← Essays

Mesa optimization

Feb 11, 2023

Learned models become optimizers — second-order alignment risk

Mesa-Optimization is the situation that occurs when a learned model (such as a neural network) is itself an optimizer. In this situation, a base optimizer creates a second optimizer, called a mesa-optimizer. The primary reference work for this concept is Hubinger et al.’s “Risks from Learned Optimization in Advanced Machine Learning Systems”. ~ lesswrong.com

In The Score Takes Care of Itself, Bill Walsh explains Mesa Optimization as the concept of using the results of one optimization process to inform the results of a subsequent optimization process. For example, in an AI system, an optimization algorithm could be used to identify the best combination of parameters for a neural network. This result could then be used to inform a subsequent optimization process that seeks to identify the best combination of parameters for a second neural network.

Paperclip maximizer is in itself a mesa optimizer which learned to maximize it’s engine in order to optimize its first goal.

Mesa optimization can also be used in other types of systems. For example, in a financial system, an optimization algorithm could be used to identify the best combination of investments for a portfolio. This result could then be used to inform a subsequent optimization process that seeks to identify the best combination of investments for a second portfolio.

Mesa optimization can be useful in a variety of applications. It can help identify the best combination of parameters for a given system, or it can help identify the best combination of investments for a given portfolio. It also has potential applications in robotics, autonomous vehicles, and other types of systems.

← back to Essays