Marcus Hutter received his PhD and BSc in Physics from the Ludwig Maximilian University of Munich and a Habilitation, MSc, and BSc in Informatics from Technical University of Munich. He is author of the AI-book Universal Artificial Intelligence^{[1]} , a novel algorithmic information theory ^{[2]} perspective, also introducing the universal algorithmic agent called AIXI.

It is actually possible to write down the AIXI model explicitly in one line, although one should not expect to be able to grasp the full meaning and power from this compact representation.

AIXI is an agent that interacts with an environment in cycles k=1,2,...,m. In cycle k, AIXI takes action ak (e.g. a limb movement) based on past perceptions o1 r1...ok-1 rk-1 as defined below. Thereafter, the environment provides a (regular) observation ok (e.g. a camera image) to AIXI and a real-valued reward rk. The reward can be very scarce, e.g. just +1 (-1) for winning (losing) a chess game, and 0 at all other times. Then the next cycle k+1 starts. Given the above, AIXI is defined by:

The expression shows that AIXI tries to maximize its total future reward rk+...+rm. If the environment is modeled by a deterministic program q, then the future perceptions ...okrk...omrm = U(q,a1..am) can be computed, where U is a universal (monotone Turing) machine executing q given a1..am. Since q is unknown, AIXI has to maximize its expected reward, i.e. average rk+...+rm over all possible perceptions created by all possible environments q. The simpler an environment, the higher is its a-priori contribution 2-l(q), where simplicity is measured by the length l of program q. Since noisy environments are just mixtures of deterministic environments, they are automatically included. The sums in the formula constitute the averaging process. Averaging and maximization have to be performed in chronological order, hence the interleaving of max and Σ (similarly to minimax for games).

Home * People * Marcus HutterMarcus Hutter,a German physicist and computer scientiest, Associate Professor in the Research School of Information Sciences and Engineering at Australian National University and NICTA adjunct. Before, he researched at IDSIA, Lugano, Switzerland in Jürgen Schmidhuber's group.

Marcus Hutter received his PhD and BSc in Physics from the Ludwig Maximilian University of Munich and a Habilitation, MSc, and BSc in Informatics from Technical University of Munich. He is author of the AI-book

Universal Artificial Intelligence^{[1]}, a novel algorithmic information theory^{[2]}perspective, also introducing the universal algorithmic agent calledAIXI.^{[3]}## Table of Contents

## AIXI

Quote fromThe AIXI Model in One Line^{[4]}It is actually possible to write down the AIXI model explicitly in one line, although one should not expect to be able to grasp the full meaning and power from this compact representation.AIXI is an agent that interacts with an environment in cycles k=1,2,...,m. In cycle k, AIXI takes action ak (e.g. a limb movement) based on past perceptions o1 r1...ok-1 rk-1 as defined below. Thereafter, the environment provides a (regular) observation ok (e.g. a camera image) to AIXI and a real-valued reward rk. The reward can be very scarce, e.g. just +1 (-1) for winning (losing) a chess game, and 0 at all other times. Then the next cycle k+1 starts. Given the above, AIXI is defined by:The expression shows that AIXI tries to maximize its total future reward rk+...+rm. If the environment is modeled by a deterministic program q, then the future perceptions ...okrk...omrm = U(q,a1..am) can be computed, where U is a universal (monotone Turing) machine executing q given a1..am. Since q is unknown, AIXI has to maximize its expected reward, i.e. average rk+...+rm over all possible perceptions created by all possible environments q. The simpler an environment, the higher is its a-priori contribution 2-l(q), where simplicity is measured by the length l of program q. Since noisy environments are just mixtures of deterministic environments, they are automatically included. The sums in the formula constitute the averaging process. Averaging and maximization have to be performed in chronological order, hence the interleaving of max and Σ (similarly to minimax for games).## Selected Publications

^{[5]}^{[6]}## 2005 ...

2005).Universal Artificial Intelligence. Sequential Decisions based on Algorithmic Probability, Springer, ISBN: 3-540-22139-52007).Universal Algorithmic Intelligence: A mathematical top->down approach. Technical Report IDSIA-01-03 In Artificial General Intelligence, pdf2009).A Monte Carlo AIXI Approximation, pdf## 2010 ...

2010).Reinforcement Learning via AIXI Approximation. Association for the Advancement of Artificial Intelligence (AAAI), pdf2011).Universal Prediction of Selected Bits. Algorithmic Learning Theory, Lecture Notes in Computer Science 6925, Springer2011).Asymptotically Optimal Agents. Algorithmic Learning Theory, Lecture Notes in Computer Science 6925, Springer2011).Time Consistent Discounting. Algorithmic Learning Theory, Lecture Notes in Computer Science 6925, Springer2011).No Free Lunch versus Occam's Razor. Solomonoff Memorial, Lecture Notes in Computer Science, Springer2012).PAC Bounds for Discounted MDPs. Algorithmic Learning Theory, Lecture Notes in Computer Science, Springer^{[7]}2013).Reinforcement Learning. Dagstuhl Reports, Vol. 3, No. 8, DOI: 10.4230/DagRep.3.8.1, URN: urn:nbn:de:0030-drops-43409## External Links

## References

2005).Universal Artificial Intelligence. Sequential Decisions based on Algorithmic Probability, Springer, ISBN: 3-540-22139-5## What links here?

Up one level