An AI agent was incentivized to be curious and explore the game. And it was the first time the AI beat human performance on the game ‘Montezuma’s revenge’.
Instead of incentivizing the reward based on something like game score, or giving information on the underlying state of the game, the agent was incentivized on visiting ‘unfamiliar states’.
I can’t help but relate this to real life. Many times we get trapped into an incentive structure set up by an external authority – marks to students given by teachers, job promotions given by managers, etc. And we orient our behavior around these rewards instead of maximizing for curiosity and exploration.
I wonder how many of us sub-optimize our potential by removing uncertainty and exploration from our lives.