Reinforcement learning is one of the primary areas of research in AI today. Most approaches to-date have been model-free—agents attempt to predict the future directly via observation of millions of sequences of images without an underlying understanding of the dynamics of the system they’re operating in.
, in contrast, attempts to have agents learn how the world behaves in general. Instead of directly mapping observations to actions, this allows an agent to explicitly plan ahead,
to more carefully select actions by “imagining” their long-term outcomes. Model-based approaches have achieved substantial successes, including AlphaGo
, which imagines taking sequences of moves on a fictitious board with the known rules of the game. However, to leverage planning in unknown
environments (such as controlling a robot given only pixels as input), the agent must learn the rules or dynamics
from experience. Because such dynamics models in principle allow for higher efficiency and natural multi-task learning, creating models that are accurate enough for successful planning is a long-standing goal of RL.
The article goes into detail on PlaNet, a collaboration between Google AI and DeepMind that models the dynamics of its environment and attempts to plan forward using that model.
There wasn’t a ton of fanfare around this launch but I think this is a big deal. Also, PlaNet is open source.