core components of Deep RL that enabled success like AlphaGo: self-play and look-ahead planning.
Self-play is the idea that an agent can improve its gameplay by playing against slightly different versions of itself because it’ll progressively encounter more challenging situations. In the space of LLMs, it is almost certain that the largest portion o... See more
Self-play is the idea that an agent can improve its gameplay by playing against slightly different versions of itself because it’ll progressively encounter more challenging situations. In the space of LLMs, it is almost certain that the largest portion o... See more
Shortwave — rajhesh.panchanadhan@gmail.com [Gmail alternative]
After his Super Mario Bros. agent has played the game long enough, “It just starts to stay in the beginning. . . . Because there is no reward anywhere—everywhere error is very, very low—so it just learns to not go anywhere.”
Brian Christian • The Alignment Problem
If intelligence is, as computer scientist John McCarthy famously said, “the computational part of the ability to achieve goals in the world,”88 then reinforcement learning offers a strikingly general toolbox for doing so.