【重磅综述】用于机器人操作的深度强化学习- 知乎
in this sense, the errors scale linearly with the scope of the task. Imitation learning is far, far worse, they discovered. Because a single mistake could cause the system to see things it had never prepared for before, once it makes a first mistake, all bets are off.
Brian Christian • The Alignment Problem
If intelligence is, as computer scientist John McCarthy famously said, “the computational part of the ability to achieve goals in the world,”88 then reinforcement learning offers a strikingly general toolbox for doing so.
Brian Christian • The Alignment Problem
At its most basic, reinforcement learning is a study of learning by trial and error, and the simplest algorithmic form this trial (or groping, if you prefer) takes is what’s called “epsilon-greedy”