【重磅综述】用于机器人操作的深度强化学习- 知乎

zhuanlan.zhihu.com

RelatedInsightsHighlights

in this sense, the errors scale linearly with the scope of the task. Imitation learning is far, far worse, they discovered. Because a single mistake could cause the system to see things it had never prepared for before, once it makes a first mistake, all bets are off.

Brian Christian • The Alignment Problem

它旨在将仿真环境中 (源域) 训练得到的策略在现实环境中 (目标域) 进行再适应。这种方法背后的假设是,不同域之间具有相同的特征,智能体在一个域中学习得到的行为和特征能够帮助其在另一个域中学习。

小米技术 • Article

If intelligence is, as computer scientist John McCarthy famously said, “the computational part of the ability to achieve goals in the world,”88 then reinforcement learning offers a strikingly general toolbox for doing so.

Brian Christian • The Alignment Problem

At its most basic, reinforcement learning is a study of learning by trial and error, and the simplest algorithmic form this trial (or groping, if you prefer) takes is what’s called “epsilon-greedy”