在机器人领域应用深度强化学习,目前主流的一些思路是什么? - 知乎
You can do it by learning how much reward certain states or actions can bring (“value” learning), or by simply knowing which strategies tend on the whole to do better than which others (“policy” learning).
Brian Christian • The Alignment Problem
我们可以通过确定agent是否了解环境模型来划分可用的RL算法。 了解模型可以使agent提前知道状态转移概率矩阵和未来的reward
【重磅综述】用于机器人操作的深度强化学习- 知乎
Third, we ideally want to be learning not just after the fact but as we go along.