【重磅综述】用于机器人操作的深度强化学习- 知乎
You can do that by explicitly trying to understand how the world works (“model-based” RL), or just by honing your instincts (“model-free” RL).
Brian Christian • The Alignment Problem
The system would then attempt to refine its inference about the reward function based on the human’s feedback, and then use this inferred reward (as in typical reinforcement learning) to find behaviors that performed well by its lights.