
The Alignment Problem

At its most basic, reinforcement learning is a study of learning by trial and error, and the simplest algorithmic form this trial (or groping, if you prefer) takes is what’s called “epsilon-greedy”
Brian Christian • The Alignment Problem
The system had not been shown a single human game to learn from. But it was, nonetheless, learning by imitation. It was learning to imitate . . . itself.
Brian Christian • The Alignment Problem
the agent’s behavior exhibited a kind of restlessness. Unlike pursuit of the in-game score, which often leads to a fairly stable and consistent set of best practices, for the “maximally curious” agent, the only reward is from this exploratory behavior, and those rewards aren’t stable
Brian Christian • The Alignment Problem
The chimpanzee, in contrast, has no such sophisticated model of the human demonstrator. The logic seems to be much simpler: “The human is dumb and isn’t taking the best action to get the food. Whatever. I can see the best way to get the food, so I’ll just do that.”
Brian Christian • The Alignment Problem
The dopamine neurons, he wrote, “discharged a burst of impulses in response to [the cue] but entirely failed to respond to the touch of food.”
Brian Christian • The Alignment Problem
Caplan noted that while there are no legal penalties for ignoring such a tattoo, there may be legal problems if the doctors let a patient die without having their official DNR paperwork. As he puts it: “The safer course is to do something.”
Brian Christian • The Alignment Problem
eventually comes to learn what score it achieved, but it may never know, win or lose, what the “correct” or “best” actions should have been.
Brian Christian • The Alignment Problem
“Given a reward signal, what behavior will optimize it?,” inverse reinforcement learning (or “IRL”) asks the reverse: “Given the observed behaviour, what reward signal, if any, is being optimized?”15
Brian Christian • The Alignment Problem
the line between critic and artist can be a thin one.