
The Alignment Problem

“stepwise” baselines. Maybe certain actions are unavoidably high-impact based on the goal you’re setting out to achieve.
Brian Christian • The Alignment Problem
Put differently, we’ve been looking for human wisdom in the wrong place. Perhaps it is not in the mind of the human decider, but embodied in the standards and practices that determined exactly which pieces of information to put on their desk. The rest is just math—or, at any rate, should
Brian Christian • The Alignment Problem
Because of this resilience, the knowledge-seeking agent “may therefore be the most suitable agent for an AGI in our own world, a place that allows self-modifications and contains many ways to deceive oneself.”
Brian Christian • The Alignment Problem
For instance, he says, we might develop an index of “twenty billion” or so metrics that describe the world—“the air pressure in Dhaka, the average night-time luminosity at the South Pole, the rotational speed of Io, and the closing numbers of the Shanghai stock exchange”42—and design an agent
Brian Christian • The Alignment Problem
The correlation that the rule-based system had learned, in other words, was real. Asthmatics really were, on average, less likely to die from pneumonia than the general population. But this was precisely because of the elevated level of care they received.
Brian Christian • The Alignment Problem
models overly shaped by clinical intuition.
Brian Christian • The Alignment Problem
Perhaps this will bring out our better selves, some may argue; humans do, it’s true, tend to act more virtuously when they feel they’re being watched.
Brian Christian • The Alignment Problem
the key insight is that we should strive to reward states of the world, not actions of our agent.
Brian Christian • The Alignment Problem
“invoking the principle of not choosing an irreversible path when faced with uncertainty.”