Like Humans, Insecure Robots Are Easier to Control
The overly confident robots? They deactivated their own off-switches.
Insecurity helps ensure caution in humans, and now researchers at Berkeley University and OpenAI have discovered that it’s also an essential element to keeping robots safe and predictable. Published last week, a new suggests that we very much need robots to trust our observations more than they trust their own — at least, most of the time.
It’s an insight the researchers think could prevent further A.I. debacles like Microsoft’s racist Twitter bot and Facebook’s epidemic of fake news.
To test the concept of confidence and insecurity in robots, the team introduced the idea of the “Off-Switch Game,” in which the robot is assigned a task, as well as an external switch that a human can use to turn off the robot before it has finished said task. The twist is that the robot can choose to deactivate this switch, making itself immune to the experimenters’ attempts to stop it before it can complete tasks in certain ways. Highly confident robots learned to deactivate the switch and disregard human input, and as a result the experimenters can’t tell the robot when it’s about to do something really stupid.
The researchers think that if complex A.I. were designed to seek out human approval of its actions (at least, at a reliable rate), there would be fewer errors; for example, Facebook could have more easily labeled certain stories as fake, using human insight. Then again, it would also introduce human bias to the process, something Facebook has been desperately trying to avoid. It’s the conceptual beginning of robot puppies that literally seek the approval of their masters — and just like puppies, an A.I. can be trained to be bad just as easily as good.
That’s why the paper also argues that there’s a minimum level of confidence that robots should maintain, along with a certain level of skepticism of human input. A simple example would be a self-driving car refusing to let a child exit the vehicles before arriving at school, but the concept goes further. If a human kept telling a Facebook news bot to never publish articles with a certain political bent, factual though they may be, then we might want an A.I. that could notice this as being at odds with its own observations about people’s reading habits. We might even want this A.I. to trust its own insights and disregard the unhelpful human meddling.
This research is conceptually similar to a study published last year by Google on robot “kill-switches.” The idea is that the A.I. needs to be programmed not to take certain things into account in its learning process, like for instance when a researcher simply decides to turn it off. Google found that robots that learn to associate their own deactivation with an inability to complete the assigned task may actually try to avoid the researchers that always deactivate them.
Taken together, this shows that the we need to think hard, in advance, about how we want to interact with robots and control their behavior, and that this process will not be as simple as we might hope. For now it falls to each individual developer to figure these things out for themselves, but with A.I. taking over ever-more-important facets of society, engineers are finally starting to hone in on the principles that could make sure robots do what we want them to do, and not necessarily what we tell them to.