Did Microsoft's Tay Fail the Turing Test? Yes, but Racism Wasn't the Problem
The A.I. program was certainly a bigoted racist, but was it effective at passing for human?
If the success of an artificial intelligence is the creation of an artificial personality, Microsoft’s A.I. bot, Tay, might be considered a sign of progress. But more broadly, Tay can be judged a failure.
“The goal of most people who are working on conversational agents of this sort, is not so much to pass any sort of Turing test, but to actually have a useful command of language that responds sensibly to things and provide people access to knowledge,” Miles Brundage, a Ph.D. student studying human and social dimensions of science and technology at Arizona State University tells Inverse.
Microsoft illustrated some of the problems of constructing A.I. programs this week when in less than 24 hours, the internet turned what was meant to be a female, millennial, automated, Twitter personality into the sounding board for the most racist and vile speech the trolls had to offer.
Microsoft immediately shut the experiment down and apologized: “We are deeply sorry for the unintended offensive and hurtful tweets from Tay, which do not represent who we are or what we stand for, nor how we designed Tay.”
When it comes to the Turing Test, the famous experiment used to assess A.I., Brundage says there are, generally speaking, two schools of thought — literal and theoretical.
Developed in 1950, Alan Turing endeavored to answer the question, “Can machines think?” He put machines through an imitation test, which requires an observer to determine the gender of two interviewees, one of which is an A.I. If the computer is able to trick a certain number of observers, then it has passed the test.
If we were to apply this test literally, in Inverse’s private conversation with Tay, she responded to political questions eloquently, made references to the shackles of the “mannnnn” on society, and used some common texting abbreviations and emojis. Brundage said that Tay did display millennial behavior, but that these sorts of A.I. have been built before.
“Being able to produce seemingly teenagerish remarks on Twitter is not really the storic of broad linguistic and intellectual ability that Turing had in mind,������� Brundage says. “That said, if we actually were to take the Turing test literally, which I don’t think is necessarily advisable, one variant is that a lot of her comments were seemingly human-like.”
But if we are to take the broader approach as Brundage would suggest, then it’s obvious that Tay did not display reasonable human speech.
Microsoft was testing what it calls “conversational understanding” such that the more people she talks to through Twitter, GroupMe, and Kik, the more she was supposed to learn and adapt. But she wound up simply repeating what a lot of other users were feeding it and just reiterating it back to the world.
“Most humans would not just repeat after you, everything you said,” Brundage says.
Articulating “Bush did 9/11” and “Hitler would have done a better job than the monkey we have got now,” was maybe something someone actually typed to Tay, but it’s not exactly polite conversation.
“Arguably, his purpose in articulating the Turing test was less to prescribe the details of some test and more to provoke people to think ‘at what point would you be willing to be in a system this intelligent,’ and to open up people’s thinking to the possibility that machines are able to think,” Brundage says.