Algorithm That Mastered 'Pong' Now Excellent at 'Flappy Bird', Still Single
All you need for a super-human score are smart rewards and iteration.
Improving on a deep-learning method pioneered for Pong, Space Invaders, and other Atari games, Stanford University computer science student Kevin Chen has created an algorithm that’s quite good at the classic 2014 side-scroller Flappy Bird. Chen has leveraged a concept known as “q-learning,” in which an agent aims to improve its reward score with each iteration of playing, to perfect a nearly impossible and impossibly addictive game.
Chen created a system wherein his algorithm was optimized to seek three rewards: a small positive reward for each frame it stayed alive, a large reward for passing through a pipe, and an equally large (but negative) reward for dying. Thus motivated, the so-called deep-q network can outplay humans, according to the report Chen authored: “We were able to successfully play the game Flappy Bird by learning straight from the pixels and the score, achieving super-human results.”
The original Atari paper, published in 2015 in Nature, came from the Google-owned DeepMind company (now famous for its mastery of the ancient Chinese board game Go). The DeepMind accomplishment was a breakthrough in that it took visual — or pixel, at least — information, and, with minimal input, was able maximize rewards. Such a reward system has been likened to the brain’s dopaminergic response, just simplified.
It’s not the first time an algorithm has conquered the flapping bird: An earlier class of Stanford University computer science students created a program that, when trained overnight, its score improved from 0 pipes passed to 1,600.