Science

How the Multi-Armed Bandit Determines What Ads and Stories You See Online

The algorithms emerging from a classic gambler's dilemma shape the modern internet.

by Sarah Sloat
Lucian Redeux

Imagine you’re a gambler and you’re standing in front of several slot machines. Your goal is to maximize your winnings, but you don’t actually know anything about the potential rewards offered by each machine. You do, however, understand that the levers you pull and the frequency with which you do so will affect the results of your gambling binge.

This scenario, faced every day by visitors to Las Vegas and Atlantic City (to whatever degree people still go to Atlantic City) is also a classic logic puzzle called the “Multi-Armed Bandit” — slot machines are referred to as “One-Armed Bandits” by aging Reno-types because they have one lever and take peoples’ money. Though there is no one correct way to address Multi-Armed Bandit situations — the closest candidate is Gittins Index — there are strategic approaches to addressing these problems that you see without registering every day when you go online. Many algorithms governing the way content is surfaced through Google and on websites are built around MAB strategies. The goal in almost all cases is to link learning and results and maximize the potential for both.

A multi-armed bandit approach is used by The Washington Post to figure out what photos and headlines you’re most likely to click, and by wireless networks to figure out which optimal, energy-conserving routes are the best. The algorithms that grow out of MBA approaches are hugely important to these companies and many others because they basically determine when and which advertisements appear online.

Figuring out what ads to show people is a challenging problem because there are so many one-armed bandits running around clicking stuff online. MAB algorithms for advertisements typically use a rapidly changing “mortal multi-armed bandit problem,” which is applied over finite periods of time. Traffic data is used to develop increasingly effective methodologies.

It’s difficult to peg MABs to an exact purpose, because it is possible to create so many variations of the formula. K-armed bandits, for example, have “arms” that compete to get the highest expected reward. Contextualize bandits do the same but with “expert advice” — data previously collected on the user — and the web-ready named “ILOVETOCONBANDITS” only works on a schedule of pre-specified rounds. In contrast, a classical MAB approach has no side information possible and the result is only dependent on the potential of the action chosen.

While the most useful application for MABs so far appear to be internet-related, researchers are working towards finding a way to apply them to “real life” (aka meatspace) scenarios. In a 2015 paper, researchers from the University of British Columbia consider the application of MABs to medical trials. The goal, if MABs prove to be possible here, is that a MAB algorithm could measure the effect of a particular medication. The obvious problem is that unless a computer-modulated version of this could be created, going with this approach would simply be too time consuming. There’s no way that a MAB design could be placed within a clinical trial.

The idea is nice, but not feasible as of now. Until the future is here, you’ll mostly feel the looming presence of a multi-armed bandit when you’re desperately trying to click out of pop-up ads.

Related Tags