Multi-Arm Bandit

By Deane Barker • July 14, 2023 •

This a model of probability and statistics where multiple exclusive options are available, pursued, each one having an individual probability of success. The player must decide if they should continue using a single option, or switch to a new option in the hopes of getting a better output.

The name comes from a gambler standing in front of a row of slot machines (“one-arm bandits”). They can only play one at a time, so do they continue to play one and experience its probability of success, or do they switch to another and hope for more success?

The model is known as “exploitation or exploration.” Do we exploit the odds we have now, or do we explore new options to see if the odds are better?

This is related to the concept of “local maximums.” When we max out a situation, sometimes we have to back out and explore other situations, maybe even go slightly backwards, to move forward again.

Why I Looked It Up

It came up quit a bit in The Model Thinker.

However, my company makes an experimentation product, and it has several modes that people have referred to as a “multi-arm bandit.” I don’t think this is an officially-named product feature, it’s just how people referring to different ways of using the product. I never knew what was meant by this (I didn’t realize it was a general term – I thought it was specific to our product.)