100 Prisoners and One Light Bulb

Here is an interesting problem:

A new prison has been built. It holds exactly 100 prisoners. Each prisoner is held in a solitary room with no way of communicating with the other prisoners, or anyone on the outside. Each day, the warden selects one prisoner at random, and escorts the selected prisoner to a central room for one hour. There is nothing in the central room except a light switch that controls a light bulb installed in the ceiling, which is visible but out of reach. While a prisoner is in the central room, they are allowed to turn the light switch on or off. They can also decide to tell the warden that they are certain that all 100 prisoners have been in the room. For simplicity's sake, assume the light in the central room is off before the first prisoner goes in.

If they are correct then everyone will be set free, but if they are not correct, then everyone will be executed. So, it is essential that they only talk to the warden when they are 100% positive that they are correct.

All 100 prisoners are allowed to gather together for one hour before they are put in prison to discuss their strategy for getting out. Can you come up with a plan that guarantees success (eventually)?

At first the task may seem quite impossible, but it turns out that there are many possible solutions. This leads us to the next question: What is best solution? In other words, what is the solution that gets everyone out of prison in the least amount of time on average?

On this page I will discuss some common solutions, and hopefully work towards getting a solution that is close to optimal. I have also included C++ source code for those of you who want to simulate these solutions and try to tweak them to get better results. In fact, if you are interested in reading this page, it might be a good idea to download the source code now as I may refer to it later on.

I am going to describe solutions in rough order of "worst" to "best" in terms of average time spent in prison.

Solution 1: Dark Frustration Until Death

This solution technically works, but it is so bad that I couldn't include it in my simulation because it would take too long to run. This one earns its title because the light will almost always be off, and the prisoners will always be frustrated even if they occasionally get lucky and get called in on the right day. For each cycle of 100 days, the chances of all the prisoners getting picked in exactly the right order are 10-200. This means that on average prisoners can expect to wait approximately 2.7 * 10199 years before they get out of jail. That's a long time. So long, that it is difficult to describe how long that is. It is about 2 * 10189 times longer than the age of the universe.

We can improve on this a bit by relaxing the condition that the prisoners have to appear in order. All we really care about is whether each prisoner has made it into the room. We can change the rules as follows:

Essentially, this allows the prisoners to be called into the room in any order, as long as they are each called exactly once in a 100 day cycle. This greatly improves the odds that such a cycle will happen by a factor of 100! (that's 100 factorial), meaning that the prisoners will expect to wait about 2.9 * 1041 years before they get out of jail.

We can do better.

Solution 2: Pass The Torch

We can improve on the previous solution if we realize that once prisoner N has seen the light on the correct day, they know that all of the prisoners from 0 to N-1 have been in the room. Thus, in the future, they can turn on the light on day N regardless of whether it was already on. The rest of the algorithm is exactly the same as before.

This means that for each cycle of 100 days, the chances of a pair (N-1, N) of prisoners being called in on the right days are 1/10000. Since every pair has to be called in the right order, we can simply multiply the expected length of time (10000*100 days) by the number of pairs (99) to find out how long this solution should take on average. By my calculation, the prisoners should be free in about 271200 years. This is MUCH better than the previous solution, but unfortunately the prisoners will still all be long dead before they can be set free.

I have included this solution in the provided source code, but it is not enabled because it takes too long just to run the simulation more than a couple of times!

Solution 3: Windowed Pass The Torch (two approaches)

Recent discussions prompted me to think about this problem some more, and I thought of an improvement to the "Pass The Torch" strategy, plus another similar way of solving the problem. It turns out that both methods have almost identical runtimes.

The first method is the same as "Pass The Torch", but instead of each prisoner being assigned one day in a cycle of 100 days, each prisoner is assigned a "window" in a cycle of 100 windows. The window length is fixed, and decided beforehand. During window N, prisoner N is allowed to turn on the light if they have previously observed the light on in window N-1, and prisoner N+1 is allowed to observe the light. Simulations suggest that the optimal window size is 900 days. This large window size ensures that it is very likely that everyone will make it through "in order" during the first cycle of 100 windows. This gives us an average runtime of about 273 years, which is 1000x better than standard "Pass The Torch".

Another similar algorithm basically works like this: Divide the time into windows again. If you are called into the room in window N, you are allowed to turn on the light if you know that all prisoners from 0..N have been in the room. Anyone is allowed to observe the light, and if they see it, then they know that prisoners 0..N have all been in the room. You can quickly see how this is equivalent to the "Windowed Pass The Torch" algorithm, since the prisoners will wind up turning the light on in order from 0..99. Sure enough, simulations show that the optimal window length for this algorithm is 900 days again, and the runtime is about 273 years.

Solution 4: The Information Snowball

Here is an idea I thought of myself. It also turns out to be quite bad, but I'm proud of it, so it earns a more detailed discussion. I call this "The Information Snowball" because at the start of the incarceration, the light will remain off for most of the time. In fact, the only way it will be turned on at the start is if prisoner N gets called in on day N, which has a 1% chance of happening each day. However, once the light gets turned on a few times, information starts spreading around, until finally the light is on most of the time until the first person clues in that everyone has been in the room.

This solution is actually just a slight modification of Pass The Torch. In that solution, we tried to improve things by having each prisoner remember a simple fact about the people who came before them -- namely, whether they had all been in the room yet. This time, we are having the prisoners remember more complicated information. Also, in Solution 2 we restricted who could send information and who could receive it. This solution still restricts who can send information, but it allows anyone to receive it, thus greatly improving the rate of information transfer, hopefully.

It is still not a great solution though, again because it is so unlikely that person N will get picked on day N. On average, after the first 100 day cycle, only one of the prisoners is likely to have got in on their day, and the prisoner who received that information will only have a 1 in 100 chance of passing that information on in the next cycle. However, true to its name, the information eventually starts snowballing, and our average performance is much better than the previous solution: About 250 years.

Here is a chart of how much knowledge the prisoners have on average, averaged over 1000 trials.

You can see that it takes a while for the first bits of information to travel, and then information starts spreading quickly, but then the flow of information tapers off again as it takes a while for just the right gaps to be filled in the prisoners' knowledge. Unfortunately for the prisoners, they will still all be dead before they can be set free.

This is pretty much the end of the road for these types of solutions. Can you think of any other ways of improving information transfer using similar techniques?

Solution 5: The Lamplighter

This algorithm comes courtesy of Ian Vollick. It has quite a good runtime, and it also has the interesting property that it can solve a more general version of the 100 Prisoners problem, where the initial state of the lightbulb is unknown, and prisoners are called into the room at random (unknown) intervals. Let's take a look: Sorry for the awkward wording of this solution, but it will help you compare it to the other algorithms below, especially the "Simple Count" algorithm. In Ian's original description, the prisoners were all supposed to turn the light off, and the collector was the only one allowed to turn it on. That is why he called the collector "The Lamplighter".

This solution works in the more general case because it effectively communicates to the prisoners when the counting has started. However, because of this property, it takes a little longer on average than other simpler solutions. On average, this algorithm takes just over 30 years to run. This is WAY better than the previously mentioned algorithms, which all take hundreds or more years!

Solution 6: Simple Count

This is the solution that most people eventually think of when they are presented with this problem, and it actually turns out to be a surprisingly good solution. Usually this solution isn't really described in this way, but it will help me tie it to the other solutions below. In any case, you can see that this is quite an elegantly simple solution. Essentially, each prisoner turns on the light the first time they are in the room with the light off. Once the collector has seen the light on 99 times, they know that everyone has been in the room at least once.

But how long will this take, on average, to set everyone free? Well, the counter has to get in the room at least 99 times, which takes 9900 days -- just over 27 years -- on average. During the times in between, it is actually fairly likely that someone will get into the room and turn on the light, so usually the light will be on every single time the collector gets in.

Sure enough, simulations seem to indicate that this method takes only 28 years on average to set everyone free. Now we are getting into the realm where our prisoners can expect to get out of jail within their lifetime! We can still do better though.

Note: We can improve this algorithm slightly if use "dynamic role assignment", which means that we don't pick the collector ahead of time. The optimal time to pick a collector is actually on the second day, because the person who got called in on the first day definitely turned on the light, so we want to make sure that the count gets picked up as soon as possible. On average, this will knock an entire cycle (100 days) off our running time.

Solution 7: Double Or Nothing

This is a very clever solution that expands on the technique used in the Simple Count strategy. The natural question you may have is: "How can we communicate counts greater than 1?" This is essential if we want to improve on the Simple Count strategy, because as we saw there it takes 27 years just for the collector to be called into the room 99 times.

The first thought that most people have is to divide time into segments, or phases, during which the light bulb can mean something other than 1 or 0. These phases have to be agreed upon before hand, as there is no way to communicate phase changes once everyone is in solitary confinement. In order for the phases to be useful, we also have to expand the possibilities of who can collect counts. We don't want to just let anyone collect counts though, or else counts will just bounce back and forth between prisoners without reaching the people who really need them.

The following algorithm takes all of these factors into consideration, and is also mathematically appealing:

I like the name "Double Or Nothing" for this strategy, because every prisoner's basic strategy is to either double their count or reduce it to zero. There are some situations (like at the end of each phase) where prisoners may be forced to pick up counts that they don't want. In that case they are no longer doubling or zeroing their count, so the name loses its meaning a bit, but it is still a good way to think of it.

As you can see from my description, I prefer to think of it in terms of binary bits. Each phase represents a different bit in a prisoner's count. Prisoners can only pick up a bit if they already have one (doubling the bit), and obviously they can only discard a bit if they have one (reducing it to nothing). Since there are only 100 prisoners, 7 bits are needed to count them all, which is why we have 7 phases.

Where did the number 460 come from? Well, it turns out that the optimal phase length for N prisoners is around N*log(N), which is approximately 460 when N = 100. I also ran some simulations to try different phase lengths, and came up with the following graph:

As you can see, the algorithm is actually quite tolerant of phase lengths between 400 and 800 days in length. In general, longer phase lengths mean a higher probability that all of the prisoners get to participate in each phase. This means that longer phase lengths trade off minimum finishing times for lower maximum finishing times, keeping the average finishing times roughly the same.

You can also see from the graph that this algorithm takes an average of around 4850 days (13.3 years) to complete. That is almost twice as fast as the Simple Count strategy, again earning the name "Double Or Nothing". Can we improve this time even more? It turns out that there are a few minor tweaks that we can easily try to improve things.

The most obvious is to relax the restriction that each phase has to be the same length. At the beginning, it makes sense for phases to be as long as possible to make sure that everyone gets a chance to get rid of their count. As counts get more concentrated with fewer prisoners, it makes sense to reduce the phase length because not all prisoners are required to participate in each phase anymore. But how long should each phase be?

In an attempt to answer that question, I coded up a quick and dirty genetic algorithm to try to evolve optimal phase lengths. I tried four different sets of evolutionary parameters to see if they agreed on what the optimal solution should be. The results are in this graph:

The algorithms ran for about 15000 generations each (although the cyan one ran for less than 2000 generations), and for the most part, they all seem to basically agree on the lengths for the first 7 phases. The randomness that most of the algorithms seem to have in the second and third set of phases comes from the fact that most of the time the prisoners are set free during the seventh phase, which means it doesn't really matter what the length of subsequent phases are.

From this we can see that our phase lengths should start at around 750 days and gradually drop down to less than 400 days for the seventh phase. Using these phase lengths, my simulations drop down to an average of around 4700 days (12.8 years), which is a pretty good improvement.

Another way we can try to improve things is to recognize that we want to quickly get high counts to a few prisoners. The sooner we can get more prisoners to drop their counts to zero, the better, and the sooner we can get a few prisoners' counts up to high values, the better. One way of doing this is to have a "boost" phase right at the beginning. This boost phase will be a bit of a gamble, but it should pay off on average. Here is how the boost phase works:

The objective of the boost phase is that by the end of it, 12 prisoners have a count of zero, and one prisoner has a count of 13. This would give a good head start to the accumulation process. Choosing an optimal length for the boost phase involves solving the Birthday Problem where instead of 365 days in a year, we have 100 prisoners to choose from. In other words, we want the maximum number of days where there is at least a 50% chance that no duplicate prisoners chosen. Mathematically, if N is the number of days, N2-N < 2*100*ln(2). This gives 12 as a reasonable number to choose for the length of the boost phase.

Adding this "boost" phase to our solution knocks our time down to about 4650 days (12.7 years), which isn't much, but is pretty good for such a small modification to the algorithm.

Solution 8: Two Stage Count

First, a story: I started writing this page in early 2008, but I didn't get around to finishing it because I got distracted with my wedding and honeymoon and other such things. I know that getting married isn't usually a valid excuse to interrupt the solution of a math problem, but what can you do? When I started, my goal was to try to come up with the best known algorithm for solving this problem. However, it seems that in the meantime other people have found results that are better than mine. Oh well, that's what I get for slacking off.

The Two Stage Count strategy works like this:

Now there is a bit of a tricky thing to consider with this algorithm. At the end of the second phase, someone has to pick up the count "package" as noted above and get promoted to "assistant". If the prisoner is already an assistant, then they basically have to become two assistants in one. If the prisoner is the collector, then he doesn't have to become an assistant... he can simply collect the package of counts. See the source code for details.

So clearly there are a few parameters to consider here. First, a, which is the number of assistants. This parameter also determines a parameter called q, which is the quota for each assistant, and is equal to 100/a. On a hunch, I started out setting a to 10, which is the square root of 100. Subsequent testing with other values seems to show that it is in fact the best choice.

We also need to determine the optimal phase lengths for each stage of the strategy. To do this, I wrote a quick program to iterate through different possible values for the lengths of each phase and then I charted the results in Excel. The closer it gets to green, the better.

You can see from the above image that there appears to be a sweet spot at around 300 days for each phase, but then there is an even better sweet spot at around 1200 days, and then curiously it gets worse until you get to around 2400 days for each phase. I zoomed in to investigate further.

The last image is a more detailed look at the area around (2400, 2400). At this point the results are rather noisy, due to the low number of samples per data point (I only used 10,000 trials per point). So it appears that 2400 days is a good number to pick for the phase lengths. The numbers get difficult to calculate here, but basically this phase length maximizes the chances of solving the problem in the first cycle, while minimizing the length of time that the first cycle takes.

This algorithm seems to take around 3896 days (10.67 years) to complete on average, which is surprisingly good for such a simple modification to the "Simple Count" algorithm. Can we do even better? The first "obvious" improvement would be letting the collector start with a master count of 1 (himself), rather than having him act like a regular prisoner in the first phase. This works because the collector has to get into the room a few times before he can get his count up to 100, so it is safe for him to count himself as being in the room right off the start. However, this means that instead of 100 prisoners that need to be counted, there are now 99. This means that a must divide evenly into 99, which means that it has to be either 1, 3, 9, 11, 33, or 99. This shifts the results of the algorithm slightly... making it difficult to directly compare with the simpler algorithm. Setting a to 11 seems to give the best results in this case, and it seems to be a slight improvement over the original algorithm, shortening the average runtime down to around 3868 days (10.6 years), an improvement of 28 days, on average. (Note that you have to recalculate the optimal phase lengths... my preliminary findings indicate that phase lengths of 2270 and 2470 are good phase lengths.)

Where to go from here?

Since I started writing this website, other people have made good headway on the problem. There is a good thread at http://www.ocf.berkeley.edu/~wwu/cgi-bin/yabb/YaBB.cgi?board=riddles_hard;action=display;num=1236693415 describing the best currently known solutions. I haven't had time to read through it yet to understand what is going on.

Oh, one note about the source code: If you are going to play with it, I recommend replacing the default random number generator with a better one, like a Mersenne Twister. All of my internal testing was done with the mtrand library, but I removed it from the posted source code for brevity.





Home - Feedback

Hosted by theorem.ca