A new prison has been built. It holds exactly 100 prisoners. Each prisoner is held in a solitary room with no way of communicating with the other prisoners, or anyone on the outside. Each day, the warden selects one prisoner at random, and escorts the selected prisoner to a central room for one hour. There is nothing in the central room except a light switch that controls a light bulb installed in the ceiling, which is visible but out of reach. While a prisoner is in the central room, they are allowed to turn the light switch on or off. They can also decide to tell the warden that they are certain that all 100 prisoners have been in the room. For simplicity's sake, assume the light in the central room is off before the first prisoner goes in.
If they are correct then everyone will be set free, but if they are not correct, then everyone will be executed. So, it is essential that they only talk to the warden when they are 100% positive that they are correct.
All 100 prisoners are allowed to gather together for one hour before they are put in prison to discuss their strategy for getting out. Can you come up with a plan that guarantees success (eventually)?
At first the task may seem quite impossible, but it turns out that there are many possible solutions. This leads us to the next question: What is best solution? In other words, what is the solution that gets everyone out of prison in the least amount of time on average?
On this page I will discuss some common solutions, and hopefully work towards getting a solution that is close to optimal. I have also included C++ source code for those of you who want to simulate these solutions and try to tweak them to get better results. In fact, if you are interested in reading this page, it might be a good idea to download the source code now as I may refer to it later on.
I am going to describe solutions in rough order of "worst" to "best" in terms of average time spent in prison.
We can improve on this a bit by relaxing the condition that the prisoners have to appear in order. All we really care about is whether each prisoner has made it into the room. We can change the rules as follows:
We can do better.
This means that for each cycle of 100 days, the chances of a pair (N-1, N) of prisoners being called in on the right days are 1/10000. Since every pair has to be called in the right order, we can simply multiply the expected length of time (10000*100 days) by the number of pairs (99) to find out how long this solution should take on average. By my calculation, the prisoners should be free in about 271200 years. This is MUCH better than the previous solution, but unfortunately the prisoners will still all be long dead before they can be set free.
I have included this solution in the provided source code, but it is not enabled because it takes too long just to run the simulation more than a couple of times!
The first method is the same as "Pass The Torch", but instead of each prisoner being assigned one day in a cycle of 100 days, each prisoner is assigned a "window" in a cycle of 100 windows. The window length is fixed, and decided beforehand. During window N, prisoner N is allowed to turn on the light if they have previously observed the light on in window N-1, and prisoner N+1 is allowed to observe the light. Simulations suggest that the optimal window size is 900 days. This large window size ensures that it is very likely that everyone will make it through "in order" during the first cycle of 100 windows. This gives us an average runtime of about 273 years, which is 1000x better than standard "Pass The Torch".
Another similar algorithm basically works like this: Divide the time into windows again. If you are called into the room in window N, you are allowed to turn on the light if you know that all prisoners from 0..N have been in the room. Anyone is allowed to observe the light, and if they see it, then they know that prisoners 0..N have all been in the room. You can quickly see how this is equivalent to the "Windowed Pass The Torch" algorithm, since the prisoners will wind up turning the light on in order from 0..99. Sure enough, simulations show that the optimal window length for this algorithm is 900 days again, and the runtime is about 273 years.
This solution is actually just a slight modification of Pass The Torch. In that solution, we tried to improve things by having each prisoner remember a simple fact about the people who came before them -- namely, whether they had all been in the room yet. This time, we are having the prisoners remember more complicated information. Also, in Solution 2 we restricted who could send information and who could receive it. This solution still restricts who can send information, but it allows anyone to receive it, thus greatly improving the rate of information transfer, hopefully.
It is still not a great solution though, again because it is so unlikely that person N will get picked on day N. On average, after the first 100 day cycle, only one of the prisoners is likely to have got in on their day, and the prisoner who received that information will only have a 1 in 100 chance of passing that information on in the next cycle. However, true to its name, the information eventually starts snowballing, and our average performance is much better than the previous solution: About 250 years.
Here is a chart of how much knowledge the prisoners have on average, averaged over 1000 trials.
You can see that it takes a while for the first bits of information to travel, and then information starts spreading quickly, but then the flow of information tapers off again as it takes a while for just the right gaps to be filled in the prisoners' knowledge. Unfortunately for the prisoners, they will still all be dead before they can be set free.
This is pretty much the end of the road for these types of solutions. Can you think of any other ways of improving information transfer using similar techniques?
This solution works in the more general case because it effectively communicates to the prisoners when the counting has started. However, because of this property, it takes a little longer on average than other simpler solutions. On average, this algorithm takes just over 30 years to run. This is WAY better than the previously mentioned algorithms, which all take hundreds or more years!
But how long will this take, on average, to set everyone free? Well, the counter has to get in the room at least 99 times, which takes 9900 days -- just over 27 years -- on average. During the times in between, it is actually fairly likely that someone will get into the room and turn on the light, so usually the light will be on every single time the collector gets in.
Sure enough, simulations seem to indicate that this method takes only 28 years on average to set everyone free. Now we are getting into the realm where our prisoners can expect to get out of jail within their lifetime! We can still do better though.
Note: We can improve this algorithm slightly if use "dynamic role assignment", which means that we don't pick the collector ahead of time. The optimal time to pick a collector is actually on the second day, because the person who got called in on the first day definitely turned on the light, so we want to make sure that the count gets picked up as soon as possible. On average, this will knock an entire cycle (100 days) off our running time.
The first thought that most people have is to divide time into segments, or phases, during which the light bulb can mean something other than 1 or 0. These phases have to be agreed upon before hand, as there is no way to communicate phase changes once everyone is in solitary confinement. In order for the phases to be useful, we also have to expand the possibilities of who can collect counts. We don't want to just let anyone collect counts though, or else counts will just bounce back and forth between prisoners without reaching the people who really need them.
The following algorithm takes all of these factors into consideration, and is also mathematically appealing:
As you can see from my description, I prefer to think of it in terms of binary bits. Each phase represents a different bit in a prisoner's count. Prisoners can only pick up a bit if they already have one (doubling the bit), and obviously they can only discard a bit if they have one (reducing it to nothing). Since there are only 100 prisoners, 7 bits are needed to count them all, which is why we have 7 phases.
Where did the number 460 come from?
Well, it turns out that the optimal phase length for N prisoners is around N*log(N), which is approximately 460 when N = 100.
I also ran some simulations to try different phase lengths, and came up with the following graph:
As you can see, the algorithm is actually quite tolerant of phase lengths between 400 and 800 days in length. In general, longer phase lengths mean a higher probability that all of the prisoners get to participate in each phase. This means that longer phase lengths trade off minimum finishing times for lower maximum finishing times, keeping the average finishing times roughly the same.
You can also see from the graph that this algorithm takes an average of around 4850 days (13.3 years) to complete. That is almost twice as fast as the Simple Count strategy, again earning the name "Double Or Nothing". Can we improve this time even more? It turns out that there are a few minor tweaks that we can easily try to improve things.
The most obvious is to relax the restriction that each phase has to be the same length. At the beginning, it makes sense for phases to be as long as possible to make sure that everyone gets a chance to get rid of their count. As counts get more concentrated with fewer prisoners, it makes sense to reduce the phase length because not all prisoners are required to participate in each phase anymore. But how long should each phase be?
In an attempt to answer that question, I coded up a quick and dirty genetic algorithm to try to evolve optimal phase lengths.
I tried four different sets of evolutionary parameters to see if they agreed on what the optimal solution should be.
The results are in this graph:
The algorithms ran for about 15000 generations each (although the cyan one ran for less than 2000 generations), and for the most part, they all seem to basically agree on the lengths for the first 7 phases. The randomness that most of the algorithms seem to have in the second and third set of phases comes from the fact that most of the time the prisoners are set free during the seventh phase, which means it doesn't really matter what the length of subsequent phases are.
From this we can see that our phase lengths should start at around 750 days and gradually drop down to less than 400 days for the seventh phase. Using these phase lengths, my simulations drop down to an average of around 4700 days (12.8 years), which is a pretty good improvement.
Another way we can try to improve things is to recognize that we want to quickly get high counts to a few prisoners. The sooner we can get more prisoners to drop their counts to zero, the better, and the sooner we can get a few prisoners' counts up to high values, the better. One way of doing this is to have a "boost" phase right at the beginning. This boost phase will be a bit of a gamble, but it should pay off on average. Here is how the boost phase works:
Adding this "boost" phase to our solution knocks our time down to about 4650 days (12.7 years), which isn't much, but is pretty good for such a small modification to the algorithm.
The Two Stage Count strategy works like this:
So clearly there are a few parameters to consider here. First, a, which is the number of assistants. This parameter also determines a parameter called q, which is the quota for each assistant, and is equal to 100/a. On a hunch, I started out setting a to 10, which is the square root of 100. Subsequent testing with other values seems to show that it is in fact the best choice.
We also need to determine the optimal phase lengths for each stage of the strategy. To do this, I wrote a quick program to iterate through different possible values for the lengths of each phase and then I charted the results in Excel. The closer it gets to green, the better.
You can see from the above image that there appears to be a sweet spot at around 300 days for each phase, but then there is an even better sweet spot at around 1200 days, and then curiously it gets worse until you get to around 2400 days for each phase. I zoomed in to investigate further.
The last image is a more detailed look at the area around (2400, 2400). At this point the results are rather noisy, due to the low number of samples per data point (I only used 10,000 trials per point). So it appears that 2400 days is a good number to pick for the phase lengths. The numbers get difficult to calculate here, but basically this phase length maximizes the chances of solving the problem in the first cycle, while minimizing the length of time that the first cycle takes.
This algorithm seems to take around 3896 days (10.67 years) to complete on average, which is surprisingly good for such a simple modification to the "Simple Count" algorithm. Can we do even better? The first "obvious" improvement would be letting the collector start with a master count of 1 (himself), rather than having him act like a regular prisoner in the first phase. This works because the collector has to get into the room a few times before he can get his count up to 100, so it is safe for him to count himself as being in the room right off the start. However, this means that instead of 100 prisoners that need to be counted, there are now 99. This means that a must divide evenly into 99, which means that it has to be either 1, 3, 9, 11, 33, or 99. This shifts the results of the algorithm slightly... making it difficult to directly compare with the simpler algorithm. Setting a to 11 seems to give the best results in this case, and it seems to be a slight improvement over the original algorithm, shortening the average runtime down to around 3868 days (10.6 years), an improvement of 28 days, on average. (Note that you have to recalculate the optimal phase lengths... my preliminary findings indicate that phase lengths of 2270 and 2470 are good phase lengths.)
Oh, one note about the source code: If you are going to play with it, I recommend replacing the default random number generator with a better one, like a Mersenne Twister. All of my internal testing was done with the mtrand library, but I removed it from the posted source code for brevity.
Hosted by theorem.ca