I was asked a question about shuffling yesterday which got me thinking. How do you write an algorithm to truly shuffle a deck of cards without any bias?
There are a couple of well-known algorithms to do this, both popularized by Donald Knuth. At a very abstract high-level they are:
- generate a random number for each card in the deck then sort the cards by number. If two cards are assigned the same number then try again;
- go through the deck, taking each card in turn and swap it with some random position in the deck.
Clearly #1 could take a longer time to run since you’ve got to sort cards and deal with clashes. Although with only 52 cards in a deck you are probably not too worried about algorithmic complexity.
#2 looks good on the surface but there are a few gotchas to be aware of. With a deeper mathematical analysis you can see why. The first is that if you swap cards with any position in the pack you will not get an even distribution with shuffles. This is because you’ve written an algorithm that has n^n execution paths and there are only n! permutations. Using the wikipedia example consider just 3 cards: your algorithm can produce 3^3 = 27 outcomes but there are only 6 permutations for shuffling. You cannot fit 27 into 6 so there must be some outcomes from your algorithm that are more likely (see pigeonhole principal).
The solution is to swap with the portion of the pack that has not yet been swapped with.
Wikipedia has a clear article on shuffling and implementations with further details on the impact of using the mod operator with random numbers (again, the space of randoms being generated then having mod applied is not an even distribution). A final note is that you need to seed your random number generator or it’ll be pseudo-random. Or better yet use random.org