Fisher-Yates Shuffle

Jeremy Thompson on Flickr

If you need to sort a set of data, and you don’t happen to have the Hogwarts sorting hat, you’ll have to rely instead on algorithms like the Fisher-Yates Shuffle. While it sounds like a type of dance, in fact, it’s a way to sort data with as little bias as possible. Plus it can do much more than sort Hogwarts students into one of four houses.

The Fisher-Yates Shuffle is an algorithm — a set of rules — to shuffle data in an unbiased way. Each result is as likely as any other result.

In 1938, two English statisticians, Sir Ronald Fisher and Frank Yates, worked together to create a set of rules to be done with pen and paper:

Step 1 — Write down numbers from 1 through N. So N could be 100.
Step 2 — Pick a random number k between the numbers 1 and the number of numbers not crossed out in your list in step 1.
Step 3 — Counting from the low end of your list of numbers not crossed out, cross out the kth number not yet struck out then write it down at the end of a separate list of numbers.
Step 4 — Repeat from the second step until all numbers have been crossed out.

The sequence of numbers in the separate list of numbers, in step 3, is now a random permutation of the original list of numbers from step 1.

The modern version of this shuffling algorithm is called Algorith P, from the book, The Art of Computer Programming by Donald Knuth. It simplifies step 3 of the Fisher-Yates Shuffle to reduce time spent by the computer counting numbers left unstruck. The modern version also shuffles in place. Instead of creating a copy of the data set, shuffling is done within the original data.

Of course, the modern version used in programming has more subtleties to make programming easier. For example, what if you do not know the value of N, the highest number in your data set?

How Fisher-Yates Works with Pencil, Paper, and Dice

Here’s how the Fisher-Yates Shuffle works with pencil, paper, and one dice. We’ll start with an initial number data set between 1 and 4:

Step 1 — We roll dice to get our initial value for k, our random number. In this example, it’s the number 2 so we cross out the number in the second position in our initial number set, the number 2 in this case, and write in the number 2 for our sorted number set.
Step 2 — We roll dice and our k value is 3. We cross out the number in the third unstruck (not crossed out) position in our initial number set, the number 4 in this case, and write in the number 4 for our sorted number set, after the number 2.
Step 3 — We have two numbers left not crossed out, 1 and 3. We roll the dice and our k value is 1 so we pick the number in the first position in our initial number set, 1 in this case, and add it to our sorted number set.
Step 4 — We have only the number 3 left so we cross it out and add it to our sorted number set.

concepts-fisher-yates-table-pencil-paper
Original Fisher-Yates Shuffle Algorithm

How Fisher-Yates Shuffle Works with Computers

The modern method, used in computing, follows this pencil and paper process but with a few interesting changes. We can’t pick the last available number, for example, and the modern method substitutes the last number not chosen with the chosen number (instead of crossing out the chosen number). We also remove the chosen number, shortening our number set each time we pick a random number k.

Here’s how the Fisher-Yates Shuffle works with modern method used by computers. We’ll start with an initial number data set between 1 and 4:

Step 1 — We roll dice to get our initial value for k, our random number. In this example, it’s the number 2 so we replace the number in the second position in our initial number set with the last number in our initial number set, the number 4 in this case. We write in the number 2 for our sorted number set.
Step 2 — We roll dice and our k value is 2 (because the modern method does not select the last available unselected number). We replace the number in the second position in our initial number set, the number 4 in this case. We write in the number 4 for our sorted number set, before the number 2.
Step 3 — We have two numbers left not selected, 1 and 3. We roll the dice and our k value is 1 so we replace the number in the first position in our initial number set, 1 in this case, with the last available number, 3. We add the number 1 to the first position in our sorted number set.
Step 4 — We have only the number 3 left so we move it from our initial number set and add it to our sorted number set, in the first position. 3142 is our random number.

concepts-fisher-yates-table-computer
Modern Fisher-Yates Shuffle Algorithm

The Fisher-Yates Shuffle works best with large data sets, not limited sets like the numbers 1 through 4. But these examples give you an idea of the process Fisher and Yates came up with for their statistics, as well as the version Knuth and others come up with to use with computers. The end result is the same or similar: an unbiased sorting of large sets of data.

Indeed, the Fisher-Yates shuffle works so well, if it were put into a hat as wearable technology, the Hogwarts sorting hat might eat itself, if it really means what it says when it sings this song:

“Oh, you may not think I’m pretty,
But don’t judge on what you see,
I’ll eat myself if you can find
A smarter hat than me.”

Learn More

Harry Potter Sorting Hat

https://www.pottermore.com/explore-the-story/the-sorting-hat

The Fisher-Yates Shuffle

https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle

Sir Ronald Fisher

https://en.wikipedia.org/wiki/Ronald_Fisher

Frank Yates

https://en.wikipedia.org/wiki/Frank_Yates

Sorting Algorithm

https://en.wikipedia.org/wiki/Sorting_algorithm

The Art of Computer Programming

http://www-cs-faculty.stanford.edu/~uno/taocp.html
https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming

Donald Knuth

https://en.wikipedia.org/wiki/Donald_Knuth
http://www-cs-faculty.stanford.edu/~uno/boss.html
https://en.wikipedia.org/wiki/Knuth_reward_check
https://en.wikipedia.org/wiki/San_Serriffe