dark mode light mode Search Menu

Chain of Command, Command of Language

Sam Carpenter on Flickr

Predictive text is a technology you’re probably very familiar with from things like smartphones and tablets. Every time you type or choose a word, you get a choice of other words that can come next. How does a phone choose what text to suggest next?

In this article, you’ll learn a simple technique for making your own text predictors as well as how to generate text in the style of a particular author!

The trick to both is an old mathematical concept called a Markov chain. A Markov chain is basically a process that can have a number of “states” it can be in, and whose next “state” is determined randomly. Flipping a coin over and over is a Markov chain with a single state. What about a Markov chain with multiple states?

Picture a checkerboard, or grab/draw one if you’d like, and put a coin down on it. Roll a four sided die to determine which direction to move next: up, down, left, or right. If you can’t move a direction, then you stay where you are. This is a Markov chain where the states are the squares on the board.

How does a game like this help us generate text? Imagine a different kind of board, one where the squares represent the last several words you’ve seen and moving to another square means choosing a word that comes next.

There’s a lot of words in a language, though, many more than just the four directions in our checkerboard example. How do we choose which one comes next? If we choose any word with equal probability, on some imaginary ~250,000 sided die, we’d get things that don’t sound anything like English! “The a his the the fish” would be as equally likely a sentence as “The ball rolled down the hill”.

We don’t have to be experts in languages to figure out what the probabilities should be, though!

Instead, we can cheat a little and figure it out by taking some really large example of the language, like a novel, and go through the entire document counting how often we see combinations of words. What we’ll have at the end is the big table of the probabilities you need for choosing different words based on what the previous several words you’ve seen are.

So as an example, imagine that the last three words you’ve generated are [‘I’,’am’,’so’] and the probabilities for the next word are a 20% chance of “awesome”, 40% chance of “tired”, and 40% chance of “late”. If the next word picked is “tired”, then the new “state” will be [‘am’,’so’,’tired’] and the process continues.

You can generate as big a piece of text as you want by repeating this process over and over to keep choosing words. To predict text, instead of choosing text by its probability you offer suggestions based on how likely they are and let the user make the final choice.

Now, we’ve said that you use “the last few words”, but how many words should you remember when generating text?

The answer is it depends. The more words you remember the more the generated text is just going to look like the source material you used to build the Markov chain. If you remember only one word, you’ll get gibberish that isn’t grammatic. If you remember ten words, then you’ll get entire sentences and paragraphs that are straight out of the source. The sweet spot is somewhere in the middle, where you’ll get something that’s grammatically correct but isn’t just a quote.

Below is most of a small Python program that reads in such a text file and then runs the Markov chain process to generate a large number of words. The one thing to note is that numWords is the number of previous words you’re using for prediction. The link to the full code can be found below.

def groupWords(text):
    ourDict = {}
    group = []
    for w in text:
	if len(group) == numWords:
	    if ourDict.get(tuple(group)) == None:
		ourDict[tuple(group)] = [w]
	    group = group[1:]
    return ourDict
def markovMulti(numTimes,ourDict):
    outwords = []
    testGroup = []
    for i in range(0,numTimes):
	if(len(testGroup) < numWords):
	    testGroup = list(random.choice(ourDict.keys()))
	    outwords = list(testGroup)
	    w = random.choice(ourDict[tuple(testGroup)])
	    testGroup = testGroup[1:]
    return " ".join(outwords)

Now, you can adapt this code into predicting text instead of generating text by making a program that will
Initially present the five most common words as options and give the option to choose one of those or write your own word. Repeat this until you have enough words to start the prediction. For example, if you chose to predict text based on the last three words then keep going until you have three words.

Once you have enough words that you can start predicting text, then show the five most common next words given what’s already been typed and give an option to choose your own.

If the current history of words doesn’t have any matches for prediction, just go back to showing the five most common words.
A Python program that does this is available in the repository linked at the end of the article. Try playing around with the number of words used in prediction as well as making the text prediction more complicated. You can try making it learn over time, recording the words you choose every time and making them more likely to show up again!

Learn More

Github Text Generation Repository/Sample Code


Natural Language Generation

The general process of trying to create text that looks like it was written by a person is called natural language generation

Project Gutenberg

Project Gutenberg has a massive number of old books available as plain text files suitable for setting up your Markov chain

Markov Chains Explained Visually


Markov and You


Related Posts