university of michigan school of information

winter 2015 | Ann arbor, mi

Shannon guesser game

In my Python programming class I used the Shannon guesser game (a game where you train a computer to guess which letter or word comes next based on previous text) to analyze the redundancy/guessability of President Obama's speeches, specifically his State of the Union speeches for each year that he has been president. I trained the Shannon guesser with four different texts: Obama's inauguration speech from 2008; the first chapter of Obama's book, Dreams From my Father; George W. Bush's State of the Union speech in 2008; and Obama's first State of the Union speech in 2009. For my test data, I used Obama's State of the Union addresses from 2010-2015. 

Run the program

Click here to download the files needed to run this program. Download the whole folder called "Obama's Speech Variablility."

To run my program you need to import test106 as test. You also need to install Plotly's Python package ( You have to make an account, so you can use my username and api key if you'd like (copied below), but I think you might need to make your own account to run it. You can make one here -it’s quick and easy because you can connect it to your gmail or Facebook.

Either run the commands that follow, or click the file click_me
$ sudo pip install plotly
$ python -c "import plotly;'aeweiner', api_key='lsfzgbh0t6', stream_ids=['scdy4y00pq', 'y821tf8hi3'])"


I found the average number of guesses each of the six tests took under the four different training data sets, to see which training text was the best at guessing Obama's State of the Union speeches. Once I found the best training text, which not surprisingly is Obama's State of the Union speech from 2009, I made a time graph to show how Obama's redundancy has changed over his presidency.

I found that if a speech had more characters, it was going to have more guess attempts. I tried to adjust it by examining the same number of characters in each speech. Since the shortest speech had 39,354 characters, I only looked at that number of characters for each speech.

I added these adjusted values into the same time graph to show the difference between the true number of guesses and the number of guesses when you don't account for guesses based on more characters in longer speeches.

I also made a bar graph to show the number of guesses the Shannon guesser took for each of the six test texts under each training text to compare the consistency of the training texts with different speeches.

There is also a csv file that gives a summary of the performance for each training text. It lists the average number of guesses per training text, a label of how effective it was, and a description of which speech it is.