As someone who is on the road a lot, I frequently have a lot of downtime during road trips. I don’t like driving, because it’s mostly wasted time, so I thought I would do something useful while driving. Unfortunately, about the only useful thing you can do while driving is listen to music, but there’s something better: Audiobooks.
Audiobooks are a fantastic way to read (or, well, listen to) books on the road, and it’s largely made long car trips bearable, if not desirable. On the last trip, I decided I would listen to some Lovecraft, as I enjoy his writing and the genre of horror in general, so I loaded “At the Mountains of Madness” on my mobile phone and off I went. While listening, however, I was struck by his frequent use of words like “terrible”, “horrible” etc. I know he’s trying to convey a sense of foreboding, but, come on, Howard, not every rock has to be grotesque.
This realization made me curious to see how many times the word “horrible” was used in the book (hint: six), and then I pondered whether it would be possible to “summarize” a book by extracting the most frequently used words. Using a small Python script, I calculated the frequencies of each word (stemmed so that words with the same root are merged), ignored the most common English words and printed the 150 most common words in each book.
It turns out that this process does, indeed, produce a list of words that more or less describes the contents of the book rather accurately, but a list of words is not very interesting. To spice it up, I used the excellent Wordle to produce word clouds from each list, and here they are below:
As you can see, the clouds are quite representative of the books, to the point where one can guess which book generated each cloud just from the words.
I’d appreciate notes/feedback! If you have any, please post them in the comments below.