When I searched my set of 1000 love poems for poems that did not contain the word 'love' (or 'loves', 'lover', 'lovers', 'loved'), I was not expecting to find over 400 poems. Pulling out the poems that contained the words 'kiss,' 'kisses,' 'heart,' and 'hearts' as well left me with about 300 poems--nearly one-third of the full set. The love carousel on this page contains a random selection of these 'love'-less poems.
The poems are written by hundreds of different authors at different points in history. Still, I wondered if there were words that they tended to have in common. Which words are used most frequently in these love poems? I asked my computer...
In order to figure out which were the most common words in my set of love poems, I used spaCy, a natural language processing software, to break the poems into lists of discrete words, known as 'tokens.' Once the poems had been tokenized, I could count the number of times each word appeared. The top ten most frequent words in my set of love poems (along with the number of times each appears) are:
These words are all very common (they are sometimes called 'stop words'), but I am fond of them. I like learning that in my set of love poems, 'I' appears more frequently than 'you,' or that 'my' appears over twice as often as 'your,' or that one person or thing is 'in' well over two times as often as 'on.' However, people often ignore stop words in favor of more exotic words when looking at things like word frequency. When I removed the set of stop words identified by spaCy, the total number of words in my set of love poems dropped from 227,896 to 102,498, and the most frequent words remaining were:
You might notice that 'like'--the second most common word in the stop-word-free ranked list that I made using spaCy--doesn't appear in the word cloud. This is because I used WordCloud to make the image from the text of the original poems. WordCloud takes care of the work spaCy did of breaking the poems into individual words, but the list of stop words it uses is different (e.g., 'like' is listed as a stop word and excluded by WordCloud, just as 'will' and 'one'--which are quite large in the picture WordCloud made--are considered stop words and excluded by spaCy). I expected 'hand' (251) occurances to be smaller than 'know' (391), until I noticed that WordCloud merges singular and plural forms by default (the poems contain 217 'hands' in addition to the 251 single ones). In the end I discovered that even something as seemingly straightforward as word counts can be complicated, just like love.
See more poems and images in the Love Carousels.Tweet