How Divided Are We? Top 20 Words in Biden's vs Trump's Inauguration Speeches

50% of the top 10 words in Biden's inaugural speech were the same as Trump's top 10 words. Analyze both inaugural speeches with Python to find all the similarities and differences.

9 months ago   •   7 min read

By Derek Xiao
Table of contents

Can Python unite the nation?

Yesterday we saw our country's 45th successful transfer of the presidency.

This marked the end of a highly contested election during which our nation at times felt more divided than ever.

But as I sat in my living room today with my parents and watched Biden’s inaugural address, I felt hopeful.

“Some days you need a hand. There are other days when we're called to lend a hand. That's how it has to be, that's what we do for one another. And if we are that way our country will be stronger, more prosperous, more ready for the future. And we can still disagree.” - Joe Biden, 2021 Inaugural Address

As a citizen I was inspired by Joe's promise for a united nation.

But as a developer I started to wonder- could I quantify this hope?

The inaugural address is a president’s first speech to the nation. The speech is meticulously written by a team of writers to capture the mood of the nation and the most pressing issues that we face.

Could the specific words used in this speech give us insight into the path ahead?

I compared the top 20 most common words in Biden's speech with the top 20 words in Trump's 2017 inaugural address to see where our country is now compared to four years ago, and what to expect over the next four years.

Using Python to Find the Top 20 Most Common Words

This next section is a tutorial for the Python analysis. If natural language processing doesn't get you excited, then you may want to jump to the end (but it's also only 20 lines of code so could be the time to learn!)

The goal for this analysis was to take each inaugural address and find the most common words. The analysis was made up of two parts:

  1. Scraping the speech from the web using Beautiful Soup
  2. Processing the words using NLKT

If you want to run the code at home, this is what you'll need to do to get set up:

  1. Install python 3
  2. Install requests, BeautifulSoup and nltk with pip3 install
  3. brew install jupyter and then open a jupyter notebook by running jupyter notebook. Now you can run all of the commands below in the jupyter notebook!

If you want to skip the scraping and cleaning, download Arctype and use the database credentials at the end to see the end data.

1. Web Scraping with Beautiful Soup

Web scraping is the process of collecting information from the web. In this scenario, we're going to be scraping transcripts of each president's inauguration speech.

You can find each president's speech at these websites:

We first use the requests package to scrape the entire HTML code from each website.

import requests

URL = 'https://www.yahoo.com/now/full-transcript-joe-bidens-inauguration-175723360.html'
page = requests.get(URL)

Congrats you've built your first web scraper!

This code is making a HTTP request to retrieve the HTML code from the server that the speech is stored at.

Now we have to take this mess of HTML and find just the text from each president's speech. We can do this easily with Python's Beautiful Soup package.

from bs4 import BeautifulSoup

biden_speech = BeautifulSoup(page.content, 'html.parser')

In the code above we've converted the HTML from earlier into a beautiful soup object that is easily parseable.

Using Chrome DevTools to Find a HTML Tag
Using Chrome DevTools to Find a HTML Tag

Now we have to find the specific HTML block that contains the text we're looking for. We can do this using the browser's DevTools console.

Open the speech in a new tab in your browser and press cmd+option+I to open the DevTools console. Highlight the text you're looking for, and you'll be able to see the HTML tag that contains that text in the console on the right.

For Biden's speech, we can see that it's contained in a <div> tag labeled with a caas-body class name. Switching back to Python, we can find that tag using the find_all method with our beautiful soup object from before.

biden_speech_content = biden_speech.find_all('div', class_='caas-body')

When we look at the biden_speech_content object, we'll still find other html tags that aren't related to the speech such as:

<div class="caas-readmore caas-readmore-collapse">
	<button aria-label="" class="link rapid-noclick-resp caas-button
    collapse-button" data-ylk="elm:readmore;slk:Story continues"
    title="">
    	Story continues
    </button>
</div>

In order to find just the text from Biden's speech, we can filter for the <p> tags that aren't labelled with a class:

biden_speech_content_v2 = biden_speech_content[0].find_all('p', attrs={'class': None})

Now we have all the text, but the string <p> is appended to the beginning of every sentence. We can remove these HTML tags with the Beautiful Soup get_text method:

biden_speech_str = ""

for sentence in biden_speech_content_v2:
    text = sentence.get_text()
    biden_speech_str = biden_speech_str + " " + text

Finally, we should be left with a clean speech that we can analyze with the nlkt package.

Joe Biden's 2020 Inauguration Address, scraped and cleaned from the web using Python Beautiful Soup
Biden's Inauguration Speech, Cleaned

2. Finding Word Frequency with NLKT

We're getting close to the end now! The final steps are doing some basic natural language processing (NLP) techniques using the Python NLP package, NLKT.

We could do a frequency analysis of the speech now, but this would show words like "I", "We", and "The" as the most common words. In natural language processing these are called stop words.

We can use NLKT's list of English stop words to find just the words that we're interested in.

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk import FreqDist

biden_words = word_tokenize(biden_speech_str.lower())

filtered_biden_speech = [w for w in biden_words if not w in stop_words and w.isalpha()]

Lets break down what the code is doing:

  1. Using .lower() to cast the entire speech to lower case so it can be compared to the stop words
  2. Separating the string into individual words with word_tokenize
  3. Removing stop words: if not w in stop_words
  4. Removing punctuation like periods and commas: w.isalpha()

Now we have a list of words that we can count!

freq = FreqDist(filtered_biden_speech)
print (freq.most_common(20))

But what you might find as you look through the list is that there are separate counts for similar words such as "country" and "countries". In order to count these as one word, we have to lemmatize the list so that every word is converted to its base word.

from nltk.stem import WordNetLemmatizer

lemmatized_biden = [wordnet_lemmatizer.lemmatize(word) for word in filtered_biden_speech]

freq_lemma = FreqDist(lemmatized_biden)
print (freq_lemma.most_common(20))

Done! You've successfully scraped data from the web and analyzed it with NLP all while supporting democracy. Let's take a look at the results.

Biden's vs. Trump's Inauguration Speeches: Most Frequent Words

k,v = zip(*freq_lemma.most_common(10))

fig = px.bar(x=v,y=k, orientation='h')
fig.update_layout(yaxis=dict(autorange="reversed"))
fig.show()
Top 10 Most Frequent Words from Joe Biden's 2021 Inauguration Speech
Top 10 Most Frequent Words from Biden's Inauguration Speech

The top word was distorted by the lemmatizer, but the word was "us".

These were the top 10 words from Trump's speech in 2017:

Top 10 Most Frequent Words from Donald Trump's 2021 Inauguration Speech
Top 10 Most Frequent Words from Trump's 2017 Inauguration Speech

What stood out to me is that 50% of the top 10 words in for both presidents were the same:

  • America
  • American
  • Nation
  • People
  • One

The optimistic side in me looks at this data and sees a nation that shares common values. We care about our country, and we care about each other.

But at the same time, we are all facing our own unique issues. If we look at the next 10 most common words for each president's speech we begin to see some differences.

Biden's Speech:

Joe Biden's Top 10-20 Most Common Words in his 2021 Inauguration Speech
Biden's Top 10-20 Most Common Words

Trump's Speech:

Donal Trump's Top 10-20 Most Common Words in his 2021 Inauguration Speech
Trump's Top 10-20 Most Common Words

Biden's speech was undeniably a call to bring our nation together in unity. On the other side, we can see Trump appealing to Americans whose job[s] are under threat and need to protect their livelihood and families.

The data shows two groups of people facing their own challenges, but I also see one nation with common values.

We set off to see if we could quantify "hope". And I believe we found an answer.

If two presidents with polar opposite political views can appeal to their supporters with 50% of the same vocabulary, then there is still hope to unite around our similarities.

What are the common objects we as Americans love, that define us as Americans? I think we know. Opportunity, security, liberty, dignity, respect, honor, and yes, the truth. - Joe Biden, 2021 Inaugural Address

A Full Speech Comparison with Arctype

I shared the top 20 words, but there were more than 500 unique words in Biden's inauguration speech. If you want to see more analysis, we've uploaded all the speech data to Arctype so you can skip the scraping and cleaning.

The dataset includes 2 tables:

  • Frequencies table: full list of the word frequencies for both speeches
  • Sentences tables: cleaned sentences for both speeches so you can do your own analysis

Here's how to connect to the data:

Connecting to a database with the Arctype SQL Client
Connecting to a database with Arctype
  1. Download the free Arctype SQL Client
  2. Input the credentials below in Arctype to connect to the database
  3. Run a query!

Database credentials:

  • host: arctype-pg-demo.c4i5p0deezvq.us-west-2.rds.amazonaws.com
  • port: 5432
  • user: root
  • password: HC9x0OkI9vVO4wqprscg
  • database: inauguration_2021

After you look at the data, leave a comment below with your own takeaways!

Spread the word

Keep reading