in Data Journalism

Using Sentiment Analysis to Measure Election Surprise

Sentiment Analysis is a powerful Natural Language Processing technique that can be used to compute and quantify the emotions associated with a body of text. One of the reasons that Sentiment Analysis is so powerful is because its results are easy to interpret and can give you a big-picture metric for your dataset.

One recent event that surprised many people was the November 8th US Presidential election. Hillary Clinton, who ended up losing the race, had been given chances ranging from a 71.4% (FiveThirtyEight), to a 85% (New York Times), to a >99% chance of victory (Princeton Election Consortium).


Credit: New York Times

To measure the shock of this upset, we decided to examine comments made during the announcements of the election results and see how (if) the sentiment changed. The sentiment of a comment is measured by how its words correspond to either a negative or positive connotation. A score of ‘0.00’ means the comment is neutral, while a higher score means that the sentiment is more positive (and a negative score implies the comment is negative).

Our dataset is a .csv of all Reddit comments made during 11/8/2016 to 11/10/2016 (UTC) and is courtesy of /u/Stuck_In_the_Matrix. All times are in EST, and we’ve annotated the timeline (the height of the bars denotes the number of comments posted during that hour):


We examined five political subreddits to gauge their reactions. Our first target was /r/hillaryclinton, Clinton’s primary support base. The number of comments reached a high starting at around 9pm EST, but the sentiment gradually fell as news came in that Donald Trump was winning more states than expected.


/r/hillaryclinton: Number of Comments per Hour


/r/hillaryclinton: Mean Sentiment Score per Hour

What is interesting is the low number of comments made after the election was called for Donald Trump. I suspect that it may have been a subreddit-wide pause on comments due to concerns about trolls, but I’m not sure; I contacted the moderators but haven’t received a response back yet.

A few other left-leaning subreddits had interesting results as well. While /r/SandersforPresident was closed for the election season post-Bernie concession, it’s successor, /r/Political_Revolution, had not closed and experienced declines in comment sentiment as well.


/r/Political_Revolution: Number of Comments per Hour


/r/Political_Revolution: Mean Sentiment Score per Hour

On /r/The_Donald (Donald Trump’s base), the results were the opposite. 


/r/The_Donald: Number of Comments per Hour


/r/The_Donald: Mean Sentiment Score per Hour

There are also a few subreddits that are less candidate- or ideology-specific: /r/politics and /r/PoliticalDiscussion. /r/PoliticalDiscussion didn’t seem to show any shift, but /r/politics did seem to become more muted, at least compared to the previous night.


/r/PoliticalDiscussion: Number of Comments per Hour


/r/PoliticalDiscussion: Mean Sentiment Score per Hour

/r/politics: Mean Sentiment Score per Hour

To recap,

  1. Reddit political subreddits experienced a sizable increase in activity during the election results
  2. Subreddits differed in their reactions to the news along idealogical lines, with pro-Trump subreddits having higher positive sentiment than pro-Clinton subreddits

What could be the next steps for this type of analysis?

  1. Can we use these patterns to classify the readership of the comments sections of newspapers as left- or right-leaning?
  2. Can we apply these time-series sentiment analyses to other events, such as sporting events (which also includes two ‘teams’)?
  3. Can we use sentiment analysis to evaluate the long-term health of communities, such as subreddits dedicated to eventually-losing candidates, like Bernie Sanders?