Data-driven Analysis of Presidential Candidates
Analytics can be used to gain insights in almost any field. Yet analyzing text—without any obvious data points to measure—still presents a special challenge to data scientists. In her master’s thesis project, MS in Predictive Analytics student Kelsey O’Neill uses topic modeling to analyze Twitter posts from three leading presidential candidates through June 2016. Topic modeling is used by O'Neill and other data scientists to explore and analyze large quantities of text and identify common themes.
Thousands of tweets were collected from earlier in the election period, March through June 2016, a period of time during which Bernie Sanders was in the race. Results suggested that discussion can be organized around about fifteen topics listed in the chart below. The chart indicates, by percentage, the topics that Donald Trump’s and Hillary Clinton’s tweets focus on.
The results support the perception that, at least during the timeframe the tweets were collected, Donald Trump's campaign was less issues-based and focused on rivals while Hillary Clinton focused more on issues.
These results, organized as candidate social media discussion by topic, may be segmented into people-focused topics and issue-focused topics. The titles of these topics represent key terms and themes within tweets. The underlying data (discussed further in the full research report) reveal that Trump focused directly on Clinton, referring to her often as “Crooked Hillary”. Clinton’s tweets are represented in large part by the Hillary Clinton topic as well. Trump’s campaign slogan “Make America Great Again” (labeled as ‘Trump Slogan’) is so prominent in the data that it is represented as a distinct topic among the candidates’ tweets. Trump Rhetoric proves to be driving the conversation as Clinton finds reason to discuss and debate it.
As pillars of his campaign, Trump tweets about domestic job loss, trade agreements, and America as a competitor abroad. He mentions President Barack Obama frequently and discusses Obama’s foreign policies and perceived failure to call recent domestic attacks “radical Islamic terrorism.” Clinton’s focus frequently includes issues such as healthcare, living wage, and education.
While this analysis supports some commonly held perceptions, it is distinct from most other political analysis in that it is entirely data-driven. Identified topics are not the opinions of political analysts or experts. Rather, topics reviewed in the research emerge from the words of the candidates themselves, words posted to social media. It is text data and unsupervised machine learning algorithms that discover general topics and themes. This is political data science, empirical research rather than political commentary and expert opinion.
Methods of text analytics such as topic modeling and sentiment analysis have applications beyond the analysis of political discourse. Social media, consumer reviews, and the web are largely text. Data scientists can use these methods to learn about what consumers are thinking about and how consumers are feeling about products and brands, as well as candidates.