More tweets correlates to congressional election wins, says paper


A study finds a correlation between candidate name mentions on Twitter and election outcomes, a finding study authors say persists even when controlling for other variables such as media coverage, incumbency, district partisanship and demography.

Study authors--academics and students from Indiana University-Bloomington--examined a random sample of 537 million tweets generated from Aug. 1 to Nov. 1, 2010 and compared it to data from 406 congressional elections.

"Our results show that the percentage of Republican-candidate name mentions correlates with the Republican vote margin in the subsequent election," the paper states. The model was valid in 404 of the 406 races, says study coauthor Fabio Rojas, a Bloomington associate professor of sociology.

Researchers decided to focus on raw Twitter mentions and not attempt to filter the data according to location of the sender. Volunteered Twitter location information is unreliable, and while tweets from smartphones may contain actual geographic data, those tweets don't constitute a large enough data sample, Rojas said.

Nor did the study examine the context of the Twitter mentions in order to categorize them into positive or negative groupings. Data from the study shows that what mattered is raw mentions. The fact that people are talking about a candidate shows that the candidate is a contender, he says. "You don't really have to worry about the geography so much, you don't even have to worry about whether the guy tweeting is happy about the candidate."

On average, possible biases in the data set introduced by a particularly fervent tweeter who keeps posting tweets about a candidate even out, Rojas said. However, researchers have also looked at the data set from the perspective of a number of users mentioning a candidate rather than the number of overall tweets and got a similar result, he added.

The model does break down in outlier or idiosyncratic races, Rojas acknowledges. The two districts that didn't conform to the model were both highly uncompetitive. Other races that feature bizarre candidates likely will also break the model, he said. Twitter traffic about New York City mayoral candidate Anthony Weiner--whose political career is commonly held to have degenerated into a sideshow of self-immolation--for example wouldn't be an accurate barometer. Delaware senatorial Republican candidate Christine O'Donnell would be another. O'Donnell attracted more Facebook fans than electoral victor Chris Coons, but "she was personally very controversial--a lightning rod. Most political candidates aren't that," Rojas said.

"We're talking about statistical average. There are always going to be cases that don't fit the curve," he added.

The fact that the Twitter data set isn't a statistically valid sample of the voting population wasn't a problem, either, Rojas said. "You don't need a representative sample of people to know that a building is on fire. You only need a few people to talk about it on Twitter--that's how politics are."

Rojas said additional research to be published in the coming months will look deeper into making electoral outcome models according to users. One thing Rojas and his team may have found, he said, is that tweets from unverified accounts that appear to come from casual users (as denoted by a relative lack of hashtags, @account mentions and hyperlinks) are likely a better source of predictive data.

For more:
- download the paper, "More Tweets, More Votes: Social Media as a Quantitative Indicator of Political Behavior," from SSRN

Related Articles:
Voters increasingly post their ballot outcomes on social media
Only 1% of Americans utilize short codes for presidential donations
Most Americans don't politic on social media sites