Carnegie Mellon University researchers have reported that a computer analysis of sentiments expressed in a billion Twitter messages during 2008-09 yielded measures of consumer confidence and of presidential job approval similar to those of well-established public opinion polls. The findings suggest that analysing the text in streams of tweets could become a cheap, rapid means of gauging public opinion on at least some subjects. But tools for extracting public opinion from social media text are still crude and social media remain in their infancy, the researchers cautioned, so the extent to which these methods could replace or supplement traditional polling is still unknown. The study findings will be presented on May 25 at the Association for the Advancement of Artificial Intelligence's International Conference on Weblogs and Social Media in Washington, DC.
In the study, the researchers collected a billion microblog messages — averaging about 11 words each — posted to Twitter during 2008 and 2009. They used simple text analysis techniques to identify messages that pertained to the economy or to politics and then found words within the text that indicated if the writer expressed positive or negative sentiments.
Results regarding consumer confidence were compared with the Index of Consumer Sentiment (ICS) from Reuters/University of Michigan Surveys of Consumers and the Gallup Organization's Economic Confidence Index. Political sentiments regarding President Obama were compared with Gallup's daily tracking poll on presidential job approval and views regarding the 2008 US presidential election were compared with a compilation of 46 different polls prepared by Pollster.com. The ICS, Gallup and Pollster.com measurements were all obtained from telephone surveys using traditional polling techniques.
The Twitter-derived sentiment measurements were much more volatile day-to-day than the polling data, but when the researchers ‘smoothed’ the results by averaging them over a period of days, the results were seen to often correlate closely with the polling data. Likewise, both the Twitter-derived sentiments and the traditional polls reflected declining approval of President Obama's job performance during 2009, with a 72 percent correlation between them. The researchers concluded that improved computational methods for understanding natural language, particularly the unusual lexicon of microblogs, will be necessary before Twitter feeds can be reliably mined to predict elections.
The paper is available online at http://www.cs.cmu.edu/~nasmith.
Search for more Case Studies/Industry study reports