Jan 17, 2012
When analysts want to study public opinion in open societies, they have many tools, including face-to-face surveys, on-site observation, media analysis, and telephone polls. But if they want to do the same in societies in which freedom of expression may be limited—such as China, Pakistan, and Iran—analysts face significant barriers, in part because people in such societies may fear retribution for expressing themselves freely. In recent years, people in closed societies have increasingly turned to social media (such as Twitter) to try to preserve anonymity. This means that such media could offer analysts an alternative source of data on public opinion and mood in more closed societies.
With funding from RAND’s continuing program of self-initiated independent research, researchers developed an innovative approach to analyze politically oriented social media content and then tested it in a “proof-of-concept” analysis by looking at the months after the contested Iranian presidential election in June 2009, when Iranians blogged, posted to Facebook, and, most visibly, coordinated large-scale protests on Twitter. Some of the results of that analysis are covered in the interactive graphic box in this brief, while the rest of the brief focuses more on how the approach was developed and applied, the lessons learned from using it, and the potential for its use in the future as a research tool.
While researchers can manually analyze social media, doing so has many limitations. For example, traditional approaches are time-consuming and allow analysts to review only limited amounts of material. Also, researchers will not be able to understand public opinion or mood on a mass scale, and their biases may affect their interpretations of what they read.
An automated content analysis program known as Linguistic Inquiry and Word Count 2007 (LIWC, developed by James Pennebaker and colleagues) can analyze thousands of social media posts in only a few seconds and may mitigate some of these limitations (although it may also introduce others, such as sacrificing nuance or implicit meanings). Analysts can gain a bird’s-eye view of what people are saying and feeling across many social media sites over time and can quantify their data, thus reducing the chance that their biases affect their interpretations.
LIWC was developed to analyze the characteristics and patterns of written text, allowing analysts to draw conclusions about people’s psychological states (e.g., emotions, desire for social interaction) based on their usage of specific categories of words. LIWC contains approximately 80 such categories, such as first-person singular pronouns, positive emotion words, and swear words. For a given text, LIWC first counts the total number of words in that text. Then, it searches for all words contained in each of the 80 categories, keeping a tally of the number of instances in each category and ultimately calculating a ratio of the number of words in each category divided by the total number of words in the text.
A strong precedent exists for using LIWC in this way in research. For example, researchers have used it to study language patterns after traumatic events, to investigate how men and women communicate differently, and to detect deception. However, LIWC has not been widely applied to understanding non-Western political contexts.
|Word Category||Attitude or Mood Expressed|
|First-person singular pronouns||Feelings of depression within the population|
|Second-person pronouns||An intent and desire to interact with others|
|Plural pronouns||A sense of (1) group or collective identity and (2) coping with shared trauma|
|Positive emotions||Feeling generally good or happy|
|Negative emotions||The degree to which people have been affected by a given trauma; also, feeling angry, anxious, or sad|
|Swear/curse words||Frustration or anger|
This research tested LIWC in the Iran political context, constructing an automated software program to parse and clean the social media texts: a total of 2,675,670 tweets marked with the “IranElection” hashtag, posted by 124,563 distinct individuals and dated from June 17, 2009, to February 28, 2010. This sample necessarily included observers throughout the world, including in Iran, who were writing in English. But a review of many of the tweets suggested that their authors were Iranians living inside Iran. For example, some tweets referred to having participated in protests, others were about mobilizing for upcoming protests, and many seemed to be communications from Iranians to people in other countries. Thus, the sample should contain a broad range of opinions that reflects these interactions. Because this study focused on exploring this diversity of opinion, the researchers did not attempt to separate tweets from people in Iran and from people outside of Iran, which would likely have not been feasible anyway.
The tweets were processed using LIWC, with researchers conducting qualitative interpretations of the quantitative LIWC output. Table 1 shows the specific word categories used to gain insight into the Iran election and describes the attitude or mood expressed; each was drawn from prior research that used that interpretation for the word category.
The researchers then looked for patterns in the data and interpreted those patterns in relation to specific events on the ground and to public figures and topics important during the post-election period. For example, they sought spikes and dips in word usage—that is, sudden reactions to a specific event or action (e.g., a protest or holiday) that might offer insight into public opinion about that event—and compared trends in certain word categories against others.
By many accounts, social media played a pivotal role in the large-scale protests that occurred after the 2009 presidential election in Iran, enabling the opposition to communicate and coordinate under censorship. RAND researchers used LIWC to analyze the millions of tweets that came out over nine months, from mid-June 2009 to mid-February 2010.
Below we highlight two scenarios we analyzed. The analysis is captured in a graphic and a corresponding discussion, which readers can interact with. For example, hovering the cursor over the date icons at the top of the graphic enables readers to see what happened on that date. The first scenario is highlighted now and the analysis for it is included below it; readers can get to the second scenario by clicking on its tab, which brings up the new graphic and the analysis of it. The third tab—Customizable Scenario—enables readers to directly interact with the data and create their own graphics.
The graphic above highlight one scenario that was analyzed and shows that people’s use of swear words on Twitter tracked closely with events and protests on the ground and did a good job of forecasting when protests would occur. With each large-scale protest, rates of swearing spiked on Twitter or rose in the weeks leading up to the event. For example, levels of swearing were elevated in the weeks before the Quds Day protest (September 18, 2009), one of the major protests of the post-election period, and there was a large spike at the end of December, when a protest on Ashura Day occurred. The overall trend in swearing—a gradual decline over time to a relatively stable level lower than the initial level observed in June and July 2009—suggests that the opposition movement had probably resigned itself to the political situation nine months after the election.
The full report contains more detailed analysis of this and other examples.
In the scenario shown above, we find that the Green Movement was viewed more positively than the Islamic Revolutionary Guards Corps (IRGC) and the Basij throughout the period measured. The Green Movement is the broad-based opposition movement that developed in the weeks following the presidential election. The IRGC is an elite military force and currently Iran’s most powerful economic, social, and political institution. The Basij comprises a set of pro-government paramilitary organizations serving under the leadership of the IRGC. In the initial weeks after the election, Twitter users expressed high levels of positive sentiment toward the Green Movement. However, this sentiment dropped sharply after the initial post-election period and remained low until the end of February 2010. Still, the levels of positive sentiment for the IRGC and the Basij were consistently low throughout the entire period. Between the two, Twitter users expressed more anger at the Basij forces than at the IRGC on the whole.
The full report contains more detailed analysis of this and other examples.
The graphics in Scenarios One and Two were created by clicking on a “word category“ and a “political figures and events“ category. The graphic above is intended to be an interactive graphic in which readers can interact with the underlying data to customize their own scenarios. Clicking on a particular “word category“ button and a “political figures and events“ button will create a customized scenario, which will show up as a graphic; it is possible to click on more than one “political figures and events“ button for each “word category“ button, which was how Scenario 2 was created. To see what was happening over the nine-month period and how it corresponds with the plotted line(s), readers can simply hover their cursors over the date icons at the top of the chart.
The full report contains more detailed analysis of many of the possible word category/topic combinations.
Select a word category to view its usage in tweets marked with the #IranElection hashtag. The graph displays the proportion of words in each category relative to the total words used. Psychological research has shown that certain patterns of word usage are associated with attitudes, opinions, and psychological states. Roll over a word category for further details about each word category.
Select a word or phrase corresponding to various political figures and events to view word use associated with them. Note, for instance, how public opinion may vary when people used different phrasings to refer to the same topic (e.g., “Iran” vs. “Islamic Republic”).
To view overall word use, check “All Political Figures and Events.”
This test case of Iran suggests that using the LIWC-based method to analyze social media holds much promise, particularly in countries where freedom of expression is limited. The potential policy uses are many. With this approach, analysts can
One key limitation is important to note. In most cases, the tweets appeared to express the emotions signified by the LIWC words they contained, based on the researchers’ manual review of them. But in certain cases, tweets that contained words thought to denote sadness, positive emotion, and anxiety did not express the emotion expected, given the words they contained. To manage this problem, analysts could construct alternate, tailored word categories that contain words that seem most useful in context and then validate those categories.
Such validation is important. This approach should be considered part of the analyst’s tool chest, along with the more traditional methods, such as elicitation of expert opinion, open-source analysis, and polling. The approach should be validated by comparing its results against such methods; by testing the assumption that linkages demonstrated in Western contexts also apply to Iran; by replicating the current study inside Iran over successive years to determine whether the methodology functions in the same way at different time periods; and by replicating it in other countries that have experienced political upheaval or traumatic experiences, such as Pakistan, to determine whether this approach is cross culturally valid or applies only to the Iranian context.
Having established that LIWC can generate informative output about public mood and opinion in a society in which freedom of expression may be limited, RAND researchers are beginning to incorporate this approach into a multimethod research design to answer timely policy questions, both to validate the method and to explore additional research questions.
As of 2012, researchers are using the LIWC approach to explore whether the growth of independent social media in Pakistan, such as Twitter and blogs, may have contributed to the spread of anti-American rhetoric in the country. The research focuses on bloggers because they often serve as both “opinion leaders” and “opinion reflectors” in Pakistan. The anti-American rhetoric and violence there following the release of the “Innocence of Muslims” film underscores the value such research can have.