Tweetbombs, Googlebombs and Politics
When I got into the office Monday morning, I started my daily social media review and happened to notice a lot of very similar tweets on the subject of “text mining” by different TwitterIDs sharing very similar bio photos. Later that day, a colleague sent an article on “Twitter bombing,” titled how ”Cyber Attacks May Have Affected Massachusetts Senate Election” by P. Takis Metaxas and Eni Mustafaraj at Wellesley College, Mass.
Research by Metaxas and Mustafaraj shows that the Tweetbomb attack on Martha Coakley in the 2010 Massachusetts elections was undertaken by the same group that attacked John Kerry in 2004. Metaxas describes Twitter-bombing (not yet defined in wikipedia.org) as “creating a large number of Twitter accounts and sending a large number (in this case, about 1,000) of Tweets within a short period of time”. The tweetbomb of 185,000 messages sent by 40,000 users reached an audience of 60,000. But there were 10 users sent between 512 and 1024 messages EACH!
A similar phenomenon called the Google-bomb, not surprisingly, refers to “practices, such as creating large numbers of links, that cause a web page to have a high ranking for searches on unrelated or off topic keyword phases, often for comical or satirical purposes” (definition from Wikipedia). Tweetbombs make for viral/trending tweets and Google-bombs make for high rankings.
Tweetbombs and Text Analytics
As for the higher than usual flurry of “text mining” tweets via TweetDeck, I noticed that a lot of them were very similar even though the Twitter ID of the person posting them was different (see chart below). There were some similarities in the Twitter IDs, too. Connecting the tweet back to the tweeter was based on visual confirmation alone - the similarity of the ‘tweeters’ avatars.
Note that this flurry of tweets contained no links and created no effect on a Google “text mining” alert, nor did it impact the first page of a Google search. I’m wondering if the tweeter is performing research of her own!
Going back to the research by Metaxas and Mustafaraj, it is also noteworthy to mention that they have written a great paper explaining their work in more detail (From Obscurity to Prominence in Minutes: Political Speech and Real-time Search). They contend that the introduction of real time search in the major search engines means that there is “disproportionate exposure to personal opinions, fabricated content, unverified events, lies and misrepresentations that otherwise would not find their way in the first page, giving them the opportunity to spread virally.”
The researches collected 185,000 messages from Twitter containing the keywords “Coakley” and “Scott Brown”. Some 41% of these messages were retweets. Some users posted over 1000 tweets each! Metaxas and Mustafaraj determined the political orientation of these users by reviewing their bios and messages (manual) and by searching for phrases showing sentiment (automatic).
I think that Attensity’s text analytics technology could be employed here to replace the manual effort to automatically identify common contextually related keywords and come up with overused phrases (excessive retweets).
Googlebombs
In order to increase higher search engine rankings, users also employ a technique known as the Google-bomb or Googlewashing. Here, users create large numbers of links, that cause a web page to have a high ranking for searches on unrelated or off topic keyword phases, often for comical or satirical purposes – and in the case of the political elections, they associate an obscure, negative term with a public entity. Negative language and entities has text analytics written all over it!
Accounting for Crowdsourced Fraud
Deep natural language understanding is absolutely vital to automatically and accurately identifying negative comments and ensuring they are kept separate from the positive ones. As more of the general public weighs in on local, state and national elections, searching and manually identifying themes and patterns from the comments will become prohibitive. Text analytics offers the ability to capture these trends automatically and provide a variety of integrated reporting and dashboarding options such as side by side displays of positive vs. negative comment frequencies, trends of tweet counts over time, identification of “hotspots” (trending topics and themes that are statistically significant), tag-clouds of contextually related keywords and phrases, and much more! The over-representation of tweeters/blog posters and tweets/blogs/comments is a very tangible problem. This new crowdsourced fraud can be identified and accounted for using text analytics, allowing companies to capture the signal from the noise.
Photo credits: busyPrinting – drop tweets not bombs