The "Thai Crisis Snapshot" Theory
Overview
Over the past two days I had been trying to create a word picture snapshot (a Wordle), if you would call it, using the script from Wordle.net, I wanted to find out what are the different keyword(s), words that most internet users in Thailand were most actively talking, conversing, discussing about, or which word(s) regarding the Thailand Crisis were most mentioned Twitter, or on websites or blogs that were relayed through Twitter, those all of I was able to gather data upon. Below I have described (in detail) as to how I went about gathering data, refinements I had to do to the data and the processes used to create the word picture snapshots.
Please note hereby, that this is NOT a scientific analysis nor a statistically significant snapshot of the exact situation in the Thailand Crisis. My data is only limited to the time periods between the 14th of May, 2010 starting at 1900 hrs (GMT+7) until 17th of May, 2010 ending at 1600 hrs (GMT+7).
Data Gathering Process
Ultimately, all my data is only limited to enormous amount of Twitter feeds, those which I only managed to gather at various installments (at varying frequency and consistency) throughout the two days of the activity period I had had. Moreover, the data is also limited to only a handful of the most active and most frequent Twitterers on the Thailand Crisis that I was able to follow and acquire data from. Even though the most consistent data was acquired from English language tweets, I was also able to obtain a bulk of twitter activity in Thai and from a huge set of Thai language Twitterers, which were later translated, spelling normalized, centralized in order to blend with the English meanings or references.
As I could not find any proper Twitter backup software, or any other method of automatically accumulating various tweets, therefore I had to do it manually (which was really laboring), whereby I copied all the live tweets that would show up on my live feed, then paste them into a file, refreshing every minute and gathering all the tweets which would emerge at the very fast rate and then pasting them on top of the earlier feeds. I also copied all the retweets, as well as all discussion related to the tweet, and so on, as it was almost impossible to filter everything out fast enough manually. Below is a list of some #hashtags from which I gathered tweets, as well as a truncated list of Twitterers whom I started following in order to keep informed.
#thaicrisis
#redtweet
#RedTweetsClub
#PADtweet
#WeLoveThai
And this is the list I have been following
Also please note that I was NOT able to acquire all the tweets at exact intervals or consistent regularities. However, was able to obtain the bulk of the tweets from during the peak hours of the crisis at different times, at intervals when there was an increased palpable tension in the news reports or when there was highest frequency of live tweets online. Nonetheless, here is my eventual acquisition pattern,
14th May, starting at 1900, consistently (refreshing every minute) up till 0200, 15th May
15th May, 0300 multiple times (two to three times hourly) until 0900
15th May, 1600 consistently until 2000,16th May
15th May, 2300 frequently (more than ten times every hour) until 1000, 16th May
16th May, 1400 multiple times until 0100, 17th May
17th May, multiple times between 1700 and 1800
Moreover, please note that all the data is only Tweets data, as there would be a lot more wealth of material which would be associated with each Tweet via the website link, blog link, video link or picture link that came with it. My data does NOT include all those material. Even if I were able to undertake the effort in order to include all those text material and add them to this analysis, there will also be a LOT of other material which I would not have access to, as there are many sources of information which are 'blindfolded' by the Thai government's 'Information and Intellect Blocking Agency', or I-BA, for short.
Data Assembly, Refinement and Preparation
Formatting
During the two days, I eventually obtained over 10,000 lines of raw data, approximately 60,000 words, 400 (A4) pages. As I had manually copy-pasted everything from the live feed page of Twitter, all my text was therefore unformatted but it was at least chronologically pasted. Initially I wanted to format every single tweet into a long-running TELEX style single paragraph (chatter), something like this. But this procedure took me so much time (almost 7 hours) and I was only able to complete about 10% of formatting. Therefore I decided to skip this part and simply process the data into meaningful and usable bits for the Wordle program.
Translation
Firstly, I removed all the irrelevant tweets which I had collected, those which came through from my other 'Following' sources or those which were unrelated to the Thailand Crisis. Then I proceeded on to translate all the Thai language tweets into English. As there was a huge bulk of tweets in Thai, I used some help from Google Translate website for translating the most common words. Thereafter I manually translated over the Google Translated material and tried to make sense of all the mess of words that came up. It was a headache, so I didn't correct every word to make them mean a phrase, as Wordle would only need to read and analyse upon words, and not necessarily phrases or context. Although to be honest Google Translate did save a lot of time, it would have taken days to manually translate every tweet, to give meaning to every single slang, idiom and insult.
Examples of Thai language tweets were as follows,
อย่าเอาตัวเองไปเปรียบกับ พธม. เลยเสื้อแดง เพราะตอนนี้มึงเลวแซงหน้า Al-Qaeda ไปแล้วเค้าสู้เพื่อประเทศแต่มึงทำลายประเทศตัวเอง #thaicrisis 1 minute ago via Echofon
เผายางเฉยๆน่าเสียดาย จับแกนนำนั่งยางแล้วเผาน่าจะดีกว่า #thailandONLY #thaicrisis #redtweet
เศร้า เปิดเพลงนักสู้นิรนาม ประกอบภาพ เสธ.แดง OMG!!!! I love him. He's gone to heaven #thaicrisis 15 minutes ago via UberTwitter
Whereby examples of English tweets,
Just got a report that Lumpini Tower on Rama 4 is on fire. Can anyone confirm? Can only see smoke from here, probably burning tyres. 23 minutes ago via mobile web
All guests have evacuated Century Park Hotel, where most ground-floor door/window glass has been shattered. (Nation TV) 2 minutes ago via TweetDeck
And so on.
Spelling Check, 'Normalization' and 'Centralization'.
As you might expect, every other person on this Planet oftentimes spells or pronounces a Thai word in the English text rather completely differently and nightmarishly difficultly for any program to automatically correct. On top of that I couldn't also find any book that could help settle spelling disputes. For example, "Rajvithee" would oftentimes be spelled as Rajwithee, Rajvithi, Rajavithee, Rajawithi, Ratchvithee or any other possible variant of this. Therefore, in order for all the words to be correctly represented on the Wordle I had to 'Centralize' all of the different variants of each translated Thai word into a central unique menu of names. Below is the 'Centralized' menu I came up with. Please use this menu extensively for all your work (or else send me yours).
Individuals
Abhisit,Nawamin, Chavalit, Thaksin, Redshirts, Kothom, Wallop, Banharn, Chaisaeng, Chaisiri, Chaiwat, Chanudom, Chaturon, Chavalit, Cheewa, Chinnawut, Natthawut, Jatuporn, Jaturon, Panitan, Pinthongta, Paethongtarn, Panthongtae, Pojaman, Ponpat, Promman, Prompan
Places, Locations or Agencies
DusitThani (includes all Dusit), Century, Bonkai, Bantadthong, Bangkhae, ChaengWattana, Chidlom, Chonburi, DinDaeng, Ekamai, Erawan, Hotel, Khao, KhonKaen, KlongToei, Krabi, KlongDan, Dusit, KlongToei, Lerdsin, Langsuan, LatPhrao, Lumphini, Lumphoo, Langkawi, LaemChabang, Monument, MBK, NangLerng, Nonthaburi, Narathiwat, Phetchburi, Pathumwanaram, Pathumwan, Pattana, Pattaya, Phattanakarn, Phayathai, Phetchaburi, Phloenchit, Phloeng, Pratumwanaram, Pratunam, Pathumwan, Rama4, Rajprasong, Rajvithee, Rajdamri, Rajthewee, Ratchada, Rajprarob, Rajthewee, Ratchada, Ratchadapisek, Ratchathani, Rattanakosin, Rajdamnern, Rangnam, Ranong, Silom, Sukhumvit, Sukhothai, Sutthisarn, SalaDaeng, BTS, MRT, BMCL, Ruamkatanyu, NorPorChor (from Chor)
And finally, the 'Normalization' process, whereby all the various words of the same meaning but spelled differently (or in short form), such as 'gvnt', 'gov', 'govnmnt', etc are renamed into 'government'., and so on. And combined all the various synonyms into a single term, for eg., 'press', 'media', 'reporters', 'photographers' all renamed into 'journalists', or 'troops', 'soldiers', 'army', 'forces' renamed into 'military', and so on. I also automatically converted all 'demonstrators', 'street people', 'mafia', 'poor people', ''onlookers', 'babies', 'dogs', 'cats', 'danrivers', and all 'ghosts' into 'protesters', using an handy script uneasily downloadable from CRES.edu.org.
Finally, after running multiple spelling checks, word replacements and so on it was finally time to create the Wordles. I used the following master data file, please feel free to download and make modifications and use as you like. It is not a perfect and not a scientifically significant piece of data, but it was the best I could do.
Wordle Creation
My intention was to create various snapshots based on all the words collected from the Twitter feed. But ideally it would be best to create a series of different Wordles, but including / excluding / specifying certain groups of words in order to get the clear picture of things.
But to start off, I copy-pasted ALL the processed text in order to come up with a 'Superset' Wordle, a snapshot which simply shows the most active words in the entire file as is.
Wordle 1, SUPERSET 1 // Thailand Crisis (15-17th May 2010). (Click on image to enlarge).
Analysis.
The most visible term immediately is "thaicrisis" (#thaicrisis), which is the word most frequent in the data sample, followed by 'tweetdeck' (the method used by most users to post tweets), "redshirts", "bangkok", "military", "people", "redtweet", "@georgebkk", "web" and so on.
However, it simply shows the most amount of usage per word in the Twitter data material, and it is unable to paint any specific or analytic snapshot of the Thailand Crisis during the past two days. Therefore, in the second Wordle, after removing all Twitter-system specific words such as "tweetdeck", "retweet", "RT", "web" etc.
or
Wordles 2a and 2b, SUBSETS 1 // Thailand Crisis (15-17th May 2010). (Click on images to enlarge).
The resulting snapshot is a little clearer, whereby the word "thaicrisis" continues to be the most prominent word in the data set. A maximum limit of 150 (top) words was used for the above Wordle, and by clicking to view the larger picture you can compare the perceptive difference between different words, although still the names of the individual Twitterers or Twitter hashtags outweigh the actual words or word messages being tweeted themselves.
Therefore,
Wordle 3, SUBSET 3 // Thailand Crisis (15-17th May 2010). (Click on image to enlarge).
after all the various Twitter hastags such as "thaicrisis", "redtweet", "PADtweet", you get the snapshot above. Also increased the maximum words count to 250 words, so that in the enlarged image you can see in very smaller text the various hidden words. Notice here the following words which appear between the other (obvious) prominent words,
"fire", "tires", "shot", "Rama4", "KlongToei", "journalists", "CRES", "half", "now", which given at the times at which I gathered the Tweets you can understand that they were the terms being discussed most often amongst the online community. However, notice the emergence and early appearance of the various other words such as "children", "schools", "smoke", which could hint to the growing discussion of "children" amongst the protesters at the moment. Also notice less is the "police" mentioned in comparison to "military", and also note the very few terms relating to "die", or "died".
The blogger / Twitterer with the most posts or most mentions, as you can see is @georgebkk, who runs Thaivisa.com, followed by @tulsathit, the editor of The Nation, @RichardBarrow who created the Google Bangkok Dangerous map and @freakingcat, the person who yesterday shot the video of the child apparently being used as a human shield by the red shirts protesters at the tire barricades, where as the popular @bangkokpundit has a smaller tweet volume than others as he said he was quite busy throughout the weeks, although he hasn't mentioned till now as to who is the person in the title picture on his blog, it often makes me think of @bangkokpundit being an old person or else a younger person with no hair.
As a result,
Wordle 4, SUBSET 4 // Thailand Crisis (15-17th May 2010). (Click on image to enlarge).
in Wordle 4, having removed all the Twitterers, we now see the following following trend where names of places and locations start becoming more prominent, most representative of the fact that everybody is informing everybody else of where the action and disaster is taking place. You can see "Rama4" and "KlongToei", which is essence is almost the same place as it is only a block within. You can see "Lumphini", "Bonkai", "DusitThani", "Rajprasong", "expressway", "DinDaeng" and so on.
But notice how there is almost no significant sign of "Abhisit", or "Thaksin" or "terrorists", or "weapons". Perhaps all these terms are mentioned often times in full length news reports or news highlights, but in Twitter the trend seems to be more priority of keeping people informed of the present live situation or breaking news, rather than detailed analysis, etc, as you can notice how prominent the word "now", is.
The next Wordle is interesting,
It is interesting because I removed only the following FIVE words,
rem "thailand", rem "military", rem "redshirts", rem "bangkok", rem "people"
and it made a huge difference to the topography of the snapshot instantly. Suddenly many smaller words start appearing and easier to see. Words such as "rumors", "leave", "stage", "black", "help", "home", "closed", "still", "many" suddenly become more visible and all this actually paints quite a precise picture of topics or news items being discussed at those moments.
Also notice the link " http://bit.ly/9VkUId " that keeps showing up, which is actually a link to the page of Thaivisa.com's live report. It is clearly the most tweeted, retweeted link on Twitter regarding the Thailand Crisis in the past two days.
Wordle 6, SUBSET 6 // Thailand Crisis (15-17th May 2010). (Click on image to enlarge).
For this Wordle I removed a lot of terms, and tried to dig out some really specific words which are hidden in the data set but not possible to see because being out-shined by other brighter stars in the sky. I removed all non-specific descriptive items, regular adjectives, adverbs, etc., words such as "since", "tonight", "set", "exit", "look", "coming", etc, more than 40 such words, in order to see underneath them.
Heres what you can observe,
Words such as "like", "think", "must", "please", "come", "away", "know" are moderately prominent, although I struggle to analyse the link of these particular words to either the context of the news being reported and I struggle to explain why they appear more than some other words which should have been more discussed, such as,
Why are "M79", "sniper", "crackdown" are mentioned very scarcely, or is it true as the CRES keeps saying that they do not have any snipers and are not using M79 during these operations? Also notice why there is relatively such smaller chatter about "peace", "thaksin", "love", "better", "soon", "well", "safe", "democracy"? Are these the things that people are tweeting very little about because they really don't want these things right now, or simply because that there aren't the above things therefore there is nothing to tweet about?
There is a lot to think about and I think you would agree that the Wordle snapshots are quite accurate and representative of the actual sentiment or essence of the information being exchanged within the time frames of the particular data set. Anyways, as for the final Wordle,
These are, in my opinion, the truest and most specific topics being actually discussed in the community. Apart from all the other generic or circumstantial words, after we have removed all the chatter and surrounding noise, we end up with the following words which are always visible.
"abhisit", "thaksin", "king", "world".
because I am certain than as the days move on and the 'every day point of interest' changes, terms like Rama4 or DinDaeng are replaced by other location names, but those four underlying references above will still be there all the time.
A True Snapshot
I know that all of the above is not a scientific experiment, and there are many, many fallacies and errors. But all I wanted to do and wanted to know was as to 'What' were people really, really talking about. What was the actual message hidden in all the tweets around the World with regard to Thailand Crisis. I know that my data set does not represent 100% of all the actual tweets, and it does not even take the news accounts, reports, blogs, rumors, gossip and the wealth of the World's conversation into account. But I just wanted to know and I still would love to know.
I would love to take the bulk of ALL the insights, analysis, reports and talk about all the things related to Thailand Situation, even all the secret blogs, the blogs on the other side of the I-BA (Intellect Blocking Agency's) blindfold, and then throw everything into a Wordle script, so that one can really see, what is the real truth behind everything.
Thanks for the read, and please forgive my lengthy text as I am not a very good writer, I just realized.
Godspeed!









11 comments:
you should print those images out and take them to a galley, very nice art work in my opinion
If you have a really large version of this, it'd be fantastic. Hell, I'd order it if you had it:
http://farm4.static.flickr.com/3380/4616400683_1882674938_o.gif
Really interesting work there, thanks for doing it. I watched it all unfold on Twitter too and this is a fascinating way to look back.
This was fascinating. Well done.
Impressed!!
Impressed!!
you've really freeze framed moments in a way that'll uniquely cue some of this time's memories. Thanks for taking the time to do this for us all. Hopefully lessons learned will help color a brighter future for this wonderful country.
fascinating , i agree with one post , you MUST find a gallery to show this . good job, very smart idea , The new way of art/information with new technology.
I admire you touch in in colours that represents each crisis/issues/plot/ by showing the weight and manipulations of the texts orders. I LOVE IT.
I saved all your pictures and kept it in my album in FB. Thanks you are a genius. You wrote history and preserved them as it goes.
This is one of the fascinating things I've ever come across. I can imagine how may hours you put into this, thanks for all the work. The end result on the final wordie is no surprise really is it?
Agree too with the comments about it making great artwork.
Magnificent job done. (I have no idea where the name 'miss' came from, the blog has picked up my gmail account but there's no 'miss' or anything like it in account or real name, but I don't know how to change it)
Thank you for all the comments!
Although all credit for the beauty of the images goes to Wordle.net for their magnificent site. The only difficulty is in arranging and rearranging the words so that they are evenly arranged across the squares and makes it easier to see. Yes you can play around with the color options too. There was a huge selection of text, layout and language options on the site, I myself only came to know about this while watching CNN one day. My effort was mostly in copying the data and preparing it, thats all.
@Kevin, the size of image was limited by the screen resolution, same as the webpage resolution at Wordle, hence the onces here were the maximum resolution I could get, please feel free to download.
For us who stay in Bangkok and had followed minute per minute the Twitters, what you have done it's really something meaningful to see. This remind me Guernica of Picasso.
Post a Comment