Last week, I tweeted that I was in Canberra. I was there to acquire some documentation required for my visa (and they better goddamn give me my visa – I’ve been waiting for about 2 years for it already, and my partner already has her permanent visa). And of course, this tweet about a commotion from the gallery of the House of Representatives would indicate I was in the Parliament at that time. Of course that was a delayed tweet since phones weren’t allowed in the gallery. The following tweet would then give clues to what I thought about the visit to Question Time.
Yes, I decided to make a visualization on Question Time. I sat through the entire session, and I thought it might be interesting to do a dataviz on what I saw. I went to the Parliament Hansard records site * The words “Hansard records” comes from the redundant department of redundancy , picked out the hansard for the day I went (September 13), and processed the Question Time transcript. The day I was there, I noted only two main topics of interest – Asylum Seekers (a.k.a the Malaysia Plan vs the Pacific Solution) and the Carbon Tax * As an economist, I must say the carbon tax is a brilliant idea . I also noticed that for the Carbon Tax/Climate Change questions were from government backbenchers, and that the opposition rarely asked questions (only ridiculous interjections). The Malaysia vs Nauru plan for refugees however, had brilliant responses from both sides * no actually, not really, both sides’ arguments are utter and total crap – I can see so much better solutions for both parties to join together and opt for. Seems like politicians don’t think out of the box . This to me indicated that there would be two ways to visualize Question Time.
So, without much ado, I present: the Carbon Tax/Climate Change visualization from parliament Question Time on 13th September 2011:
Why a Word Cloud?
I chose a word cloud because of the nature of the queries. I quickly noted, though it was my first time sitting through an entire Question Time (other times are through ABC news, which only shows snippets), that there were two kinds of questions – one that supports the cause of the person being questioned, and another that opposes the cause. It was clear that the questions asked about the carbon tax were supportive of the carbon tax, and it merely served as an exposition and an elaboration to the carbon tax. Because it is fairly one-sided, we can assume that it’s pretty much like a speech (though occasionally said by different people) about the carbon tax.
How Was the Word Cloud Made?
First I extracted the text from the hansard, and grouped them according to topic – in this case the carbon tax. It’s good that the hansard itself is nicely classified too. If you read the hansard of the 13th of September, you’ll notice the headings. I’ve lumped Climate Change, Carbon Tax, and Clean Energy Future together for this visualization.
I then put the text through a natural language processing script, picked out all the nouns and frequencies of the nouns.
I then generate the word cloud with advanced wordle.net. I tried a couple of word cloud generator (including one that generates html – which was what I was initially looking for), but I was ultimately happiest with Wordle. My first attempt with Wordle caused the image on the left. Feel free to click to enlarge.
Do note that it looks very different. The font I had chosen was my default go-to font for data visualization – Gentium. I later changed it to Tank Lite, because that was more impactful. Also, do note the word sizes. “Carbon”, “Energy” and “Price” are disproportionately large. This was the original image I had. It’s now updated to the one above.
The difference between the two are the frequency scales. In the original (the one on the left), I had set it to an absolute scale. The word “Carbon” was said 1056 times, and the word “Price” was said 784 times. When I decided to generate the new word cloud above, I scaled the frequencies to a log scale, creating a more eye-pleasing visualization.
It is a toss-up, really, between eye pleasing visuals and using a scale that humans are biologically not used to.
What Can We Learn from the Government’s Responses?
Well, for one, the government is very interested about carbon pricing, and clean energy. However, “Government” outranks “Family” and “Future” by almost twice. A disappointing thing I noticed is the science is pitiful. The amount of science-related keywords on a debate of the carbon tax can be counted with fingers (32 times). Mentions of the economy were high, but figures relating to economic growth were again quite few.
This is what I think – do note it is entirely an opinion and not facts backed by hard evidence: There is a lot of gas about climate change and the carbon tax. I was excited to go there and listen to people talk about the science of climate change, you know, I had in my mind something like Julia Gillard drowning Tony Abbot with facts and figures and regression analyses on climate change (and maybe beating him down with a huge statistics book and telling him to RTFM). Instead, what I got was a lot of rhetorics, and empty statements. Colour me disappointed, but I guess this is politics.
On Asylum Seekers
The topic of asylum seekers was more lively. Remember what I said about having opposing questions? This was one with opposing questions, and this has allowed me to work a visualization that I’ve wanted to try for a long time – a document contrast diagram. It was first invented by the guys at Neoformix, but I thought it’d be a good way to convey the topics talked about by two parties – specifically the Government and the Opposition. Behold, the document contrast diagram for Question Time on 13th September 2011:
How to Read The Chart
- The size of the bubble is the frequency of the word used in the entire conversation.The bigger the bubble, the more times the word was used
- The horizontal position of the bubbles indicate the party that spoke the words more – words at -1.0 were spoken only by the Government, and words at 1.0 were spoken only by the Opposition
- The colour of the bubbles indicate the average sentiment when the words were spoken. Red represents a negative sentiment, yellow is neutral and blue is positive.
How the Chart was Made
Like the cloud above, I first mined the text, separated it into Government and Opposition, and then ran it through a natural language processor to pick up the nouns and determine the sentiment of the words used. I collected the word frequency by both parties, intersected them, and calculated a horizontal position for them. I then plugged all the data into R and used ggplot2 to plot this with geom_point and geom_text.
There are a lot of interesting observations that can be made. At first I made a mistake in thinking that both parties seem to obsess a lot about the other party – for example: “Opposition” appears a lot in the Government’s side while “Prime Minister” appears a lot on the Opposition’s side. However, I soon realized it’s because they were addressing one another. Another thing of note is that the Government seems to be using more words of positive sentiment. It should be noted however, that sentiment mining is still in its infant years, and I had actually spent more time writing code trying to extract text from a PDF.
Another interesting observation is that both parties don’t share words as much as I expected. Perhaps it’s because I’ve been playing around more with debates whose audiences are there to be swayed, and hence similar words of positive sentiments are shared by both parties.
So there you go… that’s how I’d visualize Question Time. There was another method that I wanted to do, which was more similar to NY Time’s Debate Transcript Visualization, but I ended up having to worry more about my visa than coding hence this post was a work in progress for over 3 days. Plus, I also got sidetracked with more natural language processing methods. So what do you think about it?