Monthly Archives: January 2014

Data Journalism – a modern day Bletchley Park?

Photo by DaveonFlickr CC licence (CC BY-SA 2.0

Photo by DaveonFlickr CC licence (CC BY-SA 2.0)

I can’t help it. Whenever I hear people talking about their data-driven journalism work, I start imagining starkly-decorated rooms with complex machines crunching endless numbers; teams of frighteningly intelligent young people looking for meaningful patterns in apparent chaos. They work round-the-clock, driven by a mission to save the free world from oppression, fuelled by cups of tea.

(I’m going to ignore Jonathan Hewett who shattered this image last night and insisted that day-to-day life in his Interactive Journalism department at City University is actually far too mundane to be compared to WW2 code breakers. I’m sure they had timetabling issues in Bletchley Park too.)

Last night I found myself in a room intimidatingly full of Data Journalists. It was a Media Society event to launch the new book, Data Journalism:Mapping the Future to which I contributed a chapter. It was a fascinating evening which Adam Tinworth has captured very well in his excellent blog.Data Journalism:Mapping the Future

The evening was almost ruined when Raymond Snoddy unexpectedly asked me to say something from the floor. I burbled something along the lines of “where do Data Journalists come from and how did you get to be that way?” I imagine the recruitment process is probably similar to Bletchley Park. An ability to complete the Daily Telegraph crossword in under twelve minutes, for example. Certainly, the route into Data Journalism is not an obvious one and a period of studying journalism at a UK University certainly doesn’t seem to be part of that route.

This makes me sad.

But would adding a compulsory statistics module onto journalism courses, for example, help? I’m sure many students attracted to journalism because they like WRITING would run a mile. So perhaps we need to stop marketing journalism as purely a subject for the arts students (who often take a worrying pride in their ignorance of maths.) Perhaps we should make it obvious – to a world which really doesn’t know this yet – that journalism is also a subject for numerate students…..who can also write. (Because nobody boasts about being rubbish at writing, do they?)

In the States, some journalism students graduating with data journalism and/or programming skills have taken to calling themselves “unicorns” because their set of skills makes them so rare. This doesn’t help. It implies their skills are exceptional, difficult, elite.

Worse, it lets the rest of us off the hook.

Sure, some aspects of data journalism do require an exceptional level of specialist knowledge and skill. But other aspects are definitely attainable so long as we decide to make the effort.

And that’s where journalism educators come in. We have to show that we believe these basic numeracy, stats, spreadsheet, web scraping skills are perfectly attainable rather than always treating them as peripheral, geeky.

There is no such thing as a unicorn; only hard work.

Digital Journalism Classroom Activity – Creating a simple survey, visualising and mapping the results

This has taken up a lot of my time today. I had a fairly simple idea in mind. At the start of my Digital Journalism module with first year undergraduates in a few weeks time, I want to conduct a very simple survey of the students – how many own smartphones, what social media do they use, which one do they use most. I thought it would be fun to compare our students with the national and international picture.

I wanted to avoid Survey Monkey or the Blackboard tool because I wanted to show the students how data could be collected, put into a machine-readable format and visualised. This would be a very simple introduction to the idea of Data Journalism. My idea is to collect the data before they go on a break then show them what I’ve been able to do with that data when they get back. I may not show them all the nitty gritty of the process in week 1.

It’s been a fun task because I had to learn so much along the way and solve some tasks I didn’t think I’d be able to. This is another lesson I want to pass on to students – you can Google your way out of any situation if you put your mind to it.

So if you’re a data-beginner like me, some of this might be useful.

    • Create the Survey

This is the easy bit! Go to Google Drive, create a form and start putting together your survey questions.

Screen Shot 2014-01-16 at 21.13.46

Screen Shot 2014-01-16 at 22.00.51

You’ll be asked to Choose Response Destination. I chose to create a new spreadsheet (but you could chose to keep the data in the google form and download it later).

This is where all the responses to your questions will appear. The beauty of having them in a spreadsheet is that you can analyse the results and use various Google tools to visualise them.

Here’s the ludicrously simple form I’m using as an example for this post.

Screen Shot 2014-01-16 at 22.05.23

NB the third question “Which social media do you use?” is a checkbox question – the students can tick as many boxes as they want. The following question is multiple choice so they can only tick one box. This becomes important later on, folks!

There’s a Send Form button at the bottom which generates a Link so you can send this to your whole class by email, for example.

  • Get the Results

As your students fill in the form, the results are sent to the Google spreadsheet you created. You now have some basic information about the students in a machine-readable format. It’ll look something like this.

Screen Shot 2014-01-16 at 22.07.15

  • Create a Map

This is the fun bit! We can straight away visualise some of this data on a map. Just go to Google Maps (having watched this tutorial if, like me, you need a refresher).

Click on My Custom Maps and then Create. You will be given the option to Import. So go ahead and import your Google spreadsheet of survey responses.

I ran into a problem here straightaway the first time I tried this.

Screen Shot 2014-01-18 at 00.36.28

Googlemaps rejects any spreadsheets which has punctuation it doesn’t like in the column headings (which correspond to your survey questions) so it’s best to avoid brackets and commas etc in your survey questions. You can, of course, edit the column headings on the spreadsheet before importing it into Googlemaps.

UPDATE – I’ve found an other upload error into google maps which generates this message – “Column names cannot include these characters.” If the column name is too long, the spreadsheet gets rejected too. So you might have to rewrite some column headings to shorten them if the survey questions on your original form were long.

You’ll end up with something like this, following the instructions in the tutorial. I gave YES responses (pins) a different colour from NO responses. Note that Google maps also tells you how many YES and NO responses you got so you might want to make a note of those figures to make a chart later.

Screen Shot 2014-01-16 at 22.36.47

  • Create a Pie Chart

You could now go back to your Google spreadsheet and maybe you want to create a pie chart to show which is the most used social media. Click on the drop down menu at the top of this column and choose Sort a-z. This will cluster the different responses. Count them up and create another two columns on your spreadsheet to display this new information. But, if you’re feeling fancy (and if you’re dealing with a lot of responses) you should use a COUNTIF formula to automate this process and save you having to count. This video explains it really well.

Screen Shot 2014-01-16 at 22.48.41

Not very exciting. Perhaps a pie chart will spice it up?

The Googlechart facility makes this job really easy once you’ve sorted the data. Again, the video above explains the process really clearly. And I think the pie chart does the job here. Facebook wins!

Screen Shot 2014-01-16 at 22.49.56

  • Google Refine – multiple values in a single cell on spreadsheet

That just leaves one survey question we’ve not tackled yet. Remember I created a checkbox question where students were asked to tick all the social media they used. That means we’ve got multiple responses in a single cell on our spreadsheet. This is not very machine-friendly and you won’t be able to do any visualisation with it looking like this. We need to separate out those responses into different rows or columns.Screen Shot 2014-01-16 at 23.05.15

I ended up using Google refine to do this. It’s amazing! It’s a tool for cleaning up messy data which is just what I need here and these tutorials are pretty useful although the second is a bit daunting for a beginner like me.

You need to download GoogleRefine (easy) and import your Google spreadsheet (I had to export it to Excel first). Go to the column with the problem data and click the drop down arrow on the column header. Choose Edit Column – Split into several columns.

Screen Shot 2014-01-16 at 23.11.28

The pop-up box then asks you what’s separating the different elements in the cell that you want separating. Easy – a comma (but it might be a hyphen or just a space, for example).

Screen Shot 2014-01-16 at 23.12.17

Click OK. Wow!Screen Shot 2014-01-16 at 23.14.49

Now all the responses are separated into different cells and the computer can do something with them!

OK, I’m finished with Google refine now and can head back to Google Spreadsheet. So I can Export as a CSV (Comma Separated Values) file then import that file into Google spreadsheet.

But I still want to be able to count up the responses for each social media and it’s a pain having them spread over several columns like that. So I just copy and paste the values from each column into one column.Screen Shot 2014-01-17 at 10.03.43

As this point I highlight the whole spreadsheet and go to the View tab so I can Freeze Row 1. That just stops things moving around when you come to sorting your data.Screen Shot 2014-01-17 at 09.53.58

So now I’ve got all the instances of each social media in a separate cell in one single column. Now it’s pretty straightforward from here on in. I start by selecting the column and sorting it A-Z so that it clusters the different values together for me which just makes it neater and easier to handle.Screen Shot 2014-01-17 at 10.01.53

Then I start another column in the spreadsheet and I copy and paste each value onto a separate row like this.Screen Shot 2014-01-17 at 09.59.43

 

I copy and paste rather than writing out manually just in case I make a mistake or inadvertently add a space or something which might skew the results. So I want to find out how many times each of those words appears in Column C I created in the step before. We can use a COUNTIF formula here for each value. For example, =COUNTIF(C2:c17, O2) should tell us how many times the value FACEBOOK appears in Column C. So just do that for each value.

It’s then a simple process to turn it into a chart as before.

Screen Shot 2014-01-16 at 23.22.08

…..and voila!

“Recalculating the newsroom: the Rise of the journo-coder?”

So, this is exciting.

As part of my journey to becoming more tech-minded, I wrote a chapter for a new book called “Data Journalism:Mapping the future.” (It’s due out in a couple of weeks time and I’ll post details then so you can rush out and buy a copy.) It’s edited by John Mair and Richard Lance Keeble with Paul Bradshaw and Teodora Beleaga (Abramis)

There’s a book launch on January 22nd at the Adam Street Private Members Club, just off The Strand. Do come!

 Data journalism – mapping the future?

Chair: Raymond Snoddy. (Former Media Editor The Times)

Panel: David Ottewell – Head of Digital Trinity Mirror

Martin Stabe – Head of Interactive News The Financial Times

Jacqui Taylor – CEO Flyingbinary Limited

A new way for journalism or just old clothes disguised as new? Should journalists be programmers? Should they all have computing skills? Does Data Journalism help comprehension?

To mark the publication of Data Journalism; Mapping the Future? Edited by John Mair and Richard Lance Keeble with Paul Bradshaw and Teodora Beleaga (Abramis)

So, what’s the chapter about?

I decided it was time to find out just how much coding skill journalists need in newsrooms today. Is the journo-coder a myth? Do we all need to have a github account as well as a blog? Should students ditch shorthand and learn to code instead?

I interviewed journalists and developers working in the interactive news departments at the BBC and Financial Times. They were extremely helpful and very generous with their time. I was interested in what skills they had,  how they learnt their skills and how they worked together.

Only one person was comfortable describing themselves as a “journo-coder” or any of the other ugly, hybrid phrases that are out there. The rest strongly identified as either journalist or developer. BUT, when pressed, the journalists admitted they had a pretty exceptional skill set that you wouldn’t expect a conventional newsroom journo to have. A top-notch developer wouldn’t call them coding skills and wouldn’t even mention them on their CV because they’re pretty trivial.  But for a journalist, WOW! Writing complex Excel functions or managing a database using SQL or writing a  simple scraper in Python – these are cutting-edge skills that move your journalism into an exciting new era of interactive storytelling. They are also highly marketable skills.

And they didn’t learn them in journalism class.

I’ll post more after publication and I’ll probably re-version it as a slideshare as well for anyone who’s interested.