Author Archives: LizHannaford

Doing Data with Journalism MOOC – Top 5 Takeaways

So the data journalism MOOC has come to an end – for me at least. I just did the final test and applied for my certificate of completion. I’m going to miss it, without a doubt. I liked the discipline of heading up to my study four evenings a week to learn something new. And I always like messing around with numbers and problem-solving.

MOOC disturb

I also liked interacting with fellow participants on discussion boards and on Twitter under the hashtag #ddjMOOC. I’ve made some good connections with fellow students and I hope those connections continue and lead to new projects and more knowledge-sharing.

Here’s a list of my Top 5 takeaways from the course. But what are yours?

  1. Pivot tables. Some years back, I’d taken against Pivot Tables when I first started to explore them. Don’t know why. So thanks to Steve Doig for the clear explanations and excellent examples. I’m a convert!  
  2. Open refine. I’d played around with Open Refine before but had always been a bit wary of GREL. Nicolas Kayser-Bril helped me to understand the logic and structure of GREL so I feel a lot more confident exploring the possibilities.
  3. Open Refine again! Text facets and Clusters. Can’t wait to get my hands on some filthy data.
  4. Changedetection tells you when a website changes. Sounds so simple but any data journalist will tell you this is GOLD.  Lots of fellow participants seemed to agree. Thanks Paul Bradshaw for all the tips on finding data.
  5. I was fascinated by the science behind data visualisation as explained by William
    Relative accuracy of comparison using different basic visual features, from Cleveland and McGill. Visualisation from Alberto Cairo’s The Functional Art

    Relative accuracy of comparison using different basic visual features, from Cleveland and McGill. Visualisation from Alberto Cairo’s The Functional Art

    Cleveland and Robert McGill in their 1984 paper on Graphical Perception. So thanks to Alberto Cairo for highlighting this important piece of research and visualising it so beautifully.








So what are your Top 5 Takeaways from the MOOC? Drop me a line.

Tour de France – Doing Journalism with Data – Beware “Official” sources

Bike [Die, France]By Biphop Licensed under Creative Commons (CC BY-NC-SA 2.0)

I wanted to be able to apply some of the things I’m learning on the Doing Journalism with Data MOOC to a topical project that interests me and isn’t too heavy. So I decided to look at how international the Tour de France has become. For example, how has the proportion of French riders changed over the years? Which was the first non-European nation to take part? Which European countries have never had a rider in the Tour? With the race starting in Yorkshire this year, it seemed a good time to look at this topic.

But I needed a set of data with the names and nationalities of each of the participants from 1903 to the present day. That should be simple, shouldn’t it?

I started with the official Tour de France site‘s historical archive compiled by the current race organisers, ASO. It looks good and has various search options.

Screenshot - Historique de Tour de France

Screenshot – Historique de Tour de France

It wasn’t easy to scrape because the nationalities appear as flag jpegs rather than the actual name of the country. So I had a look to see if there were other websites which had a simpler structure. in the States had a high Google ranking. It doesn’t look as slick as the official Tour de France site but it was easy to grab the table of participants.

Screenshot -

Screenshot –

BUT its data didn’t match the official site so my initial reaction was to assume it wasn’t a reliable source. I persevered with scraping the Tour de France site.


At regular intervals, I would cross-reference the list of participants with Wikipedia (and the sources cited in Wikipedia), especially if I came across a non-European participant in those early days. Hmm, some things just weren’t matching up. I started to get concerned about the “official” data. My concerns reached crisis point when I checked the data for 1947. The Wikipedia entry says this was the first year a Polish rider took part – Edward Klabinski. There was absolutely no mention of him in the official Tour de France archives. Nor was he mentioned in 1948 or 1949 when he also, apparently, took part. And he wasn’t a nobody! He came 18th in 1947 and was also the first winner of the Dauphine so there are plenty of records for him out there. I checked to see if there was an alternative spelling of his name (sometimes he was known as Eduard but that didn’t appear either) or if he’d changed nationality. No, nothing. He did appear in though…

So I sent an email to Bill McGann who runs that site to see if I could get to the bottom of this. He was kind enough to email straight back and said he too had been surprised about how unreliable a resource the official website was! He suggested a couple of other resources which he had used – the Tour Encyclopedie (out-of-print) and Memoire du Cyclisme which claims to be the most reliable source on the web.

I felt bad for having assumed Bill’s website wouldn’t be as good a source as the official site. It’s actually extremely good! But he didn’t have the list of participants for every year that I needed so I had to carry on looking.

Memoire du Cyclisme requires a small subscription to enter so I actually parted with cash – I was that desperate. I’ve not been sent the password yet so need to chase it up…

Other sites had data but I couldn’t scrape it (this was before I’d discovered OutWit Hub which may have helped).

Then I remembered we had a History of the Tour de France by Geoffrey Wheatcroft on the bookshelf. What did he have to say on the matter?

The sources for the history of the Tour de France are exiguous, inaccessible, and largely corrupt. Plenty of popular books on the subject are riddled with error; and when three different, officially sanctioned reference works, the Tour Encyclopedie, the press office’s Histoire, and the Tour’s website, can’t agree on the number of entrants in the field one year, or on the spelling of a rider’s name, then it’s tempting to echo Sir Francis Hinsley in ‘The Loved One’ – ‘I was always the most defatigable of hacks’ and give up.’

(Wheatcroft, 2003, p329)

I too was on the point of giving up but I shared my frustrations with the MOOC community on one of the discussion boards and was really pleased to get a couple of replies. They spurred me on!

Harness the power of Wikipedia

I trawled through the sources mentioned in Wikipedia references – they can be a really useful resource. I found which matched well with Bill’s data.

Scrrenshot -

Scrrenshot –

I started scraping it (them??) into my spreadsheet – still cross referencing. By now I’d come across a couple of other amateur archives, in particular TourFacts.DK There were still a few anomalies with the number of participants but only a few either way.

I emailed Cycling Archives and TourFacts.DK to ask about their sources. Svend has been working on TourFacts.DK for seven years! He used mainly plus a lot of googling. He said he’d emailed the Tour de France a long time ago about their errors but never got a reply. (Alarm bells were raised for him when a rider listed as being born in 1956 rode the race in 1923!!)

I got a reply from CyclingArchives today. They directed me to their page about the sources they use. It’s pretty comprehensive (including Memoire du Cyclisme) but very interestingly, their list does NOT include

So, I may not have found the PERFECT data. I don’t think the perfect data exists for the Tour de France which was so chaotic in its early years. But the community of archivists I’ve been able to speak to seem to agree that I’m probably using the least unreliable one. I think that’s the best I can hope for.

 What have I learnt?

  • Never underestimate the reliability of an amateur archivist. These people are passionate about what they do and very diligent. They’re also very well-connected and rely on a network of fellow experts who are equally enthusiastic about maintaining reliable, historical data. Chapeau!
  • Never assume official data is the best. Just because a site looks fancy, doesn’t mean it’s using reliable data.
  • Cross-reference all the time. 
  • Talk to people. In my experience, they are extremely happy to help. Let them know how much you appreciate the work that’s gone into their archive and ask if they can tell you more about the sources they’ve used, how they’ve dealt with discrepancies etc. Become part of the community!

If you’ve had similar experiences trying to get hold of seemingly straightforward data, let me know. I’d love to hear your stories and share your frustrations.


Get ready for Doing Journalism with Data MOOC

Exit Festival 2012 by Bernard Bodo. Creative Commons (CC BY-NC-SA 2.0)

Exit Festival 2012 by Bernard Bodo. Creative Commons (CC BY-NC-SA 2.0)

We’re excited! This MOOC from the European Journalism Centre  – “A free online data journalism course with 5 leading experts” – starts on Monday 19th May.

….and it’s not too late to join the party! I’m doing it because I want to keep building my Data Journalism skills and find out how data journalism is developing around the world. But, as an educator studying for a Masters in Blended/Online Learning, I’m also interested in the whole MOOC phenomenon.

So what’s the best way to get ready for this MOOC? Here are a few ideas.

Understand how Massive Online Learning works

Watch this for a quick familiarisation from Dave Cormier.

There are different platforms for MOOCs. Coursera is probably the best known, but there’s also Udacity and a platform created by the UK’s Open University called FutureLearn.  The Doing Data with Journalism MOOC will be using Canvas. I recommend adding the Canvas bookmarklet for this DJ course to your browser so it’s really easy to get to and a constant reminder you have work to do!

What kind of MOOC course are you doing?

Screenshot of Dave Cormier's MOOC video

Screenshot of Dave Cormier’s MOOC video

  • xMOOCs refers to the Coursera-type model where a teacher-expert transmits knowledge through carefully packaged videos and checks that knowledge has been acquired through computer-graded quizzes. Support comes from occasional tutor-participation in discussion forums. But support is also encouraged through students organising face-to-face meet-ups in their locality. There’s probably a Coursera meet-up group in your area!.
  • cMOOCs rely on a more connectivist approach, making use of the networked web 2.0 technology. There can be an emphasis on content creation, for example, as a way of building knowledge. They are more student-centred in that there is no set route through the course and a limited structure so there’s more learner autonomy. Webinars with guest speakers, blogs and online facilities for students to connect with each other are a strong feature. Support comes from peers and is facilitated by networked technology.
  • quasi-MOOCs are not much more than Open Educational Resources tutorials such as the Khan Academy and, more recently, Codecademy. The learning resources are asynchronous and don’t really offer social interaction unless students generate it themselves. quasi-MOOCs are not packaged as a course but as a series of standalone tutorials. So support would be minimal here.
  • Dead MOOCs OK, so this is my own category. I use it to describe archived MOOCs. So the actual MOOC is no longer running – the tutors aren’t around and the submission deadlines have passed – but you can still watch the video lectures and do the online quizzes. You won’t get a badge or certificate at the end but you can still learn.

I don’t know which model Doing Journalism with Data will follow so it’ll be interesting to find out.

You can read this article by George Siemens to learn more about the MOOC phenomenon.

Statistics: Making Sense of Data

By Ainali. Creative Commons CC-BY-SA 3.0

By Ainali. Creative Commons CC-BY-SA 3.0

Without a basic understanding of statistics, data doesn’t mean much. I’ve been taking this Statistics MOOC from the University of Toronto for the last couple of months. It uses the Coursera platform and, thanks to the video lectures from Jeffrey Rosenthal and Alison Gibbs, I can now only talk about data in a Canadian accent. It’s a great example of a dead, xMOOC! Even though the discussion forums are a year old, they’re still really useful when I get stuck (often.) The submission deadlines are long gone so I’m not going to get any badges or certificates, but I still get marks for the multiple choice quizzes I do and I can even do my assignments because the lecturers have supplied a “model answers” sheet for me to check against. It’s been a great way of brushing up my A-Level Statistics and putting it into a more practical context and I highly recommend adding some statistics to your Data Journalism skill set.

Let’s kill the myth that journalists can’t do maths here and now!

Explore examples of Data Journalism

This is a great way to get in the zone. Once you start looking, you’ll find loads of examples. Think about the kind of data that was used and where it came from. What journalistic processes were added to the raw data to make it journalism – e.g. context, visualisation, interactivity?

And a note of caution, just because it’s data doesn’t mean it’s journalism. You still have to check your facts, the data source and make sure you’re not asking your data to do more than it’s capable of. Here’s a cautionary tale you should read before embarking on your Data Journalism MOOC. It’s about this map of kidnappings in Nigeria produced by FiveThirtyEight.

See you there….

So, if you’re one of the 20 393 people already registered for the MOOC, I hope to see you online and share some learning. Do drop by and say hello!

Next job – I’ll be putting together a list of Top Tips for learning with MOOCs in the next day or so.

Preparing data for Google Calendar

So it seemed quite straightforward. My husband works irregular shifts at the BBC and his monthly shifts appear in a table on the BBC Intranet. But he would like to import those shift times into his Google Calendar. Here’s how I did it. (This may not be the most efficient way so do please add a comment if you can streamline the process). By working through this problem, I picked up a lot of useful little new tricks along the way which proves the general point that the best way to pick up new tech skills is to give yourself a project to work through.

Grab the data

The shifts are in a table. You need to copy and paste it directly into a spreadsheet. I’m using Google here. This isn’t the actual data from my husband’s timetable. For privacy purposes I’ve made it up but the format is the same as the original.

Spreadsheet pix

I’ve also reduced it to just 5 rows of data so it’s easier to see what I’m doing. In real life, for a data set this small, it would probably be quicker just to type the shifts into your calendar. But the original is much longer than this so the process is worth it.

At this point, you may want to do some tidying up. Sometimes the descriptions of the shift can run over to an adjacent cell or down to the next row so you might need to copy & cut & paste those so they’re in the right place.

Google Calendar is fussy

Very fussy! You have to get the column headings for each column absolutely right or it pretends it can’t see the data. The column headers you need are Subject, Start Date, Start Time, End Date, End Time with the capital letters in the correct place. So let’s put some of those in where we can. We can also delete those blank rows. You end up with this.

NewSpreadsheet pix

So our spreadsheet is looking pretty neat now BUT Google Calendar still won’t like it. We’ve got the start time and end time in the same cell which is no good. And the date format is wrong. Google Calendar only recognises dd/mm/yyyy so we need to get rid of the day abbreviation. 

Split cells in Google Spreadsheet

This is a useful function (which can be done in Google Refine too as explained in an earlier blog post) but we’re going to stick with the spreadsheet to save all that import/export nonsense.

Let’s deal with the Start Date column first. We need to get rid of the day of the week so we’re just left with dd/mm/yyyy. We’re going to use a Function to separate the data we don’t want from the data we do want. The function is called SPLIT. In column D2, write this function =SPLIT(B:B,” “). That means you’re selecting column B as your range of data. Then the separator (in this case a space) is in inverted comas. You might need to grab the bottom right-hand corner of the corrected cell and drag it down to apply the formula down the length of the column. You should end up with this.

Split pix

Delete the surplus columns. Duplicate the column with the correctly formatted dates because you can use that for your End Dates.

NB This SPLIT function is very useful when you want to split full names into separate columns for first name and surname, for example.

Format Times in Google Spreadsheet

Now we need to look at those times. We have two problems here. The start and end time are in the same column so we need to split them in much the same way as we split the date above. But there’s an additional problem. Google Calendar likes the time in the format hh:mm (I told you it was fussy).

We’re going to deal with the second problem first. Highlight two empty columns. We’re going to format these columns so that when numbers appear in them, the spreadsheet will automatically convert them into the format we want. So once the columns are highlighted, click on Format>number>more formats>custom number format.

In the text box at the top, type in 0#:00 (this gets round the problem that some times may have been input as just 3 digits instead of 4). You won’t notice anything happening at this stage. But wait…..!

At the top of the first of those columns you highlighted and formatted, write another SPLIT function like this =SPLIT(C:C,”-“). This means I want the spreadsheet to split column C (I’ve moved things around a bit so that’s where my times are now) at the separator – (hyphen). Not only will the times now be split into two columns (representing Start Time and End Time) but they will be automatically formatted as HH:MM as we wanted!! Again, to make this apply to the whole column, you may need to grab the bottom right-hand corner of the first cell and drag it down to the end of the column. Voila!

Cleaning messy spreadsheet

So everything should now be correctly formatted. But the spreadsheet probably looks a bit of a mess with surplus columns and columns in the wrong place (remember how fussy Google Calendar is). We can’t just start deleting and moving around though because a lot of the values we can see are dependent on functions we carried out on columns we’re going to delete. Once the column is deleted, the spreadsheet can’t perform the function to generate the values. So we need to do some copy&pasting into empty cells.

BUT we can’t just do an ordinary copy and paste of, say, the newly formatted times. (Try it and see what happens!). Instead we need to use the Special Paste option which you’ll find in the Edit tab. Select “Paste Value” and you’ll get the actual number. You might need to reformat it again to hh:mm.

Once you’ve done all that, deleted surplus columns, rewritten the headers in the way Google Calendar likes them, in the right order etc etc, your spreadsheet should look like this.


Finished pix

Pretty, huh?

Import spreadsheet into Google Calendar

We’ve finally made it. First, you need to download your spreadsheet as a .csv file.

Go to your Google Calendar. Go to Settings (Click on the wheel icon on the top right-hand side). Then select Calendars in the top left-hand corner and half way down, choose Import Calendar. Import your .csv file and, if the Google spiders are on your side, you should see all your events magically appear in your calendar.

ALL DATA IS DIRTY. So there are few good techniques here which are useful for all sorts of situations.

Related Posts

Data Journalism Classroom Activity - using google survey, spreadsheets, maps and Refine


I decided to play around with Piktochart for creating free infographics. The topic I chose was Crimea – very much in the news at the moment. So I thought an infographic explaining what it is would be helpful.

I’m hopeless at visuals and layout but Piktochart does allow even me to create something which broadly resembles an infographic. Using the embed code from Piktochart didn’t work here so I uploaded it as an image but it looks OK. You can share on social media too.

It might be a nice classroom tool to use so I need to think about some good applications for it.

Crimea infographic

Data Journalism – a modern day Bletchley Park?

Photo by DaveonFlickr CC licence (CC BY-SA 2.0

Photo by DaveonFlickr CC licence (CC BY-SA 2.0)

I can’t help it. Whenever I hear people talking about their data-driven journalism work, I start imagining starkly-decorated rooms with complex machines crunching endless numbers; teams of frighteningly intelligent young people looking for meaningful patterns in apparent chaos. They work round-the-clock, driven by a mission to save the free world from oppression, fuelled by cups of tea.

(I’m going to ignore Jonathan Hewett who shattered this image last night and insisted that day-to-day life in his Interactive Journalism department at City University is actually far too mundane to be compared to WW2 code breakers. I’m sure they had timetabling issues in Bletchley Park too.)

Last night I found myself in a room intimidatingly full of Data Journalists. It was a Media Society event to launch the new book, Data Journalism:Mapping the Future to which I contributed a chapter. It was a fascinating evening which Adam Tinworth has captured very well in his excellent blog.Data Journalism:Mapping the Future

The evening was almost ruined when Raymond Snoddy unexpectedly asked me to say something from the floor. I burbled something along the lines of “where do Data Journalists come from and how did you get to be that way?” I imagine the recruitment process is probably similar to Bletchley Park. An ability to complete the Daily Telegraph crossword in under twelve minutes, for example. Certainly, the route into Data Journalism is not an obvious one and a period of studying journalism at a UK University certainly doesn’t seem to be part of that route.

This makes me sad.

But would adding a compulsory statistics module onto journalism courses, for example, help? I’m sure many students attracted to journalism because they like WRITING would run a mile. So perhaps we need to stop marketing journalism as purely a subject for the arts students (who often take a worrying pride in their ignorance of maths.) Perhaps we should make it obvious – to a world which really doesn’t know this yet – that journalism is also a subject for numerate students…..who can also write. (Because nobody boasts about being rubbish at writing, do they?)

In the States, some journalism students graduating with data journalism and/or programming skills have taken to calling themselves “unicorns” because their set of skills makes them so rare. This doesn’t help. It implies their skills are exceptional, difficult, elite.

Worse, it lets the rest of us off the hook.

Sure, some aspects of data journalism do require an exceptional level of specialist knowledge and skill. But other aspects are definitely attainable so long as we decide to make the effort.

And that’s where journalism educators come in. We have to show that we believe these basic numeracy, stats, spreadsheet, web scraping skills are perfectly attainable rather than always treating them as peripheral, geeky.

There is no such thing as a unicorn; only hard work.

Digital Journalism Classroom Activity – Creating a simple survey, visualising and mapping the results

This has taken up a lot of my time today. I had a fairly simple idea in mind. At the start of my Digital Journalism module with first year undergraduates in a few weeks time, I want to conduct a very simple survey of the students – how many own smartphones, what social media do they use, which one do they use most. I thought it would be fun to compare our students with the national and international picture.

I wanted to avoid Survey Monkey or the Blackboard tool because I wanted to show the students how data could be collected, put into a machine-readable format and visualised. This would be a very simple introduction to the idea of Data Journalism. My idea is to collect the data before they go on a break then show them what I’ve been able to do with that data when they get back. I may not show them all the nitty gritty of the process in week 1.

It’s been a fun task because I had to learn so much along the way and solve some tasks I didn’t think I’d be able to. This is another lesson I want to pass on to students – you can Google your way out of any situation if you put your mind to it.

So if you’re a data-beginner like me, some of this might be useful.

    • Create the Survey

This is the easy bit! Go to Google Drive, create a form and start putting together your survey questions.

Screen Shot 2014-01-16 at 21.13.46

Screen Shot 2014-01-16 at 22.00.51

You’ll be asked to Choose Response Destination. I chose to create a new spreadsheet (but you could chose to keep the data in the google form and download it later).

This is where all the responses to your questions will appear. The beauty of having them in a spreadsheet is that you can analyse the results and use various Google tools to visualise them.

Here’s the ludicrously simple form I’m using as an example for this post.

Screen Shot 2014-01-16 at 22.05.23

NB the third question “Which social media do you use?” is a checkbox question – the students can tick as many boxes as they want. The following question is multiple choice so they can only tick one box. This becomes important later on, folks!

There’s a Send Form button at the bottom which generates a Link so you can send this to your whole class by email, for example.

  • Get the Results

As your students fill in the form, the results are sent to the Google spreadsheet you created. You now have some basic information about the students in a machine-readable format. It’ll look something like this.

Screen Shot 2014-01-16 at 22.07.15

  • Create a Map

This is the fun bit! We can straight away visualise some of this data on a map. Just go to Google Maps (having watched this tutorial if, like me, you need a refresher).

Click on My Custom Maps and then Create. You will be given the option to Import. So go ahead and import your Google spreadsheet of survey responses.

I ran into a problem here straightaway the first time I tried this.

Screen Shot 2014-01-18 at 00.36.28

Googlemaps rejects any spreadsheets which has punctuation it doesn’t like in the column headings (which correspond to your survey questions) so it’s best to avoid brackets and commas etc in your survey questions. You can, of course, edit the column headings on the spreadsheet before importing it into Googlemaps.

UPDATE – I’ve found an other upload error into google maps which generates this message – “Column names cannot include these characters.” If the column name is too long, the spreadsheet gets rejected too. So you might have to rewrite some column headings to shorten them if the survey questions on your original form were long.

You’ll end up with something like this, following the instructions in the tutorial. I gave YES responses (pins) a different colour from NO responses. Note that Google maps also tells you how many YES and NO responses you got so you might want to make a note of those figures to make a chart later.

Screen Shot 2014-01-16 at 22.36.47

  • Create a Pie Chart

You could now go back to your Google spreadsheet and maybe you want to create a pie chart to show which is the most used social media. Click on the drop down menu at the top of this column and choose Sort a-z. This will cluster the different responses. Count them up and create another two columns on your spreadsheet to display this new information. But, if you’re feeling fancy (and if you’re dealing with a lot of responses) you should use a COUNTIF formula to automate this process and save you having to count. This video explains it really well.

Screen Shot 2014-01-16 at 22.48.41

Not very exciting. Perhaps a pie chart will spice it up?

The Googlechart facility makes this job really easy once you’ve sorted the data. Again, the video above explains the process really clearly. And I think the pie chart does the job here. Facebook wins!

Screen Shot 2014-01-16 at 22.49.56

  • Google Refine – multiple values in a single cell on spreadsheet

That just leaves one survey question we’ve not tackled yet. Remember I created a checkbox question where students were asked to tick all the social media they used. That means we’ve got multiple responses in a single cell on our spreadsheet. This is not very machine-friendly and you won’t be able to do any visualisation with it looking like this. We need to separate out those responses into different rows or columns.Screen Shot 2014-01-16 at 23.05.15

I ended up using Google refine to do this. It’s amazing! It’s a tool for cleaning up messy data which is just what I need here and these tutorials are pretty useful although the second is a bit daunting for a beginner like me.

You need to download GoogleRefine (easy) and import your Google spreadsheet (I had to export it to Excel first). Go to the column with the problem data and click the drop down arrow on the column header. Choose Edit Column – Split into several columns.

Screen Shot 2014-01-16 at 23.11.28

The pop-up box then asks you what’s separating the different elements in the cell that you want separating. Easy – a comma (but it might be a hyphen or just a space, for example).

Screen Shot 2014-01-16 at 23.12.17

Click OK. Wow!Screen Shot 2014-01-16 at 23.14.49

Now all the responses are separated into different cells and the computer can do something with them!

OK, I’m finished with Google refine now and can head back to Google Spreadsheet. So I can Export as a CSV (Comma Separated Values) file then import that file into Google spreadsheet.

But I still want to be able to count up the responses for each social media and it’s a pain having them spread over several columns like that. So I just copy and paste the values from each column into one column.Screen Shot 2014-01-17 at 10.03.43

As this point I highlight the whole spreadsheet and go to the View tab so I can Freeze Row 1. That just stops things moving around when you come to sorting your data.Screen Shot 2014-01-17 at 09.53.58

So now I’ve got all the instances of each social media in a separate cell in one single column. Now it’s pretty straightforward from here on in. I start by selecting the column and sorting it A-Z so that it clusters the different values together for me which just makes it neater and easier to handle.Screen Shot 2014-01-17 at 10.01.53

Then I start another column in the spreadsheet and I copy and paste each value onto a separate row like this.Screen Shot 2014-01-17 at 09.59.43


I copy and paste rather than writing out manually just in case I make a mistake or inadvertently add a space or something which might skew the results. So I want to find out how many times each of those words appears in Column C I created in the step before. We can use a COUNTIF formula here for each value. For example, =COUNTIF(C2:c17, O2) should tell us how many times the value FACEBOOK appears in Column C. So just do that for each value.

It’s then a simple process to turn it into a chart as before.

Screen Shot 2014-01-16 at 23.22.08

…..and voila!