This has taken up a lot of my time today. I had a fairly simple idea in mind. At the start of my Digital Journalism module with first year undergraduates in a few weeks time, I want to conduct a very simple survey of the students – how many own smartphones, what social media do they use, which one do they use most. I thought it would be fun to compare our students with the national and international picture.
I wanted to avoid Survey Monkey or the Blackboard tool because I wanted to show the students how data could be collected, put into a machine-readable format and visualised. This would be a very simple introduction to the idea of Data Journalism. My idea is to collect the data before they go on a break then show them what I’ve been able to do with that data when they get back. I may not show them all the nitty gritty of the process in week 1.
It’s been a fun task because I had to learn so much along the way and solve some tasks I didn’t think I’d be able to. This is another lesson I want to pass on to students – you can Google your way out of any situation if you put your mind to it.
So if you’re a data-beginner like me, some of this might be useful.
This is the easy bit! Go to Google Drive, create a form and start putting together your survey questions.
You’ll be asked to Choose Response Destination. I chose to create a new spreadsheet (but you could chose to keep the data in the google form and download it later).
This is where all the responses to your questions will appear. The beauty of having them in a spreadsheet is that you can analyse the results and use various Google tools to visualise them.
Here’s the ludicrously simple form I’m using as an example for this post.
NB the third question “Which social media do you use?” is a checkbox question – the students can tick as many boxes as they want. The following question is multiple choice so they can only tick one box. This becomes important later on, folks!
There’s a Send Form button at the bottom which generates a Link so you can send this to your whole class by email, for example.
As your students fill in the form, the results are sent to the Google spreadsheet you created. You now have some basic information about the students in a machine-readable format. It’ll look something like this.
This is the fun bit! We can straight away visualise some of this data on a map. Just go to Google Maps (having watched this tutorial if, like me, you need a refresher).
Click on My Custom Maps and then Create. You will be given the option to Import. So go ahead and import your Google spreadsheet of survey responses.
I ran into a problem here straightaway the first time I tried this.
Googlemaps rejects any spreadsheets which has punctuation it doesn’t like in the column headings (which correspond to your survey questions) so it’s best to avoid brackets and commas etc in your survey questions. You can, of course, edit the column headings on the spreadsheet before importing it into Googlemaps.
UPDATE – I’ve found an other upload error into google maps which generates this message – “Column names cannot include these characters.” If the column name is too long, the spreadsheet gets rejected too. So you might have to rewrite some column headings to shorten them if the survey questions on your original form were long.
You’ll end up with something like this, following the instructions in the tutorial. I gave YES responses (pins) a different colour from NO responses. Note that Google maps also tells you how many YES and NO responses you got so you might want to make a note of those figures to make a chart later.
You could now go back to your Google spreadsheet and maybe you want to create a pie chart to show which is the most used social media. Click on the drop down menu at the top of this column and choose Sort a-z. This will cluster the different responses. Count them up and create another two columns on your spreadsheet to display this new information. But, if you’re feeling fancy (and if you’re dealing with a lot of responses) you should use a COUNTIF formula to automate this process and save you having to count. This video explains it really well.
Not very exciting. Perhaps a pie chart will spice it up?
The Googlechart facility makes this job really easy once you’ve sorted the data. Again, the video above explains the process really clearly. And I think the pie chart does the job here. Facebook wins!
Google Refine – multiple values in a single cell on spreadsheet
That just leaves one survey question we’ve not tackled yet. Remember I created a checkbox question where students were asked to tick all the social media they used. That means we’ve got multiple responses in a single cell on our spreadsheet. This is not very machine-friendly and you won’t be able to do any visualisation with it looking like this. We need to separate out those responses into different rows or columns.
I ended up using Google refine to do this. It’s amazing! It’s a tool for cleaning up messy data which is just what I need here and these tutorials are pretty useful although the second is a bit daunting for a beginner like me.
You need to download GoogleRefine (easy) and import your Google spreadsheet (I had to export it to Excel first). Go to the column with the problem data and click the drop down arrow on the column header. Choose Edit Column – Split into several columns.
The pop-up box then asks you what’s separating the different elements in the cell that you want separating. Easy – a comma (but it might be a hyphen or just a space, for example).
Click OK. Wow!
Now all the responses are separated into different cells and the computer can do something with them!
OK, I’m finished with Google refine now and can head back to Google Spreadsheet. So I can Export as a CSV (Comma Separated Values) file then import that file into Google spreadsheet.
But I still want to be able to count up the responses for each social media and it’s a pain having them spread over several columns like that. So I just copy and paste the values from each column into one column.
As this point I highlight the whole spreadsheet and go to the View tab so I can Freeze Row 1. That just stops things moving around when you come to sorting your data.
So now I’ve got all the instances of each social media in a separate cell in one single column. Now it’s pretty straightforward from here on in. I start by selecting the column and sorting it A-Z so that it clusters the different values together for me which just makes it neater and easier to handle.
Then I start another column in the spreadsheet and I copy and paste each value onto a separate row like this.
I copy and paste rather than writing out manually just in case I make a mistake or inadvertently add a space or something which might skew the results. So I want to find out how many times each of those words appears in Column C I created in the step before. We can use a COUNTIF formula here for each value. For example, =COUNTIF(C2:c17, O2) should tell us how many times the value FACEBOOK appears in Column C. So just do that for each value.
It’s then a simple process to turn it into a chart as before.