≡ Menu

“You couldn’t work as a journalist, if you were not able to do an interview. The same applies to data journalism in the age of digitalization” – says Nils Mulvad, a world renowned data journalist, editor at Kaas & Mulvad and associate professor at The Danish School of Media and Journalism during the Data Harvest 2014 conference.

paulaPaulina Pacuła from European Journalism Observatory has conducted this interview with Nils Mulvad. We bring it with her permission. The interview was published in Polish here. An edited version in English can be written here.

For many journalists data journalism is basically about making data look nice – visualising it, creating interactive charts etc. But that’s not all, is it?

Definitely, it’s not all. Data journalism is an early alert tool. By analysing data you can sometimes recognize problems, before they actually cause a lot of harm. Or before you are able to notice them using other journalistic tools.

To give a quite recent example: the largest financial pyramid in the history of US, created by Bernard Madoff, was exposed in 1999, years before the big scandal happened. The man behind it was financial analyst Harry Markopolos. He informed the Security Exchange Commission, that he believed it was legally and mathematically impossible to achieve the gains Madoff claimed to deliver. Markopolos came to these conclusions through very comprehensive data analysis. That was the job journalists should have been doing.

This is very arduous work to do.

Using data journalism tools like spread sheets, scraping and data visualisation makes it much easier. First off all data journalism provides tools for continuous research. Scraping allows you to monitor how different institutions work on a daily basis, because it allows you to monitor new stuff showing up on different webpages.

On the other hand data journalism tools make it possible to work with huge amounts of data, which is very important. Especially today, when public institutions produce tons of data accessible to journalists through freedom of information laws. It would be difficult to deal with farm subsidies or EU tender data only using your brain and calculating skills. Today we have great software, that allows us to analyse thousands of records.

Data journalism became this big trend only a couple of years ago. But in fact this field is much older.

Yes. Journalism was always about data analysis. The Wall Street Journal was born in 1889 from a daily Dow Jones afternoon letter about the stock exchange. In sports reporting, data has always been an essential aspect of this. John Snow’s map of cholera outbreaks from nineteenth century London changed how we saw a disease – and gave data journalists a model of how to work today.

But the data journalism we know today, started with something called Computer-Assisted Reporting. It was back in the 50s, when reporters started to analyse data using software tools for social and scientific analyses. One of the first examples of computer-assisted reporting was in 1952, when CBS television used a UNIVAC I computer to analyze returns from the U.S. presidential election.

At the beginning there was a lot of excitement about the tools. Journalists would report in their stories, what kind of computers they used, how many records they had analysed etc. It’s like today you would say: after doing five interviews on iPhone 5, I came up with this and that conclusion (laughs). Back then, the problem was that very often CAR specialists would only focus on getting the data, instead of getting stories out of it. So basically it was about showing the readers: ‘look, we can analyse so much data’, instead of really using it as a tool.

Databases became central to journalists’ work by the 1980s, because of the ongoing process of digitalization. There have been also some new jobs emerging in newsrooms for people mainly focused on presenting the data, visualizing it. Today data journalism is one of the most important trends in the media world.

Professional data journalism requires lots of skills, which were not associated with journalism before. For example nobody required journalists to understand programming and coding.

Yes, that’s true, but this is not exactly the case today either. Of course, in order to do data journalism, you need to combine three types of skills – journalistic, programming and web designing. Sometimes you will have them combined in one person and sometimes you need the whole team. But it is necessary, that people understand other fields of work, because only then can they communicate and work effectively. If you know nothing about coding, it will be difficult for you to work together with a coder, because you will not know, what he or she needs to do the job. So you need to have knowledge in all three areas and be at least very good in one of them.

I would say that there is another type of skill required: data analysis which comes from social sciences.

Yes, of course. You need to understand basics of statistics and other social science methods. One needs to be very careful with data and use it in a proper way and not come out with conclusions which cannot be supported. That’s why data journalism is also called evidence based reporting.

There is this quality associated with data: facts. Data mean facts, something objective. But isn’t true that data driven journalism is still opinion-based journalism? There is always a certain level of subjectivity in how the story is shaped and presented, what conclusions are derived from the data.

The more the story is documented and researched, the closer to reality it gets, so data journalism is more about facts than opinions.

Very often data is only a first step in the process of creating the story. But it is very important, that the journalist looks into the data himself, instead of relying on other researchers. This is something journalists do too often: they interview people who know the data. But those people may have their hidden agenda, they may have the conclusions a bit coloured, in order to justify the meaning of their work. Journalists should be able to find the most important stories in the data and then interview sources on their findings.

By the way – I don’t like the word ‘data driven journalism’. It sounds like the data is the most important part of the job. We see a tendency to underestimate journalism. This is what I was talking about at the beginning. These machines are not the most important part of the process . These are just tools. But it’s still the journalists and their ability to think critically, which are the most important part of the job.

If you look at data journalism from an academic point of view, how does it change journalism? Is it a big shift?

I think data journalism itself is not a big shift. There are a lot of new tools emerging all the time and they only change the methods of work, not the work itself. If most of the information today is in digital form, you, as a journalist, should be able to gather it and analyse it. It’s the same as when you gather information by doing interviews. You couldn’t work as a journalist, if you weren’t able to do an interview. Data journalism is the same – it’s just interviewing data sets instead of people.

But yes, we are in a period of change and it has a lot to do with things going digital. The business models of many media, especially print ones, aren’t working well. Many media companies need to reinvent their business models, but they are very reluctant to do that. It seems like they are waiting for the internet to go away, but that’s not going to happen.

What we see is a shift from institutional media to more socially based media. Journalists are getting much more independent thanks to cross border cooperation opportunities, crowd funding and the emergence of new communicative platforms. Those who have difficult times in media organizations, which are trying to act the old way, can either walk away, or try to introduce changes, instead of knocking their heads on the walls. If you are a good journalist, people will be most interested in following your work, instead of the media institution you are representing. Sometimes it’s easier to attract readers to a whole new website, instead of to the old media.

And at this point you see another important part of what data journalism is about. It’s also about new ways of spreading your stories.

Social media?

Exactly. The “one to many” model of communication is coming to an end. Now it’s more like “many to many”. Journalism doesn’t finish the moment you have prepared the story and published it; the next step is to get it out to people. The stories should be well targeted in order to create an impact. If you are not capable of spreading your story, it means it’s only written for the archive.

The skills of building a community, identifying key influencers, showing how you write the story, forcing media to disseminate it – this is also very important in the new media landscape. Data journalism is a mixture of all these things. Of using all these new tools – also social media – in a professional, journalistic way.

There has been interesting research done on the use of social media by journalists in Poland. What is peculiar, is that they don’t really treat social media as a possible source of information. They use it mostly for “inspiration” instead of proper research – finding people, facts and numbers. Is there something we are missing about these new platforms of communication?

Yes, definitely. Social media can be used as a tracking tool for social relations, which is very useful in investigative journalism. There are some examples of investigative stories based on analyses of social circles on Facebook or LinkedIn. One girl from Slovakia actually managed to track possible corruption cases in health service procurements in Slovakia by analysing all the contracts and personal networks between hospitals’ boards and the management from the companies taking part in the procurements (link). People leave a lot of data and information about themselves in social media, so it’s good to know how to follow this.

But research is not all. Social media are very powerful tools for spreading your stories, interacting with readers, creating impact. And it is quite funny how many media do it. Journalists don’t get involved in discussions; don’t interfere with the audience. Once they post the story, that’s it. This is totally wrong.

You should think about social media as like a party. When you come to the party and people are talking to you, but you are not responding, they think you are rather impolite. If the only thing you do at that party is sometimes stand up in the middle of the room and shout, what you got to say, and then, when some people are approaching you to discuss this, you say ‘oh no, I’m not going to get into a discussion, they will probably think you are crazy or arrogant.

There is a lot of this kind of craziness or arrogance in traditional media. But this model is coming to an end. If you don’t know how to treat your reader, you’re going to lose him.

Who should we look to for best practices?

Well, I would say the Guardian is doing a very good job, New York Times, Los Angeles Times. Norwegian press is very good in adapting to changes. Danish Broadcast Corporation has also set up a database analysis team and they are doing some good stuff. But these are just some examples, which come to my head right now.

But looking for best practices in data journalism, you should also look at many cross border, independent networks, for example ICIJ – International Consortium of Investigative Journalism. The best example is their tax haven story. They’ve got access to some data about people having accounts and companies in tax heavens. They teamed up with different media all over the world and gave them access to the data for that country. The analysis took half a year. They have decided to let the stories out in three waves to create a bigger impact. They exposed the biggest CEOs from German banks, the wealthiest people in China, UK etc.

It was a very big thing last year. They’ve got a lot of awards for that. In effect the EU started to work over anti-money laundering rules. It could not have been achieved without using the tools of data journalism. What was also important in this project, was the international cooperation between journalists. There was a deep expertise needed from each country involved, so it had to be done by international group. But today, with such tools as social media, free software everything is possible. Data journalism is a sort of journalistic punk of our times.



11 tips for scrapers at the next level

Nils Mulvad, editor at Kaas & Mulvad, associate professor at The Danish School of Media and Journalism, session When to Scrape: Tools and techniques. Saturday 1st of March 2014, Baltimore, Nicar

Scraping data on a daily basis on foreing workers in Denmark – purpose to help controlling fair payment is paid to the workers. Foto: Sidse Buch

Big data is here, meaning we have access to more and more data from different sources. We have also more and more tools to extract, match, analyze and present the stories in the data.

Here are 11 tips for what to be aware of. We use Kapow software for our scrapers and run the scrapers on a linux-server, we collect the data in MySQL-databases. We have two servers on Rackspace Cloud.

1.       Don’t be scared of big data

There are so many different definitions, normally either from the perspective of job type or possibilities in the data.

Data expert defines it out from the tools they need to handle the data. Data providers see it as the new sources of data. Analyst describes it from what you can do with the data.

Look at this as new big, area to gather and combine your material from.  Take your data from censors, GPS, authority data, social media, corporate data and scientific data

2.        Data journalists are the kings

For years we have been working with data with a single purpose. Can we extract stories from them? This skill of looking for content in data is perhaps the most important to add to the possibilities right now.

Data analysts are to narrow minded – they look at all data as equal, journalists are too narrow minded – they look at data as a total not understandable part of life.

We operate in between. Very few can do that smart. Time has come to you.  Go for the stories in the data.

3.       Combine scraping with other tools

Often you need to scrape the same source in a schedule. If it’s an official website you can directly use the content.

Going into a negotiation with the authority might lead you to get the material as an xml-file or as API-access.

This will make it a lot easier for you and them, saving server-time for both, and ensure higher quality of data.

4.       Matching data

Sometimes the scraping will combine material from different sources to add more context to your material. Sometimes the best solution combines different tools. If so, use these different tools.

WHERE (rutOK.IdCompany LIKE CONCAT(‘%’, rutCompany.RUTNumber, ‘%’) OR rutCompany.RUTNumber LIKE CONCAT(‘%’, rutOK.IdCompany, ‘%’) OR rutCompany.ForeignCompanyRegistration LIKE CONCAT(‘%’, rutOK.IdCompany, ‘%’) OR rutOK.RutNumber = rutCompany.RUTNumber)

This code above is the main part of combining script from our scraper-language with MySQL to match different fields in the extraction with fields in a table either with a perfect match or where the content of one field is part of another field. Codes can do much.

5.       The perfect scraper doesn’t exist – log errors

You can do many things to optimize your scraper, but there will always be a risk of errors. In some cases the authority build and later rebuild their website slightly different, and you could not foresee that when you build your scraper.

You should have an instinct for reverse engineering, finding patterns on their website and possibilities for errors. And then you also need to keep records of your scraping, so you have a warning system for errors.

6.       Always include metadata in your scraping

Scraping data into the same table in schedules need you to keep track of each record, meaning you have to include at least:








This should be in every record.

7.       Scraping thousands of website a day down in several levels

Making a surveillance of changes on big websites means you have to go down in several levels. Having for instance 20 websites to follow with 50 urls on each site, means you have 20 Urls on level 1, 1.000 url’s on level 2, then 50.000 urls on level3 and in the end 250.000 urls.

You then need to build a system, so you only open an url one time at each level and never opens an url on a specific level if it had been opened before. A clear structure is the answer.

8.       Tuning scrapers for loading url’s

If you have a chance to edit the way your scraper handle URL’s then turn these things of:

Load frames

Execute javascripts

It’s simply the most effective way to minimize server-time and speed up the scraping. Sometimes you need to have it on for loading the data. Then it is necessary.

9.       Always focus on the story – the context

As data journalists we love data, we simply can’t get enough. But never be satisfied just with the data. Always think of the user, the viewer, anybody who will interfere with your material.

Make it as simple and easy to use as possible. Rethink and rethink. Mobile means simple and clear focused.

10.   Know the limits of the scraping and machine-generated content

In some situations the scraped material will feed an output – either as email or as a presentation. But be very keen on the finish.

Sometimes it demands a human touch in the end to make it better and even mre focused on the end-user. Follow your products closely.

11.   Make scraper-operations scalable

If you begin this operation of scheduling daily scraping jobs, be sure that you have a system easy to scale up.

We run all scraping on scalable cloud-servers. We can just upscale everything on the fly. Make expanding easy.


Ressources for GIJC-conference in Rio 2013

Google Fusion

Open Refine

Helium Scraper

Social and mobile media for investigative journalism.

Paul Myers: Researchclinic

Pitch of Investigative Reporting Denmank and a concrete cross-border project on pesticides.

Material for the network-meeting on climate and environment.

Handout with methods, findings and tips for uncovering secrets in food and nature.

We will upload material, when it is ready. Here are the sessions, where we train or speak:

Google Fusion, Saturday 11-12.30

Refine, Saturday 14-15,30

Social Media, Sunday 9-10.30

Helium Scraper, Sunday 16-17.30

Investigating the Food Industry, Monday 13.45-15.15

Environmental Collaboration Workshop, Monday 15.30-17.30

Tommy Kaas is an editor and a partner at Kaas & Mulvad. He was co-founder of DICAR – the Danish International Center for Analytical Reporting. Besides his work in Kaas & Mulvad he is a part-time lecturer at the Department of Communication, Business and Information Technologies at Roskilde University. Twitter: @tbkaas

Nils Mulvad is a co-founder of the Global Network for Investigative Journalism and other international networks such as Farmsbusidy.org. He was CEO for the Danish International Center for Analytical Reporting 2001-2006, European journalist of the year in 2006, and he also teaches data and web courses for journalists with focus on using social and mobile media. He is editor at Kaas & Mulvad and associate professor at The Danish School of Media and Journalism. Twitter: @nmulvad


Good data sources and methods


Scraping for data

Tools and tricks for extracting data from the web.


Gathering and archiving information from social media, fact-checking, crowdsourcing, using big datasets and interactive graphics are all methods to be shared and developed both in the coverage of events and investigative journalism.


A nearby storm to hit journalism education

A nearby storm to strike journalism education

Education in journalism will get the same turmoil as the media have been through for the last 20 years. It begins now in big scale.

Storified by Nils Mulvad · Tue, Sep 11 2012 21:40:30

According to Howard Finberg, director of Partnerships and Alliances at Poynter Institute, the changes will split journalism education from journalism degrees, traditional classes aren’t as effective as e-learning, training will be mastered by the students themselves, they will tend to get it from the best places in the world, teachers have no chance to follow the rapid changes, and we will need to innovate inside the classroom.
He presented his ideas at European Journalism Centre’s 20th anniversary celebration the 4th of June 2012 in Maastricht, Netherland. On Poynters website there 11 days later was presented a summary of his speech:

Journalism education cannot teach its way to the future | Poynter.As we think about the changes whipping through the media industry, there is a nearby storm about to strike journalism education. The futu…
Here you can see the full text of the speech from Howard Finberg:
The Future of Journalism Education. A Personal Perspective | Poynter’s News UniversityToday (June 4, 2012) I gave the keynote speech at the European Journalism Centre’s 20th anniversary celebration in Maastricht, The Nether…
Just after the speech a Storify of some of the reactions was put together by Howard Finberg:
The Future of Journalism Education: Keynote to European Journalism Centre ConferenceHoward Finberg gave the keynote address at the 20th Anniversary Conference in Maastricht, NI on June 4, 2012. The topic was the future of…
After publishing the summary on the Poynter website, comments on twitter keep coming in:
Interesting Howard Finberg speech on the future of journalism/journalism degrees. Don’t agree w all but worth read: http://www.newsu.org/future-journalism-educationTony J. Lee
Howard Finberg: Journalism education is at its own inflection point http://bit.ly/MQtDdRSteffen Konrath
According to Howard Finberg, "there is a nearby storm about to strike journalism education." http://bit.ly/Mfar8jKevin Sablan
“the world is changing faster than the people who are supposed to teach students can learn themselves.” -howard finberg http://www.poynter.org/how-tos/journalism-education/177219/journalism-education-cannot-teach-its-way-to-the-future/Steve Mays
I think the input is really important. But I need to think and discuss more to find my opinion on this.
At this time, I’m rather convinced on the need for getting training from the best sources, not vasting the time for the students on old systems and knowledge. As a teacher you need to find a way to innovate while you go.
My co-teacher Kristian Strøbech at The Danish School of Media and Journalism and I try to do it by integrating new methods in the training in digital journalism without knowing them beforehand, and develop the use together with the students. Especially on covering events using social and mobile media and integrating it with traditional coverage.
You also need to move more and more of the traditionel training out of the classroom, learning the students methods to solve the different tasks and innovate themselves.
There’s another interesting blogpost of the change in the journalism education, written by Paul Bradshaw:
The 3 forces changing journalism education part 2: the education business | Online Journalism BlogYesterday, in the first part of this series, I talked about how changes in the news industry were reflected in changing journalism educat…
Media crisis
The discussion on the specific need for change in the journalism training now, comes after some heavy input on the situation in the medias. A very interesting presentation of the situation for the media was done by Stijn Debrouwer:
FungibleWe don’t realize how much news media has changed in the past fifteen years. We really don’t. I’m not talking about digital first or about…
Burt Herman from Storify made a collection of the reaction to this blog:
Is journalism being replaced?A look at a week’s worth of reaction and conversation around a provocative blog post discussing the ways that journalism is being replace…
Here Mathew Ingram have done another rather hard and precise description of the situation for the media:
The hard truth: Newspaper monopolies are gone foreverNewspapers haven’t really had a monopoly on the news or the advertising market for some time, but they continue to behave as though they …
More important input – collected in august 2012:
This is a good recap of the discussion from Mary McGuire, from The Canadian Journalism Project:
J-schools at a turning point | J-source.caThe future of journalism education is as uncertain as the future of journalism. Journalism educators everywhere are struggling to adapt t…
A report from august 2012 shows a rather rapid change in the need for journalism training – expressed by the journalist themselves – most from US and Latin America.
Knight report on training shows journalists want technology, multimedia, data skills | Poynter.A new study by the Knight Foundation released today summarizes the state of journalism training. Some findings from "Digital Training Com…
Here’s a very concrete suggestion for a future training:
How far should journalism education reform go?A new proposed degree structure for journalism education This week, journalism educators meet in Chicago. I hope they think about how far…
Guardian has asked for the best tips for journalistic skills of tomorrow – and present these nine:
9 top tips for the journalists of tomorrowHannah Waldram, community coordinator, news, the Guardian Drop the hangups around engaging with readers in comments: We don’t need to con…
This is a very important description of what foundations will support – and not support – in the future education of journalists. They will not support old traditions of educations and skills. They demand fast change and point out some of the paths. They can set the standards – also outside US.
An Open Letter to America’s University PresidentsWe represent foundations making grants in journalism education and innovation. In this new digital age, we believe the "teaching hospital…
Here is a recap of the discussion after the open letter from the 6 foundation:
6 foundations tell journalism schools to change faster or risk future funding | Poynter.As thousands of educators head off to Chicago for the 100th anniversary convention of the Association for Education in Journalism and Mas…
A story in Financial Times focus on e-learning as the main thing to change in training.
Ivory towers will be toppled by an online ‘tsunami’ – FT.comThe internet is about to bring to university professors what it has brought to secretaries, journalists and music executives: unemploymen…
Good input on why people take online training – and especially that they will turn to the best places to get it.
RT @TLBissette: 3 Reasons Why People Take Massively Open Online Learning Courses http://ow.ly/1m3I27Nils Mulvad
A personal update on the discussion – including the society in general as our focus – not only students and media industry:
RT @kstrobech: Rebooting journalism schools – great post summing up this sommer’s hot debate: http://www.ojr.org/ojr/people/Geneva/201208/2084 #journalismNils Mulvad
Howard Finberg comments on the update:
RT @Hif: Rebooting journalism education means constant state of change. @hif writes about @genevaoh post on @poynter http://poy.nu/NcSLlhNils Mulvad
Ajourføring med nogle flere input 2012/09/12
RT @smfrogers: Learn basic #datajournalism with me at @frontlineclub on Friday – last-minute places available http://ow.ly/dCBT6 #ddjNils Mulvad
RT @EricNewton1: Academic research in journalism and communication education: Some of it is so unhelpful no one cites it. http://t.co/OA …Nils Mulvad
RT @knightfdn: How to better protect #students doing #journalism in the “teaching hospital” model http://kng.ht/Tsiki4 v @amberralertt …Nils Mulvad
Conclusions:E-learning: This will be delivered by the best institutions round the world and things will change fast. No reason for not point to the best places. Your own e-learning shall not master everything. Be careful not to build a copy of the traditionel training.
Degrees: Check and work with these new ideas. Make it easy to implement.
Innovate in the classroom:  It’s impossible to master everything. The students must be part of the innovation – take responsibility of it. Focus on the students. Every semester every single teacher must do some innovation and experimentation with focus on new methods and adaption of journalism to the future. We have responsibility to try to find new paths. Innovation will not come from the top – it can only provide the frames for it.
Digital media: No longer possible to take final degree in print, tv and radio – everything must be web, mobile and social media – or at least integrated with it.
Digital journalism: We need a new curriculum to attract good it-people to journalism – to combine the best from two worlds. 
Accept you’re not the master: Development is going faster and faster – and especially it is not possible to be ahead. Things changes faster than our ability to learn. Accept not to be in control.


Lessons from Data Harvest Festival in Europa

Close to 100 journalists, vizualisers and hackers gathered data and shared methods 6th to 8th of May 2012 in Brussels.

Storified by Nils Mulvad · Mon, May 14 2012 23:25:43

Really many young journalists participated in the fourt Data Harvest Festival in Europe. Next year Data Harvest will also be Brussels – then from 9-11 of May 2013.The cooperation continues on the website:
Join the mailing list:
On twitter – use the hashtag: #DataHarvest

Shymlal Yadav, journaliste indien spécialisé en publication de données concernant le gvt indien http://instagr.am/p/KUeMCXAh9N/ #DataHarvestSylvain Malcorps
The indian editor Shymlal Yadav participated and talked about wobbing – Freedom of Information.
Farmsubsidy data
Farmsubsidy.org ran a track on farmsubsidy data covering 2011 from EU-countries. Around 55 procent of all payments are kept in the dark, and 92 procent of the recipient due to the gathering of data from farmsubsidy.org.
Most important tweets are gathered here:
Junta de Andalucia tops the list of CAP beneficiaries with €98 million #dataharvestfarmsubsidy.org
Public bodies are prominent in list of biggest end beneficiaries of farm subsidies. Is this real transparency? #dataharvestfarmsubsidy.org
CY, GR and LU still haven’t published any 2011 farmsubsidy data. bad, bad, bad. Rest of 24 EU-countries have published. #DataharvestNils Mulvad
Top private sector recipient is French poultry firm Doux, with €54 million. http://www.doux.com/ #dataharvestfarmsubsidy.org
We estimate just 40% of 2011 CAP funds have been disclosed. #dataharvestfarmsubsidy.org
Many of the biggest recipients of CAP funds have been redacted from published data to protect their privacy. #dataharvestfarmsubsidy.org
List of 1500+ beneficiaries who get more than €1 million in #EU farm subsidies: https://docs.google.com/spreadsheet/oimg?key=0At9hEvGB0JsUdHh4TzBEdmNaaTFkRjdITGdvMmozeEE&oid=1&zx=xbpmnibt7gs2 #dataharvestfarmsubsidy.org
Extremely variable performance of EU countries in CAP transparency #dataharvest http://twitpic.com/9ido0afarmsubsidy.org
Approx. 94% of UK beneficiaries of farm subsides have been redacted from the data just published by Defra. #dataharvestfarmsubsidy.org
91 per cent of all recipients of CAP funds have been redacted from new data for 2011. #dataharvest #privacyfarmsubsidy.org
View UK farm subsidy data for 2011 (including redactions) on Google Fusion tables: https://www.google.com/fusiontables/DataSource?docid=1_9YaRF1TlL1B6Ir3KauvnQEhKH7ageQjF3skksI #dataharvestfarmsubsidy.org
Danish billionaire Anders Holch Povlsen is one of Scotland’s biggest farm subsidy recipients: http://www.dailyrecord.co.uk/news/scottish-news/2012/05/08/danish-billionaire-becomes-one-of-scotland-s-largest-landowners-as-he-snaps-up-sutherland-estates-86908-23851760/ #dataharvestfarmsubsidy.org
Very handy spreadsheet of EU spending by member state & policy area (2000-2010) http://ec.europa.eu/budget/library/biblio/publications/2010/fin_report/fin_report_10_data.xls #dataharvestfarmsubsidy.org
Farm subsidy story in Politiken (DK): http://politiken.dk/erhverv/ECE1616398/golfbaner-lufthavne-hoteller–og-spejdere-faar-landbrugsstoette/ #dataharvestfarmsubsidy.org
@farmsubsidy publishes list of €million+ farm subsidy payments http://bit.ly/ISxefz #dataharvestJohann Tasker
New from Google: BigQuery ‘interactive analysis of huge datasets’. https://developers.google.com/bigquery/ Anyone at #dataharvest used it? Thoughts?farmsubsidy.org
Top private sector recipient is French poultry firm Doux, with €54 million. http://www.doux.com/ #dataharvest RT @farmsubsidyVéronique Mermaz
@farmsubsidy went further. Found out that Malta gets most subsidy / land area and Ireland per population #dataharvest http://highcharts.teelmo.info/data/eu-maataloustuet-2011.htmTeemo Tebest
Press statement from @farmsubsidy following this year’s #dataharvest: http://ow.ly/aSifsNils Mulvad
Get farmsubsidy data for 2011 for EU-countries here: http://ow.ly/aSiaJ organised by #DataHarvest Festival and @farmsubsidyNils Mulvad
Woobing is the european word for FOI. Another track focused on the status of the threaths in this area.
State of play of EU #wob reform by Statewatch http://bit.ly/J7zNu7 #dataharvestBrigitte Alfter
Helpful for #wobbing on side effects of medicines EU ombudsman decision http://bit.ly/IRTWxJ# dataharvestBrigitte Alfter
Indian WOB specialist Shyamlal Yadav wobbed government officials’ foreign visits #74tripstothemoon #dataharvest @RTIExpressWobbing Europe
Martin Rosenbaum: New Zealand only nation where FOIA law is strictly for it’s own inhabitants. Or are there more countries? #dataharvestjfuruly
In Denmark we integrate FOI with the make of stories on http://ow.ly/aJPbD #dataharvestNils Mulvad
In DK we have won access to environmental data: antibiotics to pigs, new geograhical measurement, numbers of animals. #DataharvestNils Mulvad
@brenno EU supervisor on data protection warns against this http://www.wobbing.eu/news/supervisor-alarmed-threats-access-rights #DataharvestBrigitte Alfter
EU wob threatened – most recent developments under Danish presidency http://www.wobbing.eu/news/presidency-criticised-%E2%80%9Deven-worse-commission%E2%80%9D #DataharvestBrigitte Alfter
Martin Rosenbaum: Tip 1: Think inside the document cabinet. What do they have & collect? #dataharvestjfuruly
Martin Rosenbaum: Tip 2: Don’t use wide trawl. Go specific! #dataharvestjfuruly
Martin Rosenbaum: Using FOIA as a fishing expedition normally catches small fish. Go for the big ones, go specific. #dataharvestjfuruly
"Dead people have no privacy rights" #dataharvestIdes Debruyne
Muy importante mirar en detalle la nueva directiva de datos http://europa.eu/rapid/pressReleasesAction.do?reference=IP/12/46, afectará a #transparencia #periodismodatos #dataharvestMar Cabra
Here is the link to the Danish website with wobbing-requests and stories: http://ow.ly/aLKBC #dataharvestNils Mulvad
Google Fusion
Tommy Kaas conducted a training session on Google Fusion. Lots of good material gathered here:
Three Google Fusion hands on session tomorrow on making interactive maps. Like this: http://bit.ly/IEPx6P #dataharvestTommy Kaas
Training material and tip sheets to Google Fusion sessions today http://bit.ly/IzPWYh #dataharvestTommy Kaas
Made my first Google Fusion Table map visualization. Cool stuff can be obtained. Thx @tbkaas #dataharvest http://twitpic.com/9iq83qTeemo Tebest
Social media
Another part of the conference was looking into the use of social media – hosted by Paul Myers from BBC.
@pauliemyers you can get the Facebook-ID through FB’s graph API. See for example https://graph.facebook.com/teelmo & https://graph.facebook.com/635279474 #dataharvestTeemo Tebest
Tommy Kaas: Check out the freebies at Linkedin For Journalists. You can get free upgrade to Exec. version. #dataharvestjfuruly
Thought I knew how to use Facebook and Twitter to research people. Then I met @pauliemyers at #Dataharvest More here: http://www.researchclinic.co.ukAndreas Marckmann
Handouts and links
A lot of other good stuff was presented. Some of the links are gathered here: 
Thanks for comments at #dataharvest last week. My presentation on sponsored doctors is up now http://slidesha.re/KUiLuoanders pedersen
@teelmo The project on sponsored doctors is up here and avail. as download #dataharvest http://www.information.dk/databloggenanders pedersen
Pad with links from the DataHarvest conference in Bruxelles http://bit.ly/IOTBBi #ddj via @Hackette7Datenjournalist
Link to the data retention visualization http://www.zeit.de/datenschutz/malte-spitz-data-retention by @opendatacity #dataharvestMichael Kreil
And the http://crowdflow.net project, including an animation of the movement of 900 cellphone owners http://crowdflow.net/2011/07/12/fireflies-hd/ #dataharvestMichael Kreil
To my friends at the #dataharvest here’s my presentation "How to crack open information from Spain?" https://docs.google.com/presentation/pub?id=1RhwfZ3pwlRnvKB1qtPdogmjRC9NunwS-MNxuaqKsGMM&start=false&loop=false&delayms=3000 #periodismodatosMar Cabra
Estonian register court decisions https://www.riigiteataja.ee/kohtuteave/maa_ringkonna_kohtulahendid/main.html #dataharvestgraafik
Great presentation by @okvivi about mapping romanian politicians: http://www.hartapoliticii.ro/ #dataharvestFriedrich Lindenberg
Polish portal for watching politicians http://sejmometr.pl/ #dataharvest CC: @okviviStefan Urbanek
Using Eurostat data to choose your favourite region suggested by @pudo http://bit.ly/KyfaaK #dataharvestBrigitte Alfter
Some stats on living in the EU http://bit.ly/Kyfpm8 #dataharvestBrigitte Alfter
RT Collection of sources to finance jouranalism @ursulean: (…) http://delicious.com/ides/funding #dataharvestBrigitte Alfter
More and more interesting links on the pad #Dataharvest http://bit.ly/IOTBBi?Brigitte Alfter
EU open spending data visualisation http://bit.ly/IOTl5j #DataharvestBrigitte Alfter
RT @knightfdn: Via @jonathanstray: "Here’s my ‘crash course in data journalism’ notes" http://bit.ly/Ixvvq0 #opengov #opendata #dataharvestIdes Debruyne
Now at #dataharvest in Brussels. Tomorrow speaking about #maps #charts http://bit.ly/GLIKqmErik
Aftenposten word cloud on Cablegate http://www.aftenposten.no/spesial/cablegate/ More specials here: http://www.aftenposten.no/spesial/ #dataharvest #cablegatejfuruly
Goldmine: Norway’s Electronic Public Records http://www.oep.no/nettsted/fad?lang=en Use it w GoogleTranslate #dataharvest<impressedpaul james martin
RT @Hackette7: Keep adding EU data sources to http://caelainn.primarypad.com/1? #Dataharvestalesha novichkov
V. useful primer on fisheries subsidies, by @oceana : http://oceana.org/en/our-work/promote-responsible-fishing/fishing-subsidies/learn-act/more-on-fisheries-subsidies #dataharvestfishsubsidy.org
About to present the http://fishsubsidy.org & Looting the Seas stories with @cabralens In Room 1.25 #dataharvestfishsubsidy.org


Fishsubsidy.org: New online database of €1.1 billion in EU fisheries subsidies; concerns about declining data standards

Fishsubsidy.org, the transparency project which in 2009 launched an online database of EU fisheries subsidies from 1994 to 2006, has launched a new database of payments under the European Fisheries Fund, from 2007 to 2010.

The fisheries subsidy database is online at http://www.fishsubsidy.org/eff

A critical report on the availability of the data: http://eutransparency.org/wp-uploads/2011/11/eyes-wide-shut.pdf

The project’s co-founders have sounded a grave warning about the deteriorating quality of data released to the public, and the implications of this the waste, fraud and abuse of EU funds.

Nils Mulvad, the Danish data journalist and fishsubsidy.org co-founder who led the collection of data from the twenty seven EU member states said:

“It is really, really bad. Many governments don’t comply with basic EU laws on transparency. Some governments publish no data at all, others are publishing incomplete data in bad formats like PDF files running to thousands of pages. This is money from the EU budget, paid for by European citizens who have a right to know who gets what. The European Commission must get a grip.”

Fishsubsidy.org co-founder Jack Thurston said:

“There is a new European Transparency Initiative. But today we have less information on EU fish subsidy payments than we used to have in past years. It’s a real step backwards in transparency and at a time when we desperately need to know how this money is being spent. Are EU funds are being used to fish for over-exploited fish stocks, or perhaps worse, for criminal fishing operations. We just don’t know. What is most startling is that neither does the Commission because we know that they have not themselves asked for this data from national governments.”

A transparency index evaluates the data published by member states for completeness, details and accessibility. It shows which countries are doing better and which doing badly. The ranking is topped by Sweden. Belgium, Czech Republic, Estonia and the UK score relatively highly, though with significant deficiencies. The worst performers were Greece and Portugal, which appear to have published no data at all, despite spending a significant share of EU fisheries funds. Spain, which accounts for some 40 per cent of fisheries subsidies spending, scored just 48% in the transparency ranking.

Eyes Wide Shut: EU rules on transparency in fisheries subsidies are failing citizens – and the European Commission couldn’t care less is a report describing fishsubsidy.org’s quest for data on the European Fisheries Fund. It is available at: http://eutransparency.org/wp-uploads/2011/11/eyes-wide-shut.pdf

Fishsubsidy.org is a project coordinated by EU Transparency, a nonprofit organisation in the UK, and the Pew Environment Group. The aim is to obtain detailed data related to payments and recipients of fisheries subsidies in every EU member state and make this data available in a way that is useful to European citizens. Subsidies paid to owners of fishing vessels and others working in the fishing industry under the European Fisheries Fund total about €1 billion a year (2007-2013).

Detailed analysis of EU fisheries subsidies from 2000 to 2006 is available in “FIFG 2000-2006 Shadow Evaluation” (Cappell, R., T. Huntingdon and G. Macfadyen) at http://www.pewenvironment.eu/resources/view/id/115178?download=true

A list of vessels in the tuna fleet that receive EU subsidies is available at www.fishsubsidy.org/EU/tuna-fleet, and a list of vessels convicted of serious infringements (illegal fishing) is available at www.fishsubsidy.org/infringements. A list of vessels that received EU grants for modernisation and shortly afterwards grants for scrapping is available at: www.fishsubsidy.org/news/features/modernised-then-scrapped/

Under the European Transparency Initiative, details of all end beneficiaries of EU funds should be published, to improve accountability, legitimacy and as a way of combating fraud and abuse. See: www.ec.europa.eu/transparency/eti/index_en.htm

The Pew Environment Group is the conservation arm of The Pew Charitable Trusts, a nongovernmental organisation that applies a rigorous, analytical approach to improve public policy, inform the public and stimulate civic life.


Program for the conference

Hands-on training

Sarah Cohen on timelines


Albrecht Ude: Surfing anonymously and unfiltered

Albrecht Ude: Searching people

Yahoo Pipes

Paul Bradshaw on Yahoo Pipes




Luuk Sengers and John Bones: Excel 1
Luuk Sengers and John Bones: Excel 2
Extra material from Jennifer LaFleur

Google Fusion:

Jennifer LaFleurs material

Hand-out on polygonsExercise.

Google Refine:

Combining data with Refine

Cleaning data with Refine







Chase Davis on Google Refine API

Site on Google Refine with tutorials




Sebastian Mondial: Advices for crowdsourcing

Nils Mulvad: New methods for covering events and investigative journalism melt together
Nils Mulvad: Crowdsourcing in investigative journalism.

Apps for investigations

Nils Mulvad: Some Apps for investigations
Story-based inquiry
English: Story-based inquiry

Russian: Story-based inquiry

Presentation of Story-based inquiry in Kiev

Access to data

David Smallmans handout 1
David Smallmans handout 2


Andy Lehrens presentation on Wikileaks
Jan Gunnar Furelys presentation on Wikileaks
Musikilu Mojeeds presentation on Wikileaks
David Leighs presentation on Wikileaks
Jan Michael Ihls presentation: How can journalists and leakers trust each other again

Some other material

Giannina Segnini on CAR-methods

Sarah Cohen on textmining

Sarah Cohen and Aron Pilhofer on vizualisation

Andy Lehren: Social Media Tools

James Grimaldi: Multimedia

Bo Elkjærs presentation

Frédéric Zalac, Serena Tinari and Susanne Reber on Investigating Big Pharma

Arms smuggling – TRILOGY In the Name of the State – Matej Surc, Blaz Zgaga – Slovenia
(Maps in this presentation are copyrighted by Sanje Publishing House, Ljubljana, Slovenia. www.sanje.si)

Tracking the billions – with O’Murchu and Shleynov – presentation 1
Tracking the billions – with O’Murchu and Shleynov – presentation 2


Ana Aranas presentation

Result of group thinking – lots of good tips

Advice from Albrecht Ude on New Tools for Today’s Investigative Journalist.