Data journalism needs to be more than external data sets

Paul Bradshaw has a good column at Poynter about how the increasing availability of data will force journalists and news organizations to change:

Data journalism takes in a huge range of disciplines, from Computer Assisted Reporting (CAR) and programming, to visualisation and statistics. If you are a journalist with a strength in one of those areas, you are currently exceptional. This cannot last for long: The industry will have to skill up, or it will have nothing left to sell. …

So on a commercial level, if nothing else, publishing will need to establish where the value lies in this new environment, and where new efficiencies can make journalism viable. Data journalism is one of those areas.

Journalists should read and heed everything Bradshaw writes. But it’s important to make sure the discussion of data doesn’t get too narrowly confined to external data, without considering how journalism itself fits holistically into the data-centric future.

The big challenge for news organizations isn’t just how to better ingest, analyze, and present extant external (if sometimes hard-to-access) data sets. Inculcating a new skill set industrywide may be non-trivial as a matter of scale and institutional-cultural inertia, but at least that skill set is pretty well defined.

Rather, the trickier and less-addressed challenge for news organizations is how to turn the raw materials and finished products of non-database journalism into data.

While Bradshaw begins by defining data broadly as “information that can be processed by computers,” he mostly talks about one specific type of data: spreadsheets and databases containing digitized government and organizational information.

Emphases mine:

The growth of the spreadsheet and the database from the 1960s onwards kicked things off by making it much easier for organisations — including governments –to digitise information, from what they spent our money on to how many people were being treated for which diseases, and where. …

The open data movement campaigns for important information — such as government spending, scientific information and maps — to be made publicly available. …

That means, for instance, a computer can see that the director of a company named in a particular government contract is the same person who was paid as a consultant on a related government policy document. …

Concrete results of both movements can be seen in the US and UK — most visibly with the launch of government data repositories and in 2009 and 2010 respectively — but also less publicised experiments such as “Where Does My Money Go?“, which uses data to show how public expenditure is distributed, and “Mapumen-tal,” which combines travel data, property prices and public ratings of ‘scenicness’ to help show at a glance which areas of a city might be the best place to live based on individual requirements. …

While government and organizational stats and data sets will be a huge part of journalism’s future, they very likely won’t be the only part — particularly for local news organizations. What Bradshaw refers to as the “base metals” of traditional journalism — “eyewitness accounts and interviews … official reports, research papers” — aren’t going away.

Neither are news organizations’ ever-growing repositories of information about local businesses and people, sports, recreation, travel, entertainment, things to do, etc.

News organizations will best set themselves up for the future if journalists become more skilled at handling external data AND if traditional narrative journalism itself is data-fied (along with the non-narrative information mentioned above).

This isn’t a new idea; Adrian Holovaty, Dan Conover, and Stijn Debrouwere, among others, have been fleshing out this line of thinking for several years. But (generalization alert!) somewhere along the way, the journalism-as-structured-data discussion seems to have been overtaken by the journalists-as-processors-of-externaldata one.

I thought about this divergent discussion while reading Mindy McAdams’ recent lament about lack of innovation in mobile/iPad news apps (emphases hers):

I have yet to see an app from any news organization — for a phone or the iPad — that spells innovation. Steve [Yelvington] refers to “completely new information experiences that don’t even vaguely resemble old products,” but whatever these are, I have not seen them.

I agree with McAdams as far as that goes, just as I agree with Bradshaw. But there’s an obvious reason why we haven’t seen “completely new information experiences” in news apps: there’s currently little “completely new information” being created that could power such apps.

After all, there’s a limit to how innovative front-end wizardry can be on its own. There are only so many ways to present largely unstructured stories, blog posts, photos, and videos  — still the vast majority of content produced by news organizations.

Getting to a point where news orgs routinely produce structured (or semi-structured) data — what Conover calls the Informatics Scenario — will require new tools, CMSes, processes, culture, knowledge. You know, no big whoop.

But we shouldn’t let this scenario’s difficulty/unlikelihood turn into a blind spot.


6 responses to “Data journalism needs to be more than external data sets

  2. Couldn’t agree more – we need to rethink all our newsroom processes to create the building blocks of new products: sites/apps/databases that really speak to the way people absorb and use information these days.
    It’s a topic I focus on at my blog, Would love to pushing on this point.

  3. Thanks for the comment, Reg, and for the mention — I’m sorry I didn’t come across your blog before writing this! Now I have something new to read over Thanksgiving …

  4. Here, here. You have advanced an evolving line of thought (and I especially like the mention of Stijn Debrouwere’s work, which is excellent and little-known.)

    One notion I had: newsrooms need an info filing system. Perhaps like a CRM app — a cross referenced filing system wherein is logged every phone call to confirm, every recorded interview, every story note. This is the primary material of reporting, and it’s currently mostly wasted. How many times have you done an hour interview and used two quotes? Tiny drabs of info make it into finished stories, but that source material could be re-used in countless other ways. I call this problem “editing like it’s paper.”

  5. Jonathan, definitely agree that phone calls, interviews (recorded or just note-taken), and notes should be logged. It would also be important to come up with a data standard for these “raw materials” of journalism, so it can all be logged as structured/semi-structured data.

  6. Jonathan, Josh, I agree that we waste/lose/throw away too much of our daily work. And, if we’re to keep it, we need a data structure for it to be valuable. The trick, with all data structures, is that it involves semi-permanent choices about what parts of information are important and what aren’t – is time of day important, or just day? Should location of an interview be logged, or does it not matter? Do we need the person’s age? And so on. If we’re not careful, we can drop in a sea of valuable – but overly complex – information. Two thoughts: One is how Politifact cut back what they wanted from reporters to a very simple structure; it “lost” a lot of information, but it made up for it with a strong new type of product. Another is that we need to come up with products first, and then structure the work around it, rather than try to structure data around the pretty messy life of a normal journalist. People are adaptable; they don’t need to keep working the way they used to. Reg