Mapping the news: a story of personal failure

I have a secret: I’m a closet taxonomist. I love mapping relationships and understanding how things came to be from the information that we have at our hands. Must be the biology training that I never got over. One of the projects that I am not so happy about the results was a simple idea: I tried to take news and map it.

I guess that I’m not as bright as the thousands of brilliant minds thinking through how to add geographically contextually relevant information to everything. I loved playing with a Wii that I had won to spin the globe and read news however, I just couldn’t make it work for me. In theory this is fascinating stuff: hydrate (a beautiful way of saying add) news information with geocoding and throw those ‘points of interest’ on a map.

I sat with taxonomists, technologists, handset manufacturers, project managers, designers. I whiteboarded, I sat through somewhat interesting video conferences, I wrote business plans but nothing came of it even though I remember influencing roadmaps so that geocode data would be as precise as possible. I remember that I wanted journalists to code up their entries with lat/longs as soon as they started writing. I was aching for a panacea where I could throw all this stuff on a map and it would look beautiful.

There were enough technical problems to overcome:

Byline mapping

In the first revision all stories would have their byline analyzed and then mapped according to a rule engine with geographies that corresponded with the byline. However, even though it helped to know that a story was about Cairo, this wasn’t granular to show that a story about a riot in a souk was in the souk. No, the lat-long was a central point in the middle of Cairo. If I’m from Nebraska, that’s probably enough. But it wasn’t enough for me. I wanted to know exactly where these stories were happening. I wanted to be able to track the riots neighborhood by neighborhood. Byline mapping also only provided one geocode, the one for where the story was filed. What if the story referenced the souk or a neighborhood, not the city but a place in it?

Distinguishing multiple addresses

Worse: what if the story referenced two places in it, or three. How would you distinguish between them as to which was the top location to hydrate with geo-code data. Sure, you could hydrate them all (not a problem) but that didn’t really solve the underlying problem of which was more important.

Hydrating language in the story

But what if the story didn’t have a distinct address. What if the murder happened at 3rd street between Sullivan and Thompson. We’d have to take a best guess as to whether this was referring to Detroit, where the attacker was from; Philadelphia, where the victim was from; and, Minnesota, where the shooting took place. Proximity within the article could help you get there, I’m sure, but wasn’t fool proof.

Distinguishing whether a story is a Chinese story or a Wisconsin story

When an American won gold at the Beijing Olympic games, it became a flagship example of how confusing this was for me. Should the story be tagged Beijing (it was certainly filed from there, and referred to events that happened there); should it be tagged US (because a US swimmer won the medal) or should it be tagged with Wisconsin (where that swimmer ultimately heralded from).

Secret sources

How would you deal with a story from a journalist who had met with members of Al Queda and wanted to keep the location of the meeting and her own location out of the story in order to gather more information and not put her or her sources at risk. How do you deal with a story that has no geographical focus in a world of geographical mapping.

Who’s going to pay for all this?

In the end it came down to: was this effort really worth it? I knew I could crack this if I could come up with a way for people to get excited enough to pay for the information, or for advertisers to subsidize the creation of the content. The problem is: who wants to advertise against the exact location of a murder. Let’s be honest, most news is depressing. Knowing that there’s depressing stuff happening all around you is even more depressing: that’s why I never logged into EveryBlock a second time. What advertiser or reader is going to pay for that experience.

Am I thinking this through the wrong way? Am I missing something critical, an open sesame? Maybe. Let me know?