Sunday, November 11, 2007

Give us the data and forget the shiny interface

Rufus Pollock has a crucially important point to make about data projects i.e. forget the shiny interface as it mainly gets in the way.

"One thing I find remarkable about many data projects is how much effort goes into developing a shiny front-end for the material. Now I’m not knocking shiny front-ends, they’re important for providing a way for many users to get at the material (and very useful for demonstrating to funders where all the money went). But shiny front ends (SFEs from now on) do have various drawbacks:
  • They often take over completely and start acting as a restriction on the way you can get data out of the system. (A classic example of this is the Millenium Development Goals website which has lots of shiny ajax which actually make it really hard to grab all of the data out of the system — please, please just give me a plain old csv file and a plain old url).
  • Even if the SFE doesn’t actually get in the way, they do take money away from the central job of getting the data out there in a simple form, and …
  • They tend to date rapidly. Think what a website designed five years ago looks like today (hello css). Then think about what will happen to that nifty ajax+css work you’ve just done. By contrast ascii text, csv files and plain old sql dumps (at least if done with some respect for the ascii standard) don’t date — they remain forever in style.
  • They reflect an interface centric, rather than data centric, point of view. This is wrong. Many interfaces can be written to that data (and not just a web one) and it is likely (if not certain) that a better interface will be written by someone else (albeit perhaps with some delay). Furthermore the data can be used for many other purposes than read-only access. To summarize: The data is primary, the interface secondary.
  • Taking this issue further, for many projects, because the interface is taken as primary, the data does not get released until the interface has been developed. This can cause significant delay in getting access to that data.

When such points are made people often reply: “But you don’t want the data raw, in all its complexity. We need to clean it up and present it for you.” To which we should reply:

“No, we want the data raw, and we want the data now”"

I couldn't agree more, particularly in the context of open access content projects, and this message should be embedded in the dna of the folks who run or facilitate such projects.

No comments: