Wednesday, July 12, 2006

Jeff Jonas on information sharing

Jeff Jonas of IBM spoke at a panel on how technology is changing intelligence services at the Council on Foreign Relations a couple of months ago. When you listen to folks like Jonas outlining key principles about information gathering it makes you wonder how politicians can still get things so wrong.

"All right. Well, let me start with this notion that there’s a lot of conversation about information sharing, and I think that that’s kind of like a second principle.

And I have kind of come up with what I’m calling the information-sharing paradox, and I think it’s—if you start with information sharing, we’re likely to fail. And the reason is that it’s not really practical to share everything with everybody. And if you can’t share everything with everybody, your next option is, can you ask everybody every question? And it turns out that’s not practical either.

So this information sharing paradox is, if you can’t share everything with everybody and you can’t ask everyone every question every day, how is someone going to find something? And that’s the information sharing paradox.

And I think that the solution is discovery. You have to know who to ask for what. So thinking about this in terms of the card catalog at the Library of Congress, the Reader’s Digest version is no one goes to the library and roams the hall to look for the book; you go to a card file and the card file tells you where to go. If someone were to be putting books in the halls or one of the aisles of the library and not put a card in the card file it would be nondiscoverable.

So a first principle is, holders of data should be publishing some subset of the data, subject, title, author, to card files, and card files are used for discovery, and then you know who to ask.

So the first principle is discovery.

And from a policy standpoint, the question then is, how do you get people to contribute data to the card file? What data are they putting in the card file? And when I think about that, I think about motivating data holders. If you have an owner of a system, their value to the enterprise is the degree to which their data is useable. But before it’s useable, it must be discoverable.

So if you quantified people’s contributions to the card file, if you had an aisle at the library, a system, and they contributed no cards to the card file, their enterprise value would be less.

So I’m speaking here to a metric about how one would quantify discoverability.

And it turns out as you create these things that the card file itself becomes the target. When you put a few billion things in there that point to the documents in the holdings and the assets across many different systems or silos, after awhile it starts to feel like the card file is the risk, the risk of unintended disclosure, the risk of that running away from you, having an insider run off with it.

So one of the things that I’ve been pursuing is the ability to anonymize the card file so that even if the database administrator who oversees this card file is corrupt, he or she can’t actually look through it and shop or scan for names or addresses. The data in it has been scrambled in a way that it’s—when I use the word anonymize, by the way—some of the recent news has indicated that a phone number by itself is anonymized; I would differ. My view of anonymize is that it—to me, anonymized data is data that is nonhuman-interpretable, and nonreversible. So therefore if there is a match, no single person can unlock it. They actually have to go back to the holder of it and request the record, which in an information-sharing model with public sector to private sector, that request would then come out as consent or a subpoena, NSL FISA.

So I thought I’d start there."

Just to repeat the crux of that, "if you can’t share everything with everybody and you can’t ask everyone every question every day, how is someone going to find something? And that’s the information sharing paradox.

And I think that the solution is discovery. You have to know who to ask for what."

No amount of hoping that the computer, after sucking up voluminous quantities of data about everyone, will magically find something, is going to save you, if you don't know what that something is and you don't know who or what to ask to find it. Thanks to William for the link.

No comments: