Friday, November 16, 2007

Ask first on ACTA

Michael Geist thinks Canada should follow Australia's lead and launch a public consultation on whether they should be involved in the forthcoming Anti-Counterfeiting Trade Agreement negotiations.

Geist recently closed the Terra Incognito conference, the 29th International Conference of Data Protection and Privacy Commissioners:

Latest anti-P2P bill progresses in Congress

The latest anti P2P file sharing bill, buried in the midst of a larger 'College Opportunity and Affordability Act', is making steady progress through Congress, reports Anne Broache.

"The U.S. House of Representatives has taken a step toward approving a Hollywood-backed spending bill requiring universities to consider offering "alternatives" and "technology-based deterrents" to illegal peer-to-peer file sharing.

In the House Education and Labor Committee's mammoth College Opportunity and Affordability Act (PDF) lies a tiny section, which dictates universities that participate in federal financial aid programs "shall" devise plans for "alternative" offerings to unlawful downloading, such as subscription-based services, or "technology-based deterrents to prevent such illegal activity." The committee unanimously approved the bill Thursday. "

Potential Hazards of the "Protect America Act"

Via EPIC, a who's who of security experts, including Steve Bellovin, Susan Landau, Whitfield Diffie, Matt Blaze, Peter Neumann and Jennifer Rexford released an important report in October on the Congressional rubber stamping, via the "Protect America Act", of the Bush administration's mass telephone and internet wiretapping programme. EPIC says:
"Security Experts Report on Hazards of New Surveillance Architecture

This summer's Protect America Act (PAA) temporarily authorized
warrantless surveillance of communications that Americans have with
individuals abroad. The use of this authority will require the
deployment of new interception technologies. These new technologies
raise several significant security risks.

The report identified the three most serious security risks. The experts
pointed to the danger that the system could be exploited by unauthorized
users. A Greek wiretapping system was exploited by an as yet unknown
party to listen in on government conversations. FBI documents of the DCS
3000 telephone wiretap system revealed several problems in the system's
implementation. This risk turns a surveillance system on its head.

Another risk is the misuse by a trusted insider. Someone with access to
the system could use it for improper purposes. Robert Hanssen abused his
access to FBI systems to steal information and to track investigations
of him. Recently a treasury agent was indicted for using the Treasury
Enforcement Communications System (TECS) in order to stalk his former

The third major risk is misuse by the US government. Watergate era
investigations revealed wiretaps of Congressional staff, supreme court
justices. These abuses also targeted non-violent activists such as
Martin Luther King, the American Friends Service Committee and the
National Association for the Advancement of Colored People.

The security experts provide key recommendations to guard against these
risks. First is minimization. Decreasing the number of interception
points simplifies security problems. Experts also recommend that
architecture be developed with communications carriers, maintaining them
as a check on government activity. Finally they recommend independent
oversight, with regular detailed reporting.

Risking Communications Security: Potential Hazards of the "Protect
America Act" (pdf):

A Gateway For Hackers -- Susan Landau:

Privacy On the Line: The Politics of Wiretapping and Encryption, Updated
and Expanded Edition:"

From the report itself:

"1 Introduction
The Protect America Act passed in August 2007 changes U.S. law to allow warrantless foreignintelligence
wiretapping from within the U.S. of any communications believed to include one party
located outside the United States. U.S. systems for foreign intelligence surveillance located outside
the United States minimize access to the traffic of U.S. persons by virtue of their location. The
new law does not—and could lead to surveillance on a unprecedented scale that will unavoidably
pick up some purely domestic communications. The civil-liberties concern is whether the new
law puts Americans at risk of spurious — and invasive — surveillance by their own government.
The security concern is whether the new law puts Americans at risk of illegitimate surveillance
by others. We focus on security. If the system is to work, it is important that the surveillance
architecture not decrease the security of the U.S. communications networks.
The choice of architecture matters; minor changes can have significant effects, particularly with
regard to limiting the scope of inadverdent interception. In attempting to collect communications
with one end outside the United States, the new law allows the development of a system that
will probably pick up many purely domestic communications. How will the collection system
determine that communications have one end outside the United States? How will the surveillance
be secured?
We examine security risks posed by the new law and put forth recommendations to address
them. We begin by presenting background, first legal and policy, and then technical. Next we examine the difficulties in monitoring international Internet traffic. We follow with a general discussion
of risks in communications surveillance systems and then an analysis of those we fear may
result from implementing the Protect America Act. We conclude with a set of recommendations
regarding design and implementation...

5 Recommendations
The change from a system that wiretaps particular lines upon receipt of a wiretap order specifying
those lines to one that sorts through transactional data in real time and selects communications of
interest is massive. Where interception occurs and how the data sources — CDRs, traffic, other
information— are combined and used — will not only affect how powerful a tool the warrantless
wiretapping is, it will affect how likely the system is to pick up purely domestic communications.
In building a communications surveillance system itself — and saving its enemies the effort —
the U.S. government is creating three distinct serious security risks: danger of exploitation of the
system by unauthorized users, danger of criminal misuse by trusted insiders, and danger of misuse
by U.S. government agents. How should the U.S. mitigate the risks?
Minimization matters. Allowing collection of calls on U.S. territory necessarily entails greater
access to the communications of U.S. persons. An architecture that minimizes the collection of
communications lowers the risk of exploitation by outsiders and exposure to insider attacks. Traf-
fic should be collected at international cableheads rather than at tandem switches or backbone
routers, which also carry purely domestic traffic. Surveilling at the cableheads will help minimize
collection but it is not sufficient in and of itself. Intercepted traffic should be studied (by
geo-location and any other available techniques) to determine whether it comes from non-targeted
U.S. persons and if so, discarded before any further processing is done. It should be fundamental
to the design of the system that the combination of interception location and selection methods
minimizes the collection of purely domestic traffic.
Architecture matters. Using real-time transactional information to intercept high volume traffic
makes architectural choices critical. Robust auditing and logging systems must be part of the
system design. Communication providers, who have technical expertise and decades of experience
protecting the security and privacy of their customers’ communications, should have an active
role in both design and operation. “Two-person control” is applicable to organizations as well as
Oversight matters. The new system is likely to operate differently from previous wiretapping
regimes, and likely to be using new technologies for purposes of targeting wiretaps. There should
be appropriate oversight by publicly accountable bodies. While the details of problems may remain
classified, there should be a publicly known system for handling situations when “mistakes
are made.” To assure independence the overseeing authority should be as far removed from the
intercepting authority as practical. To guarantee that electronic surveillance is effective and free
of abuse and that minimization is in place and working appropriately, it is necessary that there be
frequent, detailed reports on the functioning of the system. Of particular concern is the real-time
use of CDR for targeting content, which must neither be abused by the U.S. government nor allowed
to fall into unauthorized hands. For full oversight, such review should be done by a branch
of government different from the one conducting the surveillance. We recommend frequent ex post
facto review of the CDR-based real-time targeting. The oversight mechanism must include outside
reviewers who regularly ask, “What has gone wrong lately—regardless of whether you recovered
— that you have not yet told us about?”
Security of U.S. communications has always been fundamental to U.S. national security. The
surveillance architecture implied by the Protect America Act will, by its very nature, capture some
purely domestic communications, risking the very national security that the act is supposed to
protect. In an age so dependent on communication, the loss may be greater than the gain. To
prevent greater threats to U.S. national security, it is imperative that proper security — including
minimization, robust control, and oversight — be built into the system from the start. If security
cannot be assured, then any surveillance performed using that system will be inherently fraught
with risks that may be fundamentally unacceptable."

Comcast sued for blocking BitTorrent

Apparently a Comcast customer has decided to sue the company, "arguing that the company's secret use of technology to limit peer-to-peer applications such as BitTorrent violates federal computer fraud laws, their user contracts and anti-fraudulent advertising statutes."

A suit worth watching on the control of Net traffic flows.

Uk government broke data protection laws

The Information Commissioner's Office has ruled that the UK government has broken the terms of the Data Protection Act by failing to properly protect visa applications made over the internet using its UKvisas website.

Every day diplomacy

Also via Schneier: a Japanese tourist was verbally abused, threatened and removed from a train between New York and Boston, for taking pictures of the passing landscape through the train window.

More dodgy airport security

Also from Crypto-gram:

"A classified 2006 TSA report on airport security was leaked to USA Today. (Other papers covered the story, but their articles all seem to be derived from the original USA Today article.)

Weirdest news: "At San Diego International Airport, tests are run by passengers whom local TSA managers ask to carry a fake bomb, said screener Cris Soulia, an official in a screeners union." Someone please tell me this doesn't actually happen. "Hi Mr. Passenger. I'm a TSA manager. You know I'm not lying to you because of this official-looking laminated badge I have. We need you to help us test airport security. Here's a 'fake' bomb that we'd like you to carry through security in your luggage. Another TSA manager will, um, meet you at your destination. Give the fake bomb to him when you land. And, by the way, what's your mother's maiden name?" How in the world is this a good idea? And how hard is it to dress real TSA managers up like vacationers?

TSA claims that this doesn't happen:
Here's someone who said that it did, at Dulles Airport:"

Reality beats fiction every time.

War on innocent but different

Bruce Schneier's essay War on the Unexpected in his latest Crypto-gram, should be compulsory reading for Gordon Brown, Jacqui Smith and their advisers. In fact they should be locked in a room - for 58 days if necessary - and made to read it repeatedly until they get the message.

"We've opened up a new front on the war on terror. It's an attack on the unique, the unorthodox, the unexpected; it's a war on different. If you act different, you might find yourself investigated, questioned, and even arrested -- even if you did nothing wrong, and had no intention of doing anything wrong. The problem is a combination of citizen informants and a CYA attitude among police that results in a knee-jerk escalation of reported threats.

This isn't the way counterterrorism is supposed to work, but it's happening everywhere. It's a result of our relentless campaign to convince ordinary citizens that they're the front line of terrorism defense. "If you see something, say something" is how the ads read in the New York City subways. "If you suspect something, report it" urges another ad campaign in Manchester, UK. The Michigan State Police have a seven-minute video. Administration officials from then-attorney general John Ashcroft to DHS Secretary Michael Chertoff to President Bush have asked us all to report any suspicious activity...

Watch how it happens. Someone sees something, so he says something. The person he says it to -- a policeman, a security guard, a flight attendant -- now faces a choice: ignore or escalate. Even though he may believe that it's a false alarm, it's not in his best interests to dismiss the threat. If he's wrong, it'll cost him his career. But if he escalates, he'll be praised for "doing his job" and the cost will be borne by others. So he escalates. And the person he escalates to also escalates, in a series of CYA decisions. And before we're done, innocent people have been arrested, airports have been evacuated, and hundreds of police hours have been wasted...

Of course, by then it's too late for the authorities to admit that they made a mistake and overreacted, that a sane voice of reason at some level should have prevailed. What follows is the parade of police and elected officials praising each other for doing a great job, and prosecuting the poor victim -- the person who was different in the first place -- for having the temerity to try to trick them. For some reason, governments are encouraging this kind of behavior...

If you ask amateurs to act as front-line security personnel, you shouldn't be surprised when you get amateur security.

We need to do two things. The first is to stop urging people to report their fears. People have always come forward to tell the police when they see something genuinely suspicious, and should continue to do so. But encouraging people to raise an alarm every time they're spooked only squanders our security resources and makes no one safer.

We don't want people to never report anything. A store clerk's tip led to the unraveling of a plot to attack Fort Dix last May, and in March an alert Southern California woman foiled a kidnapping by calling the police about a suspicious man carting around a person-sized crate. But these incidents only reinforce the need to realistically assess, not automatically escalate, citizen tips...

Equally important, politicians need to stop praising and promoting the officers who get it wrong. And everyone needs to stop castigating, and prosecuting, the victims just because they embarrassed the police by their innocence.

Causing a city-wide panic over blinking signs, a guy with a pellet gun, or stray backpacks, is not evidence of doing a good job: it's evidence of squandering police resources. Even worse, it causes its own form of terror, and encourages people to be even more alarmist in the future. We need to spend our resources on things that actually make us safer, not on chasing down and trumpeting every paranoid threat anyone can come up with.

Ad campaigns:

Administration comments:



Public campaigns:

Law protecting tipsters:

Successful tips:

This essay originally appeared in

Some links didn't make it into the original article. There's this creepy "if you see a father holding his child's hands, call the cops" campaign:
There's this story of an iPod found on an airplane:
There's this story of an "improvised electronics device" trying to get through airport security:
This is a good essay on the "war on electronics.""

As usual, Crypto-gram is full of gems and worth perusing in full. One victim of the UK mentality of collective panic in the wake of the 7 July bombings was a man who slipped into a diabetic coma on a bus. Police shot him twice with a Taser gun, whilst he was unconscious, because they thought he was a suicide bomber. When he came round he found himself handcuffed in the back of a police van with presumably no idea of how he'd got there. Police eventually took him to hospital when he explained he was diabetic but kept the handcuffs on whilst he was treated. When they finally accepted he was telling the truth and let him go they apparently gave him a half hearted apology claiming he looked Egyptian. Nine days later Jean Charles de Menezes was shot dead by police who thought he was a suicide bomber.

Security through liberty

I liked Timothy Garton Ash's soundbite in the Guardian yesterday:

"What is needed is a change of paradigm: from liberty through security to security through liberty."

The most effective defense against terrorism is to be not afraid. Sadly the government
don't even say don't panic, in a corporal Jones like frenzy, they say "danger here! danger there! danger everywhere! But we'll protect you by slicing away your freedoms."

Wednesday, November 14, 2007

Nurse sacked over press interview

Mark Steel has a sadly unsurprising tale to tell in the Independent today, You can't go round telling people you've been sacked.

A nurse has been sacked for
  • speaking to the press about 24 mental health patients being kept in 20 beds,
  • for telling people, after she had been suspended, that she had been suspended
  • for telling people she was innocent
  • for allowing the press to print misleading statements about her case.
No you could not make it up.

You know I stood during the silence on Remembrance Day observations on Sunday and thought about the sacrifice that all those people have made for this country and others - including, only weeks ago, a young man, originally based locally, killed in Iraq, leaving behind his wife pregnant with their first child - and I asked myself if they would now consider that sacrifice worthwhile, for a country
I could go on but it is way too depressing. Yet I still have to believe the sacrifice of those killed in war has to have been worth it. My younger son is really interested in the Second World War at the moment and he has concluded that war and wars "are just so stupid, dad... there were much better ways to fix those things..." My kids and their friends are a lot smarter than me and hopefully they will have the confidence, the perseverance and the strength to use those 'better ways'.

Tuesday, November 13, 2007

Government claim security problems with Electronic Patient Records fixed

The Government has responded to the Health Committee report on the Electronic Patient Record. It's another appalling example of how the government grinds irrationally on with the construction of white elephant information systems, in spite of all the evidence pointing towards the inevitability of the coming catastrophic failures of the system. From page 6:

"Recommendation (paragraph 121)

“Sealed envelopes” are a vital mechanism if sensitive information is to be held on the SCR. We recommend that:
• The right to break the seal protecting information in “sealed envelopes” should only be held by patients themselves, except where there is a legal requirement to override this measure; and
• Information in “sealed envelopes” should not be made available to the Secondary Uses Service under any circumstances; this will allow patients to prevent data being used for research purposes without their consent.

The Government accepts the first of these recommendations. Patient-sealed envelopes provide the mechanism whereby patients can restrict access to the parts of their SCR they consider to be particularly sensitive. Patients will be able to request that parts of their record are either ‘sealed’ or ‘sealed and locked’. These procedures form a level of access control deployed at the direction of the patient, not the NHS.

Sealed information will be recorded on the SCR and system users will be aware that some information has been sealed. However, access to the sealed information from outside of the team recording it will be obtainable only with the patient’s consent or in exceptional circumstances. Only those users with the necessary privileges will be able to gain temporary access to sealed information without the patient’s consent. A privacy officer will be alerted to the temporary access by any user and patients registered with HealthSpace will receive a notification when access permissions are changed or when temporary access is gained.

Sealed and locked information cannot be accessed outside of the team that recorded it. Users who do not have permission to access the sealed and locked information will be unaware of its presence.

The circumstances where patient-identifiable sealed and locked information may be lawfully disclosed by the clinical team that has access to it, and the circumstances where patient-identifiable information that is simply ‘sealed’ can be accessed by those outside of the team that recorded it, without the patient’s consent, are essentially the same. They are limited to circumstances where the information is required by law or where a significant public interest justification exists (for example, serious crime, child protection etc).

The Government does not accept the second of the recommendations. Patient consent to the use of anonymised or effectively pseudonymised data is not required by law and the use of such data for secondary uses, including research, is both accepted and actively promoted by the relevant professional and regulatory bodies. The Committee received strong evidence on the need for health information to be made available for research from a number of organisations. The design of the Secondary Uses Service ensures that patient confidentiality is protected."

So they reject the notion sealed envelope - confidential - medical data be kept out of the 'secondary uses service' - the database that lots of folk from civil servants to researchers have access to. The claim that "the design of the Secondary Uses Service ensures that patient confidentiality is protected" is patently false, when the data going into the secondary uses service will be neither confidential nor anonymised.

Look out for Ross Anderson's analysis which will hopefully appear at Light Blue Touchpaper soon. Ross was special adviser to the committee.

Texas evoting anomalies

BlackBoxVoting are reporting that there were some anomalies with ES&S iVotronic e-voting machines in Texas last week.

"When Wharton County, Texas citizen Jim Welch voted last Tuesday, he watched in disbelief as the voting machine changed the vote he'd entered a few moments earlier.

The machine was an ES&S iVotronic touch-screen, the same model recently subjected to a blistering Dan Rather investigative report, but what Welch witnessed does not seem explainable as a manufacturing defect or screen calibration problem like those exposed in Rather's report...

"Vote-flipping" on touch-screens has been documented before. Manufacturers claim votes show up for a different choice than that chosen by the voter sometimes, explaining that this is due to miscalibration of the computer's touch-screen...

What Welch witnessed was votes that registered CORRECTLY when he touched the screen, switching later to a different vote choice, when he was almost finished voting the full page.

Welch was stunned to see a correctly marked vote take on a life of its own, hopping over to a different spot while he voted on other items. He called an elections worker over to show him the problem. The elections worker helped him re-vote the ballot, and both men watched as the vote registered correctly, but later spontaneously altered to shift to another ballot choice...

What Welch saw was not a screen calibration problem because it registered on the screen correctly. It was not "voter error" because he literally watched the vote re-write itself to another selection, not once, but twice.

The election worker called the Wharton County elections office. Welch was astute enough to see that the suggested solution was not responsive to the real issue:

"You may continue on with this ballot if you like," said the elections worker after conferring with Wharton County elections personnel, "Or I can void this and you can start over."

This is a machine that had already demonstrated it can't be trusted. This is a machine that would fail the much-touted "Logic & Accuracy" testing purported to prove voting machines don't cheat. This is a machine that would not have passed certification tests had it performed this way for the test labs. This is a machine that has no business counting votes at all...

Jim Welch spoke with Wharton County Clerk Judy Owens about the matter, and she provided answers that were even more unrelated to the problem:

"You can go back and check your vote before casting it," she pointed out, referring to the voter's ability to page back one by one to review each panel. But if the machine can alter a vote – especially if the timing is such that this happens after you have moved to a new page – what good will that do?

"We can print each vote out," she said, but Welch astutely questioned how and when votes can be printed, They aren't printed at the same time as the voter votes, and the printouts simply re-create what the computer program records, so what good is that? "

Fox fight for fair use of 'When you wish upon a star'

Here's an interesting case, Bourne v Fox. Bourne is the sole copyright owner of the song made famous by Disney, 'When You Wish Upon a Star.' Julie Hilden explains the background at Findlaw.

"Fox's animated show "The Family Guy" is being sued for copyright violation - for the second time this year.

The plaintiff in the suit is the company that owns the rights to the song "When You Wish Upon a Star." The suit, filed October 3, alleges that the song was combined with what some have suggested were anti-Semitic lyrics. (The full lyrics can be found on page 9 of the complaint, and they refer to Jews as having "killed my Lord.")

The song, entitled "I Need a Jew" was featured in an episode called "When You Wish Upon a Weinstein." (The "Weinstein" involved is a non-famous person named "Max," though the selection of last name may well have intentionally evoked brothers Bob and Harvey, founders of Miramax, whose father was also named Max.) In the episode, the reason the singer says he "needs a Jew" is to handle his money after a financial reversal - providing some basis for claims of anti-Semitic stereotyping, at least on the character's part...

If the use of "When You Wish Upon a Star" was a commentary on the original, it was a commentary only in the very loosest possibly sense. Indeed, I believe most viewers would not deem the episode to have been "about" the song "When You Wish Upon a Star"...

That's a problem for "The Family Guy," because a parody, to be a parody, has to have an object. Moreover, it has to have the right object: If the parody's object isn't the very material that is appropriated, then there is no need to use the "recognizable sight or sound" to evoke that material. Deciding to do a parody of one particular work doesn't give you an all-purpose "license to infringe" that you can use with respect to any work you want."

Squares very nicely with threatening selected Republican presidential candidates with copyright infringement suits doesn't it.

Monday, November 12, 2007

NHS Care Record data safety fears grow

From Pulse:

"Staff from across the NHS are accessing sensitive patient-identifiable data through the controversial Secondary Uses Service, Pulse has learnt.

The revelation has sparked fresh fears over the safety of data from Summary Care Re-cords, which will be linked to the SUS when they are rolled out across England next year.

New guidance from Connecting for Health reveals three users in every organisation within the NHS have been given access to patient-identifiable information contained within Commissioning Data Sets and Payment by Results data.

The guidance admits ‘this appears to be in total contradiction to the purpose of SUS’, which was supposed to protect patient data through pseudonymisation."

Senators want Justice Department to sue P2P pirates

The latest in a long line of attempts to get the Department of Justice to pick up the entertainment industry's legal bills has been presented to the Senate. The Intellectual Property Enforcement Act of 2007, has previously been passed by the Senate on three occasions but didn't manage to make it into law. It has been known in the past as the PIRATE act.

"A July 2002 letter from prominent politicians to U.S. Attorney General John Ashcroft urged the prosecution of Americans who "allow mass copying from their computer over peer-to-peer networks."

But the Justice Department has been less than eager to file criminal charges against people like Jammie Thomas, who recently was found liable for $222,000 in damages in a lawsuit brought by the RIAA. Federal prosecutors have indicated that they're hesitant to target peer-to-peer pirates with criminal charges for two reasons: Imprisoning file-swapping teens on felony charges isn't the department's top priority, and it's difficult to make criminal charges stick.

The relative ease of winning civil cases compared to criminal prosecutions is one big reason why the RIAA and MPAA adore the Pirate Act, called the Intellectual Property Enforcement Act in its latest incarnation. The burden of proof is lower, and a civil defendant has far fewer rights under the law.

There are two other benefits for copyright holders. It's cheaper for copyright holders because they don't have to take the the risk of hiring expensive lawyers to sue a defendant who's judgment-proof (and can't cough up a check if found liable). And judges and juries may be more likely to side with Justice Department prosecutors, who claim they're looking out for the public interest, than law firms employed by the for-profit companies comprising the RIAA."

The FBI would also get more money to act as the entertainment industries' dedicated police force. How kind.

Update: Meanwhile the Congress folks are busy with some proposed legislation which would force colleges to police networks for copyright infringement and pay for music subscription services.

"New federal legislation says universities must agree to provide not just deterrents but also "alternatives" to peer-to-peer piracy, such as paying monthly subscription fees to the music industry for their students, on penalty of losing all financial aid for their students.

The U.S. House of Representatives bill (PDF), which was introduced late Friday by top Democratic politicians, could give the movie and music industries a new revenue stream by pressuring schools into signing up for monthly subscription services such as Ruckus and Napster. Ruckus is advertising-supported, and Napster charges a monthly fee per student...

According to the bill, if universities did not agree to test "technology-based deterrents to prevent such illegal activity," all of their students--even ones who don't own a computer--would lose federal financial aid.

The prospect of losing a combined total of nearly $100 billion a year in federal financial aid, coupled with the possibility of overzealous copyright-bots limiting the sharing of legitimate content, has alarmed university officials."

2 students face 20 years in jail for hacking to change grades

Two California students are reportedly potentially facing a 20-year jail term for hacking into their university's system to change their grades.

Sunday, November 11, 2007

On being in bed with Google - in defence of the Google library project

The Google library project has come in for some criticism in relation to the restrictions (and non-disclosure agreements) Google placed on the libraries taking part. Paul Courant Library chief at the University of Michigan, which is involved in the Google library project, says the critics have got it wrong.

One of the things that surprises me most about reactions to the Google Library Project is that smart people whom I respect seem to think that the only reason that a university library would be involved with Google is because, in some combination, its leadership is stupid, evil, or at best intellectually lazy. To the contrary, although I may be proved wrong, I believe that the University of Michigan (and the other partner libraries) and Google are changing the world for the better. Four years from now, all seven million volumes in the University of Michigan Libraries will have been digitized – the largest such library digitization project in history. Google Book Search and our own MBooks collection already provide full-text access to well over a hundred thousand public domain works, and make it possible to search for keywords and phrases within hundreds of thousands more in-copyright materials. This access is altering the way that we do research. At least as important, the project is itself an experiment in the provision and use of digitized print collections in large research libraries. I do not see how we can discover the best ways to use such collections without experiments at this scale. In sum, I believe that our library is doing exactly what it should do in the best interests of scholarship and our users, now and in the future.

So I’m puzzled when people ask, “How could serious libraries be doing this? How could they abdicate their responsibilities as custodians of the world’s knowledge by offering their collections up as a sacrifice on the altar of corporate power? Why don’t they join the virtuous ranks of the Open Content Alliance partners, who pay thousands of dollars to digitize books at a rate of tens of thousands of volumes a year?” It seems like those who ask such questions have little appreciation of what Michigan and the other Google partners are actually up to.

Google is on pace to scan over 7 million volumes from U-M libraries in six years at no cost to the University. As part of our arrangement with Google, they give us copies of all the digital files, and we can keep them forever. Our only financial outlay is for storage and the cost of providing library services to our users. Anyone who searches U-M’s library catalog, Mirlyn, can access the scanned files via our MBooks interface. That’s right, anyone. (Copyright law constrains what we can display in full text, and what we can offer only for searching, but we share as much as we can consistent with prudent interpretations of the law.) For an example of an MBook, take a look at The Acquisitive Society by R. H. Tawney.

In a recent New York Times article about mass digitization projects, Brewster Kahle was quoted as saying: “Scanning the great libraries is a wonderful idea, but if only one corporation controls access to this digital collection, we’ll have handed too much control to a private entity.”

I agree with him. I’m an economist with a particular interest in public goods, which is how I came to be involved with libraries in the first place. Libraries have a long and honorable history of preserving information and making it accessible. Moreover, even at their best, for-profit institutions cannot be expected to serve general public interests when those interests run counter to those of their shareholders. So I would be distressed if a single corporation controlled access to the collections of the great academic libraries, just as I find it troubling, on a smaller scale, that a handful of publishers control access to much of the current scientific literature.

But Google has no such control. After Google scans a book, they return the book to the library (like any other user), and they give us a copy of the digital file. Google is not the only entity controlling access to the collection – the University of Michigan and other partner libraries control access as well. Except we don’t think of it as controlling access so much as providing it."

Siva Vaidhyanathan has a response, which also includes a follow up response from Courant in the comments. Vaidhyanathan says:

"Sadly, Paul does not actually address the real-world consequences of the Google project:

• He dismisses serious search problems as temporary, yet fails to confront the problem that Google cannot and will not explain the factors and standards that put one book above another in search results.

• As users discover poorly-scanned files on the Google index, how can they alert Google to the problem? Why does nothing in the contract between Michigan and Google include quality-control standards or methods?

• How do we know this index will last for decades? What image file system is Google using and what ensures its preservation?

• How is the "library copy," that electronic file that Michigan and others receive as payment for allowing Google to exploit their treasures, NOT an audacious infringement of copyright? It violates both the copyright holder's right to copy and right to distribute. Doesn't a university library have an obligation to explain this?

• What about user confidentiality? Why have university failed to make a stand on this issue?

I look forward to responses from Paul and others. I have been waiting two years for them, of course. And all I get is the silence created by non-disclosure agreements.

BTW, should public university librarians be signing non-disclosure agreements about their core services?"

Courant is again robust in responding and it is good to see the debate getting a public airing. Thanks to Peter Suber for the link.

Give us the data and forget the shiny interface

Rufus Pollock has a crucially important point to make about data projects i.e. forget the shiny interface as it mainly gets in the way.

"One thing I find remarkable about many data projects is how much effort goes into developing a shiny front-end for the material. Now I’m not knocking shiny front-ends, they’re important for providing a way for many users to get at the material (and very useful for demonstrating to funders where all the money went). But shiny front ends (SFEs from now on) do have various drawbacks:
  • They often take over completely and start acting as a restriction on the way you can get data out of the system. (A classic example of this is the Millenium Development Goals website which has lots of shiny ajax which actually make it really hard to grab all of the data out of the system — please, please just give me a plain old csv file and a plain old url).
  • Even if the SFE doesn’t actually get in the way, they do take money away from the central job of getting the data out there in a simple form, and …
  • They tend to date rapidly. Think what a website designed five years ago looks like today (hello css). Then think about what will happen to that nifty ajax+css work you’ve just done. By contrast ascii text, csv files and plain old sql dumps (at least if done with some respect for the ascii standard) don’t date — they remain forever in style.
  • They reflect an interface centric, rather than data centric, point of view. This is wrong. Many interfaces can be written to that data (and not just a web one) and it is likely (if not certain) that a better interface will be written by someone else (albeit perhaps with some delay). Furthermore the data can be used for many other purposes than read-only access. To summarize: The data is primary, the interface secondary.
  • Taking this issue further, for many projects, because the interface is taken as primary, the data does not get released until the interface has been developed. This can cause significant delay in getting access to that data.

When such points are made people often reply: “But you don’t want the data raw, in all its complexity. We need to clean it up and present it for you.” To which we should reply:

“No, we want the data raw, and we want the data now”"

I couldn't agree more, particularly in the context of open access content projects, and this message should be embedded in the dna of the folks who run or facilitate such projects.