Tuesday, July 15, 2008

Google and Viacom deal on anonomising YouTube data

YouTube blog and the Guardian are suggesting that an agreement has been reached about anonymising the data from the YouTube logging database that a court ordered Google to hand over to Viacom. From the Guardian:

"Google has struck a deal to protect the personal data of millions of YouTube users in the $1bn (£497m) copyright court case brought against the video-sharing website by Viacom.

Under the deal, Google will make user information and internet protocol addresses from its YouTube subsidiary anonymous before handing over the data to Viacom in the US legal case."

YouTube blog (Google) says:

"As we let you know on July 4, YouTube received a court order to produce viewing history data. We are pleased to report that Viacom, MTV and other litigants have backed off their original demand for all users' viewing histories and we will not be providing that information. (Read the official legalese here.)

In addition, Viacom and the plaintiffs had originally demanded access to users' private videos, our search technology, and our video identification technology. Our lawyers strongly opposed each of those demands and the court sided with us.

We'll keep you informed of any important developments in this lawsuit. We remain committed to protecting your privacy and we'll continue to fight for your right to share and broadcast your work on YouTube. "

As with all these things the devil will be in the detail. The key provision in the agreement is:

"1. Substituted Values: When producing data from the Logging Database
pursuant to the Order, Defendants shall substitute values while preserving uniqueness for
entries in the following fields: User ID, IP Address and Visitor ID. The parties shall
agree as promptly as feasible on a specific protocol to govern this substitution whereby
each unique value contained in these fields shall be assigned a correlative unique
substituted value, and preexisting interdependencies shall be retained in the version of the
data produced. Defendants shall promptly (no later than 7 business days after execution
of this Stipulation) provide a proposed protocol for this substitution. Defendants agree to
reasonably consult with Plaintiffs’ consultant if necessary to reach agreement on the
protocol."

And the key wording in that provision is "each unique value contained in these fields shall be assigned a correlative unique substituted value, and preexisting interdependencies shall be retained in the version of the data produced" i.e. they can work back to identifying people at a later date when the dust has settled on the negative publicity. Google are also going to have to be very careful with the protocol they use for amending the entries.

The upshot is that the data will not be truly anonymised but Viacom and Google will probably each claw back some points in the PR stakes (for effort), Viacom in particular having been caught off guard by the negative public reaction to their courtroom success.

Viacom's page on the litigation is here.

No comments: