Monday, November 05, 2012

Mr Gove touting access to National Pupil Database

The Department for Education is holding a "Consultation on proposed amendments to individual pupil information prescribed persons regulations" a title, should it come to any notables' attention, likely to provoke a collective yawn.

Reading on down the consultation page, though, it is explained that this is
"A consultation on proposals to amend regulations to enable the Department for Education to share extracts of data held in the National Pupil Database for a wider range of purposes than currently possible.
The aim is to maximise the value of this rich dataset."
Seriously?

The current government really want to provide corporate and wider access to intimate details of school children's files? The NPD holds up to 400 variables on over half a million children including names, addresses, 'looked after status', 'in need status', birth dates, gender, ethnicity, first language, eligibility for free school meals, information about special educational needs (SEN), exam results, attendance, reasons for absence and exclusions. 

Here in full is what the Secretary of State for Education, Michael Gove told MPs this week:
"I am today launching a public consultation on proposals to amend the Education (Individual Pupil Information) (Prescribed Persons) (England) Regulations 2009 to enable the Department for Education to share extracts of data held in the National Pupil Database for a wider range of purposes than currently possible in order to maximise the value of this rich dataset.
The National Pupil Database holds one of the richest educational datasets in the world and forms a significant part of the education evidence base. It is a longitudinal database which holds information on children in schools in England. This includes pupil level data relating to school attended, teacher assessments, test and exam results by subject, prior attainment, progression and pupil characteristics.
We have already significantly expanded the content of school performance tables for primary and secondary schools and were commended in the National Audit Office report “Implementing Transparency” (April 2012) for opening up access to our data. Recently, we have also improved the application arrangements for requesting access to data from the National Pupil Database under our existing regulations for those who need pupil level data for research purposes.
However, we are aware that the existing Prescribed Persons Regulations may prevent some potentially beneficial uses of the data by third-party organisations, as use is currently restricted to “research into educational achievement”. For example, we have had to reject requests to use the data for analysis on sexual exploitation, the impact on the environment of school transport, and demographic modelling, all of which seem to be legitimate and fruitful areas for further research.
We want to give organisations greater freedom to use extracts of the data for wider purposes, while still ensuring its confidentiality and security. Existing arrangements for access to the data would apply to all future requests: all requests to access extracts of data would go through a robust approval process and successful organisations would be subject to strict terms and conditions covering their handling and use of the data, including having appropriate security arrangements in place. Organisations granted access would need to comply with the Data Protection Act, and any reports, statistical tables, or other products published or released, would need to fully protect the identity of individuals.
Amending these regulations should encourage more organisations to use the data for wider research, such as socio-economic analysis, or research into equality issues, including disability, gender or race. It could also help stimulate the market for innovative tools and services which present anonymised versions of the data.
If, having listened to the views expressed in the public consultation and subject to the will of the House, I decide to proceed with the proposed amendments, I expect the revised regulations to come into force in spring 2013.
The public consultation on this proposal will commence today and run for six weeks. A consultation document containing full details of this proposal and how interested parties can respond to the consultation will be published on the Department for Education website. Copies of that document will also be placed in the House Libraries."
Let's just pick out a couple of points from this.

Mr Gove wants "to give organisations greater freedom to use extracts of the data for wider purposes, while still ensuring its confidentiality and security." That will be a neat trick. I wonder if the Secretary of State has ever heard of diametrically opposed, mutually exclusive goals? Well whether he has or not he's found some here and he shouldn't need a Ross Anderson or a Bruce Schneier to explain why.

Next he says "Organisations granted access would need to comply with the Data Protection Act, and any reports, statistical tables, or other products published or released, would need to fully protect the identity of individuals... It could also help stimulate the market for innovative tools and services which present anonymised versions of the data."

Note that not even minimal effort is to be made by government to anonymise the data (imperfect though those methods undoubtedly are - anonymisation is really difficult) in advance of release. It is the organisations given access to the data who will have to pay lip service to "fully protecting [sic] the identity of individuals". And the private sector can beta test "tools and services which present anonymised versions of the data" on real live kids' personal details.

Just for starters, the whole thing breaches Kim Cameron's first three laws of identity.
1. User control and consent - technical identity systems must only reveal information identifying a user with the user's consent.
2. Minimum disclosure for constrained use - the solution that discloses the least amount of identifying information and best limits its use is the most stable long-term solution.
3. Justifiable parties - the information will be in control of or at least accessible by parties who have no right to it.
But then current government practice on facilitating access to this database already does that. The difference with this new proposal is one of scale and that's almost impossible to explain to a politician. If anyone has any bright ideas on breaking through this cognitive fog in a way that the average government minister would understand, answers on a postcard please or preferably in the comments below...

When Privacy International warned in the summer that the Department for Education sponsored an "appathon", allowing attendees access to the National Pupil Database I wasn't really too concerned despite their reasonable questions at the time:
"1) What data access arrangements will attendees be provided with?
2) What legal commitments will attendees be required to make?
3) What data protection/management guidance will be given to attendees?
3b) Given the use of an API, will synthetic data be provided for testing/debugging/public exhibition?
4) Who gave permission in the first instance and can we see the letter of agreement?"
Computer scientists have been warning, for decades, of the practical problems of securing valuable databases.  European and (to a lesser degree) the US courts have shown an inclination to step in to correct calculated or negligent mismanagement of or unauthorised access to such systems. The degree of such intervention by the courts would suggest they would look unkindly on untrammelled access to the NPD of the kind that Mr Gove appears to favour. But by the time that happens the damage would already be done.

Look, the enthusiasm for opening up government datasets is encouraging, when that doesn't compromise personal privacy. But transparency is not automatically always the right thing in all circumstances.

The problem of striking a balance between protecting privacy and facilitating empirical research/use of valuable datasets, like medical records or the NPD, in the public interest, is potentially one of the defining political issues of the 21st century. The technical and legal problems are also almost impossibly challenging, as FIPR, Paul Ohm and others have illustrated.

The NPD, however, is not the sandpit to be experimenting with all this.

No comments: