This fourth article of our cloud computing and privacy series (links to our previous articles below) addresses the topic of anonymisation and pseudonymisation of data in the cloud computing context.
The issue related to rendering data anonymous or pseudonymous has been a hot topic in the past few years and in particular with the emergence of new phenomena such as big data, the Internet of things (IoT) or cloud computing. Indeed, they all require at some point taking into account issues relating to privacy and the processing of personal data. It is therefore only natural that there has been a growing interest in techniques that would allow eliminating or at least mitigating the risks related to the processing of such data.
In this article we examine to what extent anonymisation, and in some cases even pseudonymisation, could play a role in a cloud computing environment. Thereto, in the following sections we examine how anonymisation and pseudonymisation techniques are acknowledged in the EU and in certain key Member States(1).
Anonymisation: a new concept in the EU?
In spite of the relatively recent interest in the issues related to anonymisation, the EU Data Protection Directive (95/46/EC) already addressed the question in 1995, putting forth the following logic under Recital 26:
- The principles of data protection must apply to any information concerning an identified or identifiable person;
- To determine whether a person is identifiable, account should be taken of all the means likely reasonably to be used either by the controller or by any other person to identify the said person;
- The principles of protection shall not apply to data rendered anonymous in such a way that the data subject is no longer identifiable.
Although the above provides an interesting basis, it is not sufficient to understand precisely what encompasses the notion of 'anonymisation' and the related concept of 'pseudonymisation'.
Anonymisation and pseudonymisation: blurred concepts?
Anonymisation is a process by which information is manipulated (concealed or hidden) to make it difficult to identify data subjects(2). This can be done either by deleting or omitting "identifying details" or by aggregating information(3). Pseudonymisation, on the other hand, involves replacing names or other direct identifiers with codes or numbers(4).
One possible technique of pseudonymisation is encryption. Encryption is the process of changing a plain text into unintelligible code(5). The use of encryption has been tipped as essential for the wider adoption of cloud computing services(6). In that respect, it has been argued that as far as the encryption is effective, which requires a strong encryption algorithm and a strong encryption key that is kept secure, the data may not be considered personal in the hands of the cloud service provider ("CSP"). Indeed, the CSP may not know that encrypted data is stored on its (or its sub-provider's) infrastructure, considering it is uploaded by the customer in self-service fashion(7). Nevertheless, in its 'Opinion 05/2014 on anonymisation techniques onto the web', the Article 29 Working Party (the "Working Party") takes a stricter approach (see below).
The Working Party considers that equating pseudonymisation to anonymisation is one of the misconceptions among many data controllers. This is because pseudonymised data still allows an individual data subject to be singled out and linkable across different data sets. Therefore, in most instances it can be concluded that pseudonymised data remains subject to the data protection rules(9). Accordingly, all privacy and data protection principles fully apply.
Anonymisation and pseudonymisation: guidance and recognition
Recital 26 of the Data Protection Directive specifies that codes of conduct may be a useful instrument for providing guidance as to the ways in which data may be rendered anonymous and retained in a form in which identification of the data subject is no longer possible. Our study shows that almost two decades were necessary to see the emergence of comprehensive opinions and/or decisions at EU and national levels on the topic of anonymisation.
At EU level, the Working Party adopted on 10 April 2014 'Opinion 05/2014 on anonymisation techniques onto the web' already mentioned above (link), which analyses in depth the effectiveness but also the limits of anonymisation techniques.
After underlining the legal background(10), the Working Party concludes that the "underlying rationale is that the outcome of anonymisation as a technique applied to personal data should be, in the current state of technology, as permanent as erasure, i.e. making it impossible to process personal data".
Moreover, Opinion 05/2014 highlights four key features:
- Anonymisation can be a result of processing personal data with the aim of irreversibly preventing identification of the data subject;
- Several anonymisation techniques may be envisaged, there is no prescriptive standard in EU legislation;
- Importance should be attached to contextual elements; and
- A risk factor is inherent to anonymisation.
Since Opinion 05/2014 underlines that "anonymisation constitutes a further processing of personal data", the process of anonymisation must comply with the test of compatibility with the original purpose. According to the Working Party, for the anonymisation to be considered as compatible with the original purpose of the processing, the anonymisation process should produce reliably anonymised information.
However, addressing the anonymisation process as compatible or incompatible with the original purpose might not represent a sound approach. This is because, for example, anonymisation could be used to comply with Article 6(1)(e) of the Data Protection Directive, which requires that information should be kept for no longer than is necessary for the purposes for which the data were collected or for which they are further processed in a form that permits identification. In this sense, anonymisation might constitute a compulsory processing activity that enables one to comply with its data protection duties(11).
According to the Working Party, once data is truly anonymised and individuals are no longer identifiable, EU data protection rules no longer apply. However, some commentators have been critical of such proposition on the basis that the Working Party applies an absolute definition of acceptable risk in the form of zero risk(12). First, the Data Protection Directive itself does not require a zero risk approach. Second, if the acceptable risk threshold is zero for any potential recipient of the data, there is no existing technique that can achieve the required degree of anonymisation. This would imply that anonymisation, or for that matter, the sharing of data would only be possible on the basis of the legitimate grounds (including consent) listed in Article 7 of the Data Protection Directive(11). This might encourage the processing of data in identifiable form, which in fact presents higher risks.
Therefore, the notion of identifiability should be approached in light of the "all means likely reasonably" test provided in recital 26 of the Data Protection Directive. In other words, the question rests on whether identification has become "reasonably" impossible. This would be measured mainly in terms of both time and resources required to identify the individual(12). Accordingly, if it is not reasonably possible, given the time; expense; technology; and, labour required, to associate the data to a particular individual, then the data would remain non-personal. Another factor that needs to be considered is whether there is any kind of data in the hands of the controller or any other person that could be used to identify the individual. For example, if a data controller keeps the original (identifiable) data, and hands over part of this dataset by removing or masking the identifiable data to another party; the resulting dataset will still constitute personal data(8).
In the third and substantial section of Opinion 05/2014, the Working Party examines the various anonymisation practices and techniques, elaborating on the robustness of each technique based on three cumulative questions:
- Is it still possible to single out an individual?
- Is it still possible to link records relating to an individual?
- Can information be inferred concerning an individual?
According to the Working Party, "knowing the main strengths and weaknesses of each technique helps to choose how to design an adequate anonymisation process in a given context".
Opinion 05/2014 provides some conclusions and recommendations. In a nutshell it indicates that "anonymisation techniques can provide privacy guarantees, but only if their application is engineered appropriately". Indeed, according to the Working Party, some techniques show inherent limitations and each technique examined fails to meet with certainty the criteria of effective anonymisation in light of the three questions above. Consequently, a case-by-case approach should be favoured in order to determine the optimal solution, always in combination with a risk analysis. Overall, the Working Party seems to imply that a true anonymisation might not be achievable in a world of "open" datasets; indicating that given the current state of technology and the increase in computational power and tools available, identification is easily attainable(13). Such an approach will significantly affect the widespread use of cloud services.
In conclusion, Opinion 05/2014 provides an important clarification of the status of anonymisation techniques. However, it does not seem to encourage businesses to use anonymisation and pseudonymisation when processing personal data. Furthermore, the Opinion does not provide any guidance to be followed by data controllers or data processors in the anonymisation of their data(11). As the Working Party has indicated, combinations of different anonymisation techniques could be used to reach the required level of anonymisation, in which case the Data Protection Directive does not apply. A further consideration could be to mitigate some obligations with respect to the use of a specific anonymisation technique if certain risks no longer exist(14). This kind of approach moves away from the "all or nothing approach" regarding personal data, making room for "more or less personal" data and accordingly "more or less protection"(15). This would not only encourage the wider use of such techniques but could also lead to the wider adoption of cloud computing services.
National recognition of anonymisation and pseudonymisation techniques
Our study shows that half of the Key Member States have not issued any guidance or do not provide any case-law covering the issues of anonymisation. The situation is however different for the other half. While three of them only have administrative decisions, our study demonstrates that the authorities in France(16) and the United Kingdom provide specific guidance, the "ICO Code of Practice on Anonymisation" in the United Kingdom being the most substantial instrument (examined below).
In Italy, there is no specific guidance or similar document focused on data anonymisation or pseudonymisation techniques. Nevertheless, some references to such kind of measure can be picked up in some resolutions or decisions of the Italian data protection authority ("DPA") on specific matters, including:
- Code of conduct and professional practice applying to processing of personal data for statistical and scientific purposes dated 16 June 2004 (Annex A.4 to the Italian Data Protection Code(17) (link)) specifying the criteria that render information capable of identifying an individual.
- Decision of 16 January 2014 related to the "Processing of personal data contained in the Italian Registry of Dialysis and Transplantation"(18) and the decision of 10 April 2014 related to the "Processing of health data collected by diagnostic equipment"(19) in which the Italian DPA suggested examples of solutions that make data "anonymous"(20).
More generic decisions on this matter are:
- Decision of 4 April 2013 related to the "Implementing Measures with Regard to the Notification of Personal Data Breaches", describing the criteria according to which data could be considered unintelligible from the Italian DPA's point of view (link).
- Decision of 10 July 2014 relating to Google, holding that information stored in so-called back-up systems "must be protected against unauthorised access by means of suitable encryption techniques or, where necessary, by anonymising the data in question", specifying that such provision is in line with the principles set forth by the Working Party Opinion 05/2014 (link).
Also in Spain, there are no practice guides on the subject of anonymisation. However, the Spanish DPA has established its criteria through resolutions.
Article 5 of the Spanish Data Protection Regulation defines the dissociation procedure as "any data processing allowing dissociated data to be obtained". Data can either be anonymous from the outset or may be associated with personal data and then be anonymised through the use of a dissociation process which, reversibly or irreversibly, destroys the link with personal data.
In this sense and following the criteria that the Spanish DPA is currently following in its resolutions, the following elements must be taken into account on a case-by-case basis in order to determine if the process would be reversible, hence, if the data is effectively anonymised data or not: (i) reasonable means to identify a person; (ii) time; (iii) costs; and (iv) disproportionate endeavour.
Moreover, as per the Spanish DPA report no. 119/2006, anonymisation may be achieved by using an identifiable characteristic that would allow the processor to classify the data but not to link it to a data subject. However, it shall be noted that the Spanish DPA has determined that when a company is named after a physical person, said name shall not be considered as personal data. Hence, it will not fall within the scope of anonymisation.
In the United Kingdom, the Information Commissioner's Office ("ICO") published in November 2012 a Code of Practice on managing the risks related to anonymisation (link). The Code explains how to balance the privacy rights of individuals while providing rich sources of data.
The Code contains a framework enabling practitioners to assess the risks of anonymisation related to data protection and identification of individuals. It also includes examples of how successful anonymisation can be achieved, such as how personal data can be anonymised for medical research when responding to Freedom of Information requests, and how customer data can be anonymised to help market researchers analyse purchasing habits. It contains less technical detail than the Working Party Opinion and also takes a different view on when data will constitute personal data. The view in the UK is that where information is anonymous in the hands of the recipients, it will not be considered to be personal data in the hands of those recipients, even if the original controller retains the ability to re-identity that data.
The ICO also announced that a consortium led by the University of Manchester, along with the University of Southampton, the Office for National Statistics and the government’s new Open Data Institute, will run a new UK Anonymisation Network ("UKAN"). The UKAN will enable the sharing of good practice related to anonymisation, across the public and private sector.
The information given in this document concerning technical, legal or professional subject matter is for guidance only and does not constitute legal or professional advice.
This series of articles has been made possible thanks to the CoCo Cloud project (www.coco-cloud.eu) funded under the European Union’s Seventh Framework Programme, and of which Bird & Bird LLP is a partner. Said project aims to establish a platform allowing cloud users to securely and privately share their data in the cloud.
We would also like to thank the Norwegian Research Centre for Computers and Law of the University of Oslo for their valuable input.
Our next article will address the topic of security breach notification.
Read our first article entitled "Cloud computing and privacy series: the general legal framework (part 1 of 6)".
Read our second article entitled "Cloud computing and privacy series: the data protection legal framework (part 2 of 6)".
Read our third article entitled "Cloud computing and privacy series: security requirements and guidance (part 3 of 6)".
(1) Our study examined the particularities of national laws on key specific issues in ten selected EU Member States, i.e., Belgium, Czech Republic, Denmark, Finland, France, Germany, Italy, Poland, Spain and the United Kingdom ("Key Member States").
(2) Paul Ohm, 'Broken Promises of Privacy: responding to the surprising failure of anonymisation', UCLA Review 57, 2009, 1707. (link)
(3) Hon W Kuan, Christopher Millard and Ian Walden, 'The Problem of Personal Data' in Cloud Computing – What Information is Regulated? The Cloud of Unknowing, International Data Privacy Law (2011) 1(4), 211-214.
(4) Article 29 Working Party, 'Opinion 4/2007 on the concept of personal data', adopted on 20 June 2007 (WP136), 18.
(5) Aaron Perkins, 'Encryption Use: Law and Anarchy on the Digital Frontier [comments]' Houston Law Review. Vol.41.No.5. (2005) 1628.
(6) However, the fuller exhaustion of the technology is hindered by the legal restrictions on the import, export and use of encryption in different jurisdictions. See Christopher Kuner, 'Legal Aspects of Encryption on the Internet' (1996) International Business Lawyer 24, 186.
(7) W Kuan Hon, Eleni Kosta, Christopher Millard and Dimitra Stefanatou, 'Cloud Accountability: The Likely Impact of the Proposed EU Data Protection Regulation', Tilburg Law School Legal Studies Research Paper Series No. 07/2014, 10. (link)
(8) See Article 29 Working Party, 'Opinion 05/2014 on Anonymisation Techniques onto the web', adopted on 10 April 2014 (WP216).
(9) See Article 29 Working Party, 'Opinion 05/2014 on Anonymisation Techniques onto the web', adopted on 10 April 2014 (WP216), 10. See also at 29 noting that pseudonymisation reduces the linkability of a dataset with the original identity of a data subject; as such, it is considered a useful security measure but not a method of anonymisation.
(10) Directive 95/46/EC but also the ePrivacy Directive 2002/58/EC.
(11) Khaled El Emam and Cecilia Alvarez, 'A Critical Appraisal of the Article 29 Working Party Opinion 05/2014 on Data Anonymisation Techniques', Draft paper for a web conference, 3-9.
(12) Draft Regulation, Recital 23 states that "to ascertain whether means are likely reasonably to be used to identify the individual, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration both available technology at the time of the processing and technological development".
(13) It is clear from case studies and research publications that the creation of a truly anonymous dataset from a rich set of personal data, whilst retaining as much of the underlying information as required for the task, is not a simple proposition. For example, a dataset considered to be anonymous may be combined with another dataset in such a way that one or more individuals can be identified.
(14) For example, the European Commission has held, in its Frequently Asked Questions (FAQs), that the transfer of key-coded data outside the EU without transferring or revealing the key does not involve transfer of personal data to a third country; W Hon Kuan, Christopher Millard and Ian Walden, 'The Problem of ‘Personal Data' in Cloud Computing – What Information is Regulated? The Cloud of Unknowing, International Data Privacy Law (2011) 1(4), 216.
(15) Neil Robinson, Hans Graux, Maarten Botterman, and Lorenzo Valeri, ‘Review of the European Data Protection Directive’ (2009) RAND Europe technical report, 26-27 (link).
(16) In France, the issues relating to the anonymisation and pseudonymisation of data are only referred to within the guides already mentioned in this Study, and in a more general perspective, within the Guide on the Security of Personal Data which comprises a dedicated factsheet (Factsheet n°16) on anonymisation (link).
(17) Similar provisions were already contained in a former version of the Code (Code of conduct and professional practice applying to the processing of personal data for statistical and scientific research purposes within the framework of the national statistical system) dated 2002. (link)
(18) Doc.web no. 2937031, available only in Italian. (link)
(19) Doc.web no. 3152119, available only in Italian. (link)
(20) E.g., the prediction of discrete values of the attributes in place of continuous values, as ranges instead of point values, i.e., the introduction where possible of binary values, such as true / false, instead of multi-valued attributes, etc., that ensure the extrapolation of only records whose combinations of attribute values are reported to a number of data subjects greater than or equal to three units).