Big Data & Issues & Opportunities: Privacy and Data Protection

In this second article of our "Big Data & Issues & Opportunities" series (see our first article here), we focus on some of the privacy and data protection aspects in a big data context. Where relevant, illustrations from the transport sector will be provided.

The analysis of privacy and data protection aspects in a big data context can be relatively complex from a legal perspective. Indeed, certain principles and requirements can be difficult to fit with some of the main characteristics of big data analytics, as will be demonstrated in this article. In this respect, it is important to note that “the process of aggregation implies that data is often combined from many different sources and that it is used and/or shared by many actors and for a wide range of purposes.”[1] This multitude of sources, actors and purposes cannot always be reconciled with the legal requirements related to data protection and security. Despite the intricacies of the legal analysis, it is still important to carefully examine how the legal requirements can be implemented in practice.

The legal assessment requires taking into consideration the newly adopted EU legal framework, and notably the new General Data Protection Regulation (hereinafter the "GDPR"), which became applicable on 25 May 2018, introducing a raft of changes to the existing data protection regime in the EU. While some of the data protection principles, obligations and rights pre-existed, some of them have been enhanced and others newly created by the GDPR.

In the remainder of this article, we will not delve into all rights and obligations included in the GDPR. We will however examine some of the core principles and concepts put forward by the GDPR that many actors active in the field of big data analytics at European level will be confronted with, and how these may be difficult to reconcile with disruptive technologies.

Privacy and Data Protection in a Big Data Context: Challenges & Opportunities

This section dedicated to the analysis of some of the relevant challenges and opportunities related to privacy and data protection intends to show some of the intricacies that some concepts, principles and obligations may cause in relation to a disruptive technology such as big data.

The main findings, categorised by different topics, may be summarised as follows:

The concepts of "personal data" and "processing"

The GDPR applies to the "processing"[2]  of "personal data"[3]. As these definitions and the interpretation thereof are very broad, numerous obligations under the GDPR will apply in many circumstances when performing big data analytics.

Moreover, in the context of big data, it cannot be excluded that the data analysis concerns "sensitive data"[4] – the processing of which is restricted and prohibited in most cases – or that it will have a “transformational impact” on data. For instance, the processing of non-sensitive personal data could lead – through data mining, for instance – to the generation of data that reveals sensitive information about an individual.[5]

The broad scope of application of the GDPR and the possible processing of sensitive data may require limiting certain processing activities or technical developments to tackle the stringent rules included in the GDPR.

        Illustration in the transport sector: The Article 29 Working Party observed in its Opinion 3/2017 on Cooperative Intelligent Transport Systems (hereinafter "C-ITS")[6] that personal data processed through such systems may also include special categories of data as defined in Article 10 of the GDPR. More specifically, it finds that sensitive data may be collected through and broadcasted to other vehicles, such as criminal data in the form of speeding data or signal violations. It notably concludes that "as a consequence [such C-ITS] applications should be modified to prevent collection and broadcast of any information that might fall under Article 10".

Various actors, roles and responsibilities

In case personal data is being processed (as it is the case in data analytics), it is important to examine the concrete situation so as to determine precisely the exact role played by the different actors involved in such processing. The various concepts enshrined under EU data protection law and in particular the difference between “data controller” and “data processor”, as well as their interaction, is of paramount importance in order to determine the responsibilities. In the same vein, such concepts are also essential in order to determine the territorial application of data protection law and the competence of the supervisory authorities.

The qualification of actors and the distinction between “controller” and “processor” can quickly become complex in a big data context. This is especially true taking into account additional data protection roles such as joint-controllership, controllers in common, and sub-processors. This is mainly due to the fact that many actors may be involved in the data value chain, the mapping of which can be rather burdensome.

Hence, additional guidance and template agreements, compliant with the strict requirements of the GDPR, are more than welcome to clarify the relationships in the big data value cycle.

Data protection principles

The GDPR outlines six data protection principles one must comply with when processing personal data[7], most of which are being challenged by some key features of big data.

  • The principle of "lawfulness" implies each processing of personal data should be based on a legal ground (see next section).

  • The principle of “fairness and transparency” means that the controller must provide information to individuals about its processing of their data, unless the individual already has this information. The transparency principle in a big data context – where the complexity of the analytics renders the processing opaque – can become particularly challenging and implies that “individuals must be given clear information on what data is processed, including data observed or inferred about them; better informed on how and for what purposes their information is used, including the logic used in algorithms to determine assumptions and predictions about them.”[8]
      Illustration in the transport sector: In its guidelines on automated individual decision-making and profiling adopted on 3 October 2017, the Article 29 Working Party takes the example of car insurances to illustrate the possible issues of fair, lawful and transparent processing of personal data in the transport sector.[9] It indicates that some insurers offer insurance rates and services based on an individual’s driving behaviour. The data collected would then be used for profiling to identify bad driving behaviour (such as fast acceleration, sudden braking, and speeding). The Article 29 Working Party concludes that in such cases, controllers must ensure that they have a lawful basis for this type of processing. They must also provide the data subject with information about the collected data, the existence of automated decision-making, the logic involved, and the significance and envisaged consequences of such processing.
  • The principle of "purpose limitation" requires personal data to be collected and processed for specified, explicit and legitimate purposes. Foremost, this requires any processing of personal data to have a clearly defined purpose in order to be permitted. This may be particularly difficult in a big data context because “at the time personal data is collected, it may still be unclear for what purpose it will later be used. However, the blunt statement that the data is collected for (any possible) big data analytics is not a sufficiently specified purpose.”[10]

  • The principle of "data minimisation” provides that personal data must be adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed. It is clear that the concepts of “data minimisation” and big data are at first sight antonymic. Indeed, “the perceived opportunities in big data provide incentives to collect as much data as possible and to retain this data as long as possible for yet unidentified future purposes.”[11]

  • Furthermore, personal data must be "accurate" and, where necessary, kept up-to-date. Similarly to others, the accuracy principle is being challenged by some key features of big data. Indeed, “big data applications typically tend to collect data from diverse sources, and without careful verification of the relevance or accuracy of the data thus collected.”[12]

  • The principle of "storage limitation" requires personal data to be kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed. The GDPR does not specify the exact data retention periods given that these are necessarily context-specific. Big data analytics is a good illustration of the possibilities of processing personal data for a longer period and the difficulties that may arise in relation to the storage limitation principle. For instance, the principle may undermine the ability of being predictive, which is one of the opportunities rendered possible by big data analytics. Indeed, if big data analytics is allowing predictability, it is precisely because algorithms can compare current data with stored past data to determine what is going to happen in the future.

It follows from the above that the core data protection principles are, for the most part, in contradiction with some of the key features of big data analytics, and thus difficult to reconcile. Nevertheless, rethinking some processing activities but also IT developments may help complying with such principles, notably by having well-managed, up-to-date and relevant data. Ultimately, this may also improve data quality and thus contribute to the analytics.

Legal grounds to process personal data

In case the GDPR applies, any processing of personal data must be based on one of the grounds listed in Article 6(1) of the GDPR. In other words, in order for a processing activity to be lawful, from the outset and throughout the activity, it must always be based on one of the six grounds exhaustively listed in the GDPR.[13] Only four of them, however, seem to be able to be applied in a big data context.

  • Consent: While "consent" is the first ground that can permit the processing of personal data, it can quickly become a difficult concept to comply with in light of its definition and the many conditions that must be met. More precisely, consent under the GDPR must be freely given, specific, informed and unambiguous.[14] Furthermore, the controller should be able to demonstrate that the data subject has given consent to the processing operation and should allow the data subject to withdraw his or her consent at any time.[15] The various conditions of consent are stringent and may be particularly difficult to meet. Therefore, relying on consent may prove to be unpractical or even impossible in a big data context, especially in its more complex applications.

  • Performance of or entering into a contract: The processing ground provided under Article 6(1)(b) GDPR can be relied upon by the data controller when it needs to process personal data in order to perform a contract to which the data subject is party or in order to take steps at the request of the data subject prior to entering into a contract; e.g., in case of purchase and delivery of a product or service. It follows that this ground for processing will be generally difficult to apply in a big data context, because it is unlikely that the processing of personal data for specific big data analytics purposes is “necessary” for the performance of a contract with the individual. Indeed, although big data analytics implies a complex chain of actors and multiple contracts, there is little interaction directly with the data subjects themselves.

  • Legal obligation: Under Article 6(1)(c), the GDPR provides a legal ground in situations where “processing is necessary for compliance with a legal obligation to which the controller is subject”. Generally, it is unlikely that personal data processing in a big data analytics context can be based on a “legal obligation”. This being said, according to the Article 29 Working Party, such legal ground should not automatically be set aside in a technology context.
    Illustration in the transport sector: In its Opinion on C-ITS, the Article 29 Working Party concludes that the long-term legal basis for this type of processing is the enactment of an EU-wide legal instrument. Indeed, the Article 29 Working Party considers it likely, given the projected prevalence of (semi-)autonomous cars, that the inclusion of C-ITS in vehicles will become mandatory at some point in time, comparable to the legal obligation on car manufacturers to include e-call functionalities in all new vehicles.[16]
  • Legitimate interests: The protection of privacy and personal data is not absolute and often requires a balance of interests. Given the difficulties to rely on the abovementioned processing grounds in a big data context, the legitimate interests of an organisation may pose a good alternative.[17] The GDPR includes Article 6(1)(f), which permits the processing of personal data where it is necessary "for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data.” However, in an Opinion on the recent developments on the Internet of Things (hereinafter "IoT"), the Article 29 Working Party warns that a processing will not always be justified merely by the economic interests of the IoT stakeholder in the processing, taking into account the potential severity of interference into the privacy of the data subject.[18] A similar reasoning could be transposed to a big data context. Therefore, when trying to rely on legitimate interests, a careful balancing test between the interests of the big data stakeholder and the data subject will remain of the utmost importance.

Finding the most adequate legal ground to permit the processing of personal data in the context of big data analytics may prove difficult. Indeed, the conditions associated to the grounds exhaustively listed in the GDPR are stringent and may limit or prohibit certain processing activities. Nonetheless, thorough assessments, such as in the context of a legitimate interests assessment, are likely to enable finding the most appropriate processing ground, while at the same time having the evidence to demonstrate the reasoning that lies behind, in accordance with the accountability principle.[19]

Core obligations under the GDPR

Some of the core obligations of the GDPR applicable to controllers (and processors) may be particularly relevant in the context of big data. This is surely the case for the requirements to conduct data protection impact assessments (hereinafter "DPIAs") and to implement privacy by design and privacy by default measures.

DPIAs are required to be conducted in certain cases only, i.e. when processing is “likely to result in a high risk”, taking into account the nature, scope, context and purposes of the processing. While Article 35(1) GDPR clearly indicates that processing “using new technologies” is likely to result in a high risk, Article 35(3) and Recital 91 of the GDPR provide a non-exhaustive list of occasions when DPIAs are required. For other processing activities, the organisation should determine whether the processing activity poses a high risk to individuals. In such context, Recital 75 of the GDPR provides some relevant elements that may help determining whether a (high) risk exists. In addition to the abovementioned illustrations and elements provided by the GDPR to determine whether a DPIA may be required, Article 35(4) of the GDPR requires national supervisory authorities to establish a list of processing operations that are necessarily subject to the requirement to conduct a DPIA ("black list") whereas Article 35(5) allows national supervisory authorities to establish a list of processing activities for which no DPIA shall be required ("white list").

An analysis of the various lists and guidance published by the different authorities easily leads to the conclusion that new technologies, and in particular big data analytics, will almost systematically require carrying out a DPIA. Indeed, some of the key characteristics of big data appear to be targeted, such as “large scale processing”, “systematic monitoring”, “automated decision-making with legal or similar significant effect”, and “matching or combining datasets”. Similarly, the use of data to analyse or predict situations, preferences or behaviours, or the systematic exchange of data between multiple actors, or the use of devices to collect data (and in particular relying on IoT) should lead to the requirement to carry out a DPIA.

Furthermore, the requirement to adopt “privacy by design” measures[20] entails that the controller must implement appropriate technical and organisational measures (e.g. pseudonymisation techniques) designed to implement the data protection principles (e.g. data minimisation). As for compliance with the “privacy by default” requirement[21], the controller must implement appropriate technical and organisational measures to ensure that, by default, only personal data necessary for each specific purpose of the processing are processed. This applies to the amount of data collected as well as to the extent of processing, period of storage and accessibility of the data. The measures adopted by the controller must guarantee that, by default, personal data are not made accessible to an indefinite number of individuals without the data subject’s intervention.

These requirements to implement dedicated "by design" and "by default" measures are particularly relevant in IT environments, and thus also to big data. In practice, it requires organisations to ensure that they consider privacy and data protection issues at the design phase and throughout the lifecycle of any system, service, product or process. The requirements can therefore be far-reaching and apply to all IT systems, services, products and processes involving personal data processing, but also require looking into organisational policies, processes, business practices and/or strategies that have privacy implications, and rethinking physical design of certain products and services as well as data sharing initiatives. Moreover, organisations must take technical measures to meet individuals' expectations in order to notably delimit what data will be processed for what purpose, only process the data strictly necessary for the purpose for which they are collected, appropriately inform individuals and provide them with sufficient controls to exercise their rights, and implement measures to prevent personal data from being made public by default.

   Illustration in the transport sector: The past decade has seen the rise of new transportation modes such as ridesharing. Ridesharing services allow car owners to fill the empty seats in their cars with other travellers. Ridesharing services however come with certain privacy and data protection implications for the users of such services. Indeed, users wanting to rely on a ridesharing service need to share their location data with the ridesharing operators in order to determine a point where drivers and riders can meet. Aïvodji et al.[22] have developed a privacy-preserving approach to compute meeting points in ridesharing. Taking into account the privacy-by-design principle, they have been able to integrate existing privacy-enhancing technologies and multimodal routing algorithms to compute in a privacy-preserving manner meeting points that are interesting to both drivers and riders using ridesharing services.

Rights of individuals

The GDPR aims to protect natural persons in relation to the processing of their personal data and therefore grants several rights to such persons.[23] In addition to these rights, the GDPR further provides for strict procedures to respond to any data subject request in exercise of their rights, notably regulating issues with respect to the timing and format of the response, or the fees that may be requested. It also regulates the right for individuals to lodge a complaint with a supervisory authority, the rights to an effective judicial remedy against a supervisory authority, a controller or a processor, and the possibility for data subjects to mandate a not-for-profit body, organisation or association to lodge a complaint on their behalf.

The numerous rights granted by the GDPR to individuals can be particularly challenging in relation to complex processing activities. Indeed, generally speaking, such rights can be overreaching and thus difficult to integrate in the context of big data analytics. It is nonetheless important to carefully consider the various rights and anticipate their concrete application. This being said, technology can also provide a means to individuals to exercise their rights in a more innovative way, such as through privacy enhancing technologies.

    Illustration in the transport sector: in its guidelines on the right to data portability[24] of 5 April 2017, the Article 29 Working Party notably advocates for a broad interpretation, whereby “raw data processed by a smart meter or other connected objects, activity logs, history of website usage or search activities” fall within the scope of the portability right.[25] Therefore, in a big data analytics context, the exercise of the right to portability of data collected through intelligent cars (e.g., by various sensors, smart meters, connected objects, etc.) or related to C-ITS might turn out to be almost impossible namely from an engineering perspective, particularly in view of the Article 29 Working Party's far-reaching interpretation of this right.

International data transfers

The GDPR maintains the general principle that the transfer of personal data to any country outside the European Economic Area (hereinafter the "EEA")[26] is prohibited unless that third country ensures an adequate level of privacy protection. Accordingly, transfers of personal data to “third countries” (i.e. to countries outside the EEA not ensuring an adequate level of protection) are restricted. In such cases, the data flow must be based on a particular instrument to allow the data transfer to take place, such as Standard/Model Contractual Clauses (SCCs)[27], Binding Corporate Rules (BCRs)[28], codes of conduct and certifications, or derogations.[29]

The provision of big data analytics services may entail that the personal data collected and processed will be transferred outside the EEA. This can be particularly true when relying on cloud computing services. It follows that the GDPR requirements related to the transfer of personal data must be taken into account in order to determine the most adequate solution to permit such international flow.      

Any data flows should therefore be carefully assessed and mapped, notably as part of the mapping of the different actors, in order to determine the data location and put in place the adequate (contractual) instruments.


The present article undeniably only looks into and provides illustrations of the most topical issues, without claiming exhaustiveness. It however demonstrates that finding a balance between the various interests at stake is of paramount importance. It is therefore essential to keep in mind Recital 4 of the GDPR which stipulates that the right to the protection of personal data is not an absolute right, that it must be considered in relation to its function in society and be balanced against other fundamental rights, and that this must be done in accordance with the principle of proportionality.

Accordingly, any guidance or administrative/judicial decision should carefully take into account all interests at stake. Failing to do so would necessarily impede the development of disruptive technologies and prohibit the emergence of a true data economy.

Our next article will address anonymisation and pseudonymisation in the context of big data, with illustrations drawn from the transport sector.

This series of articles has been made possible by the LeMO Project (, of which Bird & Bird LLP is a partner. The LeMO project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement no. 770038.

The information given in this document concerning technical, legal or professional subject matter is for guidance only and does not constitute legal or professional advice.

The content of this article reflects only the authors’ views. The European Commission and Innovation and Networks Executive Agency (INEA) are not responsible for any use that may be made of the information it contains.

[1] Gloria González Fuster and Amandine Scherrer, 'Big Data and Smart Devices and Their Impact on Privacy. Study for the LIBE Committee' (European Parliament, Directorate-General for Internal Policies, Policy Department C Citizens' rights and constitutional affairs, 2015) 20 <> accessed 4 January 2019.

[2] Any operation or set of operations which is performed on personal data or on sets of personal data, whether or not by automated means,   such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction (GDPR, art 4(2))

[3] Any information relating to an identified or identifiable natural person (GDPR, art 4(1))

[4] Personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs; trade-union membership; genetic data, biometric data processed solely to identify a human being; health-related data or data concerning a person’s sex life or sexual orientation (GDPR, art 9).

[5] Gloria González Fuster and Amandine Scherrer, 'Big Data and Smart Devices and Their Impact on Privacy. Study for the LIBE Committee' (European Parliament, Directorate-General for Internal Policies, Policy Department C Citizens' rights and constitutional affairs, 2015) 20 <> accessed 4 January 2019.

[6] Article 29 Data Protection Working Party, 'Opinion 3/2017 on Processing personal data in the context of Cooperative Intelligent Transport Systems (C-ITS)' (2017) WP252, 8

[7] Pursuant to Article 6 GDPR, these principles relate to: (i) lawfulness, fairness and transparency; (ii) purpose limitation; (iii) data minimisation; (iv) accuracy; (v) storage limitation; and (vi) integrity and confidentiality.

[8] European Data Protection Supervisor, 'Opinion 7/2015. Meeting the Challenges of Big Data. A Call for Transparency, User Control, Data Protection by Design and Accountability' (EDPS 2015) 4 <> accessed 3 January 2019; See also Paul De Hert and Gianclaudio Malgieri, 'Making the Most of New Laws: Reconciling Big Data Innovation and Personal Data Protection within and beyond the GDPR' in Elise Degrave, Cécile de Terwangne, Séverine Dusollier and Robert Queck (eds), Law, Norms and Freedoms in Cyberspace / Droit, Normes et Libertés dans le Cybermonde (Larcier 2018).

[9] Article 29 Data Protection Working Party, 'Guidelines on Automated individual decision-making and profiling for the purposes of regulation 2016/679' (2017) WP251, 15

[10] Nikolaus Forgó, Stefanie Hänold and Benjamin Schütze, 'The Principle of Purpose Limitation and Big Data' in Marcelo Corrales, Mark Fenwick and Nikolaus Forgó (eds), New Technology, Big Data and the Law (Perspectives in Law, Business and Innovation, Springer 2017).

[11] European Data Protection Supervisor, 'Opinion 7/2015. Meeting the Challenges of Big Data. A Call for Transparency, User Control, Data Protection by Design and Accountability' (EDPS 2015) 8 <> accessed 3 January 2019. 

[12] Ibid.

[13] These are (i) the consent of the data subject; (ii) the necessity for the performance of a contract with the data subject or to take steps prior to entering into a contract; (iii) the necessity for the purposes of legitimate interests of the controller or a third party; (iv) the necessity for compliance with a legal obligation to which the controller is subject; (v) the necessity for the protection of the vital interests of a data subject or another person where the data subject is incapable of giving consent; and (vi) the necessity for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller.

[14] GDPR, art 4(11)

[15] GDPR,  art 7

[16] Article 29 Data Protection Working Party, 'Opinion 3/2017 on Processing personal data in the context of Cooperative Intelligent Transport Systems (C-ITS)' (2017) WP252, 11.

[17]Legitimate interests may provide an alternative basis for the processing, which allows for a balance between commercial and societal benefits and the rights and interests of individuals.” Information Commissioner's Office, 'Big Data, Artificial Intelligence, Machine Learning and Data Protection' (ICO 2017) 34 <> accessed 3 January 2019.

[18] Article 29 Data Protection Working Party, 'Guidelines on the Recent Developments on the Internet of Things' (2014) WP223, 15.

[19] GDPR, art 6(2): The controller shall be responsible for, and be able to demonstrate compliance with, paragraph 1 (‘accountability’).

[20] GDPR, art 25(1) 

[21] GDPR, art 25(2)

[22] Ulrich Matchi Aïvodji, Sébastien Gambs, Marie-José Huguet and Marc-Olivier Killijian, 'Meeting Points in Ridesharing: A Privacy-preserving Approach' (2016) 72 Transportation Research Part C: Emerging Technologies 239

[23] These include: (i) the right of access (Article 15 GDPR); (ii) the right to rectification (Article 16 GDPR); (iii) the right to erasure (Article 17 GDPR); (iv) the right to restriction of processing (Article 18 GDPR); (v) the right to data portability (Article 20 GDPR); (vi) the right to object (Article 21 GDPR); (vii) the right not to be subject to automated decision-making, including profiling (Article 22 GDPR); and (viii) the right to withdraw consent (Article 7(3) GDPR).

[24] Article 29 Data Protection Working Party, 'Guidelines on the Right to Data Portability' (2017) WP 242

[25] By contrast, “inferred” personal data, such as “the profile created in the context of risk management and financial regulations (e.g. to assign a credit score or comply with anti-money laundering rules)” are outside the scope of the portability right.

[26] The European Economic Area includes the 28 EU countries and Iceland, Liechtenstein and Norway.

[27] A contract between the importer and exporter of the personal data containing sufficient safeguards regarding data protection.

[28] A binding internal code of conduct through which multinational corporations, international organisations and groups of companies wishing to transfer data within their corporate group comprising members established outside the EEA provide safeguards with respect to data protection.

[29] Derogations include: (i) explicit consent; (ii) contractual necessity; (iii) important reasons of public interest; (iv) legal claims; (v) vital interests; and (vi) public register data. The GDPR also provides for a limited derogation for non-repetitive transfers involving a limited number of data subjects where the transfer is necessary for compelling legitimate interests of the controller (which are not overridden by the interests or rights of the data subject) and where the controller has assessed (and documented) all the circumstances surrounding the data transfer and concluded there is adequacy. The controller must inform the supervisory authority and the data subjects when relying on this derogation.

Latest insights

More Insights

Related capabilities