Re-aligning Data Protection Law in the Age of Big Data PDF

572 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 VI. RE-ALIGNING THE REMIT OF DATA PROTECTION LAW IN THE AGE OF BIG DATA: A RIGHT TO REASONABLE INFERENCES As should now be clear, inferences receive less protection under data protection law than other types of personal data provided by the data subject. In many ways, the lower status of inferences reflects the limitations placed on the remit of data protection law by the ECJ.343 Specifically, in standing jurisprudence the ECJ has argued that data protection law is not intended to assess the accuracy of decision-making processes or ensure good administrative practices. 344 Such assessments are instead deferred to sectoral and Member State law, and relevant governance bodies. While the ECJ plays a key role in defining the remit of data protection law, the novel risks introduced by Big Data analytics and automated decision-making345 suggest that the prescribed remit of data protection law may be too narrow to realize the law’s original aims. In this Part, this Article argues that continuing to rely on sensitivity and identifiability as metrics for the level of protection to grant data is misguided. Rather, greater emphasis must be placed on management of output data, or inferences and decisions, to reconfigure privacy as a holistic concept. A right to reasonable inferences is proposed as an accountability mechanism reflecting this reconfiguration of data protection law. Tensions between profiling, discrimination, privacy, and data protection law have long been acknowledged.346 In this regard, the term “data protection” is misleading, as it suggests that the laws aim to protect the data, when in fact it is See supra Part IV. See supra Section IV.C. 345 See supra Section II.A. 346 MAYER-SCHÖNBERGER, supra note 10; MAYER-SCHÖNBERGER & CUKIER, supra note 23; PROFILING THE EUROPEAN CITIZEN, supra note 61; Wachter, Privacy, supra note 54. See generally CASES, MATERIALS AND TEXT ON NATIONAL, SUPRANATIONAL AND INTERNATIONAL NON‐DISCRIMINATION LAW 674 (Dagmar Schiek, eds., 2007). 343 344 No. 2:494] A RIGHT TO REASONABLE INFERENCES 573 intended to protect people.347 Data can both directly and indirectly reveal aspects of an individual’s private life, which then, among other things, offer grounds for discrimination. The right to privacy offers protection against such disclosures which can lead to discrimination and irreversible harms, “and have long-term consequences for the individual as well as his social environment.”348 The current limitations placed on the remit of data protection law can be detrimental to its broader aim of protecting privacy against the risks posed by new technologies. As Bygrave explains, privacy is about individuality, autonomy, integrity and dignity.349 The broader right to privacy addresses personal and family life, economic relations, and more broadly an individual’s ability to freely express her personality without fear of ramifications.350 Protecting this right is a key aim of data protection law. Standing jurisprudence of the ECJ351 and ECHR352 has recognized that the aim of data protection law is to protect these broader aspects of privacy, or, in other words, to restrict 347 See Mireille Hildebrandt, Profiling: From Data to Knowledge, 30 DATENSCHUTZ UND DATENSICHERHEIT 548 (2006); Wachter, supra note 54. 348 Article 29 Data Prot. Working Party, supra note 307, at 4. 349 LEE A BYGRAVE, DATA PROTECTION LAW: APPROACHING ITS RATIONALE, LOGIC AND LIMITS 128–129 (2002). 350 See generally Daniel J. Solove, “I’ve Got Nothing to Hide” and Other Misunderstandings of Privacy, 44 SAN DIEGO L. REV. 745 (2007). 351 See, e.g., Case C-101/01, Criminal Proceedings Against Bodil Lindqvist, 2003 E.R.C. I-12971; Case C‑434/16 Peter Nowak v. Data Prot. Comm’r, 2017 E.C.R. I-994; Case C‑582/14, Patrick Breyer v. Bundesrepublik Deutschlan, 2016 E.C.R. I-779. 352 Amann v. Switzerland, 2000-II Eur. Ct. H.R. 1, 20 § 65 (“... the term ‘private life’ must not be interpreted restrictively. In particular, respect for private life comprises the right to establish and develop relationships with other human beings; furthermore, there is no reason of principle to justify excluding activities of a professional or business nature from the notion of ‘private life’.... That broad interpretation corresponds with that of the Council of Europe’s Convention of 28 January 1981 [.]”); see also COUNCIL OF EUROPE, CASE LAW OF THE EUROPEAN COURT OF HUMAN RIGHTS CONCERNING THE PROTECTION OF PERSONAL DATA (2017), https://rm.coe.int/case-law-on-data-protection/1680766992 [https://perma. cc/MP7S-2DKP]. 574 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 the processing of personally identifiable data that impacts these areas. Data protection is thus only one segment of privacy. Reflecting this, privacy and data protection have traditionally been seen as individual rights in the EU.353 Stemming from the idea that an individual should have the right to be left alone by the state, the right to privacy was originally proposed as a defense mechanism against governmental surveillance.354 Legal remedies addressing data protection provide tools that prevent individuals from being identified or unduly singled out. On the other hand, legal remedies against discrimination were created based on the experience during the Second World War, seen in Article 14 of the EU Convention of Human Rights. 355 Both aims are reflected in the 1995 Data Protection Directive and now the GDPR, which restrict processing of personally identifiable information to prevent “singling out,” with special provisions for processing of sensitive data due to concerns with discrimination.356 Sensitive or protected attributes are linked to observable variables that have historically proven discriminatory (e.g. ethnicity, religion). As the novel risks of automated decision-making and profiling suggest,357 these systems disrupt traditional concepts of privacy and discrimination by throwing the potential value and sensitivity of data into question. A question thus becomes apparent: Are the fundamental aims of data protection law still being met in the age of Big Data, or 353 Alessandro Mantelero, Personal Data for Decisional Purposes in the Age of Analytics: From an Individual to a Collective Dimension of Data Protection, 32 COMPUTER L. & SECURITY REV. 238, 243 (2016); Alessandro Mantelero & Giuseppe Vaciago, Data Protection in a Big Data Society. Ideas for a Future Regulation, 15 DIGITAL INVESTIGATION 104, 107 (2015). 354 Mantelero, supra note 353, at 245. 355 Grabenwarter, supra note 354. 356 Regulation (EU) 2016/679 of the European Parliament and of the Council on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC, 2016 O.J. (L 119) art. 9. 357 See supra Part II. No. 2:494] A RIGHT TO REASONABLE INFERENCES 575 is a re-alignment of the remit of data protection required to restore adequate protection of privacy? To answer this question, it is necessary to evaluate whether individual-level rights can be effectively applied to inferences, and whether the distinction between types of data in data protection law based on identifiability and sensitivity is actually effective when applied to inferences. Concerning the first point, the preceding discussion revealed that data subjects are often unable to access or evaluate inferences drawn about them, as well as the processes that led to these inferences. At a minimum, inferences enjoy less protection under data protection law due to the necessity of balancing requests for access, erasure, or other rights with the interests of data controllers (e.g., trade secrets, intellectual property) and the rights and freedoms of others. Ironically, inferences receive the least protection of all the types of data addressed in data protection law, and yet now pose perhaps the greatest risks in terms of privacy and discrimination.358 Concerning the second point, if these distinctions break down when applied to inferences, protections under data protection law are arbitrarily applied, creating greater opportunities for invasions of privacy and related harms (e.g., discrimination). Many inferences can be drawn from an individual’s personal data, but this is not the only possible source. Third party personal data, anonymized data, and other forms of non-personal data can also be used to develop inferences and profiles. This background knowledge, built from anonymized, non-personal, or third-party data, can then be applied to individual data subjects. 359 The process of drawing inferences and constructing profiles can in this way be separated from their eventual application to an identifiable person. See supra Section II.A. Wim Schreurs. Mireille Hildebrandt, Els Kindt & Michaël Vanfleteren, Cogitas, Ergo Sum. The Role of Data Protection Law and NonDiscrimination Law in Group Profiling in the Private Sector, in PROFILING THE EUROPEAN CITIZEN 241, 246 (Mireille Hildebrandt & Serge Gutwirth eds., 2008). 358 359 576 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 As a result, a gap exists between the capacity of controllers or devices to collect data and draw inferences about people from it, and data protection law’s capacity to govern inferential analytics not addressing an identifiable individual.360 Ultimately, affected individuals are not (fully) able to exercise their data protection rights (e.g. access361 or erasure362) until standalone inferences or profiles based on anonymized, non-personal, or third party data have been applied at an individual level.363 By using data about people not linked to a particular individual, or by purposefully anonymizing data prior to drawing inferences and constructing profiles,364 companies can thus avoid many of the restrictions of data protection law. This is not to suggest that individuals should have rights over the data of others, or data which has not been applied to them. Rather, the difficulty is that individuals lack redress against the constituent third party or anonymous data and processing that have led to the inferences or profiles applied to them, unless relevant sectoral decision-making standards apply (e.g. anti-discrimination law). Identifiability thus poses a barrier to meaningful accountability for inferential analytics. As an example, concerns have been raised about the classification of data collected by autonomous cars. Sensors can scan the road ahead, detecting objects to avoid, which may 360 See supra note 347 (taking the view that data subjects need to consent before data is anonymized.) 361 Hildebrandt, supra note 347, at 550 (explaining “that citizens have no legal right to even access the knowledge that is inferred from these anonymised data and may be used in ways that impact their lives”). 362 Rubinstein, for example, doubts that the right to be forgotten would apply to profiles built from anonymised or aggregated data. Ira S. Rubinstein, Big Data: The End of Privacy or a New Beginning?, 3 INT’L DATA PRIVACY L. 74, 80 (2013) (“[I]t is not even clear whether Article 17 [of the GDPR] would apply to predictive inferences based on personal data that may have been anonymized or generalized as a result of analytic techniques at the heart of Big Data.”). 363 See also Hildebrandt, supra note 347, at 550. On why exclusion of anonymous data from data protection law is a problem, see Schreurs et al., supra note 359, at 241. 364 See Schreurs et al., supra note 359, at 248. No. 2:494] A RIGHT TO REASONABLE INFERENCES 577 include pedestrians. Such data describing the car’s surroundings does not clearly fall within the scope of “personal data” in data protection law. 365 Although undoubtedly data about people, such images do not normally allow for unambiguous identification of recorded individuals. For data to be “identifiable,” it does not need to identify an individual with absolute certainty. Rather, it seems to be enough that the person can be singled out from a group, even if, for example, his or her name is not known, but other characteristics describe the person sufficiently.366 The possibility of identifying a person must be evaluated reasonably, considering “all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly.”367 This can have major implications for assessing problematic behavior of the car, such as a crash, not least because such a definition of “identifiability” is fluid and changes with advances in technology.368 Scholars have shown that anonymized data can often be linked back to individuals. 369 365 See Sandra Wachter, Brent Mittelstadt & Luciano Floridi, Transparent, Explainable, and Accountable AI for Robotics, SCI. ROBOTICS, May 31, 2017, at 1, 1. See generally Inge Graef, Raphaël Gellert, Nadezhda Purtova & Martin Husovec, Feedback to the Commission’s Proposal on a Framework for the Free Flow of Non-Personal Data (Jan. 22, 2018) (unpublished manuscript). 366 See Korff, supra note 323, at 45. On why the distinction between identifiable and non-identifiable uses is important in the Big Data era, see Colin J. Bennett & Robin M. Bayley, Privacy Protection in the Era of ‘Big Data’: Regulatory Challenges and Social Assessments, in EXPLORING THE BOUNDARIES OF BIG DATA 205, 209–10 (Bart van der Sloot, Dennis Broeders & Erik Schrijvers eds., 2016); Ira S. Rubinstein & Woodrow Hartzog, Anonymization and Risk, 91 WASH. L. REV. 703, 704–05 (2016). 367 Regulation (EU) 2016/679 of the European Parliament and of the Council on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC, 2016 O.J. (L 119) 5. 368 See Korff, supra note 323, at 46. 369 See, e.g., Ohm, supra note 298, at 1752; Nadezhda Purtova, Do Property Rights in Personal Data Make Sense after the Big Data Turn?: Individual Control and Transparency, 10 J.L. & ECON. REG. 64, 74 (2017); 578 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 The driver, pedestrians, insurance companies, regulators, and others could all have an interest in accessing non-personal sensor data, yet the question of access would fall outside of the scope of data protection law. On a similar note, data does not need to be linked to an identifiable or identified individual to impact his or her life. Schreurs et al. give the example of a shopping cart that can suggest certain products based only on the products that it senses are put in the basket and the speed at which the cart is pushed.370 In this case, the customer does not need to be identified for choices to be tailored to his or her perceived preferences or needs. To prevent data harms (e.g., discrimination) and bypass the murky issue of what constitutes personal data, it has been suggested that the “personal data” classification is fundamentally broken and should be abandoned.371 Abandoning this distinction would, of course, leave a gap in data protection law requiring some other classification of data to be introduced to constrain the scope of application of the law. Without a new classification, all data relating to people would effectively become personal data, greatly expanding the scope of coverage of data protection law.372 While such a move to treat all data as personal data has its merits, such as eliminating overlapping boundaries between personal and non-personal data, such a radical step is not strictly necessary to resolve the specific weaknesses of data protection law concerning inferences. Of course, (sensitive) personal data should never be collected without the explicit consent of the user. But the problem does not lie so much with data collection, but rather with what can be read from the data and the decisions that are based on this knowledge. Latanya Sweeney, Only You, Your Doctor, and Many Others May Know, TECH. SCI. (Sept. 29, 2015), https://techscience.org/a/2015092903 [https://perma.cc/38L5-ATQ8]; Vijay Pandurangan, On Taxis and Rainbows, MEDIUM (June 21, 2014), https://tech.vijayp.ca/of-taxis-andrainbows-f6bc289679a1 [https://perma.cc/HW7B-C6UW]. 370 Schreurs et al., supra note 359, at 246. 371 See Purtova, note 86, at 58–59; Wachter, supra note 7, at 443. 372 See Purtova, supra note 86. No. 2:494] A RIGHT TO REASONABLE INFERENCES 579 Therefore, this Article suggests that continuing to rely on sensitivity and identifiability, or on the blurry distinction among personal data, sensitive data, non-personal, and anonymized data as metrics for the level of protection to grant to data is misguided. This approach fails to protect privacy in the broader sense described above from the novel risks of Big Data analytics and automated decision-making. Rather, greater emphasis should be placed on managing the outputs of data processing, understood here as inferences or decisions, regardless of the type of data informing them. This would reconfigure privacy as a holistic concept, and be more in line with the ECHR,373 the Council of Europe’s “Modernised Convention for the Protection of Individuals with Regard to the Processing of Personal Data” 374 and their guidelines on AI and data protection,375 and the European Parliament’s resolution on a comprehensive European industrial policy on artificial intelligence and robotics. 376 One could also argue for a mediated application of privacy as a human right, and advocate for a “positive obligation” of states to implement laws. However, the immediate political appeal of such a move is doubtful, given a recent proposal in the EU to facilitate 373 For an overview of ECHR jurisprudence on privacy to 2017, see Council of Europe, supra note 60. 374 Comm. of Ministers, Modernised Convention for the Protection of Individuals with Regard to the Processing of Personal Data, 128th Sess., CM/Inf(2018)15-final (2018), https://search.coe.int/cm/Pages/result_details. aspx?ObjectId=09000016807c65bf [https://perma.cc/RD3F-MVVS?type= image]. 375 Directorate Gen. of Human Rights and Rule of Law, Consultative Committee of the Convention for the Protection of Individuals with Regard to Automatic Processing of Personal Data (Convention 108): Guidelines on Artificial Intelligence and Data Protection, T-PD(2019)01 (Jan. 25, 2019), https://rm.coe.int/guidelines-on-artificial-intelligence-and-dataprotection/168091f9d8 [https://perma.cc/H563-X673]. 376 A Comprehensive European Industrial Policy on Artificial Intelligence and Robotics, EUR. PARL. DOC. P8_TA-PROV(2019)0081 (2019) [hereinafter Artificial Intelligence and Robotics]), http://www.europarl.europa.eu/sides/getDoc.do?pubRef=//EP//NONSGML+TA+P8-TA-2019-0081+0+DOC+PDF+V0//EN [https://perma.cc/UD8Y-ZF6E]. 580 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 exchange of non-personal data.377 Unfortunately, the proposal lacks serious consideration of the privacy risks of nonpersonal data, along the lines outlined above. To make this proposal work, the ECJ would need to redefine the remit of data protection law as a tool to ensure accurate and fair data driven decision-making. Given these challenges, in order to fully meet the aims of data protection law in the age of Big Data, a “right to reasonable inferences” must be introduced. In response to the novel threats posed by “high-risk inferences,” a right to reasonable inferences can be derived from the right to privacy when viewed as a mechanism intended to protect identity, reputation, and capacities for self-presentation. This right would offer data subjects additional protections against inferences drawn through Big Data analytics that (1) are predicted or shown to cause reputational damage or invade one’s privacy, and (2) have low verifiability in the sense of being predictive or opinion-based while being used to make important decisions. To make such a right feasible, the ECJ should broaden its interpretation of data protection law regarding an individual’s rights over inferred and derived data, profiling, and automated decision-making involving such information. The following Section sketches the scope of this right. To implement a “right to reasonable inferences,” new policy mechanisms are needed focusing on ex-ante justification and ex-post contestation of unreasonable inferences, which can likewise support challenges to subsequent decisions. Justification would be established by providing evidence of the normative acceptability, relevance and reliability of inferences and the methods used to draw them. If the right were implemented, high-risk inferences would receive Proposal for a Regulation of the European Parliament and of the Council on a Framework for the Free Flow of Non-Personal Data in the European Union, at 2, COM (2017) 495 final (Sept. 13, 2017), https://ec.europa.eu/digital-single-market/en/news/proposal-regulationeuropean-parliament-and-council-framework-free-flow-non-personal-data [https://perma.cc/A63H-HHJS]. 377 No. 2:494] A RIGHT TO REASONABLE INFERENCES 581 comparable levels of protection to automated individual decision-making.378 A. Justification to Establish Acceptability, Relevance, and Reliability The ex-ante component of the right to reasonable inferences would thus require data controllers to proactively establish whether an inference is reasonable. Data controllers would need to explain (1) why certain data are a normatively acceptable basis to draw inferences; (2) why these inferences are normatively acceptable and relevant for the chosen processing purpose or type of automated decision; and (3) whether the data and methods used to draw the inferences are accurate and statistically reliable.379 These requirements should be enacted through the introduction of legally binding verification and notification requirements to be met by data controllers prior to deploying high-risk inferential analytics at scale.380 The current rules in Article 5 around fairness, purpose limitation, accuracy, and data minimization (including relevance for the pursued purpose) look promising at first glance, but seem to be insufficient. Eskens convincingly argues that “fairness” as used relates to transparency and requires that the user be informed about data processing and See generally Wachter, Mittelstadt & Floridi, supra note 11. On why the immutable attributes rationale for prohibiting discrimination on suspect grounds (e.g., ethnicity) is unhelpful because talent and intelligence cannot be changed either but are treated as a legitimate basis for decision-making, see Janneke Gerards, The Discrimination Grounds of Article 14 of the European Convention on Human Rights, 13 HUM. RTS. L. REV. 99, 114–115, 115 n.70 (2013). 380 The caveat “at scale” is included to ensure that data controllers can carry out the initial processing necessary to demonstrate normative acceptability, relevance, and reliability. Without this condition, data controllers would be unable to engage in exploratory analysis or develop new methods and types of inferences. The intention is to introduce justificatory requirements to be met prior to widespread deployment, not to prevent development and deployment themselves. 378 379 582 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 their respective rights.381 The fact that “fairness” is not defined in the GDPR and only appears in relation to lawfulness or transparency makes it questionable that “the fairness principle has any independent meaning at all,” and because “fair processing” is never mentioned, 382 it seems unlikely that the GDPR is intended to govern it. The European Data Protection Board (EDPB) also recently discussed fairness in relation to purpose limitation and legitimate interests of data controllers.383 The EDPB stated fairness relates to reasonable expectations (e.g. Recital 47 and 50) for data subjects in relation to potential harms and consequences. However, even if this view is followed, “user expectation” is not a democratic or normative justification. The fact that something has become “normal” or commonplace does not necessarily mean it is justifiable or socially desirable. Similarly, problems arise with purpose and data minimization (including relevance). In the past these provisions have not proven effective owing to the fact that very vague and broad purposes are named in terms and conditions governing data collection and processing. Recent instructive examples are the complaints filed relating to forced consent, as Article7(4) of the GDPR clarifies that consent can only be considered freely given if the data requested is limited to that which is necessary for the provision of a service.384 If, as a prerequisite of using a service, consent must be given for the collection and processing data beyond that which is strictly necessary for service provision, the consent cannot be 381 Sarah Johanna Eskens, Profiling the European Consumer in the Internet of Things: How Will the General Data Protection Regulation Apply to This Form of Personal Data Processing, and How Should it? 27 (Feb. 29, 2016) (unpublished manuscript), https://papers.ssrn.com/sol3/papers.cfm ?abstract_id=2752010 [https://perma.cc/W78T-TQZ2]. For a different view, see Lee A. Bygrave, Minding the Machine v2.0: The EU General Data Protection Regulation and Automated Decision Making, in ALGORITHMIC REGULATION (Karen Yeung & Martin Lodge eds.) (forthcoming 2019). 382 Eskens, supra note 381, at 27 n.125. 383 European Data Prot. Bd., supra note 2, at 5, 9. 384 See GDPR: Noyb.eu Filed Four Complaints over “Forced Consent” Against Google, Instagram, WhatsApp and Facebook, NOYB (May 25, 2018), https://noyb.eu/4complaints/ [https://perma.cc/6KXJ-P6ZZ]. No. 2:494] A RIGHT TO REASONABLE INFERENCES 583 considered freely given. Critically, “purpose limitation,” “accuracy” and “data minimization” (including relevance for the pursued purpose) seem to only apply to input data. In general, Article 5 is seen as a transparency tool, not a justification mechanism. One of the problems is that it is the data controllers who define the purpose and relevance of the collected data. A right to reasonable inferences, on the other hand, would open up a dialogue with individual data subjects and society to discuss whether processing practices are normatively acceptable. Finally, a right to reasonable inferences would apply equally to inferences drawn by the data controller and those received from a third party which can subsequently be re-purposed. In the first instance, the right should apply only to “highrisk inferences” drawn through Big Data analytics which (1) are privacy-invasive or damaging to reputation, or have a high likelihood of being so in the future, or (2) have low verifiability in the sense of being predictive or opinion-based while being used for important decisions. The first condition effectively sets a proportionality test for normative acceptability, according to which the damage to privacy or reputation caused by using a particular data source to draw an inference must be proportional to its predicted benefit or utility. Assessments of proportionality and the potential invasiveness of a data source and processing purpose should not be performed by data controllers in isolation.385 Concerning the second condition, the right in effect applies to both verifiable and nonverifiable inferences in different ways, but is most immediately concerned with mitigating the potential harms of non-verifiable inferences.386 These two conditions are proposed as a starting point for application of the right to reasonable inferences. Inferences meeting either condition would meet the threshold for a “right to reasonable inferences” to be exercised. Alternatively, the both conditions could be seen as necessary for the right to be 385 This type of assessment could conceivably form part of a data protection impact assessment, provided for in Article 35 of the GDPR, if a sufficient level of external review or governance could be guaranteed. 386 See infra Section VI.B. 584 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 exercised. However, requiring low verifiability in addition to damage to privacy or reputation establishes a threshold that is perhaps too high in practice given the novel risks of inferential analytics.387 The necessity of each condition for the right to apply should remain open to debate to determine their impact and assess whether general or sector-specific thresholds are preferable. Alternative grounds for application or additional conditions may also be feasible. For example, the right could alternatively be based solely on the notion of “legal or similarly significant effects” as prescribed in Article 22(1) GDPR.388 In the conditions proposed here, “important decisions” are those which have such “legal or similarly significant effects.” However, such effects are not limited to “solely automated” decisions as is the case in Article 22(1) because the risks to private life caused by using non-intuitive inferences are not dependent on the extent of automation in the decision-making process. In any case, basing the right entirely on a threshold of “legal or similarly significant” effects would position it as a complementary protection for the right not to be subject to automated individual decision-making, found in Article 22, which may be desirable. The Article 29 Working Party has provided examples of such effects in relation to Article 22: differential pricing and targeted advertisements that affect vulnerable groups, such as children playing online games being profiled as susceptible to advertisements or adults experiencing financial difficulties.389 The precise scope of “legal or similarly significant effects” remains unclear in See infra Part II. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation), 2016 O.J. (L119) art. 22(1). Specifically, “automated decision-making” is defined as “a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her.” Id. 389 Article 29 Data Prot. Working Party, supra note 17, at 22, 29. 387 388 No. 2:494] A RIGHT TO REASONABLE INFERENCES 585 practice, although it will be clarified as the GDPR matures via legal commentary, national implementation, and jurisprudence. These proposals are not arbitrarily chosen; rather, they reflect current trends in recent EU policy and offer a solution to the worrying weaknesses in data protection law described above. With regards to relevance, the Article 29 Working Party, for example, argues that disclosures providing “meaningful information about the logic involved” in automated decision-making, as required by Articles 13–15, should include “details of the main characteristics considered in reaching the decision, the source of this information and the relevance.”390 The Working Party explicitly warns that data controllers should prevent “any over-reliance on 391 correlations,” and explain why a “profile is relevant to the automated decision-making process.”392 The second component of justification—reliability— requires data controllers to demonstrate that the analytical methods and data used to draw inferences (and potentially make automated decisions) are reliable, for example via statistical verification techniques.393 The need to demonstrate reliability aligns with the GDPR’s Recital 71, which suggests that in order to ensure fair and transparent processing, data controllers are directed to verify the statistical accuracy of their systems, ensure that inaccuracies in personal data can be corrected, and prevent discriminatory effects of automated decision-making.394 Similarly, the Article 29 Working Party Id. at 25–26. Id. at 28. 392 Id. at 31. 393 Schreurs et al., supra note 359, at 253 (“Another matter of concern is the fact that group profiles may incorporate falsified presumptions, such as statistics that wrongly presume that mobile phones will cause cancer or information that people from a certain area have for instance been exposed to radioactive radiation. Knowledge of the logic involved could support an objection to the use of such profiles, even if no personal data of an identifiable person are collected to construct the profile.”). 394 Regulation (EU) 2016/679 of the European Parliament and of the Council on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing 390 391 586 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 explicitly calls for “algorithmic auditing” to be implemented to assess “the accuracy and relevance of automated decisionmaking including profiling[.]”395 Controllers have a similar responsibility for input data, which must be shown to not be “inaccurate or irrelevant, or taken out of context,”396 and to not violate “the reasonable expectations of the data subjects”397 in relation to the purpose for which the data was collected.398 The right to reasonable inferences would apply similar conditions to inferences, understood as a type of output data. The obligation to demonstrate the reliability of input data and methods aligns with the Council of Europe’s views on automated data processing and profiling. The Council has acknowledged a “lack of transparency, or even ‘invisibility,’ of profiling and the lack of accuracy that may derive from the automatic application of pre-established rules of inference [which] can pose significant risks for the individual’s rights and freedoms.”399 It recommends that data controllers “should Directive 95/46/EC, 2016 O.J. (L 119) 14 (“In order to ensure fair and transparent processing in respect of the data subject, taking into account the specific circumstances and context in which the personal data are processed, the controller should use appropriate mathematical or statistical procedures for the profiling, implement technical and organisational measures appropriate to ensure, in particular, that factors which result in inaccuracies in personal data are corrected and the risk of errors is minimised, secure personal data in a manner that takes account of the potential risks involved for the interests and rights of the data subject and that prevents, inter alia, discriminatory effects on natural persons on the basis of racial or ethnic origin, political opinion, religion or beliefs, trade union membership, genetic or health status or sexual orientation, or that result in measures having such an effect.”). 395 Article 29 Data Prot. Working Party, supra note 17, at 28, 32. 396 Id. at 17. 397 Id. at 11. 398 Id.; Bart Custers, Simone van der Hof, Bart Schermer, Sandra Appleby-Arnold & Noellie Brockdorff, Informed Consent in Social Media Use – The Gap Between User Expectations and EU Personal Data Protection Law, 10 SCRIPTED 435, 445–46 (2013). 399 The Protection of Individuals with Regard to Automatic Processing of Personal Data in the Context of Profiling, at 6, CM/Rec (2010)13 (Nov. 23 2010), https://rm.coe.int/16807096c3 [https://perma.cc/DDX4-HLKQ]. No. 2:494] A RIGHT TO REASONABLE INFERENCES 587 periodically and within a reasonable time reevaluate the quality of the data and of the statistical inferences used.” 400 Acceptability, relevance and reliability requirements for inferences are not without precedent in European data protection law and policy. Similar requirements for credit scoring have existed since 2010 in Germany’s data protection law, although it is worth noting that this law is no longer in force. Specifically, Section 28b required data controllers making predictions or predictive inferences to establish that: 1. 2. 3. 4. The methods being used are sound according to the state of the art in science, mathematics, or statistics, and that the data being used is relevant to the type of prediction being made. Only legally obtained data is used. Predictions regarding the probability of an event happening are not based solely on a data subject’s physical address (e.g., post code). If physical addresses are used, the data subject is informed of this fact, and it has been documented that the data subject has been so informed.401 These requirements closely align with this Article’s proposal for data controllers to establish the normative acceptability, relevance and reliability of proposed methods and data sources for drawing inferences. In particular, requiring data subjects to be notified when known proxies for sensitive attributes are used is crucial. If legally binding requirements are created along these lines, a balance must be struck between data subject and controller interests. At a minimum, data controllers should be obligated to provide information regarding the intended content or purpose of the inferences being drawn, the extent to which these inferences rely on proxies for sensitive Id. at 11. The authors translated this from German. See Gesetz zur Änderung des Bundesdatenschutzgesetzes [Law Amending the Federal Data Protection Act], Jul. 29, 2009, BGBL I at 2254, § 28b (Ger.). 400 401 588 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 attributes, and counterintuitive relationships between input data and the target inference (e.g., basing creditworthiness on clicking behavior). This type of information is intended to be the starting point of a dialogue between data subjects and data controllers regarding the justifiability of particular inferences. One of the greatest risks of inferential Big Data analytics and automated decision-making is the loss of control over how individuals are perceived, and the predictability or intuitive link between actions and the perceptions of others. The proposed notification requirements are intended to make the process of evaluating the data subject more open, inclusive, and discursive, and to provide a new channel of remedies for data subjects who believe that unreasonable inferences have been drawn. B. Contestation of Unreasonable Inferences To complement ex-ante notification requirements, the second half of a “right to reasonable inferences” should provide an effective ex-post accountability mechanism for the data subject. The ex-ante justification is bolstered by an additional ex-post mechanism enabling unreasonable inferences to be challenged. 402 This right would allow data subjects to contest inferences themselves (e.g., credit score), which complements the existing right to contest automated decisions found in Article 22(3).403 With the considerations of justification in Section VI.A in mind, the right to contest would be transformed from a mere procedural tool404 to a remedy that allows assessment of the content behind a decision. In practice, contesting would amount to raising an objection with the data controller if an inference drawn is found by the data subject to be inaccurate or unreasonable 402 In favor of such a solution, see Mireille Hildebrandt & Bert-Jaap Koops, The Challenges of Ambient Law and Legal Protection in the Profiling Era, 73 MOD. L. REV. 428, 448–49 (2010). On the need to remedy unjust judgments based on inferences, see Leenes, supra note 73, at 298. 403 Mendoza & Bygrave, supra note 332, at 6, 14. 404 See supra Section V.E. No. 2:494] A RIGHT TO REASONABLE INFERENCES 589 (e.g., if based on non-intuitive, unreliable, or invasive features or source data), and to offering supplementary information that could lead to an alternative preferred outcome. Contesting as imagined here encourages dialogue between the data subject and the controller if the accuracy or reasonableness of an inference is questioned. The ex-post component of the right to reasonable inferences is not, however, intended to shift decision-making autonomy from private actors to data subjects. Contesting an inference and offering supplementary information does not guarantee that the inference in question (or subsequent decisions challenged under Article 22(3) of the GDPR) will also be modified. Data controllers have private autonomy in the ways they evaluate data subjects and make decisions about them. The right to reasonable inferences is not intended to violate this autonomy, but rather to provide the data subject with a way to learn more about the data controller’s perceptions and decision-making processes, and to potentially convince the controller that one or both is wrong. For verifiable inferences (e.g., Jessie is a homeowner), it is reasonable to assume that offering supplementary information demonstrating the original inference is inaccurate would lead to rectification of the inference, as accurate data is in the interests of both parties. This type of right is nothing new, as data subjects can already rectify data in this way under Article 16 of the GDPR.405 This proposal only suggests broadening the scope of Article 16 from merely input data to also output data, which is in line with the Article 29 Working Party’s view. 406 For non-verifiable or predictive inferences (e.g., Jade will default on a loan in the next five years), data subjects arguably do not have an equivalent form of rectification. Nonverifiable inferences cannot be rectified as such due to their 405 Regulation (EU) 2016/679 of the European Parliament and of the Council on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC, 2016 O.J. (L 119) art. 16. 406 See supra Part III. 590 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 inherent uncertainty or subjectivity.407 The data subject may nonetheless disagree with the controller’s views or assessment if, for example, it does not align with their selfperception, the source data is perceived as irrelevant, or the scope of data considered was incomplete or insufficient. Contesting the normative acceptability, relevance or reliability of an inference on any of these grounds is distinct from rectifying a provably inaccurate inference. The right to rectification in Article 16 may arguably already offer a remedy for non-verifiable inferences. Whether this is the case depends upon one’s view of the necessity of verifiability in classifying inferences as personal data 408 and its impact on subsequent application of data protection rights. The ECJ, for example, argues that the right to rectification is not intended to apply to the content of subjective (and thus non-verifiable) opinions and assessments. 409 In contrast, the Article 29 Working Party believes predictive inferences can also be “rectified” by providing supplementary information that would alter the assessment, meaning that verifiability is not necessary to exercise the right of rectification.410 The proposal for an ex-ante right to contest inferences made here may thus not represent a radical departure from existing law. Rather, if adopted, the right to reasonable inferences would effectively enshrine an answer to the verifiability question in law, and thus strengthen data protection rights over inferences regardless of their verifiability. This sort of strengthening is essential if the interests of data controllers are to form less of a barrier to exercising individual data protection rights against inferences than is currently the case. 411 In conjunction with the ex-ante notification requirements, the data subject’s chances of successfully contesting inferences (and automated decisionmaking based upon them) would likewise improve, as the 407 408 409 410 411 See supra Sections III.B, V.B. See supra Sections III.B, V.B. See supra Part IV, Section V.B. See supra Part III. See supra Part V. No. 2:494] A RIGHT TO REASONABLE INFERENCES 591 subject could draw on the justification disclosure made by the controller prior to an inference being drawn. VII.BARRIERS TO A RIGHT TO REASONABLE INFERENCES: IP LAW AND TRADE SECRETS As shown in Parts III and IV, the first hurdles to the implementation of a right to reasonable inferences lies with determining the legal status of inferences. Once consensus has been reached on whether inferences are personal data, the rights granted in the GDPR very often need to be counterbalanced with the legitimate interests of data controllers concerning, for example, trade secrets, intellectual property, or third-party privacy.412 The easiest legal solution to prevent unreasonable inferences from being drawn would be to allow data subjects to prevent models from being built in the first place, or to grant them control over the models used in inferential analytics, and how they are applied. Such a solution is of course not to be recommended, as it fails to respect the substantial public and commercial interests advanced by analytics and technological development more broadly. With regard to the mechanisms recommended in the preceding Section, a more reasonable approach would be to require controllers to justify to regulators or data subjects their design, choice, and usage of models and particular data types to draw inferences about individuals. However, there are an alarming number of provisions in the GDPR and other (proposed) regulations that could seriously hinder the protection afforded to data subjects against inferences. In short, the GDPR, new and old IP laws, and the new European directive on trade secrets do much to facilitate Big Data analytics and the construction of machine learning models. This Part considers models to be the outputs of data processing involving inferential analytics that uses an individual’s personal data. In other words, personal data is used to draw inferences which lead to a model, which can then be applied to other people, cases, or data to make decisions. 412 See supra Part V. 592 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 Under the GDPR and the new Copyright Directive,413 data subjects’ rights are restricted for the purpose of constructing models. For construction that does not meet the requirements of the statistical purpose exemptions, data subjects would retain these rights. However, once an output (the model) has been produced, new regulations dealing with copyright and trade secrets would give the individual little say in how the model is used, and little to no share in the benefits it produces. A. Algorithmic Models and Statistical Purposes in the GDPR The GDPR may facilitate inferential analytics by granting a number of privileges to processing for statistical purposes.414 After data is collected based on one of the legal bases in Article 6, the strict “purpose limitation” in Article 5 no longer applies.415 Article 5(1)(b) states that “further processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes shall, in accordance with Article 89(1), not be considered to be incompatible with the initial purposes[.]”416 The same privilege applies to the strict principle of storage limitation in Article 5(1)(e)417, and thus the data does not need to be deleted after it is no longer necessary for the original processing purpose. This means as long as data is collected in a lawful manner following Article 6, and in accordance with 413 Proposal for a Directive of the European Parliament and of the Council on Copyright in the Digital Single Market, COM(2016) 593 final (Sept. 14, 2016). 414 Mayer-Schönberger & Padova, supra note 50, at 326–27. But see Bertram Raum, Verarbeitung zu Archivzwecken, Forschungszwecken, in DATENSCHUTZ-GRUNDVERORDNUNG 31–32 (Eugen Ehmann & Martin Selmayr eds., 2017) (expressing uncertainty over whether the exemptions apply to Big Data). 415 Regulation (EU) 2016/679 of the European Parliament and of the Council on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC, 2016 O.J. (L 119) art. 5(1)(b). 416 Id. 417 Id. at art. 5(1)(e). No. 2:494] A RIGHT TO REASONABLE INFERENCES 593 “appropriate safeguards” pursuant to Article 83 (e.g., pseudonymization) are in place, the subsequent use for statistical purposes is lawful and does not require any additional legal basis for processing (e.g., consent) to be established. Mayer-Schönberger and Padova believe that Big Data analytics can be considered “processing for statistical purposes,” as they are strongly based on statistical methods.418 Relatedly, Zarsky argues that Big Data would face significant difficulty to fall within this exemption. 419 If the exemption is applied, Member State law can grant controllers numerous privileges and exemptions from other rights and duties in the GDPR, as described in Article 89(2). These include exemptions from Articles 14(5)(b), 15, 16, 17(3)(d), 18 and 21, as well as the strict limitations on the use of sensitive data in Article 9(2)(j) and Recital 52. 420 These exemptions have two implications for the diffusion of inferential analytics. First, they encourage the creation of new statistical models and profiles by lowering data protection requirements for such processing. Second, following from this relaxation of the law, when personal data is used for statistical purposes data subjects are unable to exercise the majority of their rights, and thus cannot prevent statistical uses. Similarly, data subjects lack any claim or rights over the resulting models or profiles (i.e., “statistical results” in Article 89(1)), despite having been built with their personal data. It is important to note a further restriction on the Article 89 privileges. Recital 162 clarifies that statistical results generated under the statistical purposes exemption (which are aggregate data, not personal data), as well as the input 418 Mayer-Schönberger & Padova, Regime Change, supra note 50, at 330. See Zarsky, supra note 317, at 1007–08. Regulation (EU) 2016/679 of the European Parliament and of the Council on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC, 2016 O.J. (L 119) 1, 10; see also id. at arts. 9(2)(j), 14(5)(b), 17(3)(d), 89(2). 419 420 594 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 personal data, cannot be used “in support of measures or decisions regarding any particular natural person.”421 It is difficult to imagine how compliance and enforcement of this restriction will be handled (i.e., how to ensure that the model is not applied or intended to be applied to a natural person), or how to manage the sale of models generated under Article 89 exemptions to third parties. Presumably, if the results (which must not be personal data, per Recital 162) are then used to make decisions about individuals, the privileges granted by the statistical purposes exemption are no longer applicable, meaning normal data processing rules, such as Articles 6 and 22, will apply. 422 An important point of contention regarding these exemptions is whether they apply to commercial data controllers, or only to public and research entities, such as government bodies and universities. Mayer-Schönberger and Padova argue that these privileges apply to “private companies for commercial gain” as well.423 A similar view 421 Regulation (EU) 2016/679 of the European Parliament and of the Council on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC, 2016 O.J. (L 119) 30; see also Zarsky, supra note 317, at 1008; Schreurs et al., supra 359, at 248. Both Zarsky and Schruers et al. are silent on the view of applying profiles after creation but hint that the law might prohibit this, albeit without any clear supporting evidence. This view translates to the GDPR because the DPD had a similar provision in Recital 29. For a view that the later application should be covered by Article 6, see Richter, supra note 328, at 585 who also warns that this can never be sufficiently regulated as there is no way of assessing how the models are subsequently used for other processing or by other data controllers. 422 Article 29 Data Prot. Working Party, supra note 17, at 7 (“For instance, a business may wish to classify its customers according to their age or gender for statistical purposes and to acquire an aggregated overview of its clients without making any predictions or drawing any conclusion about an individual. In this case, the purpose is not assessing individual characteristics and is therefore not profiling.”); Raum, supra note 414, at 41 (explaining how further usage of statistical results is no longer covered by the privileges, but can be used if the normal requirements for data processing in the GDPR are met). For example, to assess individuals with a model built under the statistical purposes exemption, a further legitimate basis for processing would need to be established, such as consent. 423 Mayer-Schönberger & Padova, supra note 50, at 326. No. 2:494] A RIGHT TO REASONABLE INFERENCES 595 comes from Richter, who argues that the statistical purposes exemption can be used to pursue commercial interests as long as the results are not applied to individuals. 424 In contrast, Raum suggests that the exemptions cannot be used for commercial interests, and that any subsequent usage of statistical results generated under these exemptions for commercial interests would require justification according to the GDPR’s standard data processing requirements. 425 This suggestion is, however, not supported with any further legal argumentation. Once the model is applied to a person, regardless of whether it was built under the statistical purposes exemption, the outcome of this application (i.e., an inference or decision) becomes the personal data of the person being assessed and the restrictions detailed in Part V apply. Members of the training set also retain rights over any of their personally identifiable data contained within the model, unless statistical purposes exemptions apply. However, while the model is admittedly applied to a data subject for the purpose of assessment, this does not mean the model will be considered the personal data of the person being assessed or the data subjects represented in the training data. Further, neither party will have rights over the model. To understand why this is the case, it is necessary to return to the judgements discussed in Part IV. In Nowak, the ECJ made clear that the exam questions are not the candidate’s personal data, 426 even if used to assess him. The exam questions are comparable with the model that is used to assess an individual. The same holds in the case of YS and M and S, where immigration law is comparable to a statistical model. The fact that immigration law was applied 424 See Richter, supra note 328, at 585 (arguing later application should not be lawful even if it fulfills Article 6 requirements due to the possible risks). Richter does not, however, offer a legal argument to justify this claim. He further warns that the GDPR legalizes many applications that would have been illegal in Germany (e.g., private sector uses). See id. 425 See Raum, supra note 414, at 41–42. 426 Case C-434/16, Peter Nowak v. Data Prot. Comm’r, 2017 E.C.R. I994, ¶ 58. 596 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 to the applicant to make a decision on residency does not mean the law itself became the applicant’s personal data. 427 The data subject thus cannot rectify or delete the law. Therefore, neither the exam questions nor the applicable law are subject to the rights granted in the GDPR. The request to access the “legal analysis” in YS and M and S further clarifies the distinction between a model and application of the model. As already discussed, immigration law provides the background framework, or model, in which residency applications are assessed. The application of this law to the particulars of an applicant’s case, or the “legal analysis,” can be considered equivalent to the application of a statistical model (i.e., the analysis or reasoning) to a data subject to make a decision. This relationship between a model and analysis can be equally applied to algorithmic decisionmaking models. For example, a decision tree used to make a decision on the basis of personal data can be considered a model. The analysis in this context would constitute the specific path, or branch, followed in the decision tree to reach an output or decision. So, in other words, a specific path in the decision tree relevant to deciding a specific case constitutes “analysis,” whereas the entire tree constitutes a “model.” Even if models (e.g., immigration law or exam questions) were treated as personal data, the rights in the GDPR must be interpreted teleologically to avoid nonsensical results. 428 In Nowak, this was clearly seen in the determination that allowing the candidate to rectify answers on an exam would be nonsensical as it would undermine the original processing purpose of evaluating the candidate’s performance, despite being the candidate’s personal data. The same applies to rectification of the exam questions, which are not considered personal data. In the case of statistical or algorithmic decision-making models, rectification of the model itself would 427 Not even the legal analysis (as an abstract application of the law) is personal data, but rather only the personal data undergoing processing is. Joined Cases C-141/12 & 372/12, YS, M and S v. Minister voor Immigratie, Integratie en Asiel, 2014 E.C.R. I-2081, ¶¶ 48–49, 59. Therefore, the law will also not be seen as personal data. 428 See supra Part IV. No. 2:494] A RIGHT TO REASONABLE INFERENCES 597 often be equally nonsensical, or at least not constitute a fair balance of subject and controller interests, due to its potential impact on application of the model to other cases, or research and business interests more broadly. Finally, the remit of data protection law does not include assessment of the accuracy or justifiability of decisions (and underlying opinions or evaluations), 429 and does not allow individuals to decide which models (e.g., exam questions, laws) are used to assess them.430 Rather, these choices fall within the data controller’s private decision-making autonomy. An example may help to illustrate why models cannot be considered personal data. If a doctor asks about a patients height, and she replies 166 centimeters, such an utterance is her personal data. This data falls under the GDPR and can be rectified, deleted, etc. However, the fact that her height is expressed in centimeters does not mean that the metric system (i.e., the model used to assess her height) becomes her personal data, meaning that she would have rights over it. By having her height measured, she will not gain the right to rectify or delete the metric system. Similarly, she would not have a right to require that a different measuring system or 429 Joined Cases C-141/12 & 372/12, YS, M and S v. Minister voor Immigratie, Integratie en Asiel, 2014 E.C.R. I-2081, ¶¶ 32, 46–48; Case C– 434/16, Peter Nowak v. Data Prot. Comm’r, 2017 E.C.R. I-994, ¶¶ 52–54. With regards to the ECJ’s judgment in Nowak, the examples of cases in which exam answers or the examiner’s comments could be considered “inaccurate” deal with cases in which the input data for a decision is somehow incomplete or corrupted (e.g., pages of answers were missing from the script assessed by the examiner). A clear distinction is drawn in paragraph 54 between the examiner’s comments, and the examiner’s evaluation of the candidate’s performance, with the former being treated as “recording” the examiner’s evaluation. See id. ¶ 54. The ECJ is thus indicating that the examiner’s comments, which can themselves be considered inferences or subjective statements of opinion, can be rectified if they have been recorded on the basis of incomplete or corrupted input data. The candidate is not granted the right to rectify the opinion, analysis, or evaluation criteria of the examiner. 430 See supra Part IV. 598 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 model be used, for example the imperial system, because she prefers the imperial system or finds it more accurate.431 One could argue that this example is not equivalent to trained algorithmic models, as personal data was not used to construct the metric system. So, while the model would not constitute personal data of the individual being assessed, it may still conceivably be the personal data of the individuals whose data was used to construct it. To address this alternative, consider instead a marking rubric as was presumably used in Nowak. In addition to the exam questions, such a rubric would constitute a model used to make a decision about the performance of the candidate. The rubric is arguably constructed from personal data, insofar as it is derived from the past experiences and opinions of the assessor or course leader with other exams, and perhaps specific answers provided by candidates in prior years. The rubric could even go so far as to include personal data, if for example a prior candidate’s answer was copied into the rubric as an example response to a question. In this case, it would be equally nonsensical to assume that the prior candidate whose personal data is contained in the rubric would have data protection rights over the rubric as a whole. Rather, in line with the ECJ’s stance on the right to erasure in relation to exam answers, the prior candidate would retain rights over the extract of her responses contained in the rubric (assuming she was still identifiable, for example if the author of the rubric recalled who provided the example in question). In line with her data protection rights over personally identifiable data, the candidate could justifiably request access or deletion of the extracted 431 For a view that trained models might be personal data, meaning the data subject would have rights over the model in its entirety, see Michael Veale, Reuben Binns & Lilian Edwards, Algorithms That Remember: Model Inversion Attacks and Data Protection Law, 376 PHIL. TRANSACTIONS ROYAL SOC’Y A 1, 1 (2018). This view, however, misinterprets the standing jurisprudence of the ECJ addressed here and does not take the remit of data protection law and the need to balance individual rights with trade secrets and IP law into account. No. 2:494] A RIGHT TO REASONABLE INFERENCES 599 response.432 In the context of a trained algorithmic model, the right to erasure could be interpreted as requiring the data to be removed from the training set, thus requiring the model to be re-trained.433 Regardless of whether the prior candidate’s requests would be successful in the real world, they demonstrate why personal data being contained in a model should not be thought to automatically grant individual rights over the model itself. Rather, the data subject’s rights apply only to the specific personally identifiable data contained within the model. This approach aligns with the teleological interpretation of individual rights described by the ECJ and AG in Nowak.434 The purpose of a model is to assess individuals; it would be nonsensical to assume that individuals whose data was used to train the model would be able to modify or delete the model entirely, and thus have an unjustifiably significant impact on the individuals being assessed by it. The scope of data protection rights must be appropriately applied and constrained to reflect the relationship between the data subject and the model, and the relevant processing purposes. In other words, the mere presence of personal data in a model in no way equates to the full, unbounded exercise of rights over it. Finally, law and policy on IP, copyright, and trade secrets also apply to the model which may prevent the exercise of individual data protection rights. In particular, these are likely to prevent requests to “delete” personal data from a model by re-training it from being successful, if doing so requires significant effort or is disruptive to business practice. The impact of these conflicts between frameworks are explored in the next three Sections. 432 See supra Section IV.B.1 (discussing erasure of examination answers), Part V (outlining necessary conditions). 433 On the challenges of implementing the right to be forgotten for AI systems, see Villaronga, Kieseberg & Li, supra note 260. 434 See supra Section IV.B.2. 600 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 B. Algorithmic Models and the EU’s Copyright Directive The previous Section shows that the GDPR facilitates the creation of profiles and models, either built from inferences (among other data) or is capable of producing them when applied to individuals. When the statistical purpose exemption applies, the individual cannot object to its construction and has no rights over it, even if the model is built using personal data. Further, even if the model is applied to a natural person (meaning the statistical purposes exemptions no longer apply), no control or rights over the model are likely to be granted if the jurisprudence of the ECJ is maintained. Similarly, members of the training data set will retain data protection rights over any personal data contained in the model and may be able to exercise rights in relation to it (unless statistical purposes exemptions apply), but this will not equate to any control or rights over the model as a whole. The facilitation of model constructions and lack of individual rights seen in the GDPR can also be seen in IP and copyright law. Current discussion of machine learning and inferential analytics in the context of IP law focuses broadly on two issues: (1) whether the training data used to construct a model (e.g., content uploaded or created by their users) is protected by IP laws; and (2) whether the outcome of the algorithmic process can be protected under IP law.435 A new EU Copyright Directive436 is currently under debate, which will complement the existing legal framework 435 See generally Daniel Schönberger, Deep Copyright: Up- and Downstream Questions Related to Artificial Intelligence (AI) and Machine Learning (ML), 10 INTELL. PROP. J. 35 (2018); Annemarie Bridy, The Evolution of Authorship: Work Made by Code, 39 COLUM. J.L. & ARTS 395 (2016). 436 Proposal for a Directive of the European Parliament and of the Council on Copyright in the Digital Single Market, COM (2016) 593 final (Sept. 14, 2016). No. 2:494] A RIGHT TO REASONABLE INFERENCES 601 on copyright437 and will, among other things, govern the legal status of data mining.438 The Directive is among other things concerned with research organizations such as universities and research institutes (including public-private partnerships439) that use new technologies that “enable the automated computational analysis of information in digital form, such as text, sounds, images or data, generally known as text and data mining. Those technologies allow researchers to process large amounts of information to gain new knowledge and discover 437 In order of enactment, see generally Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the Legal Protection of Databases, 1996 O.J. (L 77) 20–28; Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the Harmonisation of Certain Aspects of Copyright and Related Rights in the Information Society, 2001 O.J. (L 167) 10–19) (implementing the “WIPO Copyright Treaty”); Directive 2006/115/EC of the European Parliament and of the Council of 12 December 2006 on Rental Right and Lending Right and on Certain Rights to Copyright in the Field of Intellectual Property , 2006 O.J. (L 376) 28–35; Directive 2009/24/EC of the European Parliament and of the Council of 23 April 2009 on the Legal Protection of Computer Games, 2009 O.J. (L 111) 16–22; Directive 2012/28/EU of the European Parliament and of the Council of 25 October 2012 on Certain Permitted Uses of Orphan Works, 2012 O.J. (L 299) 5–12; and Council Directive 2014/26/EU of the European Parliament and of the Council of 26 February 2014 on Collective Management of Copyright and Related Rights and Multi-Territorial Licensing of Rights in Musical Works for Online Use in the Internal Market, 2014 O.J. (L 84) 72–98. Other frameworks are relevant but go beyond the scope of this paper. See, e.g., Marrakesh Agreement Establishing the World Trade Organization, Apr. 15, 1994, 1869 U.N.T.S. 299, Annex 1C, Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPS). For more, see Patents for Software?: European Law and Practice, EUR. PATENT OFF., https://www.epo.org/news-issues/issues/software.html [https://web.archive.org/web/20180613235106/http://www.epo.org/newsissues/issues/software.html]. 438 See Amendments by the European Parliament to the Commission Proposal Directive (EU) 2019/…of the European Parliament and of the Council of on Copyright and Related Rights in the Digital Single Market and Amending Directives 96/9/EC and 2001/29/EC, A800245/271, art. 4, http://www.europarl.europa.eu/doceo/document/A-8-2018-0245-AM-271271_EN.pdf?redirect [https://perma.cc/KC48-HY8M]. 439 Id. at 10–11. 602 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 new trends.”440 For text and data mining activities in such research environments, the Directive pushes for exceptions to the copyright regime (e.g., foregoing a need for license agreements441 or remuneration442), as well as for exemptions from the Database Directive443 to uses of data to monitor trends.444 These exemptions are concerning when considered alongside the GDPR’s exemptions in Articles 85 445 and 89, which already grant exemptions from most of the rights granted in the GDPR (e.g., Articles 14, 15, 16, 18, 17(3)(d) and 21) for data controllers “processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes.” 446 Recital 159 of the GDPR explains that “scientific research purposes” should be interpreted broadly to include “privately funded research.”447 Universities 440 Proposal for a Directive of the European Parliament and of the Council on Copyright in the Digital Single Market, COM (2016) 593 final (Sept. 14, 2016) at 14. 441 See id. at 14, art. 3. For arguments in favor of license fees and access to data for AI training, see generally Schönberger, supra note 435. 442 Proposal for a Directive of the European Parliament and of the Council on Copyright in the Digital Single Market, at art. 15, COM (2016) 593 final (Sept. 14, 2016). 443 Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the Legal Protection of Databases, 1996 O.J. (L 77) 2028. 444 Proposal for a Directive of the European Parliament and of the Council on Copyright in the Digital Single Market, COM (2016) 593 final (Sept. 14, 2016) at art. 3(1) (providing exemptions for text and data mining). Article 2(2) of the Proposal defines “text and data mining” as “any automated analytical technique aiming to analyse text and data in digital form in order to generate information such as patterns, trends and correlations.” Id. at art. 2(2). 445 Article 85 addresses the inclusion of journalistic purposes in these exemptions. 446 Regulation (EU) 2016/679 of the European Parliament and of the Council on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC, 2016 O.J. (L 119) art. 89. 447 Id. at 30. For a discussion on the legal problems associated with forprofit research, see Tal Z. Zarsky, Desperately Seeking Solutions: Using Implementation-Based Solutions for the Troubles of Information Privacy in No. 2:494] A RIGHT TO REASONABLE INFERENCES 603 and research institutes covered by the new Copyright Directive will therefore receive substantial exemptions to data protection and IP requirements when constructing algorithmic models. In fact, the draft of the Copyright Directive recently passed by the EP goes even further and states in Recital 11 that: Union research policy, which encourages universities and research institutes to collaborate with the private sector, research organisations should also benefit from such an exception when their research activities are carried out in the framework of public-private partnerships. While research organisations and cultural heritage institutions should continue to be the beneficiaries of that exception, they should also be able to rely on their private partners for carrying out text and data mining, including by using their technological tools.448 These privileges cover “access to content that is freely available online”449 and there is no longer any clear storage limitation.450 At the same time the private partner must not have “decisive influence” and research carried-out on a notfor-profit basis or without public-interest mission does not enjoy these privileges.451 The likely impact of the draft Copyright Directive appears to be to exempt research institutions, including public-private partnerships from the copyright regime for data mining. the Age of Data Mining and the Internet Society, 56 ME. L, REV. 14 (2004). For a focus on the GDPR, see Gabe Maldoff, How GDPR Changes the Rules for Research, INT’L ASS’N OF PRIVACY PROFS. (Apr. 19, 2016), https://iapp.org/news/a/how-gdpr-changes-the-rules-for-research/ [https://perma.cc/PV6J-VGBS]. 448 Amendments by the European Parliament to the Commission Proposal Directive (EU) 2019/…of the European Parliament and of the Council of on Copyright and Related Rights in the Digital Single Market and Amending Directives 96/9/EC and 2001/29/EC, A800245/271, at 11, http://www.europarl.europa.eu/doceo/document/A-8-2018-0245-AM-271271_EN.pdf?redirect [https://perma.cc/KC48-HY8M]. 449 See id. at 14. 450 See id. at arts. 3(3), 4(3). 451 See id. at 12. 604 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 Users will thus have no control over how their data is used to build models under the GDPR’s statistical exemptions, and under the Copyright Directive’s research exemptions. It must be noted that at the moment the actual impact of the Directive on inferential analytics and algorithmic models remains unclear and the framework is still in Trilogue negotiations.452 C. Algorithmic Models and Outcomes and Intellectual Property Law Thus far, this Article has determined that data subjects are unlikely to have data protection rights over statistical models (e.g. those produced by machine learning) applied to them or built from their personal data under the GDPR. With regard to the EU Copyright Directive, if an algorithm is trained in a research environment via data mining, consent, license agreements, and remuneration are not required to use data as inputs to train the model. Therefore, these regulations could also form a new barrier to control over inferences. In addition to the legal status of training data addressed thus far, there is growing debate on whether the data generated or creative “work” performed by algorithms should fall under intellectual property law. If IP law is applicable, business interests will be pitted against data subjects’ rights.453 This means that the new EU Copyright Directive or For all draft reports of the European Parliament, see Draft Reports, EUR. PARLIAMENT COMMITTEES, http://www.europarl.europa.eu/committees/ en/juri/draft-reports.html?ufolderComCode=JURI&ufolderId=07947& urefProcCode=&linkedDocument=true&ufolderLegId=8&urefProcYear=& urefProcNum [https://perma.cc/T7EH-Z5U7]. For further legal and ethical discussion, see Bart W. Schermer, The Limits of Privacy in Automated Profiling and Data Mining, 27 COMPUTER L. & SECURITY REV. 45 (2011); Zarsky, supra 65. 453 Madeleine de Cock Buning, Is the EU Exposed on the Copyright of Robot Creations?, 1 ROBOTICS L.J. 8, 8 (2015) (“It can either be the creator of the software who is deemed the owner of the rights; or it could be the owner of the software; or it could be both. It can also be the entity or person who invested financially in the software.”); see also CHRISTOPHE LEROUX ET AL., SUGGESTION FOR A GREEN PAPER ON LEGAL ISSUES IN ROBOTICS: CONTRIBUTION TO DELIVERABLE D3.2.1 ON ELS ISSUES IN ROBOTICS (2012), 452 No. 2:494] A RIGHT TO REASONABLE INFERENCES 605 the InfoSoc Directive 2001/29/EC454 could apply to work generated by algorithms, in addition to training data.455 In any case, Directive 2009/24/EC on the protection of computer programs applies to software. Here, software is interpreted broadly, as Art 1(2) states that the “Directive shall apply to the expression in any form of a computer program.”456 In the ECJ’s judgment in SAS Institute Inc. v. World Programming Ltd, this has been interpreted as applying to at least preparatory design material, machine code, source code, and object code, but not the functionality of the computer program or the format of the data files.457 Following this judgment, while it remains unclear whether the output of software (here, a model or an inference) is protected under Directive 2009/24/EC, information about how the output was produced will be protected. IP law can thus form an additional https://www.researchgate.net/publication/310167745_A_green_paper_on_l egal_issues_in_robotics [https://perma.cc/8752-NYDD]; Report with Recommendations to the Commission on Civil Law Rules on Robotics (2015/2013 (INL)), EUR. PARL. DOC. A8-0005/2017 (Jan. 27, 2017), http://www.europarl.europa.eu/sides/getDoc.do?pubRef=//EP//NONSGML+REPORT+A8-2017-0005+0+DOC+PDF+V0//EN [https://perma.cc/P5BB-TFTC]; Malgieri, supra note 302; Autonomous Creation – Creation by Robots: Who Owns the IP Rights?, IPKM BLOG (Mar. 5, 2015), https://law.maastrichtuniversity.nl/ipkm/autonomous-creationcreation-by-robots-who-owns-the-ip-rights/ [https://perma.cc/85MC-3L2C]. 454 Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the Harmonisation of Certain Aspects of Copyright and Related Rights in the Information Society, 2001 O.J. (L 167) 10. 455 Note there is also a discussion on whether algorithms should be equipped with personhood to be able to hold copyright, or alternatively whether copyrights should be transferred to the user or coder of the system. For discussion, see Bridy, supra note 435; James Grimmelmann, There’s No Such Thing as a Computer-Authored Work - And It’s a Good Thing, Too, 39 COLUM. J.L. & ARTS 403 (2016); Schönberger, supra note 435 (exploring the idea that the AI creation and the copyright should be in the hands of the public domain). 456 Directive 2009/24/EC of the European Parliament and of the Council of 23 April 2009 on the Legal Protection of Computer Programs (Codified Version) (Text with EEA Relevance), 2009 O.J. (L 111) 16. 457 See Case C-406/10, SAS Inst. Inc. v. World Programming Ltd., 2012 E.C.R. I-259. 606 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 barrier to accessing the reasoning or analysis that has led to a model or inference. D. Algorithmic Models and Outcomes and Trade Secrets The final framework to discuss as a potential barrier to the right to reasonable inferences is a “catch all” framework that may pose a substantial barrier to learning the justification behind inferences. Even if the aforementioned frameworks were not to apply to inferential analytics, the new EU Trade Secrets Directive458 is likely to substantially limit controllers’ transparency obligations.459 The framework, which came into effect on June 9, 2018, may result in the creation of new data being classified as a trade secret. Article 2 of the Directive defines a trade secret as any information that is not “generally known,” has commercial value due to this secrecy, and has been subject to reasonable steps to ensure it remains a 458 Directive (EU) 2016/943 of the European Parliament and of the Council of 8 June 2016 on the Protection of Undisclosed Know-How and Business Information (Trade Secrets) Against Their Unlawful Acquisition, Use and Disclosure, 2016 O.J. (L 157) 1. 459 See Rembert Niebel, Lorenzo de Martinis & Birgit Clark, The EU Trade Secrets Directive: All Change for Trade Secret Protection in Europe?, 13 J. INTELL. PROP. L. & PRAC. 445, 448–49 (2018). No. 2:494] A RIGHT TO REASONABLE INFERENCES 607 secret.460 Recital 1 further adds “valuable know-how and business information” to the definition.461 The definition of a trade secret is so broad as to include nearly any data handled by a commercial entity. For example, trade secrets could include “shopping habits and history of customers,”462 “customer lists and profiles,”463 464 “algorithms,” and “[information about a] customer’s behavior (creditworthiness, lifestyle, reliability, etc.), personalized marketing plans (e.g. pricing), or forecasts about [a] customer’s future life based on probabilistic studies (life expectancy, estimated advancements in career, etc.).”465 460 Directive (EU) 2016/943 of the European Parliament and of the Council of 8 June 2016 on the Protection of Undisclosed Know-How and Business Information (Trade Secrets) Against Their Unlawful Acquisition, Use and Disclosure, 2016 O.J. (L 157) 1, art. 2. Article 39(2) of the TRIPS agreement has a similar definition of a trade secret. Marrakesh Agreement Establishing the World Trade Organization, Apr. 15, 1994, 1869 U.N.T.S. 299, Annex 1C, Agreement on Trade-Related Aspects of Intellectual Property Rights (TRIPS) art. 39(2). For discussion of trade secrets as a hindrance to due process and algorithmic accountability, see generally PASQUALE, supra note 11; Rebecca Wexler, Life, Liberty, and Trade Secrets: Intellectual Property in the Criminal Justice System, 70 STAN. L. REV. 1343 (2018); Amy J. Schmitz, Secret Consumer Scores and Segmentations: Separating “Haves” from “Have-Nots”, 2014 MICH. ST. L. REV. 1411 (2014); Brenda Reddix-Smalls, Credit Scoring and Trade Secrecy: An Algorithmic Quagmire or How the Lack of Transparency in Complex Financial Models Scuttled the Finance Market, 12 U.C. DAVIS BUS. L.J. 87 (2011). 461 Directive (EU) 2016/943 of the European Parliament and of the Council of 8 June 2016 on the Protection of Undisclosed Know-How and Business Information (Trade Secrets) Against Their Unlawful Acquisition, Use and Disclosure, 2016 O.J. (L 157) 1, 1. 462 Graef, Husovec & Purtova, supra note 302, at 1381. 463 Purtova, supra note 369, at 71. 464 Guido Noto La Diega, Against the Dehumanisation of DecisionMaking: Algorithmic Decisions at the Crossroads of Intellectual Property, Data Protection, and Freedom of Information, 9 J. INTELL. PROP. INFO. TECH. & ELECTRONIC COM. L. 3, 12 (2018). 465 Malgieri, supra note 273, at 113–14 (internal citations omitted). According to Malgieri, disclosing, rectifying, or erasing any of these data “can probably adversely affect the ‘dynamic’ trade secret interest of business people and of employees.” Id. at 114. 608 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 An EDPS document commenting on an early draft of the Directive466 and a European Commission impact assessment accompanying the proposal for the Directive467 further clarify the scope of trade secrets. According to these sources, trade secrets can consist of “data such as information on customers and suppliers, business plans or market research and strategies,”468 “list[s] of clients/ customers; internal datasets containing research data,” 469 “private collations of individual items of publicly available information,”470 as well as “data on customers and their behaviour and on the ability to collect and to monetise those data.” The inclusion of customer data shows that personal data, subject to data protection law, can nonetheless constitute trade secrets. 471 Tension between individual privacy interests and business interests, or data protection and trade secrets laws, is thus inevitable. The EDPS foresaw these possible tensions, urging “greater precision on the concept of trade secrets and clearer safeguards... to address adequately the potential effects of the proposal on the rights to privacy and to the protection of personal data.”472 The EDPS also recommended amending Article 4 of the Trade Secrets Directive to ensure that the data subject’s “right to access the data being processed and to 466 European Data Prot. Supervisor, Opinion of the European Data Protection Supervisor on the Proposal for a Directive of the European Parliament and of the Council on the Protection of Undisclosed Know-How and Business Information (Trade Secrets) Against Their Unlawful Acquisition, Use and Disclosure (Mar. 12, 2014), https://edps.europa.eu/sites/edp/files/publication/14-03-12_trade_secrets_ en.pdf [https://perma.cc/7UE9-8WB6]. 467 Commission Staff Working Document: Impact Assessment: Accompanying the Document Proposal for a Directive of the European Parliament and of the Council on the Protection of Undisclosed Know-How and Business Information (Trade Secrets) Against Their Unlawful Acquisition, Use and Disclosure, at 107–18, 248–62, COM (2013) 813 final (Nov. 28, 2013). 468 European Data Prot. Supervisor, supra note 466, at 3. 469 Id. 470 Id. 471 Id. 472 Id. at 2. No. 2:494] A RIGHT TO REASONABLE INFERENCES 609 obtain rectification, erasure or blocking of the data where it is incomplete or inaccurate”473 is guaranteed, referring to a case involving Facebook 474 where requests were denied. This suggestion was not adopted but rather moved to Recital 35. The final Directive in Article 9(4) only requires that “any processing of personal data pursuant to paragraphs 1, 2 or 3 shall be carried out in accordance with Directive 95/46/EC,” 475 without any clarification as to resolving the tension between trade secrets and data protection law. It is thus unclear how these clashes will play out, although Member States may implement new rules. In any case, given the broad definition of trade secrets and the clear inclusion of personal data in its scope, it is safe to assume that derived and inferred data will be covered by the Trade Secrets Directive.476 Even with this outlook, a fair balance between the right of privacy, IP laws, and the rights to conduct a business and freedom of expression will be necessary; the ECJ’s jurisprudence has long reflected this position.477 Id. at 5. Letter from Facebook User Operations—Data Access Request Team, to Max Schrems, (Sept. 28, 2011), http://www.europe-vfacebook.org/FB_E-Mails_28_9_11.pdf [https://perma.cc/B3TZ-UK4R]. 475 Directive (EU) 2016/943 of the European Parliament and of the Council of 8 June 2016 on the Protection of Undisclosed Know-How and Business Information (Trade Secrets) Against Their Unlawful Acquisition, Use and Disclosure, art. 9(4), 2016 O.J. (L 157). 476 For an overview of the definition of trade secrets according to the ECJ and its constituent courts, see Case T-353/94, Postbank NV v. Comm’n of the European Cmtys., 1996 E.C.R. II-921, and Case T-198/03, Bank Austria Creditanstalt AG v. Comm’n of the European Cmtys., 2006 E.C.R. II-1429. 477 See Case C-275/06, Productores de Música de España (Promusicae) v. Telefónica de España SAU, 2008 E.C.R. I-271; Case C-70/10, Scarlet Extended SA v. Société Belge des Auteurs, Compositeurs et Éditeurs SCRL (SABAM), 2011 E.C.R. I-00000; Case C-557/07, LSG-Gesellschaft zur Wahrnehmung von Leistungsschutzrechten GmbH v. Tele2 Telecommunication GmbH, 2009 E.C.R. I-1227; Case C-461/10, Bonnier Audio AB, Earbrooks AB, Norstedts Förlagsgrupp AB, Piratförlaget AB & Storyside AB v. Perfect Commc’n Sweden AB, ECLI:EU:C:2012:219. 473 474 610 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 Taking into account the novel risks of inferential analytics and trends in the European legal landscape that appear to place greater emphasis on commercial and research interests, implementation of a right to reasonable inferences takes on renewed importance to ensure that the level of protection against inferences increases to reasonable standards. Data subjects require a new right addressing the riskiest type of personal data that, ironically, currently receives the least protection. VIII.CONCLUSION AND RECOMMENDATIONS Calls for accountability in Big Data analytics and algorithmic decision-making systems are motivated by a common concern: Assessments and inferences drawn from disparate, often non-intuitive features and data sources increasingly drive decision-making about people. These inferences are based not only on data individuals have provided or has been observed, but also on information derived or inferred from it, as well as from anonymous or third-party data. Similarly, inferential analytics can be used to infer our preferences, weaknesses, sensitive attributes (e.g. race or sexual orientation), and opinions (e.g. political stances). These can form the basis for micro-targeting, nudging, and manipulation, as seen in online advertisement478 or the recent Cambridge Analytica scandal. Too much emphasis is placed on governing the collection of these types of data, while too little is paid to how it is evaluated.479 To illustrate, even if a bank can explain which data and variables have been used to make a decision (e.g. banking records, income, post code), the decision turns on inferences drawn from these sources; for example, that the applicant is not a reliable borrower. This is an assumption or prediction about future behavior that cannot be verified or refuted at the time of decision-making. Thus, the actual risks posed by Big 478 Ryan Calo, Digital Market Manipulation, 82 GEO. WASH. L. REV. 995 (2014). 479 See Wachter, supra note 316. No. 2:494] A RIGHT TO REASONABLE INFERENCES 611 Data analytics and AI are the underpinning inferences that determine how we, as data subjects, are being viewed and evaluated by third parties. This Article has considered whether inferences or derived data constitute personal data according to the Article 29 Working Party’s three-step model and jurisprudence of the European Court of Justice. If inferences are seen as personal data, the rights in the GDPR could apply and allow data subjects to know about (Articles 13–14), access (Article 15), rectify (Article 16), delete (Article 17), and object to them (Article 21). Further, profiling and automated decisionmaking, which may include inferences, can already be contested (Article 22). The Article 29 Working Party sees verifiable and unverifiable inferences as personal data (e.g. results of a medical analysis), but leaves open whether the reasoning and process behind that inference is seen as personal data. The ECJ is still finding its voice on this topic, as its current jurisprudence is inconsistent. Future jurisprudence will continue to define the scope of personal data and the protection afforded to it. It is crucial to note that the question of whether inferences are personal data, is not the most important one. The underlying problem goes much deeper and relates to the tension of whether individuals have rights, control, or recourse over how they are seen by others. Some scholars are worried that broad interpretation of personal data turns data protection law into the “law of everything.”480 However, as shown in Section V, inferences are treated as “economy class” personal data that are afforded little meaningful protection, and certainly less than personal data provided by the data subject or sensitive personal data. In part, third parties may have an interest in inferences and derived data and the techniques used to create it (e.g. trade secrets) due to their value or the costs involved. The GDPR, the draft e-Privacy regulation, the Digital Content Directive, and legal scholars attribute only limited rights over inferences to data subjects. At the same time, new frameworks such as the EU Copyright Directive and 480 Purtova, supra note 86. 612 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 provisions in the GDPR push to facilitate data mining, knowledge discovery, and big data analytics by limiting data subjects’ rights over their data. The new Trade Secrets Directive also poses a barrier to accountability, as models, algorithms, and inferences may very well fall under this framework. Even if the ECJ decides to consistently classify inferences as personal data, current jurisprudence is a strong indicator that the court will offer insufficient protection against unreasonable inferences under data protection law. The core problem stems from how the ECJ interprets the remit of data protection law. In standing jurisprudence, the ECJ (in Bavarian Lager,481 YS and M and S,482 and Nowak483) and Advocate General (in YS and M and S484 and Nowak485) have consistently explained that the remit of data protection law is not to assess whether inferences and decisions based upon them are accurate or justified. Rather, individuals need to consult sectoral laws and governing bodies applicable to their specific case to seek possible recourse. More generally, the ECJ views data protection law as a tool for data subjects to assess whether the (input) data undergoing processing was legally obtained, and whether the purpose for processing is lawful. To ensure this, data protection law grants various rights to individuals, for example the rights of access, rectification, and deletion.486 Of course this can change in the future, as the definition of personal data and the associated rights depend on the purpose for which it was collected. As the rights in the GDPR must be interpreted teleologically, it is not 481 Case C-28/08 P, European Comm’n v. Bavarian Lager, 2010 E.C.R. I-6055. 482 Joined Cases C-141/12 & C-372/12, YS, M and S v. Minister voor Immigratie, Integratie en Asiel, 2014 E.C.R. I-2081. 483 Case C-434/16, Peter Nowak v. Data Prot. Comm’r, 2017 E.C.R. I994. 484 Joined Cases C-141/12 & C-372/12, YS, M and S v. Minister voor Immigratie, Integratie en Asiel, 2013 E.C.R. I-838. 485 Case C-434/16, Peter Nowak v. Data Prot. Comm’r, 2017 E.C.R. I582. 486 Case C-553/07, College van Burgemeester en Wethouders van Rotterdam v. M. E. E. Rijkeboer, 2009 E.C.R. I-3889. No. 2:494] A RIGHT TO REASONABLE INFERENCES 613 unthinkable that future jurisprudence could apply these rights to the content of assessments and inferences. A change is, however, unlikely in inherently antagonistic situations that pose data subjects’ rights to privacy, identity, reputation against companies’ rights to freedom of contract and free speech. Dialogue is needed to determine the point at which the right to privacy must take precedence over the private autonomy of decision-makers. This situation is ironic, as data subjects are most in need of protection from the risks posed by inferences and derived data. To close these accountability gaps and promote justification of inferences, this Article proposes a new “right to reasonable inferences” applicable to “high risk” inferences that cause damage to privacy or reputation, or have low verifiability in the sense of being predictive or opinion-based while being used for important decisions. This right would require ex-ante justification to be given by the data controller to establish whether an inference is reasonable. This disclosure would address (1) why certain data are normatively acceptable bases to draw inferences; (2) why these inferences are normatively acceptable and relevant for the chosen processing purpose or type of automated decision; and (3) whether the data and methods used to draw the inferences are accurate and statistically reliable. An ex-post mechanism would allow data subjects to challenge unreasonable inferences, which can support challenges against automated decisions exercised under Article 22(3) of the GDPR. Of course, a solution outside of data protection law may be possible.487 However, few standards exist, especially in the private sector, that govern how decisions are made. A right to reasonable inferences is an essential response to the novel risks introduced by inferential analytics. It is both the essence and the extension of data protection law. In the same way it was necessary to create a “right to be forgotten” in a Big Data world,488 it is now necessary to create 487 See generally Bert-Jaap Koops, The Trouble with European Data Protection Law, 4 INT’L DATA PRIVACY L. 250 (2014). 488 See generally MAYER-SCHÖNBERGER, supra note 10; VAN HOBOKEN, supra note 8. 614 COLUMBIA BUSINESS LAW REVIEW [Vol. 2019 a “right on how to be seen.” The proposed re-imagining of the purpose of data protection law would be more in line with the original remit proposed in the ECHR,489 as well as the Council of Europe’s Modernised Convention for the Protection of Individuals with Regard to the Processing of Personal Data 490 and its guidelines on AI, 491 and the European Parliament’s resolution on a comprehensive European industrial policy on artificial intelligence and robotics.492 It would reconfigure privacy as a holistic concept with a stronger focus on adaptable identity, self-presentation, and reputation. One could also argue for a mediated application of the human right of privacy, and advocate for a “positive obligation” of states to implement laws to protect citizens from privacy invasion by the public and private sectors.493 Based on the preceding analysis of the legal status and protection of inferences, the following recommendations can be made for European policy: A. Re-Define the Remit of Data Protection Law In order to ensure data protection law protects against the novel risks introduced by Big Data analytics and algorithmic decision-making, the ECJ should re-define the law’s remit to include assessment of the reasonableness of inferential analytics and accuracy of decision-making processes. However, it has to be noted that the court’s limitation of Article 16 made sense in this regard in the discussed case law. It would be an odd situation where data protection authorities are competent to rule on the accuracy of immigration cases or examination disputes. In these cases, procedures are in place to deal with complaints. However, the same cannot always be said for inferences that the private sector draws. It is often left to the private autonomy of industry to assess and evaluate 489 See also ECHR jurisprudence on privacy until 2017, reviewed in Council of Europe, supra note 60. 490 Comm. of Ministers, supra note 374. 491 Directorate Gen. of Human Rights and Rule of Law, supra note 375. 492 See Artificial Intelligence and Robotics, supra note 376. 493 De Hert & Gutwirth, supra note 73, at 61; Wachter, supra note 54. No. 2:494] A RIGHT TO REASONABLE INFERENCES 615 people. Companies are relatively free in how they assess people, except where laws exist (e.g. anti-discrimination law) that limit this freedom. As discussed in Part II, due to the widespread implementation of inferential analytics by companies for profiling, nudging, manipulation, or automated decision-making, these “private” decisions can to a large extent impact the privacy of individuals. Thus, dialogue is needed to determine the point at which the right to privacy must be given greater weight than the private autonomy of decision-makers. In effect, individuals should have a right t

Re-aligning Data Protection Law in the Age of Big Data PDF

Document Details

Tags

Related

Summary

Full Transcript