Data Privacy, Security and Ethics - Chapter 1 PDF

Summary

This document discusses data privacy, security, and ethical considerations, particularly focusing on the principles of law, morality, and ethics. It explores different aspects of ethics, including its technical, practical, and philosophical underpinnings, with examples and explanations. It also touches on professional ethics and its regulation.

Full Transcript

Data Privacy, Security and Ethics Chapter 1 Law, Moral and Ethics Relating to the standards of good or bad behavior, fairness, honesty, etc. that each person believes in; Behaving in ways considered by most people to be correct and honest; Standards for good or bad character and behavior....

Data Privacy, Security and Ethics Chapter 1 Law, Moral and Ethics Relating to the standards of good or bad behavior, fairness, honesty, etc. that each person believes in; Behaving in ways considered by most people to be correct and honest; Standards for good or bad character and behavior. It help us understand what we should do according to our personal ideas and principles -- INDIVIDUAL LEVEL Different people might have the same or different morals! A person standing in front of a car Description automatically generated Ethics Technically Branch of Philosophy that studies the basis of Moral; Le Petit Larousse, 2000 Systematic reflection on moral norms, seeking to understand, justify or question these norms; Philosophical study of morality. Practically A system of accepted rules and morals about behavior, based on what is considered right and wrong (e.g. business/professional ethics); Rules of conduct in a particular culture or group recognized by an external source or social system. For example, a medical code of ethics that medical professionals must follow. ![A diagram of ethics Description automatically generated](media/image2.png) "Shame is stealing and getting caught!" It seems to be more important to avoid the consequences of an act than to not do the act itself; Morally, the principles lie in the reputation of being caught rather than in the immorality of the act; Ethically, social consequences are privileged over genuine consideration of the justice and morality of behavior. Nonnormative and Normative Ethics Non-normative ethics Studies what effectively is the case not what ethically ought to be the case or what is ethically valuable; Does not directly engage with prescribing or evaluating actions as right or wrong, but instead focuses on understanding and analyzing the nature of moral concepts. Types: Descriptive ethics, also known as comparative ethics - studies of people\'s beliefs about morality. Studies and reports how people reason and act; Meta-ethics- studies the meaning of moral propositions such as right, obligation, virtue or responsibility and how their truth values (if any) may be determined. Instead of asking what actions are right or wrong, meta-ethics asks what it means to say something is right or (e.g. Is morality more a matter of taste than truth? Are moral standards culturally relative? Are there moral facts? If there are moral facts, what are their origin and nature?). Normative ethics Studies the practical means of determining a moral course of action; Determines what actions are right or wrong, what people ought to do, and what kinds of people they ought to be. It is focused on establishing ethical standards and principles that guide behavior and help people make moral decisions; It includes the formulation of moral rules that have direct implications for what human actions, institutions, and ways of life should be like; It is typically contrasted with theoretical ethics, or metaethics, which is concerned with the nature rather than the content of ethical theories and moral judgments. Types: Applied ethics- studies how moral outcomes can be achieved in specific situations. The interpretation of general norms to address particular problems. (includes professional ethics such as Research ethics) Professional Ethics Most professions have standards of conduct that are generally acknowledged by those who are serious about their professional responsibilities; Members of professions often informally adhere to widely accepted ethical guidelines (non discrimination, honesty and truthfulness); There are also specific professional ethical rules (informed consent and medical confidentiality); Main causes of professional ethics problems: conflicts over appropriate professional standards; conflicts between professional commitments and commitments outside the profession As professional ethics rules are usually very broad, some professions adopt more detailed codes of standards aimed at reducing vagueness-- Codes of Ethical Conduct; Codes sometimes include rules of etiquette in addition to rules of ethics (do not criticize other medical doctors who have previously seen the patient for example); Codes of Ethical Conduct foster and reinforce member identification with the core values of the profession. Public Regulation of Professional Ethics Public policy (any set of enforceable guidelines set by an official public body -- National Commissions, Advisory Committees, Councils) sometimes provides additional Professional Ethics guidelines. Public policy enforces ethical standards within various professions by governmental or quasi-governmental bodies. These regulations are designed to ensure that professionals follow to certain ethical principles and meet the standards of their respective fields. General Medical Council (GMC): "We set the standards for doctors. We set the values, knowledge, skills and behaviors expected of all doctors working in the UK" ABET: "At ABET, our approach, the standards we set and the quality we guarantee, inspires confidence in those who aim to build a better world. We accredit college and university programs" Law Law aims to regulate the essential relationships within a given society; The juridical rule does not require individual conscience to be applied (it is assumed that everyone knows the law); Legal violations can lead to loss of liberties. Everyone can form a moral/ethical opinion. The legal opinion is usually given by a judge; Suspicion is sometimes enough to affect someone's ethical reputation. To the eyes of the law people are innocent until proven guilty - Presumption of innocence. No matter how hard and rigid the law may be, it must always be followed. Ethical Dilemmas Ethical Dilemmas are circumstances in which ethical obligations demand or appear to demand that a person adopt each of two (or more) alternative but incompatible actions, such that the person cannot perform all the required actions. Ver exemplos nos slides No magic bullets exist in resolving ethical dilemmas. Acknowledging them helps deflate unrealistic expectations that ethical principles, rules and theories provide universal solutions. Reasoning and reflection is necessary but it does not guarantee that a perfect solution will be found. Basic Ethics Principles Aim to express general norms of common ethics (widely shared, stable social agreements) that act as a starting point in resolving ethical dilemmas Respect for autonomy (respect and support autonomous decisions); Nonmaleficence (avoiding the causation of harm); Beneficence (relieving or preventing harm, balancing benefits against risks and costs); Justice (fairly distributing benefits, risks and costs). Weighting and Balancing Principles, rules, professional obligations, and rights often need to be weighted and balanced. Balancing is the process of finding reasons to support beliefs about which ethical norms should prevail. Balancing is more than norm specification. It is concerned with relative weights and strengths of different norms and not just their scope/range. It allows appropriate application of norms to particular circumstances. Ethical norms alone cannot accurately quantify factors such as risk and burden. Weigh and Balance considers these factors in a given circumstance. Diversity and Disagreement Conscientious and reasonable moral agents understandably disagree over priorities and choice over conflicting norms; Disagreement does not necessarily indicate ethical ignorance or flaw. There is not a single solution to many ethical questions and ethical dilemmas. Reasons for Ethical disagreement and diversity of points of view: o Factual disagreements (e.g.: the suffering or damage an action will cause); o Insufficient information or evidence; o Disagreement over which norms apply to the circumstances; o Disagreements about the relative weights of applicable norms; o Disagreement on whether a genuine ethical problem exists. Disagreements may persist even among ethically committed persons. When ethical disagreements arise, ethical agents can defend their decision without reproaching others who defend different solutions. Recognition of legitimate diversity is fundamental. Chapter 2 Data Ethics Data ethics encompasses the moral obligations of gathering, protecting, and using personally identifiable information and how it affects individuals. Data ethics are of the utmost concern to analysts, data scientists, and information technology professionals. Anyone who handles data, however, must be well-versed in its basic principles. Is a set principles behind how organizations gather, protect, and use data, ensuring the respect human rights, promotion of justice, and avoidance of any harm. Data sources- Loyalty programs, apps, sensors, credit cards, online shop, social media Why Data Ethics is important? Cambridge Analytica obtained data from millions of Facebook users without their consent. The firm used this data to create detailed voter profiles and target individuals with personalized political ads during the 2016 U.S. presidential election. Ethical concerns: The scandal raised serious ethical concerns about privacy, consent, and the manipulation of voter behavior. The unauthorized use of personal data undermined trust in social media platforms and started global discussions on data privacy and the ethical responsibilities of tech companies. In 2012, Target used predictive analytics to identify customers who might be pregnant based on their purchasing patterns. The company inadvertently revealed one teenager's pregnancy to her father before she had told him. Ethical concerns: The case revealed a lack of privacy considerations. Although was for marketing purposes, it stills affects the customer life. Note: you give consent to the collection of data when making loyalty cards Ashley Madison is a Canadian dating website, which was founded in 2002 and aimed at people looking for an affair. In July 2015, the website suffered a massive data breach when a group of hackers stole the personal information of approximately 32 million users. The hackers released this data publicly, exposing the names, passwords, emails, and personal preferences. Ethical concerns: The company had inadequate data security measures Although users could pay to delete their profiles, the company did not delete them The company fails in protect their users During the COVID-19 pandemic, many countries developed contact tracing apps to track the spread of the virus. While these apps helped manage the public health crisis, they also raised concerns about privacy, data security. Transparent data practices, informed consent, and clear limitations on data use were essential to maintain public trust and ensuring ethical use of the technology. Key Principles of Data Ethics Ownership: An individual has ownership over their personal information. This involves understanding who owns the data and who has the right to control its use. Transparency: Individuals have a right to know how organizations plan to collect, store, and use it (Terms and Conditions). Privacy: Organizations should ensure individuals privacy and respect individuals privacy. Outcomes/Avoidance of Harm: Ensuring that data practices do not cause harm to individuals or communities. This includes considering the potential negative consequences of data use. Fairness: Ensuring that data procedures do not lead to discrimination or bias. This principle aims to prevent harm that might arise from data misuse, or unequal access to data-driven technologies. Accountability: Organizations and individuals responsible for collecting and using data are answerable for the consequences of their actions. It involves taking ownership of any ethical violations. Informed Consent: Ensuring that individuals are fully informed about what data is being collected, how it will be used, and the implications of its use. Data Security: Protecting data from unauthorized access, breaches, and other forms of misuse. This involves implementing robust security measures to safeguard data. Data Protection Law GDPR: General Data Protection Regulation for European Union CCPA: California Consumer Privacy Act for USA -- California PIPEDA: Personal Information Protection and Electronic Documents Act for Canada GDPR \- The main principle is that personal data need to be processed 'lawfully, fairly and in a transparent manner in relation to the data subject'. \- Requirement of informed consent from individuals whose data are processed, unless the law states otherwise (i.e. specific permission provided by law). \- Principles of data minimization and storage limitation are particularly important \- Data subjects have a number of rights as against the controller(s) and processor(s) of their data. A number of these rights may be subject to limitations in certain cases. GDPR Key Points Transparency and accountability Special provisions for scientific research Enhanced rights for data subjects Mandatory procedures for managing data breaches Special provisions for protecting data of minors Mandatory Data Protection Impact Assessments Mandatory appointment of a Data Protection Officer (subject to exceptions) Pan-European validation of European Codes of Conduct Certification mechanisms specifically for data protection Remedies, sanctions and fines GDPR Articles Right to information (art. 13 ff.); Right of access (art. 15); Right to rectification (art. 16); Right to erasure (\'right to be forgotten\') (art.17); Right to restriction of processing (art. 18); Right to receive communication of any rectification or erasure of personal data or restriction of processing (art. 19); Right to data portability (art. 20); Right to objection of treatment and automated individual decision making, including profiling (arts. 21 and 22). In a nutshell, data subjects have the right to: request the restriction of the processing of personal data in specific cases; receive personal data in a machine-readable format and send it to another controller ('data portability'); request that decisions based on automated processing concerning them and based on their personal data are made by natural persons, not only by computers. obtain access to the personal data held about them; ask for incorrect, inaccurate or incomplete personal data to be corrected; request that personal data be erased when it's no longer needed or if processing it is unlawful; object to the processing of their personal data for marketing purposes or on grounds relating to their particular situation. These rights apply across the EU, regardless of where the data is processed and where the company is established. These rights also apply when goods and services are bought from non-EU companies operating in the EU. Right to be forgotten The right to be forgotten appears in Recitals 65 and 66 and in Article 17 of the GDPR. It states, "The data subject shall have the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay" if one of a number of conditions applies. "Undue delay" is considered to be about a month. When does the right to be forgotten apply? An individual has the right to have their personal data erased if: The personal data is no longer necessary for the purpose an organization originally collected or processed it. An organization is relying on an individual's consent as the lawful basis for processing the data and that individual withdraws their consent. An organization is relying on legitimate interests as its justification for processing an individual's data, the individual objects to this processing, and there is no overriding legitimate interest for the organization to continue with the processing. An organization is processing personal data for direct marketing purposes and the individual objects to this processing. An organization processed an individual's personal data unlawfully. An organization must erase personal data in order to comply with a legal ruling or obligation. An organization has processed a child's personal data to offer their information society services. However, an organization's right to process someone's data might override their right to be forgotten. Here are the reasons cited in the GDPR that trump the right to erasure: The data is being used to exercise the right of freedom of expression and information. The data is being used to comply with a legal ruling or obligation. The data is being used to perform a task that is being carried out in the public interest or when exercising an organization's official authority. The data being processed is necessary for public health purposes and serves in the public interest. The data being processed is necessary to perform preventative or occupational medicine. This only applies when the data is being processed by a health professional who is subject to a legal obligation of professional secrecy. The data represents important information that serves the public interest, scientific research, historical research, or statistical purposes and where erasure of the data would likely to impair or halt progress towards the achievement that was the goal of the processing. The data is being used for the establishment of a legal defense or in the exercise of other legal claims. Data Protection Law -- ver slides GDPR Penalties Under the GDPR, fines are administered by the data protection regulator in each EU country. That authority will determine whether an infringement has occurred and the severity of the penalty. They will use the following 10 criteria to determine whether a fine will be assessed and in what amount: Gravity and nature Intention Mitigation Precautionary measures History Cooperation Data category Notification Certification Aggravating/mitigating factors The less severe infringements could result in a fine of up to €10 million, or 2% of the firm's worldwide annual revenue from the preceding financial year, whichever amount is higher. The more serious infringements go against the very principles of the right to privacy and the right to be forgotten that are at the heart of the GDPR. These types of infringements could result in a fine of up to €20 million, or 4% of the firm's worldwide annual revenue from the preceding financial year, whichever amount is higher. GDPR Particularities Your company/organization can only process personal data in the following circumstances: with the consent of the individuals concerned; where there is a contractual obligation (a contract between your company/organization and a client); to meet a legal obligation under EU or national legislation; where processing is necessary for the performance of a task carried out in the public interest under EU or national legislation; to protect the vital interests of an individual; for your organization\'s legitimate interests, but only after having checked that the fundamental rights and freedoms of the person whose data you're processing aren't seriously impacted. CCPA The California Consumer Privacy Act (CCPA) gives consumers more control over the personal information that businesses collect about them and the CCPA regulations provide guidance on how to implement the law. The right to know about the personal information a business collects about them and how it is used and shared; The right to delete personal information collected from them (with some exceptions); The right to opt-out of the sale or sharing of their personal information; The right to non-discrimination for exercising their CCPA rights; The right to correct inaccurate personal information that a business has about them; The right to limit the use and disclosure of sensitive personal information collected about them. PIPEDA The Personal Information Protection and Electronic Documents Act (PIPEDA) sets the ground rules for how private-sector organizations collect, use, and disclose personal information in the course of for-profit, commercial activities across Canada. PIPEDA also applies to the personal information of employees of federally-regulated businesses. The 10 principles Accountability Identifying purposes Consent Limiting collection Limiting use, disclosure, and retention Accuracy Safeguards Openness Individual access Challenging compliance Privacy Literacy Allied with these laws, it is still very important for individuals to have privacy literacy. Debatin (2011) stated that privacy literacy "encompasses an informed concern for \[...\] privacy and effective strategies to protect it" (p. 51). Trepte et al. (2015) further elaborated that "Online privacy literacy may be defined as a combination of factual or declarative ('knowing that') and procedural ('knowing how') knowledge about online privacy. In terms of declarative knowledge, online privacy literacy refers to the users' knowledge about technical aspects of online data protection, and about laws and directives as well as institutional practices. In terms of procedural knowledge, online privacy literacy refers to the users\' ability to apply strategies for individual privacy regulation and data protection" (p. 339). Chapter 3 Informed Consent Informed consent is understood as the informed authorization given by the user prior to submission to a certain medical act, any act integrated in the provision of health care, participation in research or clinical trial. This authorization presupposes an explanation and respective understanding of what is intended to be done, how to act, why and the expected result of the intervention consented to. In principle, living individuals should not be the subject of a research project without being informed. The Milgram Experiment Stanley Milgram, a psychologist at Yale University, carried out one of the most famous studies of obedience in psychology. He conducted an experiment focusing on the conflict between obedience to authority and personal conscience. Milgram (1963) examined justifications for acts of genocide offered by those accused at the World War II, Nuremberg War Criminal trials. Their defense often was based on obedience -- that they were just following orders from their superiors. Could it be that Eichmann and his million accomplices in the Holocaust were just following orders? Could wecall them all accomplices?" (Milgram,1974). A close-up of a text Description automatically generated The Milgram Experiment -- Ethical Issues Baumrind (1964) criticized the ethics of Milgram's research as participants were prevented from giving their informed consent to take part in the study. Participants assumed the experiment was benign and expected to be treated with dignity. Participants were exposed to extremely stressful situations that may have the potential to cause psychological harm. Many of the participants were visibly distressed (Baumrind,1964). Signs of tension included trembling, sweating, stuttering, laughing nervously, biting lips and digging fingernails into palms of hands. Three participants had uncontrollable seizures, and many pleaded to be allowed to stop the experiment. The experimenter gave four verbal prods which mostly discouraged withdrawal from the experiment Worldcoin Worldcoin is a digital identification platform designed to provide everyone with a way to verify their identity as a real human, not a bot or AI. It aims to create a global, decentralized identity and financial network that preserves privacy. Here are some key components of Worldcoin: -World ID: A digital identity system that verifies you are a unique human. -World App: A wallet app that supports World ID and provides access to decentralized finance. -World Chain: An upcoming blockchain designed to support the Worldcoin ecosystem Worldcoin is a way to definitively distinguish between humans and AIs. If all humans online could prove that they were, in fact, humans, then scams and imposters would dramatically decrease, and the digital landscapes would become more accurate representations of us as a society. So in order to prove that humans are humans, Worldcoin scans irises, which are unique to their owners Worldcoin's platform verifies a user's identity by scanning their iris to create personal, secure identification codes. The codes are saved on a decentralized blockchain. Individuals are "rewarded" with cryptocurrency-- WLD. Worldcoin-- What media says? Are there any ethical issues? (ver slides anteriores) The Orb creates a unique iris code (which does not identify anyone by name) and sends it to the blockchain. This code is created through a one-way function, making it impossible to reconstruct the original photograph. This means that the danger of 'identity theft' simply does not exist." Additionally, the Worldcoin team has been developing new features that allow users to delete their iris code even after verification. It also now enables personal custody of data---an essential concept in the Web3 world. In other words, all the information used to generate the iris code is stored only on the user's phone, giving them total and absolute control over their data. Wherever possible, the informed consent process should be integrated into a broader informed consent procedure that meets the standards set out in the Commission's Guidance note on informed consent. However, for projects involving particularly complex or sensitive data-processing operations, or intrusive methods such as behavioural profiling, audio/video recording or geo-location tracking, you should implement a specific informed consent process covering the data- processing component of your project. For consent to data processing to be 'informed', the data subject must be provided with detailed information about the envisaged data processing in an intelligible and easily accessible form, using clear and plain language. As a minimum, this should include: the identity of the data controller and, where applicable, the contact details of the DPO; the specific purpose(s) of the processing for which the personal data will be used; the subject's rights as guaranteed by the GDPR and the EU Charter of Fundamental Rights, in particular the right to withdraw consent or access their data, the procedures to follow should they wish to do so, and the right to lodge a complaint with a supervisory authority; information as to whether data will be shared with or transferred to third parties and for what purposes; and how long the data will be retained before they are destroyed. Consent to Certain Areas of Scientific Research It is often not possible to fully identify the purpose of personal data processing for scientific research purposes at the time of data collection. Therefore, data subjects should be allowed to give their consent to certain areas of scientific research when in keeping with recognised ethical standards for scientific research. Data subjects should have the opportunity to give their consent only to certain areas of research or parts of research projects to the extent allowed by the intended purpose. -If in the course of your research project you wish to make any significant changes to your methodology or processing arrangements that have a bearing on the data subjects' rights or the use of their data, you must make them aware of the intended changes, and seek and obtain their express consent; it is not enough to offer them the opportunity to opt out. This must be done before you make the changes. -If your project involves complex and large-scale data processing, if you plan to use the data in multiple projects or for multiple purposes, or if it is not possible fully to identify the purpose of the data processing at the time of data collection, it may be appropriate to use a consent management application. Various service providers now offer ethically robust, secure informed consent platforms that can help you to manage, document and evidence your consent processes. Consent Management Application A Consent Management Platform (CMP) is a software solution that helps you collect and manage personal information and consents in line with data protection laws and regulations. It enables you to gain insight into the personal data lifecycle from the moment of opt-in to the data removal, allowing you to track, monitor, and respond to the data subject's request and consent preferences. The CMP also allows you to centrally manage notices and propagate them to all consent collection channels. Are there exceptions to the duty to inform? The information should not be passed on to the user where it could cause serious damage to their health. That is, when the doctor or health professional considers that the user\'s knowledge of the clinical situation may represent a danger to their health, they should not provide the information. Likewise, the user has the right to be unaware. This right may be restricted, either in the interest of the user himself/herself or for the protection of others. It is not necessary to impose the obligation to provide information where the data subject already possesses the information, where the recording or disclosure of the personal data is expressly laid down by law or where the provision of information to the data subject proves to be impossible or would involve a disproportionate effort. The latter could in particular be the case where processing is carried out for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes. In that regard, the number of data subjects, the age of the data and any appropriate safeguards adopted should be taken into consideration. Conditions for Consent Consent should be given by a clear affirmative act establishing a freely given, specific, informed and unambiguous indication of the data subject's agreement to the processing of personal data relating to him or her, such as by a written statement, including by electronic means, or an oral statement. Silence, pre-ticked boxes or inactivity should not therefore constitute consent. Consent should cover all processing activities carried out for the same purpose or purposes. When the processing has multiple purposes, consent should be given for all of them. Consent must be regularly evaluated. It is never definite and can be withdrawn at any time. The case of Organ Donation Portuguese legislation is based on the concept of presumed donation, so that a person from the moment they are born acquires the status of donor. Someone who wishes not to be a donor will have to register on their own initiative, or through someone with the right to represent them (parents, in the case of minors), on the National Registry of Non-Donors (RENNDA). This objection may be total or partial. Other countries with Presumed consent: Austria,Spain, Belgium, UK, France, Columbia, Norway, Italy,Czech Republic, Finland, Greece, Hungary,Luxembourg, Norway, Slovenia, Sweden, etc. Any ethical concerns about this topic? Agree? Are you aware? -Autonomy and Informed Consent: One of the primary concerns is that presumed consent may violate the principle of respect for autonomy. This principle underlies the concept of informed consent, which requires that individuals make voluntary and informed decisions about their bodies. -Public Awareness and Education: For presumed consent to be ethically acceptable, it is crucial that the public is well-informed about the policy and the mechanisms for opting out. -Equity and Accessibility: Ensuring that all individuals, regardless of socioeconomic status, have equal access to information and the ability to opt out is essential. Use of secondary data If your research project uses data from social media networks and you do not intend to seek the data subjects' explicit consent to the use of their data, you must assess whether those persons actually intended to make their information public (e.g. in the light of the privacy settings or limited audience to which the data were made available). It is not enough that the data be accessible; they must have been made public to the extent that the data subjects do not have any reasonable expectation of privacy. You must also ensure that your intended use of the data complies with any terms and conditions published by the data controller. Data protection impact assessments The risk-based approach to data processing upon which the GDPR is predicated can help researchers with complex, sensitive or large-scale data processing requirements to identify and address the ethics issues that arise from their methods and objectives. The DPIA is a process designed to assess the data-protection impacts of a project, policy, programme, product or service and, in consultation with relevant stakeholders, to ensure that remedial actions are taken as necessary to correct, avoid or minimise the potential negative impacts on the data subjects. Under the GDPR, a DPIA is mandatory for processing operations that are likely to result in a high risk to the rights and freedoms of natural persons'(art.35). These include in particular: a 'systematic and extensive' analysis of personal data in the context of automated processing, including profiling, where this has a significant effect on the data subject; large-scale processing of 'special categories' of personal data, or of personal data relating to criminal convictions and offences; or a systematic monitoring of a publicly accessible area on a large scale. More specifically: If you're using new technologies If you're tracking people's location or behavior If you're processing personal data related to "racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person's sex life or sexual orientation" If you're processing children's data Ver slides Confidentiality and anonymity Confidentiality: the fact of private information being kept secret Anonymity: The situation in which someone\'s name is not given or known Anonymity: Process of removing personal identifiers, both direct and indirect, that may lead to an individual being identified Confidentiality and anonymity in research Anonymity means you don't know who the participants are, while confidentiality means you know who they are but remove identifying information from your research report. Both are important ethical considerations. You can only guarantee anonymity by not collecting any personally identifying information---for example, names, phone numbers, email addresses, IP addresses, physical characteristics, photos, or videos. You can keep data confidential by using aggregate information in your research report, so that you only refer to groups of participants rather than individuals. Example: Employee engagement surveys This topic is particularly relevant when collecting data via surveys. The words anonymous and confidential are all too often used synonymously when describing employee engagement surveys. The fact is, they mean very different things, and when it comes to your employees' privacy, it's important to make the distinction between the two. ![A table with black text Description automatically generated](media/image4.png) A white and black text on a white background Description automatically generated Ver slides The GDPR only applies to personal data, not to anonymised/anonymous (i.e. non-personal) data. Is anonymisation always required? No Anonymisation will not always be required. Other means should also be considered. In the majority of the cases, data have to be de-identified to the extent that objectives can be achieved. Chapter 4 Personally identifiable information Personally identifiable information is any information that can be linked to a specific person. Examples of PII include: Name Address Phone number Email address Social Security number Driver's license number Social media handles Bank account number Passport number Direct and indirect identifiers Exists two types of PII: direct identifiers and indirect identifiers. Direct identifiers: unique to a person and include things like a passport number or driver\'s license number. A single direct identifier is typically enough to determine someone\'s identity. Indirect identifiers: not unique. They include more general personal details like race and place of birth. A single indirect identifier can\'t identify a person, but a combination can. Sensitive PII versus non-sensitive PII Among PII, some pieces of information are more sensitive than others. Sensitive PII is sensitive information that directly identifies an individual and could cause significant harm if leaked or stolen. Examples of sensitive PII include: Unique identification numbers, such as driver\'s license numbers, passport numbers and other government-issued ID numbers. Biometric data, such as fingerprints and retinal scans. Financial information, including bank account numbers and credit card numbers. Medical records. Sensitive PII is typically not publicly available, and most existing data privacy laws require organizations to safeguard it by encrypting it, controlling who accesses it or taking other cybersecurity measures. Non-sensitive PII is personal data that, in isolation, would not cause significant harm to a person if leaked or stolen. It may or may not be unique to a person. For example, a malicious actor couldn\'t commit identity theft armed with only a social media account name. Other examples of non-sensitive PII include: A person\'s full name Mother\'s maiden name Telephone number IP address Place of birth Date of birth Geographical details Employment information Email address or mailing address Race or ethnicity Religion Non-sensitive PII is often publicly available. For example, telephone numbers may be listed in a phonebook, and addresses may be listed in a local government\'s public property records. Some data privacy regulations don\'t require the protection of non-sensitive PII, but many companies put safeguards in place anyway. That\'s because criminals could cause trouble by assembling multiple pieces of non-sensitive PII. For example, a hacker could break into someone\'s bank account app with their phone number, email address and mother\'s maiden name. The email gives them a username. Spoofing the phone number allows the hackers to receive a verification code. The mother\'s maiden name provides an answer to the security question. How should organizations protect PII? 1\. Identify all PII in the organization\'s systems. 2.Minimize the collection and use of PII, and regularly dispose of any PII no longer needed. 3.Categorize PII according to sensitivity level. 4\. Apply data security controls. Data security measures: Apply de-identification techniques Train employees on how to properly handle and dispose of PII Use anonymized data when possible Draft an incident response plan for PII leaks and breaches Create cybersecurity tools De-Identifying a Dataset The Importance of De-Identifying a Dataset When non-identifiable information is linked to PII in a dataset, an individual's privacy is lost. It's of the utmost importance that consent is given before any PII is collected or made public. To protect privacy, one tactic is to de-identify data, or remove all PII from a dataset. For example, if a company is tracking spending habits across various demographics, it needs to remove customers' names, contact information, address, and credit card details, leaving only their demographics (for instance, age and gender) and purchase history. This ensures that the company can still analyze variables of interest without putting customers' privacy at risk. The process of de-identification requires you to critically think about connections that can be made through data so it's truly de-identified. De-Identification Techniques Suppression: Removing data (e.g., from a cell or row in a table, or data element(s) in a record) prior to dissemination to prevent the identification of individuals in small groups or those with unique characteristics. ![A close-up of a sign Description automatically generated](media/image6.png) Generalization: collecting or reporting values in a given range (e.g., using age or age-range instead of date of birth); including individual data as a member of a set (e.g., creating categories that incorporate unique cases); or reporting rounded values instead of exact amounts. Perturbation: replacing sensitive info with realistic but inauthentic data or modifying original data based on predetermined masking rules (which may include randomization). ![](media/image8.png) Pseudonymization: replacing a real name with a made up name or a real value with a made-up value. Aggregation: combining individual subject data with a sufficient number of other subjects to disguise the attributes of a single subject (e.g., reporting a group average instead of an individual value). ![A table with numbers and a green arrow Description automatically generated](media/image10.png) Pixelation: modifying or obscuring visual information (e.g., blurring out faces in a photograph). A collage of a person\'s face Description automatically generated Scrambling/Encryption: data are algorithmically scrambled and only those with access to the appropriate key can view the encrypted data. ![A screenshot of a computer Description automatically generated](media/image12.png) Differential Privacy: a privacy-preserving technique that conceals individual data points in a dataset by adding controlled random noise, aiming to provide sanitized responses to data requests while balancing privacy protection and data utility. Differential privacy is a state-of-the-art definition of privacy used when analyzing large data sets. It guarantees that adversaries cannot discover an individual within the protected data set by comparing the data with other data sets. A diagram of a different privacy system Description automatically generated Differential privacy adds noise (i.e., small random changes) to the data or analysis results, so that even if an attacker gains access to the data set or results, they cannot accurately identify whether or not a specific person is included in the dataset. Re-identification Re-identification is the process of combining two or more datasets to reveal identities, and it presents a significant threat to privacy. "If somebody takes a dataset that's supposed to be anonymous and re-identifies the people in it, all kinds of harm can happen." Example:Netflix Competition In 2006, Netflix launched a very famous competition, where they shared data of approximately 100 million individual movie ratings and the date of the individual ratings for roughly 500,000 users. All other data of the user was anonymized and reduced to a unique numeric ID, used only to know which ratings belonged to the same user. The competition was aimed at letting the public experiment with new techniques and find a recommendation algorithm that beat what Netflix had at the time by a 10% accuracy gain. The prize was set at U\$D 1,000,000. Netflix had been very careful not to add any data that could identify a user, like zip- code, birthdate, and of course name, personal IDs, etc. Nevertheless, only a couple of weeks after the release, a PhD student announced that they had been able to connect many of the unique IDs in the Netflix dataset to real people, by cross referencing another publicly available dataset: the movie ratings in the IMDB site, where many users post publicly with their own names. This ended up with a big lawsuit and with the competition canceled. Chapter 5 Cybercrime and Cybersecurity Cybercrime While there is no universally accepted definition of cybercrime, it is essentially an act using information technology to perpetrate or facilitate a crime. Specifically, in its glossary, the International Organisation for Standardisation (ISO) defines cybercrime as \'the commission of criminal acts in cyberspace\'. More informally, it is understood as the use or exploitation of information and communication technology (ICT) and/or the internet to commit crime. It may be argued that cybercrime is not a new crime; it is merely a new way of perpetrating crime in general. Where once criminals stole from banks, post offices and individuals in face-to- face encounters, criminals now use electronic means and have often escaped before it is realised that a crime has been committed. According to estimates from Statista's Market Insights, the global cost of cybercrime is expected to surge in the next four years, rising from \$9.22 trillion in 2024 to \$13.82 trillion by 2028. Cybercrime Categories Cyber-dependent crimes: any crime that can only be committed using computers, computer networks or other forms of information communication technology. Such crimes target ICT systems and are typified by hacking, and malware including ransomware. Cyber-enabled crimes: traditional crimes facilitated by the internet and digital technologies. They have evolved in scale and form through the increased use of the internet and communication technology and include fraud through phishing, piracy and counterfeiting. Cybercrime Forms Phishing: This type of crime involves the perpetrators sending malicious email attachments or links to an individual in order to gain access to their accounts or computer. Users are tricked into reacting to emails claiming to be from a bona-fide business or organization informing them that they need to change their password or update their billing information, giving criminals access to their accounts and their funds Phishing is popular among cybercriminals and highly effective. According to IBM\'s Cost of a Data Breach report, phishing is the most common data breach vector, accounting for 15% of all breaches. Breaches caused by phishing cost organizations an average of USD 4.88 million. Types of Phishing: Bulk email phishing: scammers indiscriminately send spam emails to as many people as possible, hoping that a fraction of the targets fall for the attack. Scammers often create emails that appear to come from large, legitimate businesses, such as banks, online retailers or the makers of popular apps SMS phishing/smishing: uses fake text messages to trick targets. Scammers commonly pose as the victim\'s wireless provider, sending a text that offers a \"free gift\" or asks the user to update their credit card information. Voice phishing/vishing: is phishing by phone call. Vishing incidents have exploded in recent years, increasing by 260% between 2022 and 2023. The rise of vishing is partly due to the availability of voice over IP (VoIP) technology\*, which scammers can use to make millions of automated vishing calls per day. Scammers often use caller ID spoofing to make their calls appear to come from legitimate organizations or local phone numbers. Vishing calls typically scare recipients with warnings of credit card processing problems, overdue payments or trouble with the law. Recipients end up providing sensitive data or money to the cybercriminals to \"resolve\" their issues. Spear phishing: is a targeted phishing attack on a specific individual. The target is usually someone with privileged access to sensitive data or special authority that the scammer can exploit, such as a finance manager who can move money from company accounts. A spear phisher studies their target to gather the information they need to pose as someone the target trusts, such as a friend, boss, coworker, vendor or financial institution. Social media and professional networking sites---where people publicly congratulate coworkers, endorse vendors and tend to overshare---are rich sources of information for spear phishing research. Spear phishers use their research to craft messages that contain specific personal details, making them seem highly credible to the target. For example, a spear phisher might pose as the target\'s boss and send an email that reads: \"I know you\'re leaving tonight for vacation, but can you please pay this invoice before the close of business today?\" Business email compromise (BEC): is a class of spear phishing attacks that attempt to steal money or valuable information---for example, trade secrets, customer data or financial information---from a business or other organization. BEC attacks can take several forms. Two of the most common include: -CEO fraud: The scammer impersonates a C-level executive, often by hijacking the executive\'s email account. The scammer sends a message to a lower-level employee instructing them to transfer funds to a fraudulent account, make a purchase from a fraudulent vendor or send files to an unauthorized party. -Email account compromise (EAC): The scammer compromises a lower-level employee\'s email account, such as the account of a manager in finance, sales or research and development. The scammer uses the account to send fraudulent invoices to vendors, instruct other employees to make fraudulent payments or request access to confidential data. Cybercrime Forms Malware: malicious software that \'infects' systems and individual devices which either damage or gain access to computers or networks with criminal intent. Malware is usually spread through emails and untrustworthy websites, or hidden in other files such as documents or images. It will spread when opened, which allows the malware to install itself. The intent behind the distribution of malware is to gain control over a system or device and to steal information or data. Ransomware: Ransomware is the crime that regularly hits the headlines. It is a form of malware employed by criminals, which usually takes control of a system or network, preventing users from accessing their records. The criminal will then either continue to lock the files until the payment of a ransom is made, or threaten to release the data, often confidential, unless payment is made. Worm: are designed to cause damage and disruption by infecting systems, corrupting files, and spreading rapidly. But unlike viruses, which require a host file or program to activate and self-replicate, worms are self-sufficient. In other words, worms self-replicate without needing external input like a host file or program, making them a particularly sophisticated and dangerous cyberthreat. Stuxnet is a computer worm that was designed and deployed to attack nuclear facilities. Arguably the world's first cyberweapon that impacted physical infrastructure, Stuxnet targeted Iranian nuclear centrifuges, damaging and destroying critical military capabilities, and causing major disruption to Iran's nuclear program. Stuxnet malware spread rapidly, making use of previously unknown zero-day vulnerabilities\* in the Windows operating system to jump from computer to computer. But the computers infected in the 2010 Stuxnet zero-day attack were not the final target of the worm--- they were simply the vehicles for getting at the hardware they controlled. \*Zero-day vulnerabilities are security flaws in software that are unknown to the manufacturer and therefore do not have a patch or security update available. These vulnerabilities are called 'zero-day' because developers have zero days to fix the flaw before it can be exploited by attackers Social Engineering The act of deceiving or manipulating individuals into providing confidential or personal information that is subsequently used to carry out a fraudulent activity. The criminal will employ tactics that gain the trust of the victim and exploit common actions of human behaviour. Phishing is a form of social engineering. Cybersecurity Cybersecurity refers to any technologies, practices and policies for preventing cyberattacks or mitigating their impact. Cybersecurity aims to protect computer systems, applications, devices, data, financial assets and people against ransomware and other malware, phishing scams, data theft and other cyberthreats. Implementing effective cybersecurity measures is particularly challenging today because there are more devices than people, and attackers are becoming more innovative. Why is cybersecurity important? In today's connected world, everyone benefits from advanced cyberdefense programs. At an individual level, a cybersecurity attack can result in everything from identity theft, to extortion attempts, to the loss of important data like family photos. Everyone relies on critical infrastructure like power plants, hospitals, and financial service companies. Securing these and other organizations is essential to keeping our society functioning. Everyone also benefits from the work of cyberthreat researchers, like the team of 250 threat researchers at Talos, who investigate new and emerging threats and cyber attack strategies. They reveal new vulnerabilities, educate the public on the importance of cybersecurity, and strengthen open source tools. Their work makes the Internet safer for everyone. ![A screenshot of a computer Description automatically generated](media/image14.png) Cybersecurity challenges The pervasive adoption of cloud computing can increase network management complexity and raise the risk of cloud misconfigurations, improperly secured APIs and other avenues hackers can exploit. More remote work, hybrid work and bring-your-own-device (BYOD) policies mean more connections, devices, applications and data for security teams to protect. Proliferating Internet of Things (IoT) and connected devices, many of which are unsecured or improperly secured by default, can be easily hijacked by bad actors. The rise of artificial intelligence (AI), and of generative AI in particular, presents an entirely new threat landscape that hackers are already exploiting through prompt injection and other techniques. According to recent research from the IBM® Institute for Business Value, only 24% of generative AI initiatives are secured. Security operations center (SOC) An in-house or outsourced team of IT security professionals dedicated to monitoring an organization's entire IT infrastructure 24x7. Its mission is to detect, analyze and respond to security incidents in real-time. This orchestration of cybersecurity functions allows the SOC team to maintain vigilance over the organization's networks, systems and applications and ensures a proactive defense posture against cyber threats. The SOC also selects, operates and maintains the organization\'s cybersecurity technologies and continually analyzes threat data to find ways to improve the organization\'s security posture. SOC activities and responsibilities fall into three general categories. Preparation, planning and prevention: activities comprise an exhaustive inventory of what needs to be protected and the existing tools for protection, creation of incident response plan, maintenance of protection software and regular tests Monitoring, detection and response: activities comprise monitoring of the entire IT structure, collection and analysis of log data in every event, threat detection, incidente response (shutting dowm, delete files, run antivírus,isolate, investigate) Recovery, refinement and compliance: activities comprise impact assessment, knowledge collection, assure compliance with law (GDPR,CCPA,etc) Penetration Testing A penetration test, or \"pen test,\" is a security test that launches a mock cyberattack to find vulnerabilities in a computer system. Before a pen test begins, the testing team and the company set a scope for the test. The scope outlines which systems will be tested, when the testing will happen, and the methods pen testers can use. The scope also determines how much information the pen testers will have ahead of time: In a black-box test, pen testers have no information about the target system. They must rely on their own research to develop an attack plan, as a real-world hacker would. In a white-box test, pen testers have total transparency into the target system. The company shares details like network diagrams, source codes, credentials, and more. In a gray-box test, pen testers get some information but not much. For example, the company might share IP ranges for network devices, but the pen testers have to probe those IP ranges for vulnerabilities on their own. At the end of the simulated attack, pen testers clean up any traces they\'ve left behind, like back door trojans they planted or configurations they changed. That way, real-world hackers can\'t use the pen testers\' exploits to breach the network. Then, the pen testers prepare a report on the attack. The report typically outlines vulnerabilities that they found, exploits they used, details on how they avoided security features, and descriptions of what they did while inside the system. The report may also include specific recommendations on vulnerability remediation. The in-house security team can use this information to strengthen defenses against real-world attacks. Red Team vs Blue Team Red team is "a group of people authorized and organized to emulate a potential adversary's attack or exploitation capabilities against an enterprise's security posture." The red team plays the part of the attacker or competitor with the intention of identifying vulnerabilities in a system. Blue team is "the group responsible for defending an enterprise's use of information systems by maintaining its security posture against a group of mock attackers. " If the red team is playing offense, the blue team is playing defense to protect an organization's critical assets. Purple teams are not actually teams at all, but rather a cooperative mindset that exists between red teamers and blue teamers. While both red team and blue team members work to improve their organization's security, they don't always share their insights with one another. The role of the purple team is to encourage efficient communication and collaboration between the two teams to allow for the continuous improvement of both teams and the organization's cybersecurity. A close-up of a document Description automatically generated CIRT vs SOC CSIRTs are typically activated in response to an incident rather than monitoring threats around the clock. Proactivity vs. Reactivity: SOCs are proactive, continuously monitoring and preventing threats, while CSIRTs are reactive, focusing on responding to incidents after they occur. Scope of Work: SOCs have a broader scope, encompassing overall cybersecurity, whereas CSIRTs have a narrower focus on incident response. Operational Model: SOCs operate continuously, while CSIRTs are typically activated in response to specific incidents Chapter 6 Bias and discrimination, transparency and explainability in AI Dilemmas in AI Nowadays, AI has a huge impact in how society and organizations operate. It has been used in almost every industry ranging from health care, banking, retail, to manufacturing. Although it represents a improvement in many areas, improving efficiency, bringing down costs, and accelerating research and development it may also raise some concerns. These concerns may be related to: -Biased and discriminatory results -Explainability and transparency Bias and discrimination in AI In your favourite search engine and on the images tab: Type "greatest leaders of all time" → count the number of woman vs man Type "CEO" → count the number of woman vs man Type "school boy" → children Type "school girl" → sexualized content A search engine can become an echo chamber that upholds biases of the real world and further entrenches these prejudices and stereotypes online Examples of AI bias in the real world show us that when discriminatory data and algorithms are baked into AI models, the models deploy biases at scale and amplify the resulting negative effects. What is bias in artificial intelligence? AI bias, also referred to as machine learning bias or algorithm bias, refers to AI systems that produce biased results that reflect and perpetuate human biases within a society, including historical and current social inequality. Bias can be found in the initial training data, the algorithm, or the predictions the algorithm produces. When bias goes unaddressed, it hinders people's ability to participate in the economy and society. It also reduces AI's potential. Businesses cannot benefit from systems that produce distorted results and foster mistrust among people of color, women, people with disabilities, the LGBTQ community, or other marginalized groups of people. Real examples of bias in AI In healthcare, underrepresenting data of women or minority groups can skew predictive AI algorithms. For example, computer-aided diagnosis (CAD) systems have been found to return lower accuracy results for African-American patients than white patients. While AI tools can streamline the automation of resume scanning during a search to help identify ideal candidates, the information requested and answers screened out can result in disproportionate outcomes across groups. For example, if a job ad uses the word "ninja," it might attract more men than women, even though that is in no way a job requirement. As a test of image generation, Bloomberg requested more than 5,000 AI images be created and found that, "The world according to Stable Diffusion is run by white male CEOs. Women are rarely doctors, lawyers or judges. Men with dark skin commit crimes, while women with dark skin flip burgers. Ver slides Bias and discrimination in AI Stages where bias can occur Bias can occur in various stages of the AI pipeline: Data Collection: Bias often originates here. The AI algorithm might produce biased outputs if the data is not diverse or representative (e.g., gender or racial imbalance). Model Training: A critical phase; if the training data is not balanced or the model architecture is not designed to handle diverse inputs, the model may produce biased outputs (e.g. certain groups are favored due to the way the algorithm is optimized) Deployment: This can also introduce bias if the system is not tested with diverse inputs or monitored for bias after deployment (e.g. when the AI is applied in the real world, where environmental or contextual factors can differ from the training setting). Types of discrimination in AI: Direct Discrimination: When the AI system directly discriminates based on sensitive attributes like race, gender, or ethnicity (e.g., facial recognition systems performing worse on people of color). Indirect Discrimination: When AI uses a proxy variable that correlates with sensitive attributes, leading to unequal outcomes even without explicit reference to them (e.g., ZIP codes as a proxy for race in credit scoring). Direct Discrimination: Hiring Algorithms: If an AI system is trained using historical hiring data that includes explicit gender preferences (e.g., favoring male candidates over female ones), the algorithm might directly reject women, because it learned to do so from the biased data. This is direct discrimination because the AI is making decisions based on a sensitive attribute (gender) explicitly. Facial Recognition Technology: Some AI facial recognition systems perform worse on people with darker skin tones. Indirect Discrimination: In the U.S. health insurance system, scores are often used to decide who qualifies for insurance. This scoring system might unintentionally benefit people who already have good insurance or are wealthier. As a result, an AI system that helps decide how hospital resources are used could unfairly limit healthcare access, using wealth and insurance status as indirect factors. Another example is in diagnosing illnesses. If symptoms that are more common in men are used as the main guide, women might not get the right diagnosis because their symptoms may be different. Also, AI used in diagnostic tools may work better for some racial or ethnic groups than others, leading to unequal care in detecting and treating diseases. Selection bias: This happens when the data used to train an AI system is not representative of the reality it\'s meant to model. It can occur due to various reasons, such as incomplete data, biased sampling, or other factors that may lead to an unrepresentative dataset. If a model is trained on a dataset that only includes male employees, for example, it will not be able to predict female employees\' performance accurately. Confirmation bias: This type of bias happens when an AI system is tuned to rely too much on pre-existing beliefs or trends in the data. This can reinforce existing biases and fail to identify new patterns or trends. Measurement bias: This bias occurs when the data collected differs systematically from the actual variables of interest. For instance, if a model is trained to predict students\' success in an online course, but the data collected is only from students who have completed the course, the model may not accurately predict the performance of students who drop out of the course. Stereotyping bias: This happens when an AI system reinforces harmful stereotypes. An example is when a facial recognition system is less accurate in identifying people of color or when a language translation system associates certain languages with certain genders or stereotypes. Out-group homogeneity bias: When this happens, an AI system is less capable of distinguishing between individuals who are not part of the majority group in the training data; it\'s a form of out-group homogeneity bias. This may result in misclassification or inaccuracy when dealing with minority groups. Will AI ever be unbiased? The short answer? Yes and no. It is possible, but it's unlikely that an entirely impartial AI will ever exist. The reason for this is because it's unlikely that an entirely impartial human mind will ever exist. An Artificial Intelligence system is only as good as the quality of the data it receives as input. Suppose you can clear your training dataset of conscious and unconscious preconceptions about race, gender, and other ideological notions. In that case, you will be able to create an artificial intelligence system that makes data-driven judgments that are impartial. However, in the actual world, we know this is unlikely. AI is determined by the data it's given and learns from. Humans are the ones who generate the data that AI uses. There are many human prejudices, and the continuous discovery of new biases increases the overall number of biases regularly. Examples of AI bias in the real world: Healthcare---Underrepresented data of women or minority groups can skew predictive AI algorithms. For example, computer-aided diagnosis (CAD) systems have been found to return lower accuracy results for black patients than white patients. Applicant tracking systems---Issues with natural language processing algorithms can produce biased results within applicant tracking systems. For example, Amazon stopped using a hiring algorithm after finding it favored applicants based on words like "executed" or "captured," which were more commonly found on men's resumes. Online advertising---Biases in search engine ad algorithms can reinforce job role gender bias. Independent research at Carnegie Mellon University in Pittsburgh revealed that Google's online advertising system displayed high-paying positions to males more often than to women. Image generation---Academic research found bias in the generative AI art generation application Midjourney. When asked to create images of people in specialized professions, it showed both younger and older people, but the older people were always men, reinforcing gendered bias of the role of women in the workplace. Predictive policing tools---AI-powered predictive policing tools used by some organizations in the criminal justice system are supposed to identify areas where crime is likely to occur. However, they often rely on historical arrest data, which can reinforce existing patterns of racial profiling and disproportionate targeting of minority communities. Strategies to Mitigate Bias and discrimination: Diverse and Representative Data: Ensure training data reflects the diversity of the population, including all relevant subgroups. Ensure that all the proper checks and guardrails are in place when collecting data. It should be done in a representative way, balanced by age, gender, race and any other critical factor that could lead to bias. Fairness-Aware Algorithms: Use techniques such as fairness constraints (e.g., demographic parity, equal opportunity) to adjust AI predictions and make them fairer. Regular Audits and Testing: Regularly audit AI systems for biases and ensure that they are fair and ethical in real-world applications. Introduce tools like Google's What-If Tool, IBM AI Fairness 360, or Microsoft's Fairlearn, which can be used to detect and mitigate bias in AI models And finally, it's crucial to ensure that data and engineering teams are themselves diverse, as this makes a variety of perspectives and experiences available during design, development and testing. Explainability and Transparency in AI What is explainable AI? Explainable artificial intelligence (XAI) is a set of processes and methods that allows human users to comprehend and trust the results and output created by machine learning algorithms. Is the ability to understand and explain how an AI system or algorithm arrives at its decisions or predictions. What is transparency in AI? Openness about how an AI system was developed, trained, and deployed, including details about the data used, the design of the algorithms, and the decision-making process. Why its important? Crucial for an organization in building trust and confidence when putting AI models into production. AI explainability also helps an organization adopt a responsible approach to AI development. With explainable AI organizations can gain access to AI technology's underlying decision-making and are empowered to make adjustments. Explainable AI can improve the user experience of a product or service by helping the end user trust that the AI is making good decisions. Comparing AI and XAI What exactly is the difference between "regular" AI and explainable AI? XAI implements specific techniques and methods to ensure that each decision made during the ML process can be traced and explained. AI, on the other hand, often arrives at a result using an ML algorithm, but the architects of the AI systems do not fully understand how the algorithm reached that result. This makes it hard to check for accuracy and leads to loss of control, accountability and auditability. Ver slides Transparent AI Done Badly! OpenAI-- creators of ChatGPT and the image generation model Dall-E-- have been accused of failing to be transparent over what data is used to train their models. This has led to lawsuits from artists and writers claiming that their material was used without permission. However, some believe that OpenAI's users could themselves face legal action in the future if copyright holders are able to successfully argue that material created with the help of OpenAI's tools also infringes their IP rights. This example demonstrates how opacity around training data can potentially lead to a breakdown in trust between an AI service provider and its customers. In banking and insurance, AI is increasingly frequently being used to assess risk and detect fraud. If these systems aren't transparent, it could lead to customers being refused credit, having transactions blocked, or even facing criminal investigations, while having no way of understanding why they have been singled out or put under suspicion. Even more worrying is the dangers posed by non-transparency around systems and data used in healthcare. As AI is increasingly used for routine tasks like spotting signs of cancer in medical imagery, biased data can lead to dangerous mistakes and worse patient outcomes. With no measures in place to ensure transparency, biased data is less likely to be identified and removed from systems used to train AI tools. Chapter 7 Ethical Dilemmas in AI Ethic and moral values in AI The development of AI raise several ethical and moral dilemmas. These issues are critical because AI systems are increasingly making decisions that affect people\'s lives, raising concerns about fairness, accountability, and the impact of technology on society. Ver slides

Use Quizgecko on...
Browser
Browser