Ethics and Law for AI PDF
Document Details
Uploaded by ImmenseSlideWhistle
Prof. Faroldi
Tags
Summary
This document explores the ethical and legal aspects of artificial intelligence (AI), examining different perspectives on AI, including Weak AI, Strong AI, and Narrow AI. It also delves into the Turing test and its implications for defining AI. This document is suitable for those studying AI and its impact.
Full Transcript
ETHICS, LAW AND AI Prof. Faroldi - LAW What is AI? As machines become increasingly capable, tasks considered to require ”intelligence” are often removed from the definition of AI, a phenomenon known as the AI effect, for instance, optical character recognition. One of the l...
ETHICS, LAW AND AI Prof. Faroldi - LAW What is AI? As machines become increasingly capable, tasks considered to require ”intelligence” are often removed from the definition of AI, a phenomenon known as the AI effect, for instance, optical character recognition. One of the latest definitions is: ’The study and design of intelligent agents, where an intelligent agent is a system that perceives its environment and takes actions that maximize its chances of success’- there are no explicit references to humans or minds. Weak AI: programs that do not experience consciousness or do not have a mind in the same sense people do, but can (only) act like it thinks and has a mind and consciousness (whatever passes Turing’s test). Strong AI: ability of an intelligent agent to understand, feel or think like a human Narrow AI: ability of an intelligent agent to learn and perform a specific task, often with at least human proficiency General AI (AGI): ability of an intelligent agent to learn and perform any intellectual task (a wide range of tasks) that a human being can. It can be horizontal (by number of tasks), or vertical (a task in a much more general and higher level) Superintelligence (ASI): a hypothetical agent that would possess intelligence far surpassing that of the brightest and most gifted human mind (to what extent can they be better than humans in a huge range of tasks?) Turing’s Computing Machinery and Intelligence The Turing test tries to respond to the question: “can a machine imitate the behavior of a human?”. If it does, it is indistinguishable from a human thought and therefore he built up an imitation game to prove so. Problems and objections about Turing’s Empirical question: he made an empirical prediction: in 50 years we would be able to program computers so well that it would almost be impossible (not more than 70%) for an average interrogator to make the right identification after five minutes of questioning. (will it be true?) Conceptual questions: if the answer to the empirical question is yes, should we conclude that the machine exhibits some level of thought, or intelligence, or mentality? The first problem with this question is “what do we consider to be intelligent? 1. Just humans? 2. Just things that are able to sustain a conversation with us? (Chauvinistic) 3. Would the Turing’s test be a good test for any kinds of entities (also animals, aliens and other computers?) The Mathematical Objection draws from Gödel’s Incompleteness Theorem and Turing’s Halting Problem, which demonstrate that within any formal system, there are true statements that cannot be proven. This introduces the idea of “unanswerable” questions, which are relevant to AI. However, Turing noted that these questions are only problematic if humans can answer them, which isn’t always the case. Arguments from Various Disabilities: they focus on claims that machines will never be able to perform certain tasks or exhibit certain behaviors, such as kindness, resourcefulness, humor, learning from experience, or falling in love. These arguments reflect human-like traits that some believe machines cannot replicate. Turing, however, questioned whether these limitations are permanent and whether humans truly exhibit behaviors that are entirely new. Lady Lovelace’s Objection suggests that machines are inherently limited to performing tasks they are explicitly programmed to do, and therefore cannot surprise us or create anything genuinely original. Turing responded by challenging whether humans themselves can ever do something entirely unexpected or new. The Informality of Behavior argument assumes that human behavior is not governed by a strict set of rules, while machines operate within fixed rules. From this, it is inferred that human behavior is too unpredictable and informal to be reduced to machine-like processes, suggesting that humans and machines are fundamentally different in this respect. Assessments The Turing Test provides logically necessary and sufficient conditions for the attribution of intelligence. This would mean passing the Turing Test is both required and enough to prove intelligence, but this is widely rejected. The Turing Test provides logically sufficient but not logically necessary conditions for the attribution of intelligence. Passing the Turing Test is enough to suggest intelligence, but something could be intelligent without passing the test. The Turing Test provides “criteria”defeasible sufficient conditions for the attribution of intelligence The Turing Test provides (more or less strong) probabilistic support for the attribution of intelligence. Passing the test increases the likelihood that the system is intelligent, but it doesn’t guarantee it. The Turing’s test is too hard: there may well be features of human cognition that are particularly hard to simulate, but that are not in any sense essential for intelligence The Turing Test is too narrow: first, success in Turing’s Imitation Game might come for reasons other than the possession of intelligence. But, second, success in the Imitation Game would be but one example of the kinds of things that intelligent beings can do and hence in itself could not be taken as a reliable indicator of intelligence. Should the Turing Test be Considered Harmful? ➔ The Turing Test had the function of counter skepticism about computers and their powers The Turing Test should not be seen as being meant as a true test of intelligence. It is also not a good test of intelligence: it is informal based on a highly contingent definition of a human mind, which depends on biology, evolution and culture and cannot be reverse-engineered to build an intelligent machine. Searle’s Chinese Room, 1980 Even if something behaves as a human, it does not mean that it has a sense of self, consciousness. The experiment to prove it is called “Chinese room”. ”Searle imagines himself alone in a room following a computer program for responding to Chinese characters slipped under the door. Searle understands nothing of Chinese, and yet, by following the program for manipulating symbols and numerals just as a computer does, he sends appropriate strings of Chinese characters back out under the door, and this leads those outside to mistakenly suppose there is a Chinese speaker in the room.” >>He is provided a way to respond correctly to given questions, even without knowing Chinese. As in the Turing’s test, the outcome could make us assume that he is a native speaker, even if he actually has no understanding of what he is reading or writing. Conclusion: Computers of Turing's test use symbolic manipulation, which does not convey meaning or semantics. >>Human minds are not computer-like computational or information processing systems: they must result from biological processes; computers can at best simulate these biological processes. Mental states are high-level emergent features that are caused by low-level physical processes in the neurons, and it is the (unspecified) properties of the neurons that matter. EU AI Act The EU has a strong regulatory force. formal impact: for EU countries and services which are related to EU country informal impact: due to the EU being the largest global market, many non-EU countries and companies adopt or comply with its legal frameworks, even when not legally required to do so. For example, the GDPR privacy regulation has been widely adopted beyond the EU. The proposed AI regulations are expected to have a similar dual impact, influencing both non-EU governments and companies to voluntarily comply. Intended scope Build an Artificial Intelligence (AI) regulatory framework compatible with European values (it needs to be adopted immediately in all states indistinguishably, it cannot be adapted with a further law according to the needs of different states) and the protection of human rights (even if the main goal is to improve the function of the internal market, trustworthy AI and support innovation - this is also an opportunity for innovation for Europe). provides a definition and classification of AI systems (based on their risks) establishes a set of requirements for their implementation, commercialization and use outlines a set of monitoring tools and sanctions Structure It is divided in two parts: the first one is about consideration of the second part, which is the only legally binding one. FIRST PART: This section outlines the broader goals, ethical considerations, and general guidelines for AI development and use. While it provides context and sets the tone for the legislation, this part is not legally binding. SECOND PART: This is the enforceable part of the AI Act. It contains specific rules, obligations, and requirements that must be followed by AI developers, users, and companies operating in the EU or offering AI-related services within its jurisdiction. PART I FOR WHOM? Article 2 Mainly for providers and professional users, not for a common user’s rights. The law regards the placement on the market of a forbidden program, not the development of it: it is a product safety legislation: everything that is put on the market needs to have an intended purpose. 1. suppliers placing on the market or commissioning AI systems within the Union; 2. users of AI systems located in the Union; 3. users and suppliers of systems located in a third country when ”the output produced by the system is used in the Union” (Art. 2). However, it excludes military and defense applications, as well as for certain systems built up for scientific research and development. Definitions definition of AI system: a "software that is developed with one or more techniques and approaches listed in Annex I and can, for a given set of human-defined objectives, generate outputs such as content, predictions, recommendations, or decisions influencing the environments they interact with." This definition includes a broad range of AI technologies, including machine learning (supervised and unsupervised learning, reinforcement learning), approaches based on logic and explicit knowledge models (such as expert systems), and statistical approaches (such as Bayesian estimation) definition of general purpose AImodel: an AI model that is not designed for a specific task but can be adapted to perform a wide range of tasks across different applications and sectors. These models are versatile and can be used in a variety of contexts, including language processing, image recognition, and more. Given their adaptability, the EU AI Act places particular emphasis on ensuring that such models comply with the relevant regulatory standards, regardless of how they are ultimately applied. Remarks on definitions Broad Interpretation: The reference to multiple AI approaches could cover technologies not usually seen as AI, including most algorithmic decision systems. Expanded Scope: Logical and statistical models risk extending the regulation to nearly all algorithmic systems. Unclear Boundaries: Ambiguity between techniques (e.g., statistical vs. procedural) creates uncertainty for market participants. Exclusion of High-risk Traditional Systems: Narrow definitions might exclude automated systems with high risk but developed via traditional methods. Conflict with Risk-based Approach: This broad definition contradicts the goal of a risk-based, technology-neutral framework. PART II What is prohibited It is about certain practices (not systems) carried out through AI systems that generate or may generate effects considered unacceptable because they are contrary to the fundamental values and rights of the Union. For these practices, the Proposal places a blanket ban, with exceptions allowed in particular cases. Types of risks- It is the fundamental category of classification in the Act. A risk is the combination of probability of a harm and its severity, it can be isolated risk limited risk high risk acceptable risk The Act prohibits the use of ➔ subliminal techniques with an objective of manipulation. >>BUT: what type of harm is intended? physical or psychological? The ”subliminal techniques” is unclear, nor does the meaning of ”consciousness” in relation to practices that would operate beyond it appear understandable. exploitation of vulnerable individuals/groups: use of a system ”that exploits the vulnerabilities of a specific group of persons, due to age or physical or mental disability, in order to materially distort the behavior of a person belonging to that group in a way that causes or is likely to cause that person or another person physical or psychological harm”. >>However, it is unclear what the actual added value of this provision is compared to the previous one, no clue is given as to the meaning of ”exploitation” and some types of vulnerabilities are not considered in the Proposal (e.g., financial vulnerabilities). When dealing with AI systems, vulnerability is structural, meaning it arises from the very nature of how AI systems function, rather than from the personal characteristics or situations of individuals. This type of vulnerability affects everyone who interacts with the system, as it is rooted in the AI's design, decision-making processes, or operational frameworks. ➔ biometric categorisation: it involves using systems to infer personal characteristics such as race, sexual orientation, or assessing an individual’s behavior and trustworthiness based on data. It is distinct from identification, which focuses on recognizing or verifying a specific individual. Public social scoring systems: which rank individuals based on their behavior or personal characteristics, are prohibited if they lead to prejudicial or unfavorable treatment by public authorities or private entities in two cases: 1. Unrelated Context: The data used by the system is applied in social contexts different from where it was originally collected. 2. Disproportionate Harm: The harm caused is unjustified or disproportionate to the behavior or its severity. HIGH RISK SYSTEMS AI systems are classif[ied] [] as high-risk if, in the light of their intended purpose, they pose a high risk of harm to the health and safety or the fundamental rights of persons, taking into account both the severity of the possible harm and its probability of occurrence and they are used in a number of specifically pre-defined areas specified in the Regulation (Article 6) The definition of high risk is adapted on a case-by-case basis, and may include categories of AIs also found in prohibited practices, depending on the use. ➔ First of all, a system is considered high-risk if it is a safety component of a product (or part thereof) regulated by European Union harmonization regulations (Article 43) ➔ The second part of the definition of high risk is again a list. The list is contained in the Annex III and includes eight macro-areas in which the AI systems used are presumed to be of high risk. The macro-areas include biometric identification, access to public and private services, law enforcement, and administration of justice. The high-risk categories include: 1. Biometric Identification and Categorization of Natural Persons(Article 10): AI systems used for the remote biometric identification (e.g., facial recognition) or categorization of individuals in publicly accessible spaces, often used in law enforcement or border control. 2. Education and Vocational Training (Article 10): AI systems used to determine access to education or vocational training or influence individuals' educational decisions, such as grading systems or systems used in university admissions. 3. Access to and Enjoyment of Essential Private and Public Services (Article 13): AI systems determining eligibility for essential services like healthcare, financial services, welfare, or public assistance. These systems might influence decisions on loans, insurance, or social benefits. 4. Administration of Justice (Article 13) - AI systems used in legal proceedings or decision-making processes related to justice, such as tools that assist in judicial decision-making, legal case outcomes, or risk assessments for bail and sentencing. 5. Law Enforcement (Article 27): AI systems used by law enforcement authorities for predicting criminal activities, profiling, risk assessments, or any system influencing judicial decisions. These systems could impact individuals' fundamental rights and freedoms. ”AI systems intended for use by law enforcement authorities [...] for assessing the reliability of evidence in the investigation or prosecution of crimes,” ”for detecting deep-fakes,” ”for profiling natural persons,” and ”for criminal analysis with respect to natural persons, which enable law enforcement authorities to search complex, related and unrelated data sets made available from different data sources or in different formats in order to detect unknown patterns or discover hidden relationships in the data.” 6. Management and Operation of Critical Infrastructure (Article 29): AI systems that manage or operate critical infrastructure, such as electricity, water, and transport systems, where failures could have significant safety or operational consequences. 7. Employment, Workers Management, and Access to Self-Employment (Article 29): AI systems involved in recruiting or managing workers, including automated resume filtering or employee monitoring systems. These systems could affect access to employment and workplace conditions. 8. Migration, Asylum, and Border Control (Article 29) - AI systems used for assessing visa or asylum applications, managing immigration, or controlling borders. These could include AI-driven tools that predict migration flows or automate application processing. RISK MANAGEMENT SYSTEMS The EU AI Act defines strict rules for providers and users of high-risk AI systems to ensure safety, accountability, and transparency through their lifecycle (Art. 9) Key provisions include: Training and Validation System (Art. 9) Providers of high-risk AI systems must establish and implement a thorough training and validation process to ensure the system performs as intended. The risk management system shall consist of a continuous iterative process run throughout the entire lifecycle of a high-risk AI system, requiring regular systematic updating. It shall comprise the following steps: (a) identification and analysis of the known and foreseeable risks associated with each high-risk AI system; (b) estimation and evaluation of the risks that may emerge when the high-risk AI system is used in accordance with its intended purpose and under conditions of reasonably foreseeable misuse; (c) evaluation of other possibly arising risks based on the analysis of data gathered from the post-market monitoring system referred to in Article 61; (d) adoption of suitable risk management measures in accordance with the provisions of the following paragraphs. *Risk identification: amounts to the systematic use of available information to identify hazards, where a hazard is a potential source of harm. *Risk estimation: is the calculation of the probability of occurrence of harm and the severity of that harm, whereas risk evaluation means the determination of whether a risk is acceptable. The risk management measures are based on three steep: 1. design and development 2. mitigation and control measure 3. information and training The second component of the risk management systems is testing, to identify the best risk management measures, to ensure consistency, and that to ensure legal compliance. REQUIREMENTS FOR HIGH-RISK AI SYSTEMS The EU AI Act outlines essential requirements for high-risk AI systems in Chapter 2, placing the primary responsibility on the providers who develop these systems. Providers must establish a quality management mechanism to ensure compliance with several key requirements, including: 1. Data Governance (Art. 10): Providers must ensure the quality of the data used for training, validation, and testing. The data must be relevant, representative, error-free, and complete, considering the specific characteristics of the geographic area of application. This also includes implementing appropriate data governance practices, even for systems that do not rely on automated training. 2. Record Retention (Art. 12): Providers are required to maintain accurate records of their AI systems' design and development to ensure traceability and accountability throughout the system's life cycle. 3. Transparency and User Information (Art. 13, Art. 27): Providers must be transparent with users, providing clear and understandable information about the AI system's capabilities, limitations, and potential risks. High-risk AI systems must be designed in a way that makes their operation understandable to users and providers. In addition, the regulation emphasizes the monitoring, detection, and correction of bias. Providers must prevent discriminatory outcomes by managing biases that could arise from the data. This is particularly challenging because detecting bias often requires knowledge of protected characteristics (such as race, gender, etc.), which are sensitive and may fall under Article 9 of the GDPR. The EU AI Act aims to ensure that even high-risk systems that don’t rely on automated training have robust data governance mechanisms in place to minimize risks. The criticism of the EU AI Act is that it lacks a clear, risk-based approach to identifying high-risk AI systems. While the act defines sensitive areas, it doesn't offer a specific method for determining when an AI system actually poses a risk. This can lead to confusion, as even low-risk systems (e.g., a legal search engine for judges) might be classified as high-risk. In contrast, the GDPR follows a structured risk-based approach with clear criteria for assessing data processing risks, such as evaluative processing, systematic monitoring, and the use of innovative technologies like facial recognition. The GDPR provides clear guidelines, allowing organizations to assess risks and implement safeguards accordingly. The concern is that without a clearer method, the EU AI Act could lead to over-regulation of low-risk systems and under-regulation of high-risk ones. HUMAN SUPERVISION Article 14 of the AI Proposal mandates that high-risk AI systems must be developed with appropriate human-machine interface tools to enable effective supervision by natural persons (humans). These systems are expected to include measures, either integrated by the provider or implemented by the user, to support human oversight. Specifically, the Proposal requires that these measures help the supervising person: 1. Fully understand the capabilities and limitations of the AI system. 2. Monitor the system’s operation to ensure it functions as intended. 3. Remain informed about potential risks of bias in the system. 4. Correctly interpret the system’s output, avoiding blind reliance on its decisions. 5. Decide not to use the system or override its decisions, including ignoring, canceling, or reversing its outputs. 6. Intervene in or interrupt the operation of the system, such as using a "stop" button or similar procedure when necessary. In paragraph 4 of Article 14, these supervisory tasks are outlined as essential for preventing misuse or overreliance on AI systems, especially when their operations might impact important areas such as safety, health, or fundamental rights. Article 29 establishes the obligations for users of high-risk AI systems. Under Article 29 (1), users are required to use the system according to the operating instructions, and Article 29 (3) states that they must monitor the system’s functioning. However, concerns arise regarding the feasibility of human supervision as prescribed by Article 14. It can be difficult for individuals to perform the required tasks effectively, as many AI systems are complex and exceed human cognitive and decision-making abilities. This could create a mismatch between the system’s complexity and the supervising person’s ability to fully assess its functioning, particularly when AI is used in tasks that surpass human capabilities in processing information or making decisions. Experts argue that this creates a competence-responsibility gap, as AI systems are often adopted precisely because they are more capable than humans in specific tasks, such as decision-making and processing vast amounts of information. Humans, however, are still required to supervise and evaluate these systems, which is sometimes seen as an “impossible task.” Balancing complex protection requirements (e.g., safety, health, human rights) adds further complications, making it challenging for individuals to exercise full control over the system's outcomes. MEDIUM RISK SYSTEM In the EU AI Act, medium-risk AI systems are seen as potentially problematic, particularly because of their manipulative potential. Therefore, transparency requirements are placed on these systems to ensure individuals interacting with them are aware that they are engaging with an AI system. This transparency, however, is distinct from the "explainability" required by Article 13 for high-risk AI systems, which provides a detailed explanation of the AI’s functioning or decision-making process. Article 52 addresses transparency for three specific categories of AI systems: 1. Systems intended to interact with natural persons: This category is designed to address systems like chatbots or personified AI systems. While the definition may seem somewhat ambiguous, the goal is to ensure that users understand they are interacting with an AI system. These systems may manipulate conversations or behaviors, and thus transparency is crucial to prevent deception. 2. Emotion recognition and biometric categorization systems: These systems analyze biometric data to make assessments about individuals, such as emotion recognition (through facial expressions or voice) or biometric-based psychographic profiling (e.g., determining health status via wearable devices like the Apple Watch). Since these systems involve sensitive personal data, transparency helps prevent misuse or hidden manipulation. 3. Deep fake systems: This category refers to AI systems used to generate or manipulate content—image, audio, or video—that closely resembles real people, places, objects, or events in a way that could falsely appear as authentic or truthful. The goal of transparency here is to ensure that users are not deceived by artificially generated content. There is, however, an exception to this transparency requirement if the systems are used for investigation or crime detection purposes, acknowledging the need for secrecy in certain law enforcement applications. PREVENTIVE CONTROL OF AI SYSTEMS AND POST-MARKETING MONITORING The EU AI Act establishes several key mechanisms and processes to regulate the introduction and use of high-risk AI systems, ensuring compliance with safety, transparency, and ethical standards. 1. Preliminary assessment of high-risk AI Systems Before high-risk AI systems can enter the market, they must undergo a preliminary conformity assessment to ensure compliance with the Regulation's requirements. This assessment is essential to verify that the system meets safety, transparency, and ethical standards outlined by the Regulation. Article 43 specifies that conformity assessment is typically based on the supplier’s internal control (governed by Annex VI). This means that suppliers independently evaluate their own quality management systems and technical documentation, ensuring they align with the regulatory requirements and common standards. Suppliers can either comply with existing standards or implement the essential requirements at the technical level on their own. However, for specific systems, like biometric categorization systems (Article 43), a self-assessment is not sufficient. In such cases, third-party approval from an accredited, independent notified body is required to verify the system’s compliance (Article 33). If the AI system is a safety component of a product regulated under the New Legislative Framework (NLF), it follows the evaluation procedures from relevant legislation to avoid duplication of the regulatory process (Article 43). In many cases, suppliers are still allowed to conduct a self-assessment and apply the EU mark of conformity independently, without requiring third-party approval. 2. Governance and Oversight To ensure consistency and effective enforcement, the Proposal establishes a new governing body at the EU level: The European Artificial Intelligence Board (Title VI, Chapter 1, Art. 85) is created to coordinate governance of the system. Its main responsibilities include ensuring cooperation among national authorities (e.g., sharing best practices), providing support to the Commission through opinions and guidelines, and helping to manage overall compliance. Member States are required to designate one or more national competent authorities responsible for enforcing the Regulation (Art. 86). These authorities have several key powers: ○ They can request information and documentation from suppliers. ○ They can urge suppliers to take corrective action or even withdraw non-compliant systems from the market. Additionally, a database for high-risk independent AI systems (Art. 66) will be established at the European Commission, providing greater transparency and oversight over such systems. 3. Codes of Conduct for Non-High-Risk AI Systems Although the Proposal primarily focuses on high-risk AI systems, Title IX encourages the adoption of codes of conduct for AI systems that fall outside the high-risk category. These codes are designed to promote responsible AI development and should aim to match the standards set for high-risk systems, even if they are not legally required to do so. This is an effort to raise the bar for all AI systems, ensuring they operate safely and ethically. 4. Sanctions and Penalties (Art. 71) The Proposal also sets out administrative sanctions for non-compliance, particularly for violations involving high-risk AI systems: For severe violations, such as those relating to the prohibitions in Article 5 or failure to meet the requirements in Article 10, fines can reach up to thirty million euros or 6% of a company’s global annual turnover (whichever is higher) (Article 71). Lesser violations can result in penalties of up to twenty million euros or 4% of global turnover. 5. Challenges with Enforcement Despite the well-laid-out framework, there are concerns about the effectiveness of enforcement: The Proposal relies heavily on national authorities to enforce compliance, but enforcement practices and resources can vary significantly between Member States. This creates a risk of enforcement discrepancies, where some countries may be better equipped to handle enforcement than others. Additionally, the Proposal does not provide mechanisms for individuals or groups who may be aggrieved by AI systems, such as a formal complaint process to the market surveillance authority. This lack of direct recourse for affected parties limits the ability of individuals to hold AI suppliers accountable. Article 66 of the EU AI Act mandates the creation of a public database for high-risk AI systems by the European Commission. Providers must submit information about their systems (e.g., contact details, intended use, and conformity assessment results) before market entry and keep the data updated. The database is publicly accessible to ensure transparency and accountability for high-risk AI systems. Alignment problem The alignment problem consists in having artificial systems do what human principals intend them to do. There are conceptual issues and technical issues. The aim is to propose a novel alignment strategy that respects the following desiderata: (i) representativity ii) flexibility and value change (iii) prescriptiveness, (iv) motivation, based on the ground-breaking concept of law-following AI, developed through an original reasons-based approach. Risks and harms of existing AI systems , which are fairly limited in the narrow tasks they can accomplish (thus known as narrow AI), are emerging and there is consensus that they need to be reduced by improved design (e.g. machine ethics), regulation (EU AI Act), and other means. Risks and harms of future AI systems are more difficult to foresee, because it is unclear to what extent fAI will reach something approaching general intelligence or superintelligence, what that would look like, and how difficult it would be for humans to understand and control it The proposed EU AI Act, for instance, even in its revised version, fails to address how general fAI can be covered by its risk-based approach, considering that a general system, for its own nature, does not have an intended purpose. Some people are worried about future systems engaging with power-seeking behaviors, because they could lead to existential risk; even if this is an extremist way of thinking, everybody agrees that they need to be ethical and moral. We can divide the fields in two parts: 1. AI ethics spells this out in terms of various principles or requirements, e.g. transparency, justice and fairness, non-maleficence, responsibility, privacy, beneficence, freedom and autonomy, trust, sustainability, dignity, solidarity, some of which filter down to regulatory proposals. 2. AI safety spells this out in terms of alignment, i.e. to be in line with humans’ preferences, objectives, or values (respectively the value alignment and super alignment problem). Value alignment needs to be distinguished from alignment out of court because alignment, per se, is a value neutral term, i.e. an AI can be aligned with respect to an evil policy. If an AI is not aligned, we say it is misaligned. > The principle-based, top-down approach of AI ethics, however, faces at least two kinds of problems: 1. there are too many principles, thus making it hard to find a workable application, resulting in overlap and ambiguity (e.g. ’transparency’ is used in two different senses in the proposed EU AI Act); and even settling on a few and resolving ambiguities 2. the principles are difficult to make concrete and therefore to implement in a technical setting, and they do not take into account general AI >There is no consensus on a definition of alignment in AI safety. There are two main reasons: 1. AI safety is bottom-up and is not yet in a normal science phase, and it is thus normal that there are competing approaches and concerns. 2. Work on alignment has remained confined to computer science, and has not yet benefited from philosophical reflection. This has resulted in at least the following problems with the state-of-the-art research on alignment: ➔ a general failure of conceptual analysis, often resulting in incoherent, inconsistent definitions; ➔ a lack of awareness and coordination with individual well-established debates, e.g. the problem of what constitutes agency; the fact-value distinction; the rule-following problem; the structure of values and norms. Principles are many and too abstract, and alignment is too fragmented and hard to generalize. The synthesis needed points towards a framework that takes into account the plurality and human-level complexity of principles from AI Ethics, and the technical implementation of alignment strategies from AI Safety. Such a framework is the law (and related phenomena): for instance, argues that the law converts opaque human goals and values into actionable conducts, which can be generalized in novel situations; moreover, the law has in-built resolution mechanisms that ethics lacks. The law is general, but admits of exceptions, and can be adapted and extended to new scenarios in principled ways. However, a general artificial agent cannot just be programmed to follow principles, rules or laws, as they are too general (thus AI ethics is insufficient), but neither it can be programmed to follow narrow conducts, for then it could easily engage in reward-hacking, goal misgeralisation, and be misaligned in a way that cannot be judged from external conduct during training (thus AI safety is insufficient); but it also needs to be responsive to high-level reasons behind actions and decisions (thus needing something more than first-order alignment from AI safety), and not just human preferences, a technique used both by state-of-the-art large language models , and by a recently proposed framework, such as the Human Compatible framework. The literature on alignment is extremely fragmented, in part because different communities work on it from different perspectives, in part because standards have not yet emerged, and of course, because techniques are fast improving. There are two large families of alignment definitions: 1. Minimalist (aka narrow) approaches focus on avoiding catastrophic outcomes. The best example is Christiano concept of intent alignment: ”When I say an AI A is aligned with an operator H, I mean: A is trying to do what H wants it to do. 2. Maximalist (aka ambitious) approaches attempt to make AIs adopt or defer to a specific overarching set of values - like a particular moral theory, a global democratic consensus, or a meta-level procedure for deciding between moral theories. Recent work points towards the concerning fact that even if an AI follows the specification during training, it can still capably pursue an unintended goal during deployment: this is the problem of goal misgeneralization, and it appears not just with deep learning, but across different machine learning techniques : misalignment is ”robust”. Current approaches to value alignment can be characterized depending on how they deal with or represent values, i.e. either implicitly or explicitly: Implicit values: values are determined implicitly via large-scale human feedback, e.g. from contractors in large language models (which of course has socio-political problems we do not focus on here, such as bias, exploitation, etc.). Explicit values: values and principles are introduced explicitly, and then there might be further training on them, i.e. what the company Anthropic calls “Constitutional AI” Problems: These two families of approaches have one common feature: they are, in a sense, descriptive, i.e. they rely on an implicit or explicit list of values that are contemporary and are held by a specific group of individuals. 1. This has practical problems: it is not representative of a majority, and is predicated on the single-human/single-machine interaction style; it is fixed in time (e.g. the Romans would have found slavery not just acceptable, but good) thus preventing moral progress absent constant retraining or updating of the list of values; Humans are notoriously bad moral teachers, at least if the teaching is by example. We cannot expect machines to learn what values there are just by observing our behavior. They have to learn what our preferences are: the idea of IRL (inverse reinforcement learning), which is a technique proposed by Russel. 2. It also has theoretical problems: it blurs the descriptive/prescriptive distinction because one cannot determine what is right, and therefore to be done (the normative), from observing behaviors (the normal); Even if it ensured compliance, it would not ensure compliance for the right reason: an agent can behave in an aligned fashion just to get a reward until presented with new scenarios or in deployment, rather than training, stages. >Shah shows that in a variety of domains and techniques, a model trained with a correct reward function pursues undesirable goals in novel tasks (the goal mis-generalization problem): this suggests that it is possible to have an AI system that pursues a misaligned goal but, knowing we would stop it, does what we want until it can make sure we can stop it anymore. >Hart hypothesized that a switch to an internal point of view is required. From the external perspective, an observer merely notes that members of a society tend to behave in accordance with certain rules. The observer sees a pattern of behavior, but doesn’t themselves feel bound by these rules. Progress on alignment research, therefore, has to respect the basic families desiderata we can derive from the problems just mentioned: representativity, in order to take into account wide interests and values in society, not to exclude minorities and to improve on the single human/single machine interaction style that leads to distortions in societal contexts; flexibility and value change, in order to take into account changing preferences both at the individual and societal level, with an outlook of normative progress; (update mechanisms); prescriptiveness, in order to avoid reading values from facts, which leads to theoretical and practical problems; motivation, in order to solve problems such as goal mis-generalization, and reward-hacking. The first two desiderata capture, at a high level, some of the principles and concerns of AI ethics, at least at the procedural level. The latter two desiderata capture instead some of the concerns and objectives of AI safety, once this field is subject to some philosophical analysis. Why having a law-following AI fulfills this desiderata One complex human phenomenon is able to respect these desiderata, at least in principle, namely the law. Legal systems can be representative of a large population in a way that tries to create some synthesis of different values and ends. Legal systems can adapt to societal change through e.g., elections, interpretation, legislation, or customs. Legal systems are prescriptive, as they normally enforce sanctions, and genuine law-following requires what Hart called the internal point of view. What does it take to follow the law? Two elements of rule- or law-following need to be distinguished: the external element, which can be empirically established (although with space for interpretation and disagreement of what constitutes conformity), (ACTUS REUS) the internal element, which is taken to be acceptance, intentionality, etc. (MENS REA) Could we establish whether conduct complies with the law by simply focusing on an empirical, behavioral criterion? This would consist in observing external behavior and seeing whether it complies (or not) with (a reasonably accepted interpretation of) the law, without any regard to factors such as motivation, understanding, etc. an agent can continue to comply outwardly, but for the wrong reasons, and at a certain, critical point, reveal that it was pursuing a different misaligned objective all along. The thin view holds that agents can be legal agents as soon as there is a valid norm addressing them in the proper way. However, this position is somewhat weak on a variety of fronts. >First, it is philosophically uninteresting, because it fails to address the issue of whether such an agent can be or should be the subject of such norms. >Second, and relatedly, such a thin view is also useless, for it does not consider the alignment between the means and the aim of the law. The thick view holds that artificial agents can display characteristics that make them substantially subject to the law. The important distinction on how much of these requirements are “natural” (e.g., depend on a presupposed notion of “natural” person) or artificial (i.e. established arbitrarily to a degree), such that they can apply to non-human agents as well. Intentions and responsibility in AI agents Defining an Agent The concept of an agent has been debated extensively in philosophy, ranging from Aristotle to modern thinkers like Hume and Davidson. Broadly, an agent is any entity capable of entering causal relationships, A narrower definition focuses on entities capable of intentional actions, initiating activities, or forming second-order desires. Outside philosophy, fields like computer science and biology define agents as systems that direct actions toward goals using feedback control mechanisms. Artificial Agents Artificial intelligence has introduced systems referred to as "agentic," such as personal assistants and large language models. These systems are considered agentic if they can pursue goals independently in complex environments, act autonomously based on natural language instructions, or use tools like web search and programming. This functional perspective evaluates AI agency based on observable behaviors rather than intrinsic mental characteristics. Linking Agency to Intentions Core elements of agency—goal pursuit, autonomy, and planning—are closely tied to intentions. According to Bratman’s Belief-Intention-Desire (BID) model, intentions arise from beliefs about the world and desires to act within it. While narrow AI can display agentic behavior without general intelligence, it remains debatable whether general AI inherently requires agency. Theories of Intentions Intentions are explained through two primary frameworks: 1. The belief-desire theory views intentions as mental states combining a desire to act with the belief in executing that action. 2. The goal-oriented theory regards intentions as stable attitudes essential for reasoning and planning, guiding conduct without frequent reconsideration. Legal Perspectives on Intentions In law, intentions are critical to determining responsibility. Both common and civil law systems require mental (mens rea) and physical (actus reus) components for criminal acts. Common law includes purpose, knowledge, recklessness, and negligence under mens rea. In civil law, particularly in Italy, "dolo" involves two elements: an epistemic element (foreseeing consequences) and a volitional element (will to act). Motivations for Legal Insights The legal framework offers a codified and democratic foundation for examining intentions in artificial agents. Focusing on the strongest requirement, such as intention, ensures clarity in determining responsibility. Scholars like Lagioia and Sartor argue that artificial systems programmed with sufficient mens rea elements could theoretically commit crimes. This raises questions about whether current legal paradigms are sufficient or whether new systems are needed to address the unique challenges posed by AI agents. Reinforcement Learning (RL) Agents Reinforcement learning agents are typically modeled using a Markov Decision Process (MDP), represented as a tuple M = (S, A, T, R, b). This includes: - State Space (S): Represents the world’s possible states. - Actions (A): The set of actions the agent can take. - State Transition Function (T): Provides the probability of transitioning from one state to another after an action. - Reward Function (R): Determines the immediate reward for being in a specific state or taking specific actions. -Policy: Maps state actions, guiding the agent's decisions. Belief-Desire-Intention (BDI) Agents BDI agents are structured around: - Beliefs: Represent what the agent knows. - Desires: Define the goals the agent wants to achieve. - Intentions: Focused commitments to specific goals. Intentions are executed through "intention plans" (i-plans), which are sequences of actions the agent follows. Comparing MDP and BDI Models While both models share common actions and transitions, differences arise: - MDP focuses on achieving a goal through optimal policies, ignoring the mental states behind the actions. - BDI includes epistemic (knowledge-based) and bouletic (desire-based) components, adding depth to the agent’s decision-making. Limitations of the MDP-BDI Comparison The simplistic mapping of BDI intentions to MDP target states is seen as: - Trivial: It overlooks the complex epistemic and bouletic structures that define true intentions. - Circular: Designing agents to act intentionally requires more than ensuring outcomes; the underlying reasons for actions matter. To address these issues, an alternative approach involves mapping beliefs to the transition function and desires to the reward function within an MDP framework. Backward-Looking Responsibility Attributing responsibility to AI agents retrospectively requires considering their intentions. Philosophical frameworks, such as Plato’s reflections on lifeless objects and responsibility, illustrate that intention might be a helpful but not necessary criterion. Reactive-Attitudes Theories Theories based on emotional responses to peers (e.g., blame or praise) may not apply to AI, as these require AI to be viewed as part of a community—a notion still largely theoretical. Reason-Responsiveness Views Responsibility requires agents to be guided by reasons. For AI, this raises critical questions: - Can AI possess "reasons"? - Can they respond to moral considerations? Fischer and Ravizza argue that without moral receptivity, AI cannot be attributed responsibility in a traditional sense. Prospective Aim: Alignment Alignment refers to ensuring that AI systems behave in ways consistent with human values and goals. It connects to forward-looking responsibility, aiming to align AI motives and behaviors with societal needs. Schlick's view frames responsibility as applying appropriate incentives (reward/punishment) to achieve desired outcomes. Intentions Are Not Enough Designing agents with intentions marks a beginning but requires further refinements: Complexity Threshold: Not all systems displaying intentional behavior qualify for legal or moral consideration Normative Requirements: Agents must respond to reasons, demanding integration of ethical frameworks into architectures like reinforcement learning. Behaviorist Perspectives Behaviorist approaches infer intentions from actions and context rather than internal states. This parallels how courts evaluate human intentions based on evidence. While practical, this perspective avoids deeper philosophical questions about true intention. Other AI Techniques Beyond reinforcement learning, systems like large language models can exhibit agentic behaviors by: 1. Modeling environments to link actions with consequences. 2. Holding preferences over states to identify desirable outcomes. These capabilities align with the epistemic (knowledge-based) and bouletic (desire-based) elements of intentionality. The study distinguishes between descriptive (what intentions AI systems can have) and normative (what intentions they should have) questions. While current methods raise conceptual challenges (e.g., interpreting reward functions), they open possibilities for aligning future AI systems with individual and collective human values. The alignment process remains central to ensuring responsible AI deployment and ethical integration