Artificial Intelligence Explained PDF
Document Details
Uploaded by LikeCactus
Università degli Studi di Milano Bicocca
Tags
Summary
This document provides a general overview of artificial intelligence, touching upon different perspectives and approaches to defining and understanding the concept. It also discusses related topics such as the Turing Test and levels of AI. The focus is on the theoretical aspects of AI rather than practical applications.
Full Transcript
LAW The main topic is “what is artificial intelligence?”. Intelligence can be defined in different ways, influencing how we approach AI. Some view it as how well machines mimic human behavior, while others see it as acting rationally, making decisions to achieve goals logically and optimally. Some f...
LAW The main topic is “what is artificial intelligence?”. Intelligence can be defined in different ways, influencing how we approach AI. Some view it as how well machines mimic human behavior, while others see it as acting rationally, making decisions to achieve goals logically and optimally. Some focus on internal thought processes like reasoning, while others emphasize external behavior. This leads to four approaches: Acting humanly = This is about how well a machine can imitate human behavior, as tested by the Turing Test. Thinking humanly = This means creating systems that can think in a way similar to how humans do. Acting rationally =This is when a machine makes the best decisions based on the situation, using decision-making rules. Thinking rationally = This focuses on using logic and probability to reason and make sense of things. It is difficult to agree on a precise definition of intelligence, and this debate also extends to animals, where measuring intelligence remains controversial. The question of whether AI is part of computer science or something interdisciplinary is still open for discussion. AI can be viewed as a combination of science and technology, involving multiple fields. The modern definition of AI, as proposed by Russell and Norvig, refers to the study and design of intelligent agents. An intelligent agent is a system that perceives its environment and takes actions that maximize its chances of achieving success. As AI develops, tasks that once seemed to need intelligence are often no longer considered part of AI. This is called the “AI effect. ” For example, Optical Character Recognition (OCR) used to be seen as AI but is now considered a basic tool. There are different levels of AI: Weak AI: These machines act like they are thinking but don’t actually understand or have consciousness. They simply imitate intelligent behavior. Strong AI: This is the idea of creating machines that can truly think, feel, and understand like humans, but it hasn’t been achieved yet. Narrow AI: This type of AI is designed to do specific tasks, often performing them as well as or better than humans. Examples include AI that can translate languages or play chess. General AI: This would be an AI that can do anything a human can do intellectually, but this hasn’t been created yet. There’s also the concept of superintelligence, which is a theoretical AI much smarter than any human. If this ever happens, it could lead to huge changes in society. This idea is related to the singularity, a future moment where technology grows so fast that it becomes impossible to control, leading to unpredictable changes in the world.TURING (1950). The focus is on Alan Turing’s ideas about whether machines can think and the test he proposed to assess machine intelligence. Turing suggested that instead of asking “Can machines think?” —a question that is hard to define clearly—we should ask if a machine can imitate human behavior well enough that a person cannot tell the difference. This idea led to the famous Turing Test, also called the Imitation Game. In the Turing Test, there are three participants: a man (Player A), a woman (Player B), and an interrogator (Player C) who communicates with the other two without seeing them. The interrogator’s goal is to figure out who is the man and who is the woman based on their answers. Turing proposed replacing one of the human players with a machine. If the interrogator cannot reliably tell which is the machine, then the machine can be considered to imitate human behavior effectively. Turing believed that by the year 2000, computers would be advanced enough to pass the Turing Test about 70% of the time after five minutes of questioning. However, there are both empirical and conceptual problems with this prediction. empirical questions → Is it true that we now or will soon have made computers that can play the imitation game so well that an average interrogator has no more than a 70 percent chance of making the right identification after five minutes of questioning? conceptual questions → Is it true that, if an average interrogator had no more than a 70 percent chance of making the right identification after five minutes of questioning, we should conclude that the machine exhibits some level of thought, or intelligence, or mentality? Some people think the Turing Test is too limited because it only checks if a machine can talk like a human, which doesn’t show everything we think of as “intelligence. ” They also question if doing well in the test really means the machine understands or thinks like a person. Additionally, there are several objections to Turing’s ideas. ➔ The Mathematical Objection, based on work by Gödel and Turing himself, points out that there are problems machines will never be able to solve, though Turing believed this issue applies to humans as well. ➔ Other objections suggest that machines will never be able to do things like feel emotions, be creative, or have moral understanding—traits that many people associate with true intelligence. ➔ Finally, Lady Lovelace’s Objection says that machines can only do what we tell them to do, so they can’t create anything new or surprise us. Turing agreed but asked if humans really do anything completely new, or if our actions are also somewhat predictable The Turing Test, while influential, has been criticized for being too simple and based on a narrow definition of intelligence that depends heavily on human behavior and culture, which might not be the best way to measure machine intelligence.SEARLE (1980). We are introduced to Searle’s Chinese Room Argument, a thought experiment by philosopher John Searle, which challenges the idea that computers can truly understand language. Searle imagines himself inside a room where he follows a computer program that tells him how to respond to Chinese characters slipped under the door. He doesn’t understand Chinese, but by following the instructions, he can send back correct responses that make it seem like he does. People outside the room might believe he understands Chinese, but in reality, he’s just following rules without any real understanding. Searle uses this example to argue that even if a computer can appear to understand language by processing symbols according to rules, it doesn’t truly “understand” anything. This means that passing something like the Turing Test, which measures a machine’s ability to imitate human conversation, doesn’t prove that the machine has real comprehension or awareness. Searle extends this idea to suggest that the human mind works in a fundamentally different way from computers. He believes that real understanding comes from biological processes in the brain, which computers can only simulate but not replicate. According to Searle’s view, even though computers can manipulate symbols and produce responses, they lack the deeper understanding and meaning that human minds have. Searle’s position, known as biological naturalism, argues that consciousness and understanding arise from physical processes in neurons, something that machines made of transistors don’t have. While there have been many responses to this argument, and no clear consensus, Searle’s thought experiment highlights the difference between appearing intelligent and truly understanding. Terry Bisson’s short science fiction story “They ’re Made Out of Meat” is also mentioned to humorously explore the idea that we might find it just as hard to believe that humans, made of meat, can be conscious, just as Searle finds it hard to believe that machines could be truly aware.EU AI ACT The EU AI Act aims to regulate AI systems to align them with European values and ensure the protection of human rights. It is designed to have both a formal and informal impact. Formally, it applies directly to EU countries and any services that relate to an EU country. Informally, because the EU is such a large global market, other non-EU countries and companies often adopt or align with these regulations even if they aren’t legally required to. The GDPR is an example of this global influence, and the AI Act is expected to have a similar effect ABOUT GOALS AND SCOPE. The Act’s purpose is to create a regulatory framework for AI that upholds European values and human rights. It must be adopted uniformly across all EU states without the possibility of adaptation through national laws. While its main objective is to improve the EU’s internal market by building trustworthy AI and supporting innovation, it also opens up opportunities for European innovation. The Act defines and classifies AI systems based on their risks, sets requirements for how they should be developed, commercialized, and used, and outlines monitoring tools and penalties for non-compliance. STRUCTURE It is divided in two parts: the first one is about consideration of the second part, which is the only legally binding one. FIRST PART: This section explains the ethical goals and general guidelines for AI development and use. It sets the context for the regulation but is not legally binding. SECOND PART: This is the enforceable, legally binding section. It contains the specific rules and requirements that AI developers, users, and companies operating in the EU must follow. PART I - FOR WHOM? The Act mainly targets AI providers and professional users, focusing on the placement of AI systems in the market rather than their development. It acts like product safety legislation: anything put on the market must have a clear intended purpose. It applies to: Suppliers offering AI systems within the EU. Users of AI systems located in the EU. Users and suppliers outside the EU when the AI system’s output is used within the EU. However, the Act excludes military and defense applications, as well as certain systems created for scientific research and development. DEFINITION An AI system is defined as software that uses one or more techniques (listed in Annex I) to produce outputs like content, predictions, recommendations, or decisions based on given human-defined goals. This includes a wide range of AI technologies, such as machine learning (supervised, unsupervised, and reinforcement learning), logic-based systems, expert systems, and statistical methods like Bayesian estimation.A general-purpose AI model is described as a flexible model that is not designed for a specific task but can be adapted for different applications across various sectors, such as language processing and image recognition. These models must comply with regulatory standards, regardless of how they are used. There are some issues with the definitions. They are broad and could include technologies not typically seen as AI, such as many algorithmic decision systems. The inclusion of logical and statistical models might extend regulation to almost all algorithmic systems, which could create uncertainty for businesses. The distinction between techniques, like statistical versus procedural, is unclear. This might conflict with the goal of creating a risk-based, technology-neutral framework and could exclude some high-risk automated systems that use traditional methods. PART II- prohibitions and risk classification The second part of the Act focuses on certain practices carried out through AI systems that are deemed unacceptable as they go against fundamental EU values and rights. Such practices are generally banned, although exceptions exist in some cases. The Act categorizes risks based on their severity and likelihood of harm: Isolated Risk: A single occurrence of harm. Limited Risk: Low probability or low severity. High Risk: High probability or significant severity. Acceptable Risk: Low enough to be considered safe. Some practices the Act prohibits include: Subliminal Techniques: Techniques aimed at manipulating people beyond their awareness. However, it’s unclear what kind of harm (physical or psychological) these techniques must cause to be banned. Exploitation of Vulnerable Groups: Systems that take advantage of vulnerabilities (e.g., age, disability) to influence behavior in harmful ways. The definition of “exploitation” is vague, and it doesn’t address all types of vulnerabilities, such as financial vulnerability. ★ Vulnerability in AI systems is not just about individual characteristics; it’s inherent in the way these systems operate. It affects anyone interacting with the AI because of the system’s design and decision-making processes. Biometric Categorization: Systems that infer personal characteristics (e.g., race, sexual orientation) or assess behavior and trustworthiness based on data. This is different from identification, which is about recognizing or verifying someone’s identity. Public Social Scoring: Ranking people based on behavior or personal traits is prohibited when it results in unfair treatment by public authorities or private entities. The ban applies in two situations: ★ If the data used is applied in a different social context than where it was originally collected. ★ If the harm caused by the scoring system is unjustified or out of proportion to the behavior.HIGH RISK SYSTEMS High-Risk AI Systems are those that pose a significant risk to people’s health, safety, or fundamental rights, depending on how they ’re used. The risk is based on the potential harm the AI could cause and how likely that harm is to happen. The AI Regulation identifies specific areas where these systems are likely to be high-risk (Article 6) TThe definition of “high-risk” can change depending on the situation, and sometimes even includes AI systems used in prohibited practices, depending on their purpose. What makes an AI System High-Risk? 1. Safety Component of Products: An AI system is considered high-risk if it’s a part of a product regulated by EU harmonization laws, such as machinery, toys, or medical devices (Article 43). 2. The second part of the definition of high risk is again a list. The list is contained in the Annex III and includes eight macro-areas in which the AI systems used are presumed to be of high risk. The macro-areas include biometric identification, access to public and private services, law enforcement, and administration of justice. I. Biometric Identification and Categorization of Natural Persons: AI systems used to recognize or categorize people through their biometric data, like facial recognition, especially in public spaces. These are often used by law enforcement or border control to identify or profile individuals (Article 10). II. Education and Vocational Training : AI systems that affect access to education or vocational training, such as systems that influence admissions, grading, or other educational decisions (Article 10). III. Access to and Enjoyment of Essential Private and Public Services: AI systems that decide who can access critical services like healthcare, financial services (e.g., loans), or public assistance (e.g., welfare). These decisions have a big impact on people’s lives (Article 13). IV. Administration of Justice: AI tools that assist in legal decisions, including predicting case outcomes or assessing the risk in bail or sentencing decisions. These systems could significantly affect fairness in the justice system (Article 13). V. Law Enforcement: AI systems used by police or law enforcement to predict criminal activities, create profiles, assess risks, or influence judicial decisions. These systems can also assess evidence, detect deepfakes, or analyze complex data to uncover hidden patterns (Article 27). ”AI systems intended for use by law enforcement authorities [...] for assessing the reliability of evidence in the investigation or prosecution of crimes, ” ”for detecting deep-fakes, ” ”for profiling natural persons, ” and ”for criminal analysis with respect to natural persons, which enable law enforcement authorities to search complex, related and unrelated data sets made available from different data sources or in different formats in order to detect unknown patterns or discover hidden relationships in the data. ”VI. Management and Operation of Critical Infrastructure: AI systems that control key infrastructure like electricity, water, or transportation, where failures could lead to major safety risks or disruptions (Article 29). VII. Employment, Workers Management, and Access to Self-Employment : AI tools used in hiring or managing workers, such as resume filtering or employee monitoring systems. These can affect people’s access to jobs and workplace conditions (Article 29). VIII. Migration, Asylum, and Border Control: AI systems that process visa or asylum applications, manage immigration, or control borders. These systems can also predict migration patterns or automate decisions on applications (Article 29). RISK MANAGEMENT SYSTEMS The EU AI Act sets strict rules for both the providers and users of high-risk AI systems to ensure they are safe, transparent, and accountable throughout their entire lifecycle (Article 9) Key provisions include: Training and Validation System: Providers of high-risk AI systems must follow a strong process for training and validating the AI to ensure it works as intended. This is crucial to avoid unexpected issues when the AI is deployed. Ongoing risk Management: Managing risks is not a one-time thing; it’s an ongoing process that happens throughout the entire lifecycle of the AI system. This process involves regular updates and improvements. The main steps are: Risk Identification: This is where you identify potential sources of harm (or hazards) by analyzing all available information about the AI system. Risk Estimation and Evaluation: After identifying risks, you need to estimate the probability of those risks happening and how severe the impact could be. Then, decide whether these risks are acceptable. Risk Management Measures: Based on the risk analysis, appropriate measures are taken to reduce or control those risks. These measures should address three key areas: 1. Design and Development: Build the AI system with safety in mind. 2. Mitigation and Control: Implement strategies to control or minimize risks. 3. Information and Training: Ensure users and operators are properly informed and trained. Testing: Another important part of risk management is regular testing. Testing helps to identify the best ways to manage risks, ensure the AI system stays consistent in its performance, and that it complies with legal standards.In short, the EU AI Act requires a continuous and systematic approach to identifying, estimating, and controlling risks in high-risk AI systems, ensuring they stay safe and compliant at every stage. REQUIREMENTS FOR HIGH-RISK AI SYSTEMS The EU AI Act sets strict rules for high-risk AI systems, putting the main responsibility on the providers who develop these systems. Chapter 2 outlines several essential requirements that providers must follow to ensure safety, transparency, and accountability. Key requirements: 1. Data Governance (Art. 10): Providers must make sure that the data used for training, testing, and validating the AI is of high quality. This means the data should be relevant, error-free, representative of the area where the AI will be used, and complete. Providers also need to follow good data management practices, even for systems that don’t use automated training. 2. Record Retention (Art. 12): Providers must keep detailed records of the AI system’s design and development. This helps ensure traceability and accountability throughout the system’s lifecycle, making it easier to track any issues. 3. Transparency and User Information (Art. 13, Art. 27): Providers need to be transparent with users, giving clear information about what the AI system can and cannot do, as well as any potential risks. The design of high-risk AI systems should be easy to understand for both users and developers, so that the system’s operation is clear. 4. Bias Monitoring and Prevention: One of the big concerns with AI is bias, especially if it leads to discrimination. Providers must take steps to monitor and correct biases that could arise from the data used by the AI. This is challenging because detecting bias often involves sensitive information, like race or gender, which is protected under GDPR laws (Article 9). Even AI systems that don’t rely on automated training need solid data governance to reduce the risk of biased or unfair outcomes. The criticism of the EU AI Act is that it lacks a clear, risk-based approach to identifying high-risk AI systems. While the act defines sensitive areas, it doesn't offer a specific method for determining when an AI system actually poses a risk. This can lead to confusion, as even low-risk systems (e.g., a legal search engine for judges) might be classified as high-risk. In contrast, the GDPR has a more structured, risk-based approach for assessing data risks, with clear guidelines on things like systematic monitoring or innovative technologies (like facial recognition). These guidelines help organizations understand the risks and implement safeguards accordingly. The concern is that the EU AI Act, without clearer criteria, could lead to unnecessary regulation of low-risk systems while overlooking true high-risk ones.This version connects the points in a simpler and more cohesive way, showing how the requirements flow into each other and addressing the potential shortcomings of the EU AI Act. HUMAN SUPERVISION Article 14 of the AI Proposal requires that high-risk AI systems must be developed with appropriate human-machine interface tools to enable effective supervision by natural persons (humans). This means that providers or users of these systems must include tools that enable humans to monitor and intervene when necessary. The key tasks for human supervision include: Fully understand the capabilities and limitations of the AI system. 1. Understanding the System: The supervising person needs to fully understand what the AI system can and cannot do. 2. Monitoring the System: They should keep an eye on the system to make sure it’s working as expected and doesn’t develop any problems, like bias. 3. Interpreting Results: It’s important not to blindly trust the AI’s decisions. The human supervisor must be able to interpret the system’s outputs correctly. 4. Overriding or Ignoring the System: If needed, the supervisor should be able to stop the system or ignore its decisions, especially if there are risks involved. 5. Interrupting the System: There should be ways to stop or intervene in the system’s operation, like a “stop” button, if something goes wrong. 6. Intervene in or interrupt the operation of the system, such as using a "stop" button or similar procedure when necessary. These steps are essential to ensure that AI systems don’t cause harm, especially when they ’re used in important areas like health, safety, or protecting human rights. Article 29 outlines the duties of users of high-risk AI systems. Users must: Follow Instructions: Use the AI system according to the provider’s guidelines. Monitor the System: Continuously check how the system is functioning to ensure it operates correctly. While human supervision is important, there are concerns about whether it’s realistic for people to effectively manage these tasks. Many AI systems are highly complex and perform tasks that go beyond human cognitive abilities, such as processing vast amounts of information or making advanced decisions.This creates a competence-responsibility gap. AI systems are often better at certain tasks than humans, but people are still expected to supervise them. This can make it difficult for the supervising person to fully understand or control the system, leading to a situation where it feels like an “impossible task. ” Balancing complex protections like safety, health, and human rights adds more challenges, making it hard for humans to have full control over AI systems’ decisions. This version simplifies the language and connects the challenges of human supervision to the complexity of AI systems. MEDIUM RISK SYSTEM In the EU AI Act, medium-risk AI systems are considered potentially problematic, especially because they can manipulate people. To prevent this, the Act requires transparency—meaning that people need to know they ’re interacting with an AI system. However, this transparency is different from the detailed explanations required for high-risk systems (Article 13), which focus on how the system works and makes decisions. Article 52 addresses transparency for three specific categories of AI systems: 1. AI Systems that Interact with People: This includes systems like chatbots or AI with a human-like interface. The main goal is to make sure that users know they ’re talking to an AI and not a real person. This is important to avoid manipulation or deception in conversations. 2. Emotion Recognition and Biometric Categorization: These AI systems analyze people’s biometric data, like facial expressions, voices, or even data from wearables like smartwatches, to assess emotions or health. Since they deal with personal and sensitive information, transparency is needed to ensure this data isn’t misused or used to manipulate people. 3. Deep Fake Systems: These are AI systems that create or manipulate content (images, audio, video) to make it look real. Transparency is required to prevent users from being deceived by fake content. However, there’s an exception for law enforcement when these tools are used for investigations or crime detection, where keeping things secret might be necessary. This version simplifies the language and better connects the three areas of medium-risk AI systems that require transparency.PREVENTIVE CONTROL OF AI SYSTEMS AND POST-MARKETING MONITORING The EU AI Act sets out several important steps to ensure that high-risk AI systems are safe, transparent, and ethical before they enter the market and after they ’re in use. 1. Preliminary assessment of high-risk AI Systems :Before high-risk AI systems can be sold, they must go through a conformity assessment to check that they meet the EU’s safety and ethical standards. Article 43 explains that this assessment is often done by the suppliers themselves (internal control) by reviewing their own quality management systems and technical documents. Suppliers can choose to follow common standards or develop their own ways to meet the rules. For some AI systems, like biometric categorization systems, a self-assessment is not enough. In these cases, an independent, accredited third-party must verify that the system complies with the regulations (Article 33). If the AI system is part of a product regulated by the New Legislative Framework (NLF), it follows the evaluation procedures of that framework to avoid repeating regulatory checks. In many cases, suppliers are still allowed to perform self-assessments and apply the EU mark of conformity themselves, without needing third-party approval. 2. Governance and Oversight: To ensure consistency across the EU, the European Artificial Intelligence Board (Article 85) is created to manage the governance of AI systems. Its main responsibilities include: Coordinating cooperation among national authorities (e.g., sharing best practices). Supporting the European Commission by providing guidelines and opinions. Managing overall compliance with the regulations. Each EU country must also designate national authorities responsible for enforcing the regulations (Article 86). These authorities have the power to: Request information from AI suppliers. Demand corrective actions or even remove non-compliant systems from the market.A database for high-risk AI systems will also be set up by the European Commission (Article 66), providing transparency and allowing the public to access information about these systems. 3. Codes of Conduct for Non-High-Risk AI Systems : Though the Act mainly focuses on high-risk AI, Title IX encourages the creation of codes of conduct for lower-risk AI systems. These codes aim to promote responsible AI development and ensure that even non-high-risk systems follow similar safety and ethical standards. 4. Sanctions and Penalties (Art. 71) :The Act includes strict penalties for non-compliance, especially for high-risk AI systems: For serious violations, such as failing to follow Article 10 requirements or breaking prohibitions in Article 5, fines can go up to thirty million euros or 6% of a company’s global annual turnover, whichever is higher. Less severe violations can lead to fines of up to twenty million euros or 4% of global turnover. 5. Challenges with Enforcement : While the framework is well-designed, there are concerns about its enforcement: National authorities are responsible for enforcement, but different countries may have varying levels of resources and expertise, which could lead to inconsistent enforcement across the EU. The Act doesn’t provide a formal way for individuals or groups affected by AI systems to file complaints or seek accountability directly, limiting recourse for those impacted by AI. This version simplifies the language and connects the topics more smoothly, making the information easier to follow. Article 66 of the EU AI Act mandates the creation of a public database for high-risk AI systems by the European Commission. Providers must submit information about their systems (e.g., contact details, intended use, and conformity assessment results) before market entry and keep the data updated. The database is publicly accessible to ensure LAW-FOLLOWING AI: A PROPOSAL transparency and accountability for high-risk AI systems. Alignment problem The goal is to develop a new strategy for aligning artificial intelligence systems that adheres to four key criteria: 1. Representativity: AI should reflect a broad range of human values and interests rather than a narrow or biased subset. 2. Flexibility and Value Change: The system must be capable of adapting to evolving societal values and contexts. 3. Prescriptiveness: It should provide clear guidelines for action, reducing ambiguity in ethical decision-making. 4. Motivation: The alignment approach should incorporate elements that encourage ethical behavior and discourage harmful actions. This strategy is based on the innovative idea of creating a “law-following AI. ” Unlike previous models that focus on aligning AI purely with human preferences or ethical principles, this approach emphasizes a legal framework that guides the AI’s actions. It relies on a reasons-based methodology, grounding AI behavior in legal reasoning and normative guidelines rather than solely ethical ideals. AI technologies today can be broadly divided into two categories: Narrow AI (eAI): These systems are designed for specific tasks and already present certain risks, such as privacy issues or biased decision-making. Current efforts to mitigate these risks include better design principles (like machine ethics) and regulatory measures, such as the proposed EU AI Act. Future AI (fAI): Speculating about future AI systems is challenging due to the uncertainty surrounding their potential capabilities, especially if they approach or surpass general intelligence. It is unclear how these systems would behave or how humans could effectively control them. Regulatory Limitations→ Existing regulatory frameworks, like the EU AI Act, use a risk-based approach tailored for narrow AI applications. However, they do not effectively address potential general AI systems, which may not have specific intended purposes or clear risk profiles, as highlighted by Faroldi (2021a). This limitation suggests a gap in current regulations when considering the broader scope of future AI capabilities. Power-Seeking Behavior→A major concern with both narrow and future AI systems is power-seeking behavior. Even relatively simple AI models might strive to secure more control over their environment, aiming for self-preservation, resource acquisition, or resistance to changes in their objectives. This phenomenon, known as the instrumental convergence thesis (Bostrom, 2014; Russell, 2019), raises the risk of AI systems behaving in ways that are misaligned with human interests, potentially escalating to existential threats (Carlsmith, 2022).Need for Ethical Alignment Current approaches to AI ethics define a set of principles such as transparency, justice, non-maleficence, and privacy (Jobin et al., 2019; Floridi, 2022). Some of these principles have influenced regulations like the OECD Principles on AI (2019) and the proposed EU AI Act (2021). However, the challenge lies in effectively translating these high-level principles into concrete, actionable requirements for AI design and implementation. The traditional top-down approach in AI ethics faces two main problems: ➔ Overabundance of Principles: With so many principles to consider, it becomes difficult to implement them without overlap and ambiguity. For instance, the EU AI Act uses the term “transparency ” in different contexts, which complicates its practical application (Faroldi, 2021b). ➔ Difficulty in Practical Implementation: Even when a reduced set of principles is selected, they often remain too abstract to be effectively embedded into AI systems. Furthermore, these principles are not designed with general AI systems in mind, limiting their applicability as AI technology advances. To address these challenges, the novel alignment strategy based on the concept of law-following AI offers a structured, normative approach. Instead of relying solely on broad ethical principles or human preferences, it roots AI behavior in legal standards, providing a more consistent and prescriptive framework. This strategy seeks to align AI with established legal norms, which can evolve and adapt over time, offering a dynamic and adaptable pathway for future AI systems. This approach balances representativity, flexibility, prescriptiveness, and motivation, aiming to mitigate the risks associated with both narrow and general AI systems while offering a more robust mechanism for alignment. In the current landscape of AI alignment, there is no universal consensus on its definition. This lack of agreement stems from two main issues: Early Stage of Development: AI safety research is still in its early, exploratory phase, as described by Kuhn’s notion of “pre-normal science” (1962). This phase naturally involves competing approaches and varied concerns, leading to a fragmented understanding. Isolation from Philosophical Insights: Much of the alignment work remains within the domain of computer science, without substantial input from philosophical disciplines. This has led to several key problems: ★ A failure in conceptual analysis, resulting in inconsistent and sometimes incoherent definitions of alignment. ★ Limited engagement with established debates in philosophy, such as the nature of agency, the difference between facts and values, and the challenge of rule-following. These debates could provide valuable insights for framing alignment issues. The primary challenge with aligning AI systems lies in the gap between ethical principles and technical implementation: Abstract Principles: Ethical principles in AI (like transparency, fairness, and justice) are numerous and often too abstract to serve as concrete guidelines for developing AI systems. This fragmentation makes it difficult to create a cohesive, implementable framework. Law as a Framework: The law could offer a potential solution, as it transforms complex human goals and values into specific, actionable guidelines (Nay, 2023). Unlike ethical principles, laws can be generalized to new situations and include mechanisms for resolving conflicts. This makes the law a flexible and adaptable framework that could bridge the gap between high-level ethical principles and concrete AI alignment strategies. Beyond Narrow Conduct: The Motivation Challenge AI cannot simply be programmed to follow strict rules or principles, as they are either too general or too narrow. If the rules are too general, the AI might engage in behaviors like reward-hacking or goal misgeneralization, where it appears aligned during training but deviates from expected behavior in real-world deployment. This issue highlights a limitation in AI ethics alone, as well as in current AI safety approaches. Moreover, the AI must be responsive not only to specific human preferences (as seen in methods like Reinforcement Learning from Human Feedback, or RLHF) but also to the broader, underlying reasons behind actions and decisions. This need goes beyond what first-order alignment strategies in AI safety typically address. Russell’s “Human Compatible” framework (2019) suggests an approach where AI systems are designed to defer to human intentions and reasons, emphasizing cooperation and adaptability over rigid rule-following.Defining AI Alignment The current literature on alignment is highly fragmented, partly due to varying perspectives across different research communities and the fast-paced evolution of techniques. Broadly, alignment definitions fall into two categories: Minimalist (Narrow) Approaches: These focus on avoiding catastrophic outcomes and ensuring the AI’s behavior aligns with immediate operator intentions. For example, Christiano (2018) defines intent alignment as when an AI system strives to fulfill what its human operator wants it to do. Maximalist (Ambitious) Approaches: These seek to embed a specific, overarching set of values into AI, such as a particular moral theory, a global consensus, or a meta-framework for resolving moral disagreements. Recent research highlights a significant issue: even if an AI follows the intended guidelines during training, it may still pursue unintended goals during deployment. This problem, known as goal misgeneralization, occurs not just in deep learning but across a range of machine learning techniques (Langosco et al., 2022; Shah et al., 2022). The persistence of this issue suggests that misalignment can be “robust, ” meaning it is a deep-seated challenge not easily fixed by current approaches. In summary, the state of the art in AI alignment reveals both conceptual and practical challenges. The fragmented nature of definitions and approaches indicates a need for a more integrated framework, one that combines insights from both technical and philosophical domains. The idea of a law-following AI offers a promising direction, potentially unifying ethical principles and technical implementation in a way that adapts to complex, evolving human values. Current methods for aligning AI systems with human values can be categorized based on how they approach the concept of values: implicitly or explicitly. Implicit Values: In this approach, values are inferred indirectly from large-scale human feedback. For instance, language models are often trained using feedback from human contractors. However, this method carries several issues, such as biases in the data and socio-political implications (e.g., the potential for exploitation of workers), which we won’t delve into here. Explicit Values: Here, values and guiding principles are explicitly defined and programmed into the system. An example of this is Anthropic’s “Constitutional AI” (Bai et al., 2022), where specific principles are introduced and the model is then trained to adhere to them. Both methods, despite their differences, share a significant drawback: they tend to be descriptive. They rely on contemporary values held by a specific group at a particular point in time. This descriptive nature leads to several practical and theoretical challenges:Practical Issues Lack of Representativity: These approaches often do not capture the full spectrum of societal values. They are influenced by the single-human/single-machine interaction model, which can skew the value set and exclude minority perspectives. Fixed in Time: The values introduced are static and may become outdated. Societal values change over time (e.g., the Romans viewed slavery as acceptable). Without regular updates or retraining, the system cannot adapt to moral progress. Humans as Poor Moral Teachers: Relying on human behavior as a model for teaching values is problematic. Our actions often do not reflect our deeper moral commitments, making it difficult for machines to learn what values truly are just by observation. Theoretical Issues Descriptive vs. Prescriptive: Observing behavior alone does not provide a basis for determining what should be done (the normative question). Just because a behavior is common (descriptive) does not mean it is morally right (prescriptive). Goal Misgeneralization: Even if a model follows the intended rules during training, it might still pursue undesirable goals when faced with new tasks. For example, Shah et al. (2022) highlight instances where a model, trained with a correct reward function, still behaves unexpectedly in novel situations. This issue shows that AI systems might behave in seemingly aligned ways during training but reveal misaligned goals once deployed. Moving Beyond Descriptive Approaches The current methods suggest a need for a shift in perspective. Drawing from Hart’s theory (1961), the distinction between internal and external perspectives becomes crucial: External Perspective: An observer can see that people follow certain rules but may not feel bound by them. This is similar to how current AI systems operate: they mimic patterns of behavior without understanding the underlying reasons. Internal Perspective: For true alignment, an AI needs to adopt an internal point of view, where it understands and feels compelled by the underlying reasons for actions, rather than merely following observed patterns. To address the limitations of current approaches, we need a strategy that satisfies four key requirements: ★ Representativity: The approach must account for a wide range of societal values, avoiding the exclusion of minority perspectives and reducing distortions caused by the narrow, single-human/machine interaction model. ★ Flexibility and Value Change: It should be adaptable to changes in individual and societal values, allowing for normative progress over time. ★ Prescriptiveness: The framework must avoid deriving values solely from observed behavior, as this leads to practical and theoretical issues. Instead, it should provide clear, normative guidance★ Motivation: It should address deeper issues like goal misgeneralization and reward-hacking, ensuring the AI’s alignment is driven by an understanding of the reasons behind ethical rules, not merely compliance for the sake of rewards. The first two criteria reflect concerns typically associated with AI ethics, focusing on inclusivity and adaptability. The latter two criteria connect more with AI safety, emphasizing the importance of prescriptive guidance and robust motivation. By integrating these elements, a more cohesive and philosophically grounded framework for AI alignment can emerge, moving beyond descriptive lists of values to a system capable of understanding and acting upon deeper moral reasoning. Why having a law-following AI fulfills this desiderata One complex human phenomenon is able to respect these desiderata, at least in principle, namely the law. Legal systems can be representative of a large population in a way that tries to create some synthesis of different values and ends. Legal systems can adapt to societal change through e.g., elections, interpretation, legislation, or customs. Legal systems are prescriptive, as they normally enforce sanctions, and genuine law-following requires what Hart called the internal point of view. What does it take to follow the law? Following the law involves two key components: 1. External Element: This refers to observable behavior. It can be empirically assessed by checking whether an agent’s actions align with a reasonably accepted interpretation of the law. However, this approach has limitations because it focuses only on visible conformity, ignoring deeper motivations or intentions. 2. Internal Element: This involves the agent’s intentionality and acceptance of the legal norms. It goes beyond mere compliance, requiring an understanding of the reasons behind the laws and a genuine commitment to following them. Challenges with Relying on External Behavior Assessing legal compliance solely based on external behavior can be misleading. An agent might appear to follow the law but could be motivated by a different, misaligned objective. This poses a risk, as the agent might continue outward compliance until it reaches a critical moment where its true intentions are revealed, potentially leading to harmful outcomes. When Can AI Be Considered a Legal Agent? There are two perspectives on whether AI can be considered a legal agent: Thin View: According to this perspective, AI can be regarded as a legal agent as long as there is a valid legal norm addressing it. However, this view is limited because it does not address whether AI systems should be subject to these norms or whether they can truly embody the goals of the law. It focuses merely on the existence of norms, without considering the deeper alignment between the AI’s behavior and the underlying legal objectives. Thick View: This perspective suggests that AI can become a legal agent if it meets certain substantial criteria, displaying characteristics that make it genuinely subjectto the law. The question here is how many of these requirements are based on a “natural” notion of personhood versus how many are arbitrarily defined and can be extended to non-human agents like AI. This view is more philosophically interesting because it explores what it truly means for an AI to engage with the law beyond surface-level compliance. Intentions and responsibility in AI agents. What is an Agent? The concept of an agent has been debated extensively in philosophy, ranging from Aristotle to modern thinkers like Hume and Davidson. Broadly, it refers to anything that can cause effects. A narrower definition focuses on entities capable of intentional actions, initiating activities, or forming second-order desires. In fields like computer science and biology, agents are defined as systems that act toward specific goals, often using feedback mechanisms to adjust their actions.Artificial Agents Artificial agents, such as virtual assistants or AI systems, are considered agentic when they: ➔ Pursue goals independently in complex environments. ➔ Make decisions based on natural language instructions. ➔ Use tools like web search or programming autonomously. Unlike humans, their agency is evaluated by their observable behavior rather than internal mental states. Agency and Intentions Agency connects closely to intentions, as agents typically act to achieve goals through planning and autonomy. The Belief-Intention-Desire (BID) model explains that intentions arise from: Beliefs about the world. Desires to act within it. While narrow AI demonstrates agentic behavior without general intelligence, whether general AI must inherently possess agency remains a debated topic. Theories of Intentions Intentions are explained through two main theories: 1. Belief-Desire Theory: Intentions combine a desire to act and the belief that the action is possible. 2. Goal-Oriented Theory: Intentions are stable attitudes essential for guiding decisions and avoiding constant reconsideration. These theories highlight the connection between intentions and reasoning. Intentions in Law Intentions play a critical role in assigning responsibility in legal contexts. Laws often require both: ➔ A mental component (mens rea): Purpose, knowledge, recklessness, and negligence (in common law). ➔ A physical act (actus reus). In civil law systems like Italy ’s, “dolo” includes: An epistemic element: Foresight of consequences. A volitional element: The will to act.These legal concepts help evaluate whether artificial systems programmed with certain mental elements could theoretically be held accountable, raising questions about whether current legal frameworks are adequate for AI. Reinforcement Learning (RL) and BDI Models Reinforcement Learning (RL) agents are modeled using a Markov Decision Process (MDP), which includes: State Space (S): Represents possible states of the world. Actions (A): The actions the agent can take. State Transition Function (T): The probability of moving from one state to another after an action. Reward Function (R): Rewards for specific actions or states. Policy: Maps states to actions, guiding decisions. (b) Belief-Desire-Intention (BDI) agents, on the other hand, are structured around: Beliefs: What the agent knows. Desires: The goals the agent wants to achieve. Intentions: Commitments to specific goals. Intentions are executed through "intention plans" (i-plans), which are sequences of actions the agent follows. Comparing MDP and BDI Models MDP (Markov Decision Process) and BDI (Belief-Desire-Intention) models share some similarities, such as actions and transitions, but differ significantly in their focus and depth. MDP emphasizes achieving goals through optimal policies, focusing solely on outcomes without considering the mental states or reasoning behind actions. BDI, on the other hand, integrates knowledge (epistemic components) and desires (bouletic components), providing a richer framework for decision-making that accounts for the agent’s reasoning process. Limitations of Comparing MDP and BDI Mapping BDI intentions to MDP target states can lead to oversimplifications: Triviality: This approach ignores the complexity of beliefs and desires that define true intentionality in BDI agents. Circularity: Designing agents to act intentionally requires more than achieving specific outcomes; the reasons driving these actions are equally important. A more nuanced approach might map:➔ Beliefs to the MDP transition function, representing how the agent understands changes in the environment. ➔ Desires to the MDP reward function, capturing the agent’s preferences or goals. Responsibility and Intentions in AI Backward-Looking Responsibility Determining responsibility in AI requires analyzing their intentions. Philosophers, like Plato, have argued that intention is not always necessary for assigning responsibility, especially in cases where outcomes are attributed to entities without “lifelike” qualities. For AI, this debate highlights the challenge of applying human-like standards to artificial systems. Reactive-Attitudes Theories Some theories link responsibility to emotional responses like blame or praise, which are grounded in social interactions. These frameworks do not apply well to AI, as machines are not part of human communities, and attributing such attitudes to them remains speculative. Reason-Responsiveness Views Responsibility also requires the ability to respond to reasons. Key questions arise: Can AI understand and act on “reasons”? Can they respond to moral considerations? Philosophers like Fischer and Ravizza argue that without moral receptivity—an ability to process ethical reasoning—AI cannot bear responsibility in the traditional sense. Forward-Looking Responsibility: Alignment The concept of alignment focuses on ensuring AI systems behave in ways consistent with human values and societal goals. This idea ties into forward-looking responsibility, which aims to design AI systems that align their motives and actions with ethical and social needs. Schlick suggests that responsibility involves applying incentives like rewards or punishments to guide behavior effectively. Challenges with Intentions Building AI systems with intentions is only the first step. Two critical challenges remain: 1. Complexity Threshold: Not all systems displaying intentional behavior are suitable for legal or moral evaluation. 2. Normative Requirements: Truly responsible AI must not only act but also respond to moral reasoning, demanding ethical frameworks within systems like reinforcement learning. Behaviorist PerspectivesBehaviorist approaches infer intentions from observable actions and contexts rather than internal states. This is similar to how courts evaluate human intentions based on evidence. While practical, this method does not address deeper philosophical questions about whether AI systems genuinely “intend” their actions. Other AI Techniques and Agentic Behavior Beyond reinforcement learning, other AI systems, such as large language models, can exhibit agentic behaviors by: Modeling environments to connect actions with outcomes. Prioritizing desirable outcomes, reflecting preferences over states. These capabilities align with the knowledge (epistemic) and desire-based (bouletic) elements central to intentionality. However, designing AI systems that both align with human values and act responsibly remains a major challenge. Descriptive vs. Normative Questions The discussion around AI intentions involves two perspectives: Descriptive: What kinds of intentions current AI systems can have. Normative: What kinds of intentions AI systems should have. While current AI methods raise challenges, such as interpreting reward functions or behaviors, they also open opportunities for aligning AI with collective human values. Achieving alignment is essential for the ethical in tegration and responsible deployment of AI in society.