Week 1 Lec 1.pdf

Responsible & Safe AI Prof. Ponnurangam Kumaraguru (PK), IIITH Prof. Balaraman Ravindran, IIT Madras Prof. Arun Rajkumar, IIT Madras Week 1: AI Risks Improvement in AI capabilities 2 What is the current situation? Hard to differentiate between AI & Human How did we get here? Scaling up algorithms Scaling up data for training Increasing computing capabilities Not many predicted that we would have these advancements Worry about AI overtaking Human 3 4 AI capabilities Vision Reinforcement Learning Language Multi-Paradigm …. 5 GANs 2014 6 Image generation 7 Image generation New algorithms, GANs, transformers, diffusion models Scaling up of Compute & Data used during training 8 Image generation 9 Image generation 10 Image generation Professor teaching Responsible and Safe AI course at IIIT Hyderabad for 70+ students 11 Video generation 2019 DeepMind’s DVD-GAN model 12 Video generation April 2022 13 Video generation Tiny plant sprout coming out of land Teddy bear running in New York city Oct 2022 https://openai.com/index/sora/ 14 Video Games 2013 Pong and Breakout 15 Video Games 2018 Starcraft, Dota2 16 Strategy games 2016 / 17 AlphaGo 17 Strategy games 2022 Diplomacy Hidden alliances, negotiations, deceiving other players https://en.wikipedia.org/wiki/Diplomacy_(game) 18 Language based tasks Text generation Common-sense Q&A Planning & strategic thinking 19 Language models 2011 20 GPT-2 2019 21 GPT-3 2020 Same as GPT-2 100X parameters 22 ChatGPT 2022 Significant changes form GPT-3 23 Common sense Q&A Google’s 2022 PaLM model 24 25 Common sense Q&A Google’s 2022 PaLM model 26 AP exam 27 Planning & Strategic thinking 28 Acting on instruction / plans 29 https://www.adept.ai/blog/act-1 https://arxiv.org/pdf/2307.07924.pdf 30 31 32 33 34 35 36 37 ChatGPT Facts Writing email Writing code And many more….. 38 Any use cases / experiences from your side? 39 Coding: GPT-3 with Codex LM Codex is the model that powers GitHub Copilot Training = natural language and billions of lines of source code from publicly available sources OpenAI Codex is most capable in Python, but it is also proficient in over a dozen languages including JavaScript, Go, Perl, PHP, Ruby, Swift and TypeScript, and even Shell. https://openai.com/blog/openai-codex#spacegame 40 Math: Google’s MINERVA model (PaLM variant) 41 Math: AlphaTensor https://deepmind.google/discover/blog/discovering-novel-algorithms-with-alphatensor/ 42 Life Sciences: AlphaFold2 Predicting protein structure GDT is a measure of similarity between two protein structures https://en.wikipedia.org/wiki/Global_distance_test 43 https://blog.google/technology/ai/google-deepmind-isomorphic-alphafold-3-ai-model/#life-molecules 44 https://blog.google/technology/research/google-ai-research-new-images-human-brain/ 45 Similar systems / applications Bard by Google - is connected to internet, docs, drive, gmail LLaMa by Meta - open source LLM BingChat by Microsoft - integrates GPT with internet Copilot X by Github - integrates with VSCode to help you write code HuggingChat - open source chatGPT alternative BLOOM by BigScience - multilingual LLM OverflowAI by StackOverflow - LLM trained by stackoverflow Poe by Quora - has chatbot personalities YouChat - LLM powered by search engine You.com More in the list, Devin, GPT40 46 In summary Most of the advancements in 2022 and beyond Good at taking actions in complex environment, strategic thinking and connecting to real world 47 48 49 Activity #AICapabilities Imagine the optimal collaboration between AI and humans across sectors like healthcare, education, environmental management, and more. What innovations are necessary to achieve this? What challenges could arise, and what potential risks might we face in this best-case scenario? Drop your answers as a response in mailing list with subject line “Activity #AICapabilities” 50 White House: Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence, Oct 2023 https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/ 51 https://www.whitehouse.gov/briefing-room/statements-releases/2023/10/30/fact-sheet-president-biden-issues-executive-order-on-safe-secure-and-trustworthy-artificial-intelligence/ 52 Deepfakes https://www.youtube.com/watch?v=cQ54GDm1eL0 53 Deepfakes https://www.youtube.com/watch?v=enr78tJkTLE 54 Deepfakes: What goes on behind the scenes; go to colab 55 https://colab.research.google.com/github/JaumeClave/deepfakes_first_order_model/blob/master/first_order_model_deepfakes.ipynb Lip sync https://bhaasha.iiit.ac.in/lipsync/example_upload1 56 Face recognition https://youtu.be/jZl55PsfZJQ?si=3wD5xxRHgnD1p1fR 57 Weaponization https://www.theguardian.com/world/2023/dec/01/the-gospel-how-israel-uses-ai-to-select-bombing-targets 58 59 Errors / Bias in algorithms 60 https://techcrunch.com/2023/06/06/a-waymo-self-driving-car-killed-a-dog-in-unavoidable-accident/ Errors in algorithms 61 https://www.theguardian.com/technology/2022/dec/22/tesla-crash-full-self-driving-mode-san-francisco Errors in algorithms 62 https://www.indiatoday.in/technology/news/story/robot-confuses-man-for-a-box-of-vegetables-pushes-him-to-death-in-factory-2460977-2023-11-09 What is going on? ☺ https://www.youtube.com/watch?v=lnyuIHSaso8&t=75s 63 More https://economictimes.indiatimes.com/news/new-updates/man-gets-caught-in-deepfake-trap-almost-ends-life-among-first-such-cases-in-india/articleshow/105611955.cms 64 Malicious use: ChaosGPT "empowering GPT with Internet and Memory to Destroy Humanity.” https://decrypt.co/126122/meet-chaos-gpt-ai-tool-destroy-humanity 65 Malicious use: ChaosGPT https://en.wikipedia.org/wiki/Tsar_Bomba 66 Malicious use: ChaosGPT https://decrypt.co/126122/meet-chaos-gpt-ai-tool-destroy-humanity 67 Malicious use: ChaosGPT https://www.youtube.com/watch?v=kqfsuHsyJb8 68 Your list of AI risks? 69 What is an alignment problem? 70 What is an alignment problem? https://www.youtube.com/watch?v=yWDUzNiWPJA 71 Misalignment? https://www.ndtv.com/offbeat/ai-chatbot-goes-rogue-swears-at-customer-and-slams-company-in-uk-4900202 https://twitter.com/ashbeauchamp/status/1748034519104450874/ 72 73 74 https://flowingdata.com/2023/11/03/demonstration-of-bias-in-ai-generated-images/ 75 https://blog.google/products/gemini/gemini-image-generation-issue/ 76 https://blog.google/products/gemini/gemini-image-generation-issue/ 77 https://blog.google/products/gemini/gemini-image-generation-issue/ 78 Any questions? 79 Risk sources / Taxonomy Malicious use AI race Organizational risks Rogue AIs 80 Malicious use AI could be used to engineer new pandemics or for propaganda, censorship, and surveillance, or released to autonomously pursue harmful goals. 81 Malicious use: Bioterrorism Ability to engineer pandemic is rapidly becoming more accessible Gene synthesis is halving cost every 15 months Benchtop DNA synthesis can help rogue actors new biological agents with no safety measures https://www.nature.com/articles/s42256-022-00465-9 82 Malicious use: ChaosGPT "empowering GPT with Internet and Memory to Destroy Humanity.” https://decrypt.co/126122/meet-chaos-gpt-ai-tool-destroy-humanity 83 Persuasive AI AIs will enable sophisticated personalized influence campaigns that may destabilize our shared sense of reality AIs have the potential to increase the accessibility, success rate, scale, speed, stealth and potency of cyberattacks Cyberattacks can destroy critical infrastructure 84 Concentration of Power If material control of AIs is limited to few, it could represent the most severe economic and power inequality in human history. 85 Malicious use: Solutions Improving biosecurity Restricted access controls Biological capabilities removed from general purpose AI Use of AI for biosecurity Restricting access to dangerous AI models Controlled interactions Developers to prove minimal risks Technical research on anomaly detection Holding AI developers liable for harms 86 AI race Competition could push nations and corporations to rush AI development, relinquishing control to these systems. Cyberwarfare, autonomous weapons, automate human labor → mass unemployment and dependence on AI systems. 87 AI race: Military Low-cost automated weapons, such as drone swarms outfitted with explosives, could autonomously hunt human targets with high precision, performing lethal operations for both militaries and terrorist groups and lowering the barriers to large-scale violence. 88 AI race: Corporate As AIs automate increasingly many tasks, the economy may become largely run by AIs. Eventually, this could lead to human enfeeblement and dependence on AIs for basic needs. 89 AI race: Solutions Safety regulations: self regulation of companies, competitive advantage for safety oriented companies Data documentation: transparency & accountability Meaningful human oversight: human supervision AI for cyber defense: anomaly detection International coordination: standards for AI development, robust verification & enforcement Public control of general-purpose AIs 90 Organizational risks Organizations developing advanced AI cause catastrophic accidents; profits over safety AIs could be accidentally leaked to the public or stolen by malicious actors, and organizations could fail to properly invest in safety research. 91 Organizational risks New capabilities can emerge quickly and unpredictably during training, such that dangerous milestones may be crossed without our knowing. 92 Organizational risks The Swiss cheese model shows how technical factors can improve organizational safety. Multiple layers of defense compensate for each other’s individual weaknesses, leading to a low overall level of risk. 93 Organizational risks: Solutions Red teaming Prove safety Deployment Publication reviews Response plans Risk management: Employ a chief risk officer and an internal audit team for risk management. Processes for important decisions: Make sure AI training or deployment decisions involve the chief risk officer and other key stakeholders, ensuring executive accountability. 94 Rouge AIs We risk losing control over AIs as they become more capable. Proxy gaming: YouTube / Insta – User engagement – Mental health 95 Rouge AIs: power seeking It can be instrumentally rational for AIs to engage in self- preservation. Loss of control over such systems could be hard to recover from. 96 Rouge AIs: Deception Various resources, such as money and computing power, can sometimes be instrumentally rational to seek. AIs which can capably pursue goals may take intermediate steps to gain power and resources. 97 Rouge AIs: Solutions AIs should not be deployed in high-risk settings, such as by autonomously pursuing open-ended goals or overseeing critical infrastructure, unless proven safe. Need to advance AI safety research in areas such as adversarial robustness, model honesty, transparency, and removing undesired capabilities. 98 World GDP adjusted for inflation https://ourworldindata.org/economic-growth 99 Rapid acceleration Took hundreds of thousands of years for Homo Sapiens → agricultural revolution & millenia for industrial revolution Centuries later AI revolution https://ourworldindata.org/economic-growth World GDP adjusted for inflation 100 Double edge sword of technology, nuclear weapons 101 102 Solutions to these risks? 103 Solutions to Mentioned Risks 104 Solutions to Mentioned Risks People Policy Technology 105 A Notional Decomposition of Risk Risk ≈ Vulnerability × Hazard Exposure × Hazard Vulnerability: a factor or process that increases susceptibility to the damaging effects of hazards Exposure: extent to which elements (e.g., people, property, systems) are subjected or exposed to hazards Hazard: a source of danger with the potential to harm Dan Hendrycks Introduction to ML Safety 106 A Notional Decomposition of Risk Risk ≈ Vulnerability × Hazard Exposure × Hazard This is a risk Here, “×” just Here, “Hazard” is a corresponding to denotes shorthand for hazard a specific hazard, nonlinear probability and not total risk interaction severity Dan Hendrycks Introduction to ML Safety 107 Example: Injury from Falling on a Wet Floor Risk ≈ Vulnerability × Hazard Exposure × Hazard Bodily Brittleness Floor Utilization Floor Slipperiness Introduction to ML Safety 108 Example: Injury from Falling on a Wet Floor Risk ≈ Vulnerability × Hazard Exposure × Hazard Bodily Brittleness Floor Utilization Floor Slipperiness Dan Hendrycks Introduction to ML Safety 109 Example: COVID Risk ≈ Vulnerability × Hazard Exposure × Hazard Old Age, Poor Contact with Prevalence Health, etc. Carriers and Severity Dan Hendrycks Introduction to ML Safety 110 Example: COVID Risk ≈ Vulnerability × Hazard Exposure × Hazard Old Age, Poor Contact with Prevalence Health, etc. Carriers and Severity Dan Hendrycks Introduction to ML Safety 111 112 Lets look at ML systems 113 The Disaster Risk Equation Risk ≈ Vulnerability × Hazard Exposure × Hazard Alignment Reduce the probability and severity of inherent model hazards Dan Hendrycks Introduction to ML Safety 114 Agents Must Pursue Good Goals Dan Hendrycks Introduction to ML Safety 115 The Disaster Risk Equation Risk ≈ Vulnerability × Hazard Exposure × Hazard Robustness Withstand Hazards Dan Hendrycks Introduction to ML Safety 116 Agents Must Withstand Hazards Dan Hendrycks Introduction to ML Safety 117 Agents Must Withstand Hazards Dan Hendrycks Introduction to ML Safety 118 The Disaster Risk Equation Risk ≈ Vulnerability × Hazard Exposure × Hazard Monitoring Identify Hazards Dan Hendrycks Introduction to ML Safety 119 Agents Must Identify and Avoid Hazards Dan Hendrycks Introduction to ML Safety 120 The Disaster Risk Equation Risk ≈ Vulnerability × Hazard Exposure × Hazard Systemic Safety Reduce systemic risks Dan Hendrycks Introduction to ML Safety 121 Remove Hazards Dan Hendrycks Introduction to ML Safety 122 Remove Hazards Dan Hendrycks Introduction to ML Safety 123 Reducing Risk vs Estimating Risk Risk ≈ Vulnerability × Hazard Exposure × Hazard Dan Hendrycks Introduction to ML Safety 124 Errors in algorithms 125 https://www.indiatoday.in/technology/news/story/robot-confuses-man-for-a-box-of-vegetables-pushes-him-to-death-in-factory-2460977-2023-11-09 Example: Robot confuses man for veggies Risk ≈ Vulnerability × Hazard Exposure × Hazard ????? ????? ????? Dan Hendrycks Introduction to ML Safety 126 Example: Robot confuses man for veggies Risk ≈ Vulnerability × Hazard Exposure × Hazard Misclassifying Employees Injury / Death veggies to & Robot around humans each other Dan Hendrycks Introduction to ML Safety 127 Other examples? 128 X-Risks 129 AI could someday reach human intelligence Human intelligence arises from changes that are not necessarily that dramatic architecturally The train won’t stop at human station Intelligence Dumb Albert person Einstein The train won’t stop at human station Intelligence Dumb person ???? Ant Cat Albert Einstein Intelligence is power Gorillas are far stronger than we are Yet their existence depends entirely on us The difference: intelligence “It Isn’t Going to Happen” Sept 11, 1933: Ernest Rutherford: “Anyone who looks for a source of power in the transformation of the atoms is talking moonshine.” Sept 12, 1933: Leo Szilard invents neutron-- induced nuclear chain reactions. “We switched everything off and went home. That night, there was very little doubt in my mind that the world was headed for grief.” 134 Models Are Not Always Truthful 135 Models Are Not Always Honest We can show models “know” the truth, but sometimes are not incentivized to output it. 136 Emergent capabilities are common 137 Emergent capabilities are common Larger “LMs exhibit qualitatively different reasoning abilities, e.g., RoBERTa succeeds in reasoning tasks where BERT fails completely” Capabilities are only continuing to get better 138 Power-seeking can be instrumentally incentivized “By default, suitably strategic “One might imagine that AI systems and intelligent agents, engaging with harmless goals will be harmless. in suitable types of planning, will This paper instead shows that intelligent systems will need to be carefully have instrumental incentives to designed to prevent them from behaving gain and maintain various types in harmful ways.” ~ Omohundro of power, since this power will help them pursue their objectives more effectively” - Joseph Carlsmith, Is Power- seeking AI an Existential Risk? https://en.wikipedia.org/wiki/Steve_Omohundro 139 Power-seeking can be explicitly incentivized “Whoever becomes the leader in [AI] will become the ruler of the world.” - Vladimir Putin 140 Stephen Hawking on AI Risk “Unless we learn how to prepare for, and avoid, the potential risks, AI could be the worst event in the history of our civilization. It brings dangers, like powerful autonomous weapons, or new ways for the few to oppress the many. It could bring great disruption to our economy.” “The development of full artificial intelligence could spell the end of the human race.” 141 Elon Musk on AI Risk “I think we should be very careful about artificial intelligence. If I were to guess like what our biggest existential threat is, it’s probably that. … With artificial intelligence we are summoning the demon.” “As AI gets probably much smarter than humans, the relative intelligence ratio is probably similar to that between a person and a cat, maybe bigger” 142 Hillary Clinton on AI Risk “Think about it: Have you ever seen a movie where the machines start thinking for themselves that ends well? Every time I went out to Silicon Valley during the campaign, I came home more alarmed about this. My staff lived in fear that I’d start talking about ‘the rise of the robots’ in some Iowa town hall. Maybe I should have.” 143 Alan Turing on AI Risk “Once the machine thinking method had started, it would not take long to outstrip our feeble powers. At some stage therefore we should have to expect the machines to take control.” 144 Norbert Wiener on AI Risk “Moreover, if we move in the direction of making machines which learn and whose behavior is modified by experience, we must face the fact that every degree of independence we give the machine is a degree of possible defiance of our wishes. The genie in the bottle will not willingly go back in the bottle, nor have we any reason to expect them to be well disposed to us.” 145 “There are very few examples of a more intelligent thing being controlled by a less intelligent thing,” https://edition.cnn.com/videos/tv/2023/05/02/the-lead-geoffrey-hinton.cnn 146 Speculative Hazards and Failure Modes 148 Weaponized AI Recently, it was shown that AI could generate potentially deadly chemical compounds Weaponized AI AI could be used to create autonomous weapons Deep RL methods outperform humans in simulated aerial combat What to do about weaponized AI? Anomaly detection Detect novel hazards such as novel biological phenomena Detect malicious use and nation-state misuse Systemic Safety (forecasting, ML for cyberdefense, cooperative AI) Reduce probability of conflict Policy Out of scope for this course Proxy Gaming Future artificial agents could over-optimize and game faulty proxies, which could mean systems aggressively pursue goals and create a world that is distinct from what humans value In the real world, “what gets measured gets managed,” so we will need to appropriately measure our values Treacherous Turns AI could behave differently once it has the ability to do so For instance, it could turn after reaching a high enough intelligence, detecting that it is “deployed in the real world”, gaining enough power, the removal of a safeguard, etc. Might be difficult to predict beforehand and difficult to stop Deceptive Alignment Deception doesn’t require a superhuman model The robot only appears to be grabbing the ball Persuasive AI Superintelligent AI could be extremely persuasive It may become difficult to differentiate reality from fiction Current examples: disinformation, social media bots, deepfakes 156 https://arxiv.org/pdf/2206.05862.pdf#page=13 https://arxiv.org/pdf/2206.13353.pdf https://www.youtube.com/watch?v=UbruBnv3pZU&t=37s 157 Activity #AIRisks Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence What to do? 1. Please go through the FACTSHEET 2. Submit the following with subject line Activity #AIRisks I. At least 3 technical issues that are highlighted in the Order II. At least 3 ideas that you think you can take it up as course project 158 pk.profgiri Ponnurangam.kumaraguru /in/ponguru ponguru Thank you [email protected] for attending the class!!!

Document Details

Tags

Related

Full Transcript