Data Ethics in AI: Key Points From Crawford's Atlas of AI PDF

**Key Points of Chapter 3: *Data* from *Atlas of AI* by Kate Crawford** This chapter of Crawford\'s *Atlas of AI* dives into how data, particularly in the context of artificial intelligence (AI), is gathered, classified, and utilized, often in ways that dehumanize individuals and perpetuate inequality. Below are the core themes and key points: **1. Data Extraction and Exploitation:** - Crawford highlights that data used for AI systems is often extracted without consent or context. A notable example is the **NIST Special Database 32**, which contains mug shots of individuals---many of whom have been arrested multiple times. These images are used for testing AI systems, but the people in them are reduced to mere data points, dehumanizing them. - Data extraction is normalized within the tech industry, where anything available online or in government databases is seen as a resource. The individuals involved in creating this data are rarely considered, nor are the ethical concerns surrounding the usage of their images. **2. The Shift from Image to Infrastructure:** - Crawford emphasizes how the meaning of images, such as mug shots or other biometric data, changes once they are part of AI systems. Originally, mug shots were used in law enforcement for identifying individuals. Now, they are part of a much larger system where they serve as the technical foundation for training AI algorithms to recognize faces. - This transformation marks a shift in how data is perceived---moving from specific personal images to raw, neutral data used for technological purposes. The social and political meanings of these images, including the power dynamics involved in their creation, are erased when they become part of algorithmic systems. **3. Dehumanization through AI Training Datasets:** - The chapter critiques how individuals captured in datasets are stripped of their humanity. People in mug shot databases or other data collections are treated as technical resources, not as individuals with personal histories or rights. The focus is on refining AI's technical performance, not on the potential harms to the people whose data is being used. - A significant theme is how AI systems are trained on biased and incomplete datasets, further perpetuating inequality and discrimination. For example, the use of mug shots in AI facial recognition algorithms has roots in practices like eugenics, where faces were studied to make judgments about character or criminality based on physical appearance. **4. Training Data: \"There's No Data Like More Data\":** - Crawford discusses the massive scale of data needed to train AI systems. For instance, companies like Google or Facebook collect millions of images and interactions daily, which are then used to train machine learning models. This practice feeds the growing demand for data in AI, where the guiding principle is \"more data is better.\" - Training data is used to make AI models more accurate, but these datasets are often biased or flawed. Crawford argues that the collection and use of data in AI are driven by a profit-oriented, extractive logic that prioritizes the needs of corporations over the rights of individuals. **5. Lack of Consent and Ethical Oversight:** - Crawford highlights the absence of consent in the data collection process. Many of the images used in AI systems, like those in **FERET** or **ImageNet**, were scraped from the internet without individuals\' knowledge or permission. - Furthermore, ethical oversight in AI research is minimal. Universities and tech companies often bypass ethical review processes, treating publicly available data as fair game, regardless of how its use may harm individuals or communities. This raises concerns about privacy, consent, and the ethical implications of using people\'s data to train AI systems. **6. Data as Capital and Resource:** - The chapter discusses how data has been commodified and is now seen as a valuable resource, akin to natural resources like oil. Data is treated as something to be mined, refined, and used to fuel AI development. This \"data as oil\" metaphor reflects the exploitative nature of data collection, where the focus is on accumulation rather than ethical use. - The commodification of data perpetuates inequality, as those with access to the largest datasets (typically tech giants) gain more power and control over AI development. Meanwhile, marginalized groups are disproportionately impacted by surveillance and the extraction of their data. **Really Important Hard Facts:** 1. **Data extraction** is often done without consent, using images (like mug shots) for training AI systems, which dehumanizes individuals and perpetuates inequality. 2. **AI training datasets** erase the social and political context of the data, treating people's images as raw material for machine learning models. 3. **Ethical oversight is lacking**, as AI research often bypasses consent and privacy concerns, scraping data from the internet or public sources. 4. **The commodification of data** means that it is treated like a resource (often compared to oil), with tech companies prioritizing profit over the rights of individuals. 5. **Bias in AI**: AI systems trained on biased datasets replicate and amplify discrimination, particularly against marginalized communities. For your exam, focus on how Crawford critiques the **exploitative nature of data extraction**, the lack of **consent** in AI training processes, and the shift from **personal data to commodified data** in the broader context of AI development.

Data Ethics in AI: Key Points From Crawford's Atlas of AI PDF

Document Details

Tags

Related

Summary

Full Transcript