Data Labor & User-Generated Data

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

According to the authors, what is a key issue concerning data production in the current technological landscape?

  • Data producers have too much control over how their data is used.
  • Data producers have little say in what data is captured, how it is used, or who benefits. (correct)
  • Organizations with the ability to process data possess too little power.
  • There is an overabundance of transparency regarding data reuse.

Which of the following best describes the core concept of 'data labor' as presented in the paper?

  • The ethical considerations involved in monetizing user-generated content.
  • The production of data for computing, reconceptualized as a form of labor. (correct)
  • The legal frameworks governing data privacy and user rights.
  • The automated processes that technology companies use to collect and analyze user data.

According to the authors, what opportunities exist to empower data producers in their relationship with tech companies?

  • Advocating for less transparency about data reuse.
  • Developing mechanisms to ensure tech companies have complete control over data revenue.
  • Restricting feedback channels between data producers and companies.
  • Creating feedback channels between data producers and companies. (correct)

Which of the following is a key dimension the authors use to characterize data labor?

<p>Legibility (D)</p> Signup and view all the answers

What is the primary goal of the roadmap that the authors provide in the paper?

<p>To empower data producers and address the power imbalance between them and large technology firms. (D)</p> Signup and view all the answers

Which concept is related to data labor?

<p>Digital records useful for capital generation (B)</p> Signup and view all the answers

What could be a real-world example of 'data labor'?

<p>Labeling images on Amazon Mechanical Turk for a computer vision company. (D)</p> Signup and view all the answers

What does the article state regarding the goal to generate capital from Governmental agencies, research organizations, or non-profits?

<p>These entities do so to a lesser extent than prominent tech companies. (A)</p> Signup and view all the answers

According to the authors, which concept is most similar to digital labor?

<p>Monetized online activities regardless of whether they occur at traditional workplaces or are compensated (B)</p> Signup and view all the answers

What is one limitation of digital labor?

<p>Not all passively produced data is considered digital labor. (D)</p> Signup and view all the answers

According to the document, how does 'power-to' relate to data labor?

<p>Power-to corresponds to an individual's ability to freely make decisions around their data labor. (D)</p> Signup and view all the answers

Data leverage can refer to?

<p>Data strikes, data poisoning, and conscious data contribution (B)</p> Signup and view all the answers

What is a key component of 'legibility' in relation to data labor?

<p>Whether data laborers know their labor is being captured. (D)</p> Signup and view all the answers

According to the document, can making illegible data legible always translate to power for data producers?

<p>NO, even if the data is more legible, external social constraints still determine user power to that illegible data. (D)</p> Signup and view all the answers

What is end-use awareness in the context of data labor?

<p>The degree to which data producers are aware of how resulting data is used downstream to generate capital. (A)</p> Signup and view all the answers

What does the document state regarding empowering open data labor?

<p>Both A and C. (E)</p> Signup and view all the answers

How is the 'replaceability' of data labor defined?

<p>Whether certain background, knowledge, skills, or contextual aspects are required to perform data labor. (B)</p> Signup and view all the answers

According to the document, are technology companies always in an employment relationship with data producers?

<p>NO, technololgy companies are not always in an employment relationship with data producers. (C)</p> Signup and view all the answers

What can be a parameter to assess if an activity represents data labor from a Technology Company Perspective?

<p>Shelf life (D)</p> Signup and view all the answers

Flashcards

What is data?

User-generated data produced through interactions with computing systems or scraped from the web.

What is data labor?

The ability for data producers to understand what data is captured, how it is used, and who benefits.

Data Labor Definition

Activities that produce digital records useful for capital generation.

Illegible Data Labor

Activities where digital records arise without the person's knowledge.

Signup and view all the flashcards

Legible Data Labor

Activities where data capture is clear to the data laborers.

Signup and view all the flashcards

End-Use Awareness

The degree to which data producers know how their data is used to generate capital.

Signup and view all the flashcards

End Use-Aware Data Labor

Data producers are informed of data use or have some understanding of how the output of their labor is being used.

Signup and view all the flashcards

Collaboration Requirement

The extent to which data laborers work together.

Signup and view all the flashcards

Collaborative Data Labor

Elements of deliberation, communication, and other forms of teamwork; Prevalent in social computing systems.

Signup and view all the flashcards

Non-collaborative Data Labor

Data laborers perform activities in isolation, without discussion with others.

Signup and view all the flashcards

Data Labor Openness

How accessible the downstream data is to the public.

Signup and view all the flashcards

Closed Data Labor

Data labor is determined by private systems to benefit specific individuals or groups and excludes others.

Signup and view all the flashcards

Open Data Labor

Data labor captures data in systems that adopt licenses and make the fruits of data labor public.

Signup and view all the flashcards

Data Labor Replaceability

How much needed skill, knowledge, skills, or contextual aspects is necessary to perform various activities.

Signup and view all the flashcards

Irreplaceable labor

Data laborers demonstrate the ability to impact technology performance for their field of knowledge.

Signup and view all the flashcards

Replaceable Labor

Data laborers require little skill, knowledge, skills, or contextual aspects to perform various activities.

Signup and view all the flashcards

Livelihood Overlap

How much does Data labor overlap with data producers' occupational activities?

Signup and view all the flashcards

Overlap with Livelihood

Data Labor occurs in addition to core occupational activities thereby generating supplemental income for data laborers.

Signup and view all the flashcards

No Overlap with Livelihood

Data labor occurs removed core occupational activities thereby having no impact on existing livelihoods or income for data laborers

Signup and view all the flashcards

Study Notes

  • Many recent technological advances, such as ChatGPT and search engines, rely on massive amounts of user-generated data obtained through user interactions or web scraping.
  • Data producers often have limited control over what data is collected, how it is utilized, and who benefits from it.
  • Organizations like OpenAI and Google have significant influence in shaping the technology landscape due to their ability to access and process user data.
  • Synthesizing existing data labor literature provides opportunities to empower data producers in their relationships with tech companies.
  • Researchers, policymakers, and activists can advocate for transparency in data reuse, create feedback channels, and develop mechanisms to share data revenue more broadly.
  • Data labor can be characterized in six dimensions: legibility, end-use awareness, collaboration requirement, openness, replaceability, and livelihood overlap.
  • These dimensions are based on parallels between data labor and other types of labor in computing literature.

Introduction

  • Technology users generate vast amounts of data through their interactions with computing systems, including behavior logs, content, and personal information.
  • Currently, only a handful of tech companies with the resources to collect, process, and model data at scale benefit from this data.
  • The creation of generative AI models like ChatGPT and Dall-E was made possible by publicly available texts and artwork that model developers scraped and processed from billions of web pages.
  • Data producers such as artists, writers, and users have scant power over decisions about how their data is used or who benefits.
  • This imbalance of power has led to public criticism of tech industry practices, including unapproved reuse of work and implications for employment opportunities.
  • It is argued understanding data generation as a form of labor, or "data labor," can pave the way for more broadly distributing the power and benefits of data.
  • Proposals include supporting "data unions," mediators of individual data, legislation granting users greater control, and tools supporting user-driven collective action.

Defining Data Labor

  • The research community has yet to provide concrete guidelines for actionable data labor.
  • A clear characterization is essential to guide researchers, data producers, and policymakers in addressing the existing power imbalance between the public and large tech companies.
  • A clear characterization will highlight how different data labor types require different strategies and interventions to empower data producers through research, development, and policy practices.
  • A working definition is offered: Activities that produce digital records useful for capital generation.
  • An activity must meet two criteria to qualify as data labor: creating or enhancing data ("digital records") and helping to generate capital.
  • Data labor subsidizes prominent tech companies due to the emerging, substantial power inequity between these entities and the data-generating public.

Computer-Mediated Labor

  • Focus of HCI and CSCW research, encompasses both compensated and uncompensated activities.
  • Data labor includes both compensated activities and unwitting ones like content creation on social networks.
  • Crowdsourcing, peer production, and content moderation are examples of data labor that advance computing systems and benefit technology companies financially.

Digital Labor

  • This refers to monetized online activities, regardless of traditional workplaces or compensation.
  • The internet is animated by cultural and technical labor, continuously producing value.
  • Not all instances considered digital labor are instances of data labor and vice versa.

Crowdwork

  • Tasks completed by distributed laborers for payment fall under computer-mediated labor.
  • Image labeling, text production for spam filters are examples.
  • Completing behavior experiments may not be data labor unless it leads to capital generation.

Data Work

  • This is a relatively new term includes user data generation, data labeling, and cleaning.
  • It primarily concerns the upstream activity of generation, not downstream processing activities such as data cleaning and filtering.
  • Data scientists cleaning datasets may also perform data labor by labeling images, for example.

Frameworks of Power and Data Leverage

  • This work expands on existing literature about power imbalance and social inequalities in computing systems.
  • Actionable steps support equitable social relationships by following this roadmap.
  • Power-to: ability to choose not to participate in data-generating activities or delete data.
  • Power-over: ability to influence technology operators around data-driven technology decisions.
  • Data leverage involves data strikes, data poisoning, and conscious data contribution.
  • Leverage requires collective action and critical mass participation.

Dimensions of Data Labor

  • Six key dimensions of data labor are described: legibility, end-use awareness, collaboration requirement, openness, replaceability, and livelihood overlap.
  • Each dimension is examined for relation to power & avenues to empower data laborers.

Legibility

  • Addresses whether data laborers know their labor is being captured.
  • Illegible data labor involves activities where individuals are unaware their actions create digital records.
  • User interaction logs and completing reCAPTCHA are examples of illegible data labor.
  • Legible data labor includes activities such as contributing ratings and completing clearly disclosed crowdwork tasks.

Relationship with Power

  • Legibility is essential for data laborers to exert control over technology companies.
  • A lack of awareness inhibits the power to withhold or change labor, and it limits collective action.
  • An imperative is to move illegible data labor to legibility.
  • Tools can measure and communicate the economic value of data labor, activists can use to equip data laborers with knowledge about how to effectively leverage it against tech companies.
  • Mitigating illegibility may not always empower data producers due to external social restraints.

End-Use Awareness

  • Centers on whether data laborers know how their labor is used to generate capital.
  • Data laborers may unknowingly be captured with no end-use awareness, benefiting technology companies.
  • Content creators, Wikipedia editors, and journalists may be unaware of downstream uses like search engine performance or large language model training,
  • Data laborers who are end-use aware are informed of or understand how their output is used.
  • This includes supporting particular functions or features with targeted advertising, personalized newsfeed algorithms, and recommender systems.
  • End-use awareness can empower data labor, giving them an understanding of the downstream capital generation implications of the data they produce.

Mitigating a lack of awareness

  • Policymakers may mandate end-use awareness for sensitive data, requiring tech companies to disclose how data will be used.
  • Opt-in mechanism will provide laborers control over their end use.
  • Data laborers' concern can disincentivize production of data labor.
  • Activists may develop tools illustrating effects of collectively withholding or poisoning data.
  • Strengthening legal frameworks enables data control prevent downstream usage.

Collaboration Requirement

  • A spectrum from non-collaborative to collaborative activities determines the extent of teamwork among its laborers.
  • A computer-mediated framework shows a distinction between team vs individual work.
  • Non-collaborative data labor: isolated data production.
  • Examples include completing a reCAPTCHA or a MTurk task.
  • Collaborative data labor: Deliberation, communication, and interaction among data laborers, especially in social computing systems.

Relationship with Power

  • Requires social connections to facilitate withholds, but those connections also create cost.
  • Data laborers lack shared identity without collaboration.
  • Shared identity among non-collaborative laborers is a crucial step in empowerment for future action.
  • Collaboration can theoretically leverage network against the "employers."

Openness

  • Data labor is characterized by the accessibility of downstream data to the public.
  • Private systems offer closed labor excludes others.
  • Maximal openness from copyleft licenses provide labor publicly.

Examples of Open Labor

  • Academia
  • Examples: Pushshift Reddit data, dataset
  • Regulations include GDPR in European Union and CCPA in California.

Replaceability

  • Data labor will have more ability directly impact technology performance when responsibility is directly is on labor.
  • For performance to be at it's maximum in any situation, the requirements set on it must be met to full capacity.
  • Gaining and exerting labor power will be harder for tasks that can be easily replaced or by many similar people.

Livelihood Overlap

  • This notes if data labor overlaps occupational activities.
  • Foundation models will increase this dimension.
  • Companies may not establish formal jobs in place of data labor, leaving the workers without control over their work and outcome.
  • No overlap = rare activity of web searching activity; low paying usually
  • Overlap = writing code for systems by GitHub; higher paying

Discussion of the Article

  • The six dimensions identified and articulated above are only a starting point for understanding the rich variety of data labor activities.
  • Additional considerations emerge from the perspective of data-dependent operators.
  • A discussion on revenue generation must happen as a direct of the complexity.
  • More analysis and study on this needed.

Conclusion

  • The study's synthesis has been done as review from labor and data, a roadmap is now made to empower data producers has been constructed with six defined dimensions to use in order to take the power back.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

TEMA 7
7 questions

TEMA 7

AdaptiveRhodolite avatar
AdaptiveRhodolite
Lightcast Overview and Services Quiz
13 questions
Data Analysis Chapter 1-4 Flashcards
89 questions
Use Quizgecko on...
Browser
Browser