Cultural Trends in Digitized Books Analysis

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What is the primary goal of culturomics?

  • Studying the evolution of grammar
  • Analyzing trends in the English language
  • Investigating cultural trends quantitatively (correct)
  • Understanding the impact of technology on culture

What is the approximate percentage of books ever printed that are included in the corpus used in this research?

  • 5%
  • 12%
  • 4% (correct)
  • 15%

What is the trend in forgetting events over time?

  • Forgetting speeds up with each passing year. (correct)
  • Forgetting rates fluctuate randomly.
  • Forgetting slows down over time.
  • Forgetting occurs at a constant rate.

What is the name of the website where the full culturomics data set is available?

<p><a href="http://www.culturomics.org">www.culturomics.org</a> (C)</p> Signup and view all the answers

Which of the following is NOT mentioned as a factor affecting memory of past events?

<p>The regularity of the verb used to describe the event. (D)</p> Signup and view all the answers

Which of the following is NOT mentioned as a field that culturomics can provide insights into?

<p>History of Science (A)</p> Signup and view all the answers

What does the author observe about the use of precise dates in recording historical events?

<p>The use of precise dates is becoming more frequent. (A)</p> Signup and view all the answers

What is the author investigating regarding the assimilation of new information?

<p>Whether the rate of forgetting affects the rate of learning new information. (B)</p> Signup and view all the answers

What are the two central factors that contribute to culturomic trends?

<p>Linguistic change and cultural change (C)</p> Signup and view all the answers

What does the author conclude about the relationship between inventing and forgetting?

<p>Inventions are forgotten more quickly than other events. (B)</p> Signup and view all the answers

Flashcards

Culturomics

The quantitative analysis of cultural trends through large corpora of text.

Cultural Change

Transformations in cultural norms and practices over time.

Linguistic Change

Alterations in language, including vocabulary and grammar, influenced by cultural shifts.

Collective Memory

Shared memories of a group that shape identity and cultural narratives.

Signup and view all the flashcards

N-grams

Continuous sequences of 'n' items from a given sample of text, used in linguistic analysis.

Signup and view all the flashcards

Regular verbs

Verbs with a regularity greater than 50%, forming past tense typically with -ed.

Signup and view all the flashcards

Irregular verbs

Verbs with a regularity less than 50%, not following standard past tense formation rules.

Signup and view all the flashcards

-t suffix usage

Some verbs in past tense use -t instead of -ed (e.g., learn, spoil).

Signup and view all the flashcards

Invention frequency decline

Inventions lose recognition over time, shown by reducing peak recognition years later.

Signup and view all the flashcards

Study Notes

Quantitative Analysis of Culture Using Millions of Digitized Books

  • A corpus of 5,195,769 digitized books (~4% of all books ever published) was created.
  • Computational analysis of this corpus allows observation of cultural trends.
  • The corpus comprises over 500 billion words, predominantly in English, French, Spanish, German, Chinese, Russian, and Hebrew.
  • The oldest books date back to the 1500s; the corpus's size expanded from a few hundred thousand words per year in the 1800s, to 1.8 billion words per year in 1900s, to 11 billion words per year in 2000.
  • The study focused on analyzing 1-gram and n-gram usage frequencies over time.
  • A 1-gram is a sequence of characters without spaces (e.g., words, numbers, typos).
  • An n-gram is a sequence of n 1-grams.
  • Frequencies were calculated by dividing the number of instances of a given n-gram in a given year by the total words in the corpus of that year.
  • Cultural change influences concepts and subjects discussed (e.g., slavery).
  • Linguistic change, rooted in culture, affects the words used (e.g., "the Great War" vs. "World War I").
  • Analysis encompasses linguistic changes (lexicon, grammar); cultural phenomena (remembering people/events); and technology adoption, fame pursuit, censorship, and epidemiology.

Size of the English Lexicon

  • Common 1-grams (frequency >1 per billion) were identified from dictionaries in 1900, 1950, and 2000.
  • The English lexicon was estimated as 544,000 words in 1900; 597,000 in 1950; and 1,022,000 in 2000.
  • The English lexicon has had significant growth (~8,500 words/year), expanding by over 70% in the last 50 years.
  • Many words found in the corpus weren't present in dictionaries.

Evolution of Grammar

  • Irregular verbs were studied: words like "jump/jumped", or "stick/stuck" that are conjugated idiosyncratically.
  • Unlike regular verbs, irregular verbs have irregular patterns (e.g., stick/stuck, come/came, get/got).
  • High-frequency irregular verbs are more resistant to replacement.
  • Regularization of irregular verbs (e.g., from "burnt" to "burned") shows a gradual trend, with some regularizing more rapidly than others in different parts of the world.
  • The collapse of a group of irregular verbs (using “-t” instead of “-ed”) shows the significant driver of regularization in the last 200 years.

Forgetting and Cultural Adoption

  • The frequency of specific years (e.g., "1951") peaked near the targeted year and steadily declined afterward, suggesting forgetting with time.
  • There's been a gradual acceleration in cultural adoption of new things.
  • The half-life of the peaks in cultural adoption has significantly decreased.

Fame Analysis

  • Frequency of person's name was used to track their rise to and decline from fame.
  • The trajectories are all similar: a pre-celebrity period, a rapid rise to peak fame, and a slow decline.

Censorship and Suppression

  • Culturomic tools aid lexicographers in identifying low-frequency words and offering current frequency trend estimates.
  • Nazi censorship of artists (e.g., Marc Chagall), writers, etc., demonstrates quantifiable impacts.
  • Suppression indices were computed, and victims of Nazi repression were identified.

Culturomics Applications

  • Culturomics utilizes high-throughput data collection and analysis for human culture study, encompassing books, newspapers, manuscripts, artwork, etc.
  • Historical epidemiology, Civil War studies, gender studies, and more can all be investigated with the method..

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

More Like This

Use Quizgecko on...
Browser
Browser