Podcast
Questions and Answers
What is the primary goal of culturomics?
What is the primary goal of culturomics?
What is the approximate percentage of books ever printed that are included in the corpus used in this research?
What is the approximate percentage of books ever printed that are included in the corpus used in this research?
What is the trend in forgetting events over time?
What is the trend in forgetting events over time?
What is the name of the website where the full culturomics data set is available?
What is the name of the website where the full culturomics data set is available?
Signup and view all the answers
Which of the following is NOT mentioned as a factor affecting memory of past events?
Which of the following is NOT mentioned as a factor affecting memory of past events?
Signup and view all the answers
Which of the following is NOT mentioned as a field that culturomics can provide insights into?
Which of the following is NOT mentioned as a field that culturomics can provide insights into?
Signup and view all the answers
What does the author observe about the use of precise dates in recording historical events?
What does the author observe about the use of precise dates in recording historical events?
Signup and view all the answers
What is the author investigating regarding the assimilation of new information?
What is the author investigating regarding the assimilation of new information?
Signup and view all the answers
What are the two central factors that contribute to culturomic trends?
What are the two central factors that contribute to culturomic trends?
Signup and view all the answers
What does the author conclude about the relationship between inventing and forgetting?
What does the author conclude about the relationship between inventing and forgetting?
Signup and view all the answers
Flashcards
Culturomics
Culturomics
The quantitative analysis of cultural trends through large corpora of text.
Cultural Change
Cultural Change
Transformations in cultural norms and practices over time.
Linguistic Change
Linguistic Change
Alterations in language, including vocabulary and grammar, influenced by cultural shifts.
Collective Memory
Collective Memory
Signup and view all the flashcards
N-grams
N-grams
Signup and view all the flashcards
Regular verbs
Regular verbs
Signup and view all the flashcards
Irregular verbs
Irregular verbs
Signup and view all the flashcards
-t suffix usage
-t suffix usage
Signup and view all the flashcards
Invention frequency decline
Invention frequency decline
Signup and view all the flashcards
Study Notes
Quantitative Analysis of Culture Using Millions of Digitized Books
- A corpus of 5,195,769 digitized books (~4% of all books ever published) was created.
- Computational analysis of this corpus allows observation of cultural trends.
- The corpus comprises over 500 billion words, predominantly in English, French, Spanish, German, Chinese, Russian, and Hebrew.
- The oldest books date back to the 1500s; the corpus's size expanded from a few hundred thousand words per year in the 1800s, to 1.8 billion words per year in 1900s, to 11 billion words per year in 2000.
- The study focused on analyzing 1-gram and n-gram usage frequencies over time.
- A 1-gram is a sequence of characters without spaces (e.g., words, numbers, typos).
- An n-gram is a sequence of n 1-grams.
- Frequencies were calculated by dividing the number of instances of a given n-gram in a given year by the total words in the corpus of that year.
Cultural Trends and Linguistic Changes
- Cultural change influences concepts and subjects discussed (e.g., slavery).
- Linguistic change, rooted in culture, affects the words used (e.g., "the Great War" vs. "World War I").
- Analysis encompasses linguistic changes (lexicon, grammar); cultural phenomena (remembering people/events); and technology adoption, fame pursuit, censorship, and epidemiology.
Size of the English Lexicon
- Common 1-grams (frequency >1 per billion) were identified from dictionaries in 1900, 1950, and 2000.
- The English lexicon was estimated as 544,000 words in 1900; 597,000 in 1950; and 1,022,000 in 2000.
- The English lexicon has had significant growth (~8,500 words/year), expanding by over 70% in the last 50 years.
- Many words found in the corpus weren't present in dictionaries.
Evolution of Grammar
- Irregular verbs were studied: words like "jump/jumped", or "stick/stuck" that are conjugated idiosyncratically.
- Unlike regular verbs, irregular verbs have irregular patterns (e.g., stick/stuck, come/came, get/got).
- High-frequency irregular verbs are more resistant to replacement.
- Regularization of irregular verbs (e.g., from "burnt" to "burned") shows a gradual trend, with some regularizing more rapidly than others in different parts of the world.
- The collapse of a group of irregular verbs (using “-t” instead of “-ed”) shows the significant driver of regularization in the last 200 years.
Forgetting and Cultural Adoption
- The frequency of specific years (e.g., "1951") peaked near the targeted year and steadily declined afterward, suggesting forgetting with time.
- There's been a gradual acceleration in cultural adoption of new things.
- The half-life of the peaks in cultural adoption has significantly decreased.
Fame Analysis
- Frequency of person's name was used to track their rise to and decline from fame.
- The trajectories are all similar: a pre-celebrity period, a rapid rise to peak fame, and a slow decline.
Censorship and Suppression
- Culturomic tools aid lexicographers in identifying low-frequency words and offering current frequency trend estimates.
- Nazi censorship of artists (e.g., Marc Chagall), writers, etc., demonstrates quantifiable impacts.
- Suppression indices were computed, and victims of Nazi repression were identified.
Culturomics Applications
- Culturomics utilizes high-throughput data collection and analysis for human culture study, encompassing books, newspapers, manuscripts, artwork, etc.
- Historical epidemiology, Civil War studies, gender studies, and more can all be investigated with the method..
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Description
Explore the fascinating quantitative analysis of cultural trends using a corpus of over 5 million digitized books. This quiz delves into the methodology of calculating n-gram frequencies and how they reflect linguistic and cultural changes over time. Test your knowledge on the impact of these historical texts and their significance in understanding cultural dynamics.