Information Theory in English Prose

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

What does Shannon mean by the term ‘redundancy’ in the context of English prose?

  • The repetition of the same word or phrase multiple times.
  • The predictability of letters or words within a sequence. (correct)
  • The length of the words in sentences.
  • The presence of unnecessary letters in a word.

What percentage of English prose does Shannon estimate to be redundant?

  • 25 percent
  • 50 percent
  • 90 percent
  • 75 percent (correct)

Which of the following is an example of a highly redundant letter pairing in English?

  • tk
  • zs
  • th (correct)
  • qx

If a letter combination is very likely, what does that imply about the amount of information it conveys?

<p>It conveys a minimal amount of information. (D)</p> Signup and view all the answers

What does Shannon's work on quantifying information help us to measure with regards to language?

<p>The likelihood of words appearing in specific orders. (B)</p> Signup and view all the answers

Based on the passage, if writers use word combinations that are highly predictable, what is the effect on the information content of their work?

<p>It decreases the information content. (D)</p> Signup and view all the answers

In the context of Shannon's work, which of the following is NOT something that can be quantified?

<p>The number of synonyms for a particular word. (A)</p> Signup and view all the answers

The article mentions that when the first couple of letters of a word were correct, the remainder of the word was easily guessed. What does this suggest about human interpretation of written language?

<p>Humans rely heavily on redundancy for comprehension. (A)</p> Signup and view all the answers

What is the primary purpose of the validation process described in the text?

<p>To demonstrate the effectiveness of the tool in attributing authorship. (D)</p> Signup and view all the answers

Which of the following sources is NOT listed as a primary resource for authorship attribution?

<p>Personal correspondence of playwrights. (D)</p> Signup and view all the answers

What type of plays are EXCLUDED from the set of works used for building author profiles?

<p>Court masques and civic entertainments. (D)</p> Signup and view all the answers

What does the term 'relative entropy' refer to in the context of this validation?

<p>A measure used to determine the stylistic distance between a play and an author’s profile. (D)</p> Signup and view all the answers

What does the acronym 'WAN' refer to in the context of this study?

<p>A profile of each playwright. (C)</p> Signup and view all the answers

What is the primary basis for determining authorship before using the tool?

<p>A reasonable consensus in scholarship. (A)</p> Signup and view all the answers

What is the next step after calculating the relative entropy between plays and author profiles?

<p>Testing known collaborative plays against the author profiles. (A)</p> Signup and view all the answers

Why are plays surviving only as small fragments, such as Jonson's Mortimer's Fall, excluded from the validation process?

<p>They do not contain sufficient text for a reliable stylometric analysis.. (C)</p> Signup and view all the answers

What does Shannon's mathematics of relative entropy quantify?

<p>The likelihood of a data source emitting a given symbol after another. (B)</p> Signup and view all the answers

How is the difference between two WANs calculated using Shannon's relative entropy?

<p>By subtracting the natural logarithm of the weight in the second WAN from the natural logarithm of the weight in the first WAN, then multiplying by the weight in the first WAN and the limit probability of the originating node. (A)</p> Signup and view all the answers

Why is the calculation of the difference between two WANs performed twice, switching the designation of 'first' and 'second'?

<p>To measure the absolute difference, since calculations are dependent on the order. (D)</p> Signup and view all the answers

What is 'entropy' in the context described in the text?

<p>A measure of differential predictability within any symbolic system. (D)</p> Signup and view all the answers

What do the 'symbols' represent when measuring collocation habits of writers?

<p>Whole words. (B)</p> Signup and view all the answers

What makes 'th' a common combination of symbols in English?

<p>It is a frequently used consonant cluster in words. (C)</p> Signup and view all the answers

What is the goal of comparing WANs directly?

<p>To measure the totality of differences between all corresponding edges in two networks. (C)</p> Signup and view all the answers

How is the author of a play determined in this method?

<p>By identifying the author profile with the lowest relative entropy based on function word adjacencies. (A)</p> Signup and view all the answers

What purpose does the deduction of the 'background reading' serve in the relative entropy calculation?

<p>To normalize the relative entropy by creating a common reference for comparisons. (C)</p> Signup and view all the answers

What is the significance of the 'limit probability' mentioned in the calculation?

<p>It represents the long-term probability of the node from where the edge originates. (A)</p> Signup and view all the answers

What does a relative entropy of '0' between a play and an author profile indicate?

<p>The play is as similar to that author as it is to the collective works of all six authors. (C)</p> Signup and view all the answers

What does a negative relative entropy between a play and an author profile suggest?

<p>The play is more similar to that author's work than it is to the combined work of all six authors. (B)</p> Signup and view all the answers

When testing a play against its known author's profile, what adjustment is made?

<p>The play's analysis is excluded from the set of all plays by that author. (C)</p> Signup and view all the answers

What is considered a successful attribution outcome for a collaborative play?

<p>If the top ranked author was involved in the collaboration. (A)</p> Signup and view all the answers

What was the accuracy rate achieved when considering only the 94 plays with undisputed authorship?

<p>93.6 percent. (B)</p> Signup and view all the answers

What does the relative entropy method primarily use to distinguish between authors?

<p>The adjacencies of function words found within their plays. (C)</p> Signup and view all the answers

Which of the following words from the list is used to indicate a position relative to something else?

<p>Amidst (B)</p> Signup and view all the answers

Which word functions as a conjunction to link two contrasting ideas?

<p>Although (A)</p> Signup and view all the answers

Which of the following is NOT typically a preposition indicating location or direction?

<p>Therefore (B)</p> Signup and view all the answers

Which word on this list is used to indicate a point in time?

<p>Since (B)</p> Signup and view all the answers

Which of the words from the text is used to describe something that is 'on top of' or 'covering'?

<p>Above (C)</p> Signup and view all the answers

Which word could be categorized as indicating an exception or exclusion?

<p>Barring (A)</p> Signup and view all the answers

Which of these words from the list is primarily used to indicate a quantity?

<p>Several (C)</p> Signup and view all the answers

Among the provided function words, which one is used to express possession?

<p>Its (D)</p> Signup and view all the answers

Which word serves as a conjunction used to introduce a condition?

<p>If (A)</p> Signup and view all the answers

Which of the following is a function word that can indicate a direction or a pathway?

<p>Along (A)</p> Signup and view all the answers

Which play is attributed to Ben Jonson?

<p>Every Man in His Humour (A)</p> Signup and view all the answers

Which work was written by Thomas Middleton?

<p>The Witch (B)</p> Signup and view all the answers

What is the correct title of George Chapman's play associated with love and honor?

<p>The Revenge of Bussy D’Ambois (C)</p> Signup and view all the answers

Which of the following is not a play written by Robert Greene?

<p>Monsieur D’Olive (A)</p> Signup and view all the answers

Which play written by Thomas Middleton features themes around betrayal and revenge?

<p>The Revenger’s Tragedy (A)</p> Signup and view all the answers

Which of the following titles is written by George Peele?

<p>The Arraignment of Paris (D)</p> Signup and view all the answers

Which title was authored by Ben Jonson that reflects a comedic viewpoint on human nature?

<p>The Devil Is an Ass (D)</p> Signup and view all the answers

Which of the following pairs matches a playwright with their respective play?

<p>Ben Jonson - The Staple of News (C)</p> Signup and view all the answers

Flashcards

Redundancy in Language

The predictability of letter sequences in English prose, where certain letters are more likely to follow others.

Information Content

The amount of information a message carries, based on the likelihood of its components. A highly predictable sequence carries less information.

Shannon's Theory

Claude Shannon's theory quantifies the information content of messages, including text, by analyzing the predictability of its elements.

Writer's Preferences

The relative frequency of certain words or letter combinations in a text. These preferences influence the overall structure and information content.

Signup and view all the flashcards

Redundancy in English

The property of a language where the meaning of a message can be understood even with missing or inaccurate letters, due to the predictability of the language.

Signup and view all the flashcards

Guessing Missing Letters

The process of inferring the missing parts of a message based on context and language patterns, illustrating the redundancy present in written language.

Signup and view all the flashcards

Shannon's Mathematics

The mathematical framework for quantifying the information content of any message, whether letters, words, or other symbols.

Signup and view all the flashcards

Efficiency of Language

The overall predictability of a language allows for information to be conveyed with fewer elements, as the context and patterns provide a sufficient amount of information.

Signup and view all the flashcards

Entropy

A mathematical concept that measures the predictability of a sequence of symbols within a data source, like a writer's work.

Signup and view all the flashcards

Word Association Network (WAN)

A network of words and the connections between them in a written text, used to analyze a writer's characteristic patterns of word usage.

Signup and view all the flashcards

Relative Entropy

A mathematical method for comparing two WANs, quantifying their differences by examining the relative likelihood of words occurring together.

Signup and view all the flashcards

Symbol Probability

The likelihood that a specific symbol will occur in a data source, calculated based on the probability of the symbol appearing in a given context.

Signup and view all the flashcards

Edge Weight

The specific weight assigned to each edge in a WAN, representing the frequency of co-occurrence between the connected nodes.

Signup and view all the flashcards

WAN Comparison Procedure

A procedure for comparing two WANs by subtracting the natural logarithm of the weight in one network from the corresponding edge weight in the other network.

Signup and view all the flashcards

Node Limit Probability

The likelihood that a specific node or word will appear in a WAN, determined by its position within the network and the strength of its connections.

Signup and view all the flashcards

Total Difference

The total difference between two WANs, calculated by summing the relative entropy values for each edge common to both networks.

Signup and view all the flashcards

Stylometric Analysis

A method for determining the authorship of a text by analyzing its writing style and comparing it to known works of different authors.

Signup and view all the flashcards

Database of Early English Playbooks (DEEP)

A database of plays from the early modern English period, used to identify the authors based on their writing styles.

Signup and view all the flashcards

Validation

The process of verifying a method's accuracy by using it to analyze texts with known authorship.

Signup and view all the flashcards

Sole-authored plays

Plays written solely by one author.

Signup and view all the flashcards

Collaborative Plays

Plays written by two or more authors.

Signup and view all the flashcards

Author Profile

A collection of data representing the writing style of a particular author.

Signup and view all the flashcards

Non-commercial drama

Plays that are not commercially produced, such as court masques or civic entertainments.

Signup and view all the flashcards

Discriminating Function Words

A collection of words that are most effective in distinguishing between authors based on their writing styles.

Signup and view all the flashcards

Background Reading

A play's relative entropy calculated against the collective works of all authors.

Signup and view all the flashcards

Play Attribution

The process of determining the most likely author of a play based on its relative entropy compared to different author profiles.

Signup and view all the flashcards

Collaborative Play Attribution

A collaborative play is considered correctly attributed if at least one of the actual collaborators is ranked as the most likely author based on the attribution method.

Signup and view all the flashcards

Attribution Accuracy

The accuracy of the attribution model is calculated by comparing the predicted authorship with the known or consensus authorship of plays.

Signup and view all the flashcards

Authorship Attribution

The study of authorship attribution refers to the process of determining the most likely author of a text based on its stylistic features and patterns, often using statistical and computational techniques.

Signup and view all the flashcards

Function Words

Words that help connect words, phrases, and clauses in a sentence. They are not directly involved in the meaning of the sentence but help establish grammatical relationships.

Signup and view all the flashcards

Function Word Analysis

Analyzing and comparing function word frequencies to determine the most likely author of a text.

Signup and view all the flashcards

Function Word Style Differences

The differences in the use of function words between authors.

Signup and view all the flashcards

Function Word Signature

The specific way an author uses function words, which may reflect their education, time period, or personal preferences.

Signup and view all the flashcards

Function Word Frequency

The frequency of a certain function word within a text.

Signup and view all the flashcards

Function Word Attribution

A method used to determine the authorship of a text by analyzing and comparing the frequency of specific function words used.

Signup and view all the flashcards

Function Word List

A set of words that are commonly used in written text. They are typically short and frequently used, such as prepositions, conjunctions, and pronouns.

Signup and view all the flashcards

Function Word Selection

The process of selecting the most relevant function words for analysis in a specific attribution study.

Signup and view all the flashcards

Attributing the Authorship of Henry VI Plays

Studying the frequency of specific function words to determine the authorship of plays attributed to William Shakespeare.

Signup and view all the flashcards

Henry VI Plays

A set of plays with disputed authorship, which may have been written by Shakespeare or other playwrights of his time.

Signup and view all the flashcards

The Alchemist (Jonson)

A play written by Ben Jonson, known for satirical humor and witty dialogue.

Signup and view all the flashcards

The Magnetic Lady (Jonson)

A play by Ben Jonson that features a love triangle and explores the concept of self-deception.

Signup and view all the flashcards

Bartholomew Fair (Jonson)

A comedy by Ben Jonson that satirizes the follies of society. It features a variety of colorful characters and is famous for its lively atmosphere.

Signup and view all the flashcards

Every Man in His Humour (Jonson)

A play by Ben Jonson that portrays a world where people are driven solely by their obsessions.

Signup and view all the flashcards

Bussy D’Ambois (Chapman)

A play by George Chapman that follows a story of betrayal, revenge, and the consequences of ambition.

Signup and view all the flashcards

A Chaste Maid in Cheapside (Middleton)

A play by Thomas Middleton that uses the setting of a London marketplace to examine themes of love, marriage, and social mobility.

Signup and view all the flashcards

A Game at Chess (Middleton)

A play by Thomas Middleton that is a political allegory set during the reign of James I.

Signup and view all the flashcards

The Revenger’s Tragedy (Middleton)

A play by Thomas Middleton that is well known for its violent and revenge-driven plot.

Signup and view all the flashcards

Study Notes

Attributing Authorship of Henry VI Plays

  • Study by word adjacency, analyzing how frequently words appear together.
  • Authors have distinct patterns in word adjacency/collocations.
  • Shannon's theory of information helps quantify how predictable word choices are.
  • A word adjacency network (WAN) analyzes word proximity.
  • Rare word proximity is studied, as well as frequency of common words.
  • Authors have unique word patterns that can be quantitatively measured using WANs.

Methodology

  • Analyze word placement in phrases and collocations (words appearing near each other).
  • Use software for corpus comparison for large-scale analyses.
  • Account for the size differences in the examined texts.
  • Statistical analysis of lexical choices is employed to distinguish authors.
  • Word adjacency networks can identify subtle differences in style.
  • Focus on patterns, rather than individual words or frequencies.
  • Normalize data for meaningful comparisons across texts.

Results

  • A set of plays attributed to authors.
  • The study supports the theory that authors have unique word placement patterns, quantifiable through word adjacency.
  • Shows patterns of collocations and phrases to differentiate authors.
  • Successfully attributes authorship in contentious cases.

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

More Like This

Use Quizgecko on...
Browser
Browser