Podcast
Questions and Answers
What is the linguistic term for an individual's unique version of a language?
What is the linguistic term for an individual's unique version of a language?
- Dialect
- Sociolect
- Idiolect (correct)
- Vernacular
The concept of linguistic fingerprinting guarantees a perfect match in forensic authorship analysis.
The concept of linguistic fingerprinting guarantees a perfect match in forensic authorship analysis.
False (B)
What case provided an early example of the forensic application of idiolectal co-selection?
What case provided an early example of the forensic application of idiolectal co-selection?
Unabomber case
De Morgan hypothesized that average ______ length would be writer-specific and virtually constant.
De Morgan hypothesized that average ______ length would be writer-specific and virtually constant.
Match the following linguists with their contributions to authorship attribution:
Match the following linguists with their contributions to authorship attribution:
Which of the following factors did Winter and Woolls consider to be potentially significant in authorship attribution?
Which of the following factors did Winter and Woolls consider to be potentially significant in authorship attribution?
Honore's formula for vocabulary richness accurately differentiates between open and closed set items.
Honore's formula for vocabulary richness accurately differentiates between open and closed set items.
What is the term for words that a writer uses only once in a text?
What is the term for words that a writer uses only once in a text?
______ is a statistical approach to authorship analysis that utilizes the cumulative sum of deviations from the average.
______ is a statistical approach to authorship analysis that utilizes the cumulative sum of deviations from the average.
According to psychological evaluations, what is a limitation of the CUSUM method?
According to psychological evaluations, what is a limitation of the CUSUM method?
The CUSUM method has consistently demonstrated its reliability in distinguishing between single and multiple authors.
The CUSUM method has consistently demonstrated its reliability in distinguishing between single and multiple authors.
In their analysis of The Federalist Papers, what preference did Mosteller and Wallace discover that distinguished Hamilton from Madison?
In their analysis of The Federalist Papers, what preference did Mosteller and Wallace discover that distinguished Hamilton from Madison?
Matthews and Merriam trained a ______ network to distinguish between Shakespeare's plays and those by Marlowe.
Matthews and Merriam trained a ______ network to distinguish between Shakespeare's plays and those by Marlowe.
Match the following discourse markers with the musician who used them more frequently in press interviews, according to Kredens:
Match the following discourse markers with the musician who used them more frequently in press interviews, according to Kredens:
What is the first major question an analyst may be asked when tasked with looking for consistency in authorship?
What is the first major question an analyst may be asked when tasked with looking for consistency in authorship?
Mistakes and errors, as defined by Corder, are not useful as authorship markers.
Mistakes and errors, as defined by Corder, are not useful as authorship markers.
According to McMenamin, what is required for authorship attribution, given that unique markers are rare?
According to McMenamin, what is required for authorship attribution, given that unique markers are rare?
McMenamin labels the task of comparing a ransom note with a set of writings looking for ______.
McMenamin labels the task of comparing a ransom note with a set of writings looking for ______.
In the JonBenét Ramsey case, what method did McMenamin use to elicit richer data from the suspects?
In the JonBenét Ramsey case, what method did McMenamin use to elicit richer data from the suspects?
Linguistic analysis played no role in Derek Bentley's conviction for murder.
Linguistic analysis played no role in Derek Bentley's conviction for murder.
In the Derek Bentley case, what phrase in the confession was used to imply Bentley knew Craig had a gun?
In the Derek Bentley case, what phrase in the confession was used to imply Bentley knew Craig had a gun?
In the Derek Bentley case, frequent use of the word '______' in the confession was considered a feature of police register.
In the Derek Bentley case, frequent use of the word '______' in the confession was considered a feature of police register.
What did evidence show in Derek Bentley's confession regarding a negative response?
What did evidence show in Derek Bentley's confession regarding a negative response?
What is a phrase that indicates possible coerced input in the Derek Bentley statements?
What is a phrase that indicates possible coerced input in the Derek Bentley statements?
The use of a word is never an indication of a person's specific language use.
The use of a word is never an indication of a person's specific language use.
In the passage about On textual borrowing, the author claims the artist concealed the relationship between texts that ______ otherwise create with source materials.
In the passage about On textual borrowing, the author claims the artist concealed the relationship between texts that ______ otherwise create with source materials.
Looking at textual borrowing from the outside, what is a potential challenge concerning proof of plagiarism?
Looking at textual borrowing from the outside, what is a potential challenge concerning proof of plagiarism?
All artistic plagiarism is a criminal offense.
All artistic plagiarism is a criminal offense.
When did text analysis become something that was seen as plagiarism?
When did text analysis become something that was seen as plagiarism?
One possible issue with relying too heavily on text messaging is the ______ of their abbreviations.
One possible issue with relying too heavily on text messaging is the ______ of their abbreviations.
Match the type of information with how it could used in text analysis:
Match the type of information with how it could used in text analysis:
What concept defines how a reader allows a writer time to clarify a potentially combative point in writing?
What concept defines how a reader allows a writer time to clarify a potentially combative point in writing?
Pace isn't as easily seen or measured as other attributes of writing.
Pace isn't as easily seen or measured as other attributes of writing.
What term did text material use to indicate the decision to relexicalize while discussing one topic?
What term did text material use to indicate the decision to relexicalize while discussing one topic?
Texts can have a certain number of ______ which can cause difference in statistics of style.
Texts can have a certain number of ______ which can cause difference in statistics of style.
Flashcards
Idiolect
Idiolect
A person's individual and unique version of a language, manifested through distinctive and idiosyncratic choices in speech and writing.
Linguistic Fingerprinting
Linguistic Fingerprinting
The idea that linguistic patterns can uniquely identify an author, similar to a physical fingerprint.
Idiolectal Coselection
Idiolectal Coselection
The use of individual linguistic habits and patterns to attribute authorship.
CUSUM
CUSUM
Signup and view all the flashcards
Hapax Legomena
Hapax Legomena
Signup and view all the flashcards
Specific Analyses
Specific Analyses
Signup and view all the flashcards
Mistakes
Mistakes
Signup and view all the flashcards
Errors
Errors
Signup and view all the flashcards
Qualitative Analysis
Qualitative Analysis
Signup and view all the flashcards
Quantitative Analysis
Quantitative Analysis
Signup and view all the flashcards
Resemblance Analysis
Resemblance Analysis
Signup and view all the flashcards
Significant Features
Significant Features
Signup and view all the flashcards
Elegant variation
Elegant variation
Signup and view all the flashcards
Study Notes
- Native speakers possess individual language versions, termed idiolects, influencing speech and writing choices.
- Bloch is credited with the first use of the term idiolect in 1948.
- The concept traces back to Coleridge's early nineteenth-century work, Biographia Literaria.
- Speakers develop large vocabularies over time, leading to variations in word selection preferences.
- Speakers tend to make typical and individuating co-selections of preferred words.
- The concept of a linguistic fingerprint is a misleading metaphor in the context of authorship investigations.
- Linguistic samples provide only partial information about the creator, even large samples.
- Forensic linguists often analyze short texts, like suicide notes and ransom demands.
- Texts often contain clues that restrict the number of possible authors.
- The task involves selecting or deselecting from a small number of candidate authors.
The Unabomber Case
- The Unabomber case provides an example of idiolectal coselection.
- From 1978 to 1995, the Unabomber sent bombs through the mail, targeting university and airline workers.
- In 1995, six national publications received a 35,000-word manuscript, Industrial Society and its Future, from the Unabomber along with an offer to stop sending bombs if the manuscript were published.
- The Washington Post published the manuscript, and someone recognized the writing style as his brother's.
- 'Cool-headed logician' was identified as a distinct idiolectal preference.
- The FBI found a 300-word newspaper article written a decade earlier with similar language, indicating common authorship.
- Robin Lakoff argued that shared vocabulary does not confirm authorship.
- Lakoff singled out 12 common words and phrases that could occur in any argumentative text.
- The FBI found millions of documents containing one or more of the 12 items on the internet.
- Narrowing the search to documents with all 12 items resulted in only 69 documents.
- All 69 documents proved to be versions of the manifesto.
- This was a rejection of open authorial choice, and example of idiolectal coselection
Early Interest in Authorship Attribution
- The question of authorship has been interrogated for over 2,000 years.
- Davis humorously reported an early unsuccessful attempt in Greece in the fourth century BC.
- Philosophers Heraklides and Dionysius had a falling out, and Dionysius wrote a tragedy and presented it as a recently discovered work.
- Heraklides, renowned for literary criticism, affirmed the play was written by Sophocles.
- Dionysius revealed his authorship, but Heraklides insisted on his initial judgment.
- Dionysius asked how the acrostic of the first letters of the first eight lines of the play spell out the name of his lover.
- Heraklides points out that the acrostic of the first letters of the next lines form a couplet.
- The next lines contain another acrostic, where 'Heraklides knows nothing whatsoever about literature'.
Linguistic Regularities
- De Morgan created a proposal in 1851 on how to solve authorship questions by accessing individual linguistic regularities.
- Average word length, measured by letters per word, was hypothesized to be writer-specific and virtually constant.
- Mendenhall counted the length of words from the Pauline letters, works by Shakespeare, Marlowe, and Bacon in 1887.
- Word length scores for Marlowe's later plays correlated more closely with Shakespeare's plays.
- Neither Mendenhall nor anyone else re-used or developed the method.
- Word length was one of 11 authorship markers that survived Grant's reliability tests.
- In 1938, Yule proposed average sentence length as a marker likely to discriminate well.
- A study by Winter and Woolls in 1996 combined this measure with lexical richness.
- Both of these markers were among those approved by Grant.
- Winter and Woolls were challenged by a literature colleague to distinguish between the style of two late-Victorian authors who had both written a novel.
- Provided with 1,000 words from the first five and last six chapters, along with 2,500 words from another novel.
- Potential markers that were analyzed were average sentence length and lexical richness.
- The suggestion was that the sentence boundary acts as an interaction point, or clarifies a writer's contentious point.
- Pace is a feature under the writer's direct control, relating new content to new vocabulary.
- This is amplified by elegant variation, which is the decision to relexicalize while discussing the same topic.
Honoré (1979) on Vocabulary Choice
- Honoré conducted an early study on vocabulary choice
- Honore stated the frequency of hapax legomena is a measure of vocabulary richness.
- He created a formula: 100 × log N/(1 – V₁/ V) - N is total length, V₁ is hapaxes, V is vocabulary types.
- Honoré confl ated open and closed set by measuring lexical and grammatical items together.
- The four most frequent grammatical items account for 14% of written text.
- Winter and Woolls resolved to measure only lexical richness for comparing texts.
- Results of individual 1,000 word extracts showed sentence length and lexical richness.
Winter and Woolls
- Winter and Woolls indicated a stylistic difference between the two authors being measured..
- The penultimate chapter was found to have scores comparable with those of chapters 2 and 4.
- The scores for chapter 33 seemed to fit with those that had been gathered in chapters 1, 3, and 5.
- Winter and Woolls suggested the authors collaborated on writing them due to the scores for the remaining four chapters, 28–31.
- The analysis indicated the authors also shared the same author because of the single author novel and the comparable values.
- The book was revealed to be Adrian Rome, with authors Arthur Moore and Edward Dowson, and Dowson also authored Souvenirs of an Egoist.
- Letters confirmed chapter authorship and timeline.
- Coulthard reports using the methodology to compare the style of six 1,000-word extracts.
- Clemit and Woolls investigated the authorship of eighteenth-century pamphlets, considering lexical richness and hapax dislegomena.
- The analysis assigned the texts to William Godwin.
Cusum
- Morton and Michaelson (1990) describe a statistical approach to authorship to be used in court.
- The approach revives de Morgan's claim on accurate 'habits'.
- Counter-intuitive to linguists, habits measured were nouns, words beginning with a vowel, and words consisting of two-to-four letters in each sentence.
- A CUmulative SUM of the deviation of both from the average for the whole text was created in a calculation that was made separately.
- The method itself was thus labelled, CUSUM.
- Graphs are superimposed on one another from the resulting scores.
- To exemplify the method, a text is imagined with an average of 12 words per sentence and an average of five two-and three-letter words per sentence.
- For the first three sentences of the text we have 20, 12 and 6 words long and contain respectively 7, 5 and 3 two- and three-letter words.
- Assumption: The assumption is that as habits are constant, the graphs made for multiple measurements will shadow each other.
Method Limitations
- The method, which appeared to work without a linguistic basis, was disturbing for linguists.
- Farringdon in 1996 compared the method to fingerprint analysis.
- Academic psychologists tested the method and yielded devastating results, proving the initial assumption to be incorrect.
- Morton in 1995 rejected their observations.
- Canter and Chester in 1997 set out to test the ability of CUSUM to reliably detect whether a text had a single or multiple authors.
- Results were promising as unaltered texts were classified as written by a single author; unfortunately so were all of the multiple author texts.
- Hardcastle, a Home Office trained document examiner, concluded forensics scientists turn their attention to other methods.
More Authorship Investigation Methods
- Specific analyses center on markers that permeate all sections of text, but methods can focus on smaller sections looking for differential use.
- Mosteller and Wallace in 1964 analyzed the Federalist Papers, 85 essays published anonymously in 1787-8.
- The 85 essays were written by Alexander Hamilton, James Madison, and John Jay.
- A study assumed there were unique selection preferences involved, analyzing texts and finding the items with differentiating usages between them.
- Hamilton used 'upon' more than Madison, and Madison used ‘also’ more than Hamilton.
- This analysis assigned all 12 of texts that were in question to Madison.
- The Federalist Papers have been a testing ground for the latest stylometric methods since then.
- The computer program "learns" from its mistakes because one of its models is the human brain.
- Training via a neural network has been used to discern the plays of both Shakespeare and Marlowe.
- Shakespeare adopted the play Henry VI, Part 3 from an earlier work by Marlowe.
- Studies provide evidence for potential differential usage, with Kredens indicating the difference between Smith and Morrissey in 3 out of 5 discourse markers.
Error and Mistake
- Authorship studies all work under the assumption that all speakers/writers are unique in their linguistic selections.
- The investigations have concerns with variation, both intra-speaker and inter-speaker.
- One of the distinctive features of the author's texts when the account has no spell-checker is 'teh' instead of the and '-ign' for '-ing'.
- Majority of items are checked, but imperfection causes errors.
- Corder in 1973 suggested to categorize problems of language learners by creating mistake and error categories.
- The mistakes are instances where the producer knows of the deviation and can possibly be corrected.
- Unlike mistakes, errors are when the producer has acquired a different rule from the standard system.
- Unique markers are rare, with authorship being the identification of a collection of markers.
- McMenamin found 300 style markers that has been used in some 80 authorship cases.
- Classified as: Text Format, Numbers and Symbols, Abbreviations, Punctuation, Capitalization, Spelling,Word Formation, Syntax, Discourse, Errors and Correction, and High Frequency Words and Phrases.
- McMenamin states there are two major authorship questions: looking for consistency, and ‘looking for resemblance'.
- The first will determine if a text or collection of texts have the same author while the second investigates a case where authorship is unknown.
- Qualitative traits are identified and described, while quantitative indicators are identified and then measured through relative frequency.
- McMenamin (2002:77) exemplifies the qualitative approach with a case in which the questioned author consistently spelled the name Mary Ann as two words.
- For the quantitative approach an assessment of rarity and significance is carried out.
- The results showed that the version Ca. as opposed to Ca., CA., Ca, ca and ca. occurred in 11 percent of the 686 addresses examined.
Analysing Cases
- An ideal forensic world would have a substantial amount of known text to work with.
- Forensic world = texts are unhelpfully short.
- McMenamin (2002: 181-205) examplifies a analysis with the in the 1996 JonBenét Ramsey murder case.
- McMenamin was asked to compare the ransom note with a set of writings.
- Analysis indicated a series of idiosyncrasies.
- As the samples of comparable text from the suspects was limited, McMenamin decided to elicit rich data.
- Elicited both versions where he dictated the text, and had them copy.
- He found differences between the ransom and father's note and the note and mother's, while observing the stylistic differences were consistent for pre-crime writings.
- Quantitative: number of writers in a population have the profile of variables identified in the note.
Comparing Styles
- Compared style features of ransom and corpus of handwritten texts from American Writing Project.
- Isolated six variables because they were frequent in the corpus and Mrs. Ramsey differed.
- The likelihood of the six co-occurring in the same text by chance was one in 10,000.
- Both qualitative and quantitative measures, thus, support the opinion that neither Mr nor Mrs Ramsey had written the ransom note.
- Report of a letter of similar analysis which lead to husband being found to be the writer.
- It is not unusual for the expert to use more than one approach.
- The case also illustrates two more approaches to possible multiple authorship.
- The police was there to arrest 2 teenagers, Derek Bentley aged 19 and Chris Craig, when Craig starting shooting and killing a policeman.
- Craig and Bentley were both trialed with murder, and Bentley was jointly charged.
- A trial lasted 2 days, and the two were found guilty, despite Craig shooting during the police apprehension.
- Bentley's relatives sued to drop the verdict.
- The prosecution said Derek did kill by claiming he had a gun.
- Exact 8.1 indicates some evidence as it relates to the events that happened during that time.
- Police officers were to ask no substantive questions during the event.
- Three police officers said it was an unaided monologue.
Interpreting Narrative
- One example of interpreting narrative from monologue involved what happened during the crime, an observation on their knowledge, what they thought and what had happened.
- It would have formed some meta narrative, and it most likely came from a series of clarificatory questions.
- What can be deduced with narrative questions requires an intricate process that can only be seen through clear textual evidence.
- It means that Bentley really didn’t see anything any more, according to the police report, and there were clear events that had occurred.
- It’s apparent from just the available, textual evidence, that the actual context of the situation must be analyzed due to the fact what did not, was not included.
- There would need to be evidence to prove some sort of way to have inferred to people about certain events, and a great example is Bentley telling his friends, which must have been a clear detail based on the actual event.
- Bentley’s knowledge about the gun would come out in full capacity due to the questioning by the police officer about it.
- The knowledge was presented in Extract 8.2, and if knowledge of it would need to be discussed, the loaded gun part would come in advance too.
- Bentley stated it with an explanation on the logic, sequence and information that was available to tell.
- A Corpus analyzed with register indicated how Bentley had used what information was available.
Corpus Assisted Analysis
- One of the features that was indicated was the meaning of “then", temporal meaning.
- It was apparent through looking at the witness statements.
- Bentley’s unusual usage seemed unusual at the time.
- The investigation could derive with the way of accurate temporal meaning
- With smaller portions that was used by officer and police.
- The comparative result meant there was information not said which can be stated through 78 words.
- A reference corpus had over 1.5 million running words.
- It was more remarkable to put evidence in that area with this Bentley.
- The phrase gave an odd feel, not ordinary or speaking.
- It has shown to include a structure on the verb, typically police register.
- It helps find the structure with Craig and it happened by the officer.
- All are examples from police style.
- Added support was brought forth it the police confirmed Bentley for the joint authorship and that it all was undermined with “let him at Craig!”
- The Lord Chief was able to put evidence and prove that things would be showed had the conviction been allowed.
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.