Introduction to Forensic Phonetics Applications 2024 PDF
Document Details
Uploaded by RecommendedMoose641
2024
Dr. Hanady Mansour
Tags
Summary
This document is a lecture for an introduction to forensic phonetics applications in 2024. It covers various topics, including speech recognition, AI, and multimodal technologies. The summary provides an overview of the course's content.
Full Transcript
Introduction to Forensic phonetics applications Week 1 2024 Dr. Hanady Mansour Phonetics and linguistics Forensic Program Aims ▪ SPEECH AND LANGUAGE TECHNOLOGY ▪ SPEECH RECOGNITION AND AI ▪ INTRODUCTION: APPLICATIONS IN PHONETIC FORENSIC ▪...
Introduction to Forensic phonetics applications Week 1 2024 Dr. Hanady Mansour Phonetics and linguistics Forensic Program Aims ▪ SPEECH AND LANGUAGE TECHNOLOGY ▪ SPEECH RECOGNITION AND AI ▪ INTRODUCTION: APPLICATIONS IN PHONETIC FORENSIC ▪ COURSE TOPICS ▪ COURSE RESOURCES ▪ WHO ARE MENTORS? ▪ WEEK 1 OUTCOME : SYLLABUS, ASSIGNMENTS AND GRADES ▪ NEXT WEEK LECTURE PRE- READ Brainstorm ▪ In one sentence describe the following picture: Amazon Alexia voice services: https://www.youtube.com/watch?v=ofayz2v5zWM Watch the video and notice how the man and the device communicate? Your comments Speech Recognition AI and Speech Recognition What is forensic phonetics analysis Applications? ▪ Remember ➔Forensic phonetics is the application of the knowledge, theories and methods of general phonetics to practical tasks that arise out of a context of police work or the presentation of evidence in court, as well as the development of new, specifically forensic- phonetic, knowledge, theories and methods. How?!! ▪ There are two main types of expert analysts: linguists and phoneticians. These experts use a combination of software, expertise and statistical approaches in their analyses. Computer scientists have developed technologies to automate linguistic and phonetic analyses. AI ✓ Try to find Examples!!! These approaches do not require an expert to implement them but do need expert interpretation. What is language technology? Make a guess! Which areas , applications, tools will you study in this course? What are Language technologies?! Multimodal and multimedia technologies ▪ Multimodal and multimedia technologies in the context of speech and language technology refer to systems that integrate multiple types of input and output data. ▪ These technologies are often combining various sensory or communication modalities like speech, text, images, and gestures. Guess Why?!!! ▪ These technologies aim to enhance the naturalness, accuracy, and usability of human-computer interaction by making systems more adaptable to human communication, which typically involves multiple modes. Multimodal and multimedia technologies ▪ Multimodal and multimedia technologies are key advancements in speech and language technology. ▪ By combining various forms of input, output, and content, these technologies create more engaging, natural, and effective systems for communication, learning, and interaction. They are at the core of innovations like intelligent virtual assistants, language learning tools, and accessibility applications. Examples: Multimodal AI Systems: Virtual assistants or customer service bots that use speech, visual data, and text inputs to interact with users more naturally. Interactive Language Tutors: Platforms that use speech recognition, visual feedback, and multimedia content (videos, images, quizzes) to teach languages interactively. Speech Recognition with Visual Context: Systems that not only transcribe speech but also incorporate visual context (e.g., objects in a room) to understand and respond accurately. !. Multimodal Text to speech ▪ Here is the diagram representing a multimodal Text-to-Speech (TTS) system, illustrating how various inputs (speech, text, and images) are processed by a TTS engine to produce multimodal outputs (speech synthesis, text display, and facial animation). What is Speech and Language Technology? ▪ SLT applies computational methods to process and understand human speech and written language. ▪ Subfields: Speech Processing & Natural Language Processing (NLP). ▪ Speech and Language Technology comprises numerous aspects of human- computer interaction, such as speech and voice recognition, predictive text, voice-command interfaces(Siri, Alexa..), spell and grammar checkers, document summarization, and text-to-speech synthesis. ✓ Students' comments =➔ Each relies on automated parsing and analysis of human language! ▪ Advances over the past decade have made computers, appliances, and communication devices more efficient, accessible, functional, and user- friendly. Students =➔ who will achieve those targets!!!! Key Applications ▪ Automatic Speech Recognition (ASR): Converting speech to text (e.g., Siri, Alexa). ▪ Text-to-Speech (TTS): Transforming text into speech (e.g., accessibility tools). ▪ Natural Language Understanding (NLU) & Machine Translation (e.g., Google Translate). ▪ Dialogue Systems and Chatbots: Virtual assistants and conversational agents. Speech Processing Basics ▪ Acoustic Phonetics: Physical properties of speech sounds. ▪ Digital Signal Processing (DSP): Converting speech into digital data. ▪ Key Steps in ASR: Feature extraction, Acoustic modeling, Language modeling, Decoding. ▪ Students ➔ Can you predicate a definition of Speech processing? 1.1 Speech processing ▪ Speech processing is the science concerned with how speech communication works: how speech is produced by the speaker and understood by the listener. ▪ It is also concerned with how these processes can be analysed and modelled, and with how these models can be used to develop technologies that also produce and understand speech (synthetic voices, speech recognisers). ▪ Also, the Speech technology involved are fundamental to the understanding and remediation of disordered speech. ▪ The science of speech Technology is thus at the intersection of many disciplines, particularly linguistics, psychology, acoustics, and engineering. 1.1.1 Speech processing ▪ Systems Minimally, core speech technology refers to the transcription of speech to words For : ✓ Automatic Speech Recognition, ASR. ✓ Text-to-Speech, TTS: The generation of speech from written words. ▪ language generation : The translating concept to words. The understanding of what the words mean (spoken language understanding, akin to the parsing of written language technology, How this works for Arabic?!). ▪ Speaker verification and voice print : a number of related technologies, for example, are nearly as fundamental. ▪ Each of these core technologies is a research area within computer science. The main goal is basically to find the most efficient algorithms under strictly controlled circumstances. Language Processing Basics ▪ Tokenization: Splitting text into words/phrases. ▪ Part-of-Speech Tagging: Identifying grammatical roles of words. ▪ Parsing: Analyzing grammatical structure. ▪ Named Entity Recognition (NER) & Semantic Analysis. ▪ Students ➔ Can you predicate a definition of Language processing? How does NER solve the translation issues? 1.2 Language processing ▪ Language processing, in parallel, deals with computational theories of grammar and meaning, and provides access to fundamentals of linguistics as a science and as an engineering discipline. ❖ What is the grammar? ▪ As a science, it is concerned with the fact that language is used as a medium for thought as well as for communication. ▪ As an engineering discipline, it is concerned with tools that work in systems as a predictive text in telephones, an automated personal assistants, a web search, Sentiment analysis and so on. Key Technologies in Speech and language Technology (SLT) ▪ ASR: How it works & Major systems (e.g., Google Speech, DeepSpeech). ▪ TTS: Pipeline & Popular engines (e.g., Amazon Polly, Festival , Phonetics students 3 rd year 2024). 1.3 conclusion ▪ This course is cross-listed between two sciences : the Linguistics and the Computer. ▪ (SLT) is also known as Speech and Language Processing (SLP) ▪ SLT systems needs ▪ NLP engine built with ▪ Model and algorithm ▪ More details in the first chapter next lectuer !! Real-World Use Cases ▪ Speech Assistants (Siri, Alexa, Google Assistant). ▪ Real-time Transcription (e.g., Otter.ai). ▪ Machine Translation (DeepL, Google Translate). ▪ Healthcare and Accessibility. Which areas , applications, tools will you study in this course ▪ Topic: ▪ Introduction to Forensic Phonetics and linguistics Applications. ▪ Speaker Identification and Speaker verification, Voice print and the document examiner (Audacity software) ▪ Challenge of Spoken Arabic Processing: ▪ Speech synthesis (Using praat) ▪ Arabic Text to speech ▪ Speech recognition system (using Google cloud) ▪ Data in forensic phonetics Syllabus ▪ Course Aim ▪ Course Topics ▪ Assignment ▪ Resources ▪ Microsoft team channels Pre- read for the next lecture Kindly , read and summarize the following chapter, it is been loaded at the team ▪ Sinha.S (2015): Forensic Linguistics and Forensic Phonetics: An Introduction, ▪ International Journal of Interdisciplinary and Multidisciplinary Studies (IJIMS), 2015, Vol 2, No.6, 153-157.