AI App Lec 4 PDF
Document Details
MITU
2024
Dr. Tarek Abdul Hamid
Tags
Summary
This document is a lecture on Natural Language Processing (NLP) as part of an Artificial Intelligence course. Slides cover concepts like ways of communication between humans and computers, formal language and natural language differences, examples of NLP applications and areas of Artificial Intelligence.
Full Transcript
DST301 Artificial Intelligence Applications Fall 2024 Lecture 04 – Natural Language Processing (NLP) Instructor: Dr. Tarek Abdul Hamid What is Language???? Way of Communication Speaker Listener 2...
DST301 Artificial Intelligence Applications Fall 2024 Lecture 04 – Natural Language Processing (NLP) Instructor: Dr. Tarek Abdul Hamid What is Language???? Way of Communication Speaker Listener 2 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Difference between Formal and Natural Languages Natural languages are the languages that people speak, such as English, Spanish, and French. They were not designed by people and they evolved naturally. The grammar of a natural language is incredibly complex and discovered through empirical investigation 3 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Difference between Formal and Natural Languages Formal languages are languages that are designed by people for specific applications. For example, the notation that mathematicians use is a formal language that is particularly good at denoting relationships among numbers and symbols. Chemists use a formal language to represent the chemical structure of molecules. Java is an example of formal language, an artificial language and the grammar of an artificial language (like Java) is incredibly simple. We don’t discover the grammar of an artificial language, we stipulate it — we define it however we want. 4 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Difference between Formal and Natural Languages Although formal and natural languages have many features in common — tokens, structure, syntax, and semantics — there are many differences: Ambiguity: Natural languages are full of ambiguity, which people deal with by using contextual clues and other information. Formal languages are designed to be nearly or completely unambiguous, which means that any statement has exactly one meaning, regardless of context. Redundancy: In order to make up for ambiguity and reduce misunderstandings, natural languages employ lots of redundancy. As a result, they are often verbose. Formal languages are less redundant and more concise. Literalness: Formal languages mean exactly what they say. On the other hand, natural languages are full of idiom and metaphor. 5 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Difference Between Natural Language and Computer Language Natural Language >Computer language Ambiguous Non-ambiguous Context Sensitive Context free Informal Formal Descriptive Prescriptive Unstructured Structured Uncontrolled Controlled 6 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Types of languages Natural languages Also called: Informal language Unstructured language Non-regular language Computer languages Also called: Formal language Structured language Regular language 7 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Ways of Communication b/w Users and Computers To Know the language of computer Infancy To Know a third language Youth High-level languages To communicate through a human natural language Maturity Human-Computer Interaction (HCI) Fifth generation languages 8 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Natural Language Processing Natural language processing (NLP) is a subfield of Artificial Intelligence and Computational Linguistics. It studies the problems of automated generation and understanding of Natural Human Languages. OR Getting computers to understand and communicate in everyday language is known as Natural Language Processing. 9 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Computational Linguistics Computational linguistics is an interdisciplinary field dealing with the statistical and/or rule-based modeling of natural language from a computational perspective. 10 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Major Areas of NLP Natural Language Understanding (NLU) Natural Language Generation (NLG) NLP NLU NLG 11 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Natural Language Understanding (NLU) Natural-Language-Understanding systems convert samples of human language into more formal representations that are easier for computer programs to manipulate. Natural-language understanding is sometimes referred to as an AI-complete problem, because natural-language recognition seems to require extensive knowledge about the outside world and the ability to manipulate it. 12 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Natural Language Generation (NLG) Natural-Language-Generation systems convert information from computer databases into normal-sounding human language. 13 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Why NLP Hard? 14 Artificial Intelligence Applications Dr. Tarek Abdul Hamid A Tiny Sample of NLP Applications 15 Artificial Intelligence Applications Dr. Tarek Abdul Hamid NLP in Industry 16 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Chatbot 17 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Applications of NLP Automatic Summarization Information Extraction Information Retrieval Machine Translation Named Entity Recognition Natural Language Understanding Natural Language Generation Optical Character Recognition Question Answering Speech Processing Spoken Dialogue System Text Simplification Text to Speech 18 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Automatic Summarization Subfield of Machine Learning and Data Mining. Process of a text document with software, in order to create a Summary with the major points of the original document. Find a subset of data which contains the “Information" of the entire set. Possible for documents, image collections and videos. Two general approaches: Extraction and Abstraction. 19 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Automatic Summarization Types Extractive methods work by selecting a subset of existing words, phrases, or sentences in the original text to form the summary. Abstractive methods build an internal semantic representation and then use natural language generation techniques to create a summary that is closer to what a human might express. Such a summary might include verbal innovations. Extractive methods are appropriate for image collection and video summarization. 20 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Information Extraction The task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video could be seen as information extraction. 21 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Information Extraction (IE) 22 Artificial Intelligence Applications Takenfrom Stanford Online lecture on Dr. Tarek Abdul Hamid Natural Language Processing with Deep Learning Information Extraction 23 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Information Retrieval (IR) the activity of obtaining information resources relevant to an information need from a collection of information resources. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for metadata that describe data, and for databases of texts, images or sounds. Automated information retrieval systems are used to reduce what has been called information overload. Web search engines are the most visible IR applications. Other IR systems Solr, Lucene, Elastic Search. 24 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Machine Translation (MT) Sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another. 25 Artificial Intelligence Applications Dr. Tarek Abdul Hamid MT Approaches 26 Artificial Intelligence Applications Dr. Tarek Abdul Hamid 27 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Named-Entity Recognition (NER) Also known as entity identification, entity chunking and entity extraction. Subtask of information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. Most research on NER systems has been structured as taking an unannotated block of text, such as: Jim bought 300 shares of Acme Corp. in 2006. And producing an annotated block of text that highlights the names of entities: [Jim]Person bought 300 shares of [Acme Corp.]Organization in Time. 28 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Named-Entity Recognition (NER) 29 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Optical Character Recognition(OCR) A system that provides a full alphanumeric recognition of printed or handwritten characters at electronic speed by simply scanning the form. 30 Artificial Intelligence Applications Dr. Tarek Abdul Hamid An OCR System 31 Artificial Intelligence Applications Dr. Tarek Abdul Hamid QUESTION AND ANSWERING Type of Information Retrieval. Given a collection of documents, the system should be able to retrieve answers to questions posed in natural language. Find the answer to a question in a large collection of documents. The main aim of QA is to present the user with a short answer to a question rather than a list of possibly relevant documents. As it become more and more difficult to find answers on the WWW using standard search engines, question answering technology will become increasingly important. 32 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Speech Recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. The recognised words can be used as an input to next system, as for applications such as commands & control, data entry, and document preparation. They can also serve as the input to further linguistic processing in order to achieve speech understanding. 33 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Spoken Dialogue System Is an automated system that engages in a dialogue with a human user using spoken language as the medium of interaction. 1. Task-oriented: involves the use of dialogues to accomplish a task, e.g. making a hotel booking, or planning a family holiday. 2. Non-task-oriented: engaging in conversational interaction, but without necessarily being involved in a task that needs to be accomplished e.g conversational companion for the elderly. 34 Artificial Intelligence Applications Dr. Tarek Abdul Hamid Perpectivising NLP: Areas of AI and their inter-dependencies Knowledge Search Logic Representation Machine Planning Learning Expert NLP Vision Robotics Systems 35 Artificial Intelligence Applications Dr. Tarek Abdul Hamid