Lecture 7 Natural Language Processing (NLP) PDF

Artificial Intelligence Natural Language Processing (NLP) Lecture # 7 Fall 2024 By. Dr. Shahzad Ashraf Associate Professor Overview of  Machine learning  Deep learning  Neural network  Generative Ai Machine learning The branch of artificial intelligence (AI) that focuses on developing algorithms that enable computers to learn patterns from data and make predictions/ decisions without being explicitly programmed. Types: Supervised Learning (The model is trained on labeled data (e.g., spam email detection) Unsupervised Learning (The model identifies patterns in unlabeled data (e.g., clustering customers). Reinforcement Learning (The model learns through trial and error to maximize rewards (e.g., gaming AI) Applications: Fraud detection, recommendation systems, and predictive maintenance. Overview of  Machine learning  Deep learning  Neural network  Generative Ai Deep learning It is a subset of machine learning that uses artificial neural networks with multiple layers (means "deep") to process and learn complex data patterns. Key Feature: It requires large datasets and significant computational power to train models effectively. Popular Architectures: Convolutional Neural Networks (CNNs) for image recognition and Recurrent Neural Networks (RNNs) for sequence data like text and time-series. Applications: Natural language processing, image and speech recognition, and autonomous vehicles. Overview of  Machine learning  Deep learning  Neural network  Generative Ai Neural network A type of machine learning, that uses a layered structure of interconnected nodes (neurons) to teach computers to process data in a way that mimics the human brain. Structure: The layers are composed of interconnected neurons. Input Layer: Takes in data. Hidden Layers: Processes the data using weights, biases, and activation functions. Output Layer: Produces the final result. Applications: Disease diagnosis, stock market prediction, and gaming AI. Overview of  Machine learning  Deep learning  Neural network  Generative Ai Generative Ai It refers to AI systems that can create new content, such as images, music, text/ videos, by learning from existing data. Core Mechanism: Often based on models like Generative Adversarial Networks (GANs) or Variational Autoencoders (VAEs). GANs: Consist of a generator (creates data) and a discriminator (evaluates authenticity). Large Language Models (LLMs): Used for text generation, such as OpenAI’s GPT. Applications: Content creation, drug discovery, game design, and synthetic data generation Natural Language Processing (NLP)  The field of artificial intelligence that focuses on the interaction between computers and human language.  It involves developing algorithms and models that enable machines to understand, interpret, and respond to text or speech in ways that are both meaningful and useful.  The programming languages work on their own principles, syntax, and keywords.  The aim of NLP is developing such systems that work on human natural language on oral as well-spoken language.  NLP combines the field of linguistics and computer science to decipher language structure and guidelines and to make models which can comprehend, break down and separate significant details from text and speech. Natural Language Processing (NLP) Components Natural Language Understanding (NLU) o It understands human language and converts it into data. o It is used for spoken orwritten language to provide a link between natural language inputs and what they present. o It analyses different aspects of language. --morphological analysis, --syntactic analysis, --semantic analysis, --discourse analysis, … Natural Language Generation (NLG) o It uses structured data and generates meaningful narratives out of it. o It helps to produce meaningful phrases and sentences along with Text planning, Sentence Planning, and Text realization. NL Understanding is much harder than NL Generation. But, still both of them are hard Natural Language Processing (NLP) Why NL Understanding is hard? Natural language is extremely rich in form and structure, and very ambiguous. – How to represent meaning, – Which structures map to which meaning structures. One input can mean many different things. Ambiguity can be at different levels. – Lexical (word level) ambiguity -- different meanings of words – Syntactic ambiguity -- different ways to parse the sentence – Interpreting partial information -- how to interpret pronouns – Contextual information -- context of the sentence may affect the meaning of that sentence. Many input can mean the same thing. Interaction among components of the input is not clear. 8 Natural Language Processing (NLP) Knowledge of Language Phonology – concerns how words are related to the sounds that realize them. Morphology – concerns how words are constructed from more basic meaning units called morphemes. A morpheme is the primitive unit of meaning in a language. Syntax – concerns how can be put together to form correct sentences and determines what structural role each word plays in the sentence and what phrases are subparts of other phrases. Semantics – concerns what words mean and how these meaning combine in sentences to form sentence meaning. The study of context-independent meaning. Pragmatics – concerns how sentences are used in different situations and how use affects the interpretation of the sentence. Discourse – concerns how the immediately preceding sentences affect the interpretation of the next sentence. For example, interpreting pronouns and interpreting the temporal aspects of the information. World Knowledge – includes general knowledge about the world. What each language user must know about the other’s beliefs and goals. 9 Natural Language Processing (NLP) Phases of NLP Natural Language Processing (NLP) Phases of NLP Lexical Analysis The sentences, paragraphs are broken into tokens. Tokens are the smallest unit of text. It scans the entire source text and divides it into meaningful lexemes. Example: He goes to school. would be divided as [ ‘He’ , ‘goes’ , ‘to’ , ‘college’, ‘.’] Key steps Identification of Word Boundaries --The process of determining where one word ends and another begins. Tokenization --The process of separating the text into individual tokens such as breaking text into individual words, characters, and subword units like prefixes, suffixes Normalization --Converting tokens to a standard form to reduce ambiguity Natural Language Processing (NLP) Phases of NLP Lexical Analysis Natural Language Processing (NLP) Phases of NLP Syntactic Analysis/ Parsing Analyze the grammatical structure of a sentence whether it is well- formed or not. Parsing refers to converting a flat input sentence into a hierarchical structure that corresponds to the units of meaning in the sentence. It identifies the syntactic relationships between words and phrases, such as subject-verb agreement, noun-verb relationships, and prepositional phrases Example: The cat sat on the mat "The" (Determiner - DT) "cat" (Noun - NN) "sat" (Verb - VBD) "on" (Preposition - IN) "the" (Determiner - DT) "mat" (Noun - NN) Natural Language Processing (NLP) Phases of NLP Semantic Analysis It is a process of extracting meaning or significance of words, phrases, or sentences. It involves understanding the underlying concepts, relationships, and context of the text. Example: The sentences “I ate hot ice cream” “colorless green idea” will get rejected by the semantic analyzer because it doesn’t make sense. It also determine the emotional tone of a text, such as positive, negative, or neutral. Challenges Ambiguity: Many words have multiple meanings, making it difficult to determine the correct interpretation. Contextual Dependence: The meaning of a word can vary depending on the context in which it is used. Natural Language Processing (NLP) Phases of NLP Discourse Integration It analyze the relationships between sentences within a larger text. It allows models to comprehend the overall meaning and context of a piece of text, beyond the individual sentence level. Example: He wanted that The word “that” in the sentence depends upon the prior discourse context. Bilal went to the store. He bought some milk. Here, "he" refers to Bilal Challenges --Many linguistic phenomena, such as pronouns and ellipsis, can be ambiguous. --often requires knowledge of the broader context, including world knowledge and previous discourse. Natural Language Processing (NLP) Phases of NLP Pragmatic Analysis It is the process of interpreting language in its context, considering factors like speaker intent, world knowledge, and social cues. It's about understanding the implied meaning behind words, rather than just their literal definitions. Example: Close the window? should be interpreted as a request instead of an order. Challenges --Implied meaning can be highly subjective and vary depending on individual interpretations. --Language can be ambiguous, leading to multiple possible interpretations. --Pragmatic cues can vary across cultures, making it difficult to generalize models. Natural Language Processing (NLP) Applications of NLP The End

Lecture 7 Natural Language Processing (NLP) PDF

Document Details

Tags

Related

Summary

Full Transcript