2-lang.pdf
Document Details
Uploaded by SnappySaturn2708
Tags
Full Transcript
HS1501 §2 Capabilities: language AI is capable of analyzing, processing, and generating speech or text. Natural language processing (NLP) is the study of techniques that enable computers to use both written and spoken human languages the ways human beings can. Similar techniques can be used for no...
HS1501 §2 Capabilities: language AI is capable of analyzing, processing, and generating speech or text. Natural language processing (NLP) is the study of techniques that enable computers to use both written and spoken human languages the ways human beings can. Similar techniques can be used for non-language applications, e.g., coding and music. Here is a summary of the major NLP capabilities of AI nowadays. named entity recognition (NER): extracting the names of persons, places, compa- nies, and more, and classifying them into predefined labels topic modelling: uncovering hidden topics from large collections of documents text categorization: sorting text into specific taxonomies text clustering: grouping text or documents based on similarities in content sentiment analysis: identifying, extracting, quantifying, and studying affective states and subjective information summarization: generating a short version of the input document that retains the important points information extraction: finding meaningful information in unstructured text entity resolution: identifying records in (internal or public) data sources that refer to the same real-world entities, and identifying relationships between these records translation: turning text from one language to another while retaining the meanings speech recognition: converting speech to text speech synthesis: converting text to speech natural language generation (NLG): transforming data into human language References: William D. Eggers, Neha Malik, and Matt Gracie. “Using AI to unleash the power of un- structured government data”. Deloitte, 16 Jan. 2019. https://www2.deloitte.com/us/en/insights/foc us/cognitive-technologies/natural-language-processing-examples-in-government-data.html. Last accessed: 18 Aug. 2024. Multiple contributors. “Sentiment analysis”. Wikipedia. https://en.wikipedia.org/wiki/Sentiment analysis. Last accessed: 18 Aug. 2024. We will look at some of these in more detail, check out the current level of the technology, explore a few applications, and discuss some challenges in this area. 15 2.1 Named entity recognition (NER) See how good NER is these days by following the steps below. 1. Go to the “displaCy Named Entity Visualizer” at https://demos.explosion.ai/dis placy-ent. 2. Enter (or copy-and-paste) some text in the text box. 3. Select the language the text is in in the dropdown menu. 4. Tick the kinds of entities you want labelled. 5. Click the ü button. 6. The model labels the selected kinds of entities below. Try out different inputs. Does it make (m)any mistakes? 2.2 Sentiment analysis Sentiment analysis often refers to the detection of polarity (e.g., positive or negative), emotion (e.g., angry, happy or sad), urgency, and intention (e.g., interested or not interested) in text or speech. Automatically analyzing customer feedback, such as opinions in survey responses and social media conversations, using sentiment analysis allows brands to better understand their customers, so that they can tailor products and services to meet their needs. Reference: MonkeyLearn. “Sentiment Analysis: A Definitive Guide”. https://monkeylearn.com/sentimen t-analysis/. Last accessed: 20 Jan. 2024. See a demonstration of AI sentiment analysis by following the steps below. 16 1. Go to Lexalytics’s “NLP Demo” page at https://www.lexalytics.com/nlp-demo/. 2. Select a category you want demonstrated in the Industry Pack section. 3. Select a text sample in the category you want analyzed. 4. Click the “Show Analysis” button. 5. An analysis report is shown, where the words indicating sentiments are highlighted. 6. In the different tabs, one can also see an analysis of the degrees and the topics of the sentiments. 7. Repeat the steps above with different selections and evaluate the results. Now try sentiment analysis on your own text by following the steps below. 17 1. Open Lettria’s Customer Sentiment Analysis page on Hugging Face at https://hugg ingface.co/spaces/Lettria/customer-sentiment-analysis. 2. Clear the “Customer Review” box. 3. Enter a piece of text in the “Paragraph” box (or copy-and-paste there a review from your favourite online restaurant guide). 4. Click the “Submit” button and wait for the process to end. 5. The model determines whether the sentiment is “POSITIVE”, “NEUTRAL” or “NEG- ATIVE”. 6. Click the “Clear” button and repeat the steps above with a different input. 7. Evaluate the quality of the outputs. 2.3 Summarization Extractive models perform “copy-and-paste” operations: they select relevant phrases of the input document and concatenate them to form a summary. They are quite robust since they use existing natural-language phrases that are taken straight from the input, but they lack in flexibility since they cannot use novel words or connectors. They also cannot paraphrase. Abstractive models generate a summary based on the actual “abstracted” content: they can use words that were not in the original input. This gives them a lot more potential to produce fluent and coherent summaries but it is also a much harder problem as you now require the model to generate coherent phrases and connectors. Reference: Romain Paulus. “Your TLDR by an ai: a Deep Reinforced Model for Abstractive Summarization”. Salesforce, 11 May 2017. https://blog.salesforceairesearch.com/your-tldr-by-an-ai-a-deep-reinfo rced-model-for-abstractive-summarization/. Last accessed: 18 Aug. 2024. 18 Try out a commercial AI summarizer by following the steps below. 1. Open Intellexer’s “Summarizer” at http://esapi.intellexer.com/Summarizer. 2. Click into the “Load Text” tab. 3. Enter some text you want to summarize into the text box. 4. Indicate how long you want the summary to be, in terms of a percentage of the original text or the number of sentences. 5. Click the “Summarize” button and wait for the model to load. 6. A summary is produced as requested. 7. Repeat the steps above with a different input and evaluate the quality of the outputs. Here is a demonstration using the article from https://www.channelnewsasia.com/co mmentary/ai-jobs-universal-basic-income-unemployment-support-3589421. Is the summarization performed extractive or abstractive? (It is extractive.) 2.4 Information extraction AI NLP allows extraction of useful information from Big Data, i.e., extensive data sets, such as the Internet, that are too large to be analyzed using traditional methods. For example, AI can analyze patent data to visualize the relationships between patents, and thus help investors make more informed decisions, as described in the video by Prof. Yu below. 3 min 33 sec 19 Question-answering AI can generate answers to given questions by querying a knowl- edge base. Closed-domain question answering deals only with questions under a specific domain, e.g., medicine and law, while open-domain question answering deals with factual ques- tions about nearly everything. Prof. Yu demonstrates closed-domain question answering in the video below. 2 min 30 sec Try out open-domain question-answering AI at https://www.perplexity.ai/ your- self. Does it give accurate answers to everyday questions? Does it give accurate answers to academic questions? 2.5 Entity resolution Entity resolution can be challenging because similar records (e.g., of father and son) may refer to different entities, and different records (e.g., in different types of databases) may refer to the same person. The identification of two different records as referring to one entity sometimes requires a sequence of links extracted from multiple external sources. Financial organizations and public-sector organizations can use entity resolution to detect fraud, improve risk assessment, improve investigative outcomes, help ensure compliance, improve customer insights, and reduce false positives and false negatives. The company Senzing developed software using AI that is capable of performing real- time entity resolution. 20 Reference: Senzing. “Financial Services” and “Public Sector”. https://senzing.com/industries/financi al-services/ and https://senzing.com/industries/public-sector/. Last accessed: 18 Aug. 2024. Watch Jeff Jonas, founder and CEO of Senzing, explain how AI (or machine learning) helps entity resolution. 3 min 12 sec Video source: Senzing (@senzinginc). “How Senzing Uses Machine Learning for Entity Resolution”. YouTube, 2 Aug. 2023. https://youtu.be/x2viuGCcAks. 2.6 Translation The classical approach to machine translation is rule-based, i.e., based entirely on dictionaries and grammars. It requires a great amount of manual effort. Another approach is statistics-based: one picks out the most likely translation according to some sample data given. In 2016, Google Translate started using translation models based on neural networks, which give superior performance compared to statistics-based models. References: Quoc V. Le and Mike Schuster. “A Neural Network for Machine Translation, at Production Scale”. Google Research, 27 Sep. 2016. https://ai.googleblog.com/2016/09/a-neural-network-for-machi ne.html. Last accessed: 20 Jan. 2024. Yonghui Wu, et al. “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”. arXiv:1609.08144 [cs.CL], Sep./Oct. 2016. Here is an English translation of a Chinese poem by Ji Zhang by the AI-based machine translator at https://www.deepl.com/. Is the output acceptable, despite the fact that the input is probably not of an intended type? Compare this machine translation with the following human translation by Yuanchong Xu. 21 At moonset cry the crows, streaking the frosty sky; Dimly lit fishing boats ’neath maples sadly lie. Beyond the city wall, from Temple of Cold Hill Bells break the ship-borne roamer’s dream and midnight still. 2.7 Speech recognition AI Singapore, in collaboration with NUS and NTU, developed a speech recognition engine called the Speech Lab, which is able to recognize conversations comprising words from different languages, e.g., Singlish. Reference: AI Singapore. “Speech Lab”. https://aisingapore.org/aiproducts/speech-lab/. Last accessed: 18 Aug. 2024. See how well it works in the video below. 29 sec Video source: AI Singapore (@AISingapore). “Speech Lab Product Demo”. YouTube, 12 Nov. 2019. https://youtu.be/ZCqW7meCXFk. 2.8 Speech synthesis The WaveNet model, which was created by Google’s DeepMind in 2016, can generate realistic-sounding human-like voices that were better than what Google had from its other speech synthesis systems at that time. Speech can be synthesized to imitate a given person’s voice. While scammers can use this technology for impersonation, singers can use it to sing in a language they do not speak, and people who suffer from voice disorders can use it to recreate their voices. Check out how well this technology performs in the video below, where US congress- woman Jennifer Wexton delivers House floor speech using AI voice clone. 1 min 35 sec 22 References and video source: Aäron van den Oord and Sander Dieleman. “WaveNet: A generative model for raw audio”. DeepMind, 8 Sep. 2016. https://www.deepmind.com/blog/wavenet-a-generative-model-f or-raw-audio. Last accessed: 18 Aug. 2024. Lim Ruey Yan. “Irish boy band Westlife release Mandarin song with help of AI”. The Straits Times, 18 Jul. 2024. https://www.straitstimes.com/life/entertainm ent/irish-boy-band-westlife-release-mandarin-song-with-help-of-ai. Last accessed: 18 Aug. 2024. Associated Press (@AssociatedPress). “WATCH: Rep. Jennifer Wexton delivers House floor speech using AI voice clone”. YouTube. https://youtu.be/UDwamEdbZk8. 2.9 Natural language generation (NLG) In §1.2.3, we saw an application of NLG in writing job applications. Other use cases include: suggesting replies to emails and complaints, and collating audit findings. The complexity, the ambiguity, and the variety of expressions in human languages make NLG challenging. The most powerful NLG AIs nowadays are large language models (LLMs), in the sense that they are massive programs that contain language information extracted from mas- sive amounts of data. Currently, the most popular type of LLMs is the so-called Transformer models, which we will look at in more detail in §6.8. Reference: Multiple contributors. “Natural language generation”. Wikipedia. https://en.wikipedia.org/w iki/Natural language generation. Last accessed: 18 Aug. 2024. Compare the quality of different LLMs by following the steps below. 1. Go to the “Chatbot Arena: Benchmarking LLMs in the Wild” page at https://chat.lmsys.org/. 2. Click into the “Arena (side-by-side)” tab. 3. Select two LLMs you would like to compare from the dropdown menus. 4. Ask the chosen LLMs to generate some text to your liking by entering a prompt into the “Enter your prompt and press ENTER” box. (Do not enter personal or private information.) 5. Click the “Send” button. 6. Wait for the text to be generated in the boxes above. 7. Compare the quality of the results. 8. Try out different LLMs and different prompts. 9. Evaluate the results. How good are the LLMs in generating a coherent piece of text? How about sustaining a meaningful conversation? Answering factual questions? Rea- soning? 23 2.10 Further applications 2.10.1 Chatbots Chatbots are softwares that conduct written or spoken conversations in natural lan- guages. In Nov. 2022, OpenAI launched a free “preview” of its text chatbot called ChatGPT. ChatGPT can adapt to the style and the content of the prompt. This allows the user to generate realistic and coherent continuations about a topic of their choosing. ChatGPT attracted widespread public interest and showed great potential. In Jan. 2023, Microsoft extended its partnership with OpenAI through a multiyear, multibillion dol- 24 lar investment. Microsoft has since incorporated a new user interface called Copilot, which is based on natural languages and is powered by the GPT-family of LLMs, in its 365 apps and in its Windows operating system. As a direct response to ChatGPT, Google released Bard (now Gemini), Meta released LLaMA, Baidu released ERNIE bot, and Anthropic released Claude, all of which have ChatGPT-like capabilities, in different capacities. These chatbots do not only generate text, but also analyze sentiment, summarize text, translate text, etc. People have since found numerous applications of these chatbots, e.g., brainstorming, explaining complex topics, getting feedback or second opinion, polishing write-ups, and rehearsing for interviews. References: Murray Shanahan. “Talking About Large Language Models”. arXiv:2212.03551 [cs.CL], Dec. 2022/Feb. 2023. Microsoft Corporate Blogs. “Microsoft and OpenAI extend partnership”. Official Microsoft Blog, 23 Jan. 2023. https://blogs.microsoft.com/blog/2023/01/23/microsofta ndopenaiextendpartnership/. Last accessed: 18 Aug. 2024. Yusuf Mehdi. “Reinventing search with a new AI-powered Microsoft Bing and Edge, your copilot for the web”. Official Microsoft Blog, 7 Feb. 2023. https://blogs.microsoft.com/blog/2023/02/07/reinventing-search-with-a-new- ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/. Last accessed: 18 Aug. 2024. Jared Spataro. “Introducing Microsoft 365 Copilot – your copilot for work”. Official Microsoft Blog, 16 Mar. 2023. https://blogs.microsoft.com/blog/2023/03/16/introducing- microsof t-365-copilot-your-copilot-for-work/. Last accessed: 18 Aug. 2024. [No named author]. “Welcome to Copilot on Windows”. Microsoft Support. https://support.microsoft.com/en-us/wi ndows/welcome-to-copilot-on-windows-675708af-8c16-4675-afeb-85a5a476ccb0. Last accessed: 18 Aug. 2024. Examples of currently available virtual voice agents include the Google Assistant, Ap- ple’s Siri, and Amazon’s Alexa. Virtual voice agents work seemlessly with smart speakers, e.g., Google Nest (formerly known as Google Home) and Amazon Echo. Through voice agents, one can control lights and devices, play music and videos, get answers to questions, place orders,... Chatbots can also be used in shopping malls to provide concierge and navigation service and to create smarter digital signage. Voice chatbots, e.g., Google’s Duplex and Amazon Connect, can now carry out real- world tasks over the phone. Watch how well Duplex works in 2018 in the following demonstration. 1 min 55 sec Video source: ZEM502 (@ZEM502). “Google Duplex Demo (Google I/O 2018)”. YouTube, 11 May 2018. https://youtu.be/znNe4pMCsD4. Some chatbots can now be developed without code. Source: LivePerson. “Conversation Builder”. https://www.liveperson.com/products/conversation -builder/. Last accessed: 18 Aug. 2024. 25 2.10.2 Writing computer code Computer languages are languages. So some language models apply to them as well. For example, ChatGPT (discussed in §2.10.1) can generate and translate computer code. Watch how this can be exploited to build apps in a low-code manner through Debuild. 47 sec Video source: Sharif Shameem (@sharifshameem1227). “Debuild.co – Creating a Todo List”. YouTube, 3 Aug. 2020. https://youtu.be/WhPgZFsPLeE. 2.10.3 Producing music In addition to text-to-speech capabilities, WaveNet, discussed in §2.8 above, can also be used to synthesize other audio signals such as music. The webpage linked there contains some demonstrations of this kind towards the bottom. There are music “chatbots”. One example is A.I. Duet built by Yotam Mann and friends at Google. You can try it out at https://aiexperiments.withgoogle.com/a i-duet/view/. 2.10.4 Other examples grammar checkers natural language database query 2.11 Current challenges As we saw in the demonstration in §2.7, current voice agents are still not very good at voice recognition, especially when different languages and dialects are involved. Multi- lingual text is also a challenge. ASEAN languages lack corpus. Singlish has AI.SG corpus, but this is just a start and is not yet enough. 26 Currently, AI has no true understanding of language. For instance, this poses limits to question answering abilities: AI may not understand questions that have complex structure and may not find answers to slightly ambiguous questions. While AI can now retrieve information from the Internet/databases to answer open- domain questions, many chatbots still have rather limited scopes. Sentiment analysis is affected by many parts of a text, each may have different meanings and implications. Many NLP systems require additional training and cannot work out of the box. LLMs currently take lots of computer power to train and are slow. Current LLMs may hallucinate, i.e., they may produce confident responses that do not seem to be justified by the source data used to make the model. The result can be plausible sounding, but factually incorrect. The following is an example from the chatbot at https://deepai.com/chat. References: Multiple contributors. “Question answering”. Wikipedia. https://en.wikipedia.org/w iki/Question answering. Last accessed: 18 Aug. 2024. Kyle Wiggers. “Salesforce’s AI navigates Wikipedia to find answers to complex questions”. VentureBeat, 25 Feb. 2020. https://venturebeat. com/ai/salesforces- ai- navigates- wikipedia- to- find- answers- to- complex- questions/. Last accessed: 18 Aug. 2024. Multiple contributors. “Hallucination (artificial intelligence)”. Wikipedia. https: //en.wikipedia.org/wiki/Hallucination (artificial intelligence). Last accessed: 18 Aug. 2024. 27 2.12 Reflection We saw some current NLP capabilities of AI and how powerful/restricted they are. Will you start exploiting these capabilities in your work? If yes, then how? If no, then why? Can you think of some creative applications of these capabilities? With the rapid development of NLG technologies, can you imagine a scenario in the fu- ture where AI-written text becomes the norm and text composition by human becomes a niche handicraft? 28