Untitled Quiz
9 Questions
0 Views

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to lesson

Podcast

Play an AI-generated podcast conversation about this lesson

Questions and Answers

Quel est le principal avantage des fichiers ouverts par rapport aux fichiers propriétaires?

  • Les fichiers ouverts sont indépendants de l'application qui les a créés. (correct)
  • Les fichiers ouverts sont plus rapides à ouvrir et à lire.
  • Les fichiers ouverts sont plus faciles à créer et à modifier avec un éditeur de texte.
  • Les fichiers ouverts permettent de stocker plus de données que les fichiers propriétaires.
  • Les fichiers HTML sont des fichiers binaires créant des pages web statiques.

    False

    L'un des avantages de LATEX est d'être un langage WYSIWYG (What You See Is What You Get).

    False

    Quel langage de programmation est utilisé par Perl pour effectuer un matching d'expression régulière ?

    <p>=~ //</p> Signup and view all the answers

    Donnez un exemple d'expression régulière qui recherche une chaîne de caractères commençant par un 'M', suivie d'une voyelle, d'au moins une lettre, d'un espace et se terminant par au moins un chiffre.

    <p>^M[aeiouy]\w+\s+\d+$</p> Signup and view all the answers

    Quel est le rôle de l'instruction 'chop' en Perl ?

    <p>Supprimer le dernier caractère d'une chaîne.</p> Signup and view all the answers

    Expliquez la différence entre les instructions 'print' et 'return' en Perl.

    <p>L'instruction 'print' affiche une valeur à l'écran, tandis que 'return' renvoie une valeur à l'appelant de la fonction.</p> Signup and view all the answers

    La hiérarchie de Chomsk y classie les grammaires en 4 types: Type 0, Type 1, Type 2 et Type 3. Type 0 étant le plus complexe et contenant tous les types de grammaires et Type 3 étant le plus simple correspondant aux grammaires régulières.

    <p>True</p> Signup and view all the answers

    Associez chaque type de grammaire avec son niveau de complexité de reconnaissance.

    <p>Type 0 = Indécidable. Type 1 = Exponentielle. Type 2 = Cubique. Type 3 = Linéaire.</p> Signup and view all the answers

    Study Notes

    Introduction to Perl and Texts

    • Course: M1 ÉdNITL-LTTAC 2024-2025
    • Instructor: Fabien TORRE
    • Institution: Université de Lille

    Motivations and Context

    • Reasons for learning Perl:
      • Mastering data
      • Working with unstructured text
      • Exploring Big Data and open data
      • Discovering the hidden web and data journalism
      • Automatically creating text corpora from the web
      • Automating the creation of documents
      • Converting between different formats (text, HTML, LaTeX)

    Some Ideas (1)

    • Motivations:
      • Mastering data
      • Utilizing unstructured texts
      • Exploring big data, open data, etc.
      • Discovering hidden web and data journalism
      • Creating automatic corpora from the web
      • Automating document generation
      • Handling format changes (text, HTML, LaTeX)

    Some Ideas (2)

    • Examples of applications:
      • Text generation (prefixes/suffixes, conjugations, proper nouns, etc.)
      • Entity recognition and typing in text
      • Automatic text annotation
      • Discovering co-occurrences
      • Concordances
      • Anagram generators
      • Automatic text classification
      • Access to Medline, Wikileaks, Enron documents
      • Retrieving information from various sources

    Perl Overview (1)

    • Perl's strengths:
      • "Glue language"
      • Simple syntax for files
      • Easy handling of regular expressions
      • Useful for text processing
      • Turing complete

    Perl Overview (2)

    • Characteristics:
      • Natural language-like syntax
      • Multiple ways to express the same thing
      • Can generate poems or complete programs in one line
      • Semantics depend on context
      • No mandatory variable declaration
      • Default variables exist
    • Note: These characteristics are a contrast to typical algorithmic principles.

    Perl and Algorithmics (1)

    • Good practices:
      • use strict;
      • use warnings;
    • affiche_tableau subroutine example showcasing these good practices.

    Perl and Algorithmics (2)

    • Bad practices:
      • Example of poor subroutine affiche.
      • Highlights potential mistakes in Perl programming

    Work Environments

    • Required tools: Perl, console, and text editor
    • Installation methods:
      • Linux: Perl usually pre-installed.
      • MacOS: Perl might need installation.
      • Windows: Use Windows Subsystem for Linux, Strawberry Perl, or a virtual machine.
      • Online: Use a web-based interpreter.
    • Recommendation prioritize Linux for university machines (often dual-boot).

    Linux (Overview)

    • Linux topics:
      • Motivations and context
      • Free and open-source software, Linux distributions
      • Linux in practice
      • Open formats

    Linux (General Information)

    • Components of a Linux distribution:
      • Operating system
      • File system
      • Graphical environment
      • Applications (console, text editor, office suite, archivers/compressors, etc.)
      • Web browser, mail reader

    Linux (Distributions)

    • Examples of Linux distributions: Ubuntu, Mint, Debian, Mandriva, Gentoo, Fedora
    • Different graphical environments (Xfce, KDE, Gnome, Cinnamon)
    • Different strategies for software choices /updates

    Linux (File System)

    • Principles: Hierarchical structure of directories and files
    • Permissions: read (r), write (w), execute (x), for users, groups, and others.
    • Notations:
      • /: root directory
      • ~: user's home directory
      • ~user: other user's home directory
      • Current directory indicator
      • Parent directory indicator
      • The specified file/folder

    Linux (Syntax and Commands)

    • Command syntax: parameters, options, background execution, redirection, error channel (pipes), and running programs

    Basic Linux Commands (1/2)

    • Navigation and management:
      • cd: Change directory
      • ls: List directory contents
      • pwd: Show current directory
      • cp: Copy files/directories
      • mv: Move files/directories
      • rm: Remove files/directories
      • mkdir: Create a directory
    • Archiving and compression:
      • tar: Archive a directory
      • bzip2: Compress a file

    Basic Linux Commands (2/2)

    • Information about commands
      • history: Previous commands
      • man: Manual of a command
      • apropos: Command search
      • which: Command lookup
      • top: Active processes
    • Text editor/viewers
      • gnome-text-editor: Text editor (varies by distribution)
      • evince: PDF viewer
      • eog: Image viewer
    • Web access
      • wget: Retrieve files
      • firefox-esr: Web browser

    Text File Management

    • Basic text file utilities: counting lines, words, and characters (wc), searching within content (grep), extracting data (cut), sorting lines (sort), removing duplicate lines (uniq), viewing content page by page (more), displaying header/footer (head/tail), and differences between files (diff).

    Commands for Other Documents

    • Search and information: find, file
    • Document text extraction: pdftotext, tesseract
    • Conversion between formats: pandoc
    • PDF manipulation: pdfinfos, xournal, pdftk, pdfjam

    Terminal

    • Terminal shortcuts: clear or Ctrl+l: Clear the console.
    • Ctrl+ + and Ctrl+-: Change character size, tab: Completion of instructions and file names, Arrow keys (up/down): Access history, ! deb: Search and launch the last command starting with 'deb', alt+tab: Window switching.
    • No mouse needed

    Open Formats (Overview)

    • Overview of open formats
    • Discussion of closed formats, binary format, proprietary formats.
    • Advantages and disadvantages of open formats for automatic processing

    HTML Overview

    • HTML files are editable text files with tags (e.g., <html>, <body>, <h1>, <p>, <table>).- HTML has tags, attributes (e.g., align, size), nesting, and a tree-like structure. An example of HTML is provided.

    HTML plus Semantics Example

    • Example HTML with semantic structure highlighting the use of <head>, <link>, and semantic elements for improved organization.

    CSS

    • Styling of HTML with CSS (Cascading Style Sheets) is demonstrated, showing how CSS allows for formatting without affecting the basic HTML structure. The example includes style definitions for the <body>and <h1> elements.

    Markdown

    • Introduction to Markdown formatting.
    • Usage of different levels of headings and lists with examples.

    LaTeX

    • LaTeX overview, principles, formatting, example document.

    LaTeX Software

    • Description of LaTeX as a program for transforming LaTeX files into PDF files (pdflatex).
    • Highlights of LaTeX's features, reliability (rare bugs), quality for printing, and its historical development.

    Number Encodings

    • Numbers are represented in binary form (0s and 1s).
    • Principles similar to the decimal system (base 10) but limited by binary/machine constraints.
    • Encoding and decoding, and addition are discussed.

    Text Encodings

    • Character numbering and encoding of numbers.
    • Numbering of bits used for characters.
    • ASCII, ISO-Latin1, UTF-16, UTF-8 are mentioned as encoding standards.

    Perl (Language Basics)

    • Core Perl language components
      • Syntax elements
    • Control Structures
      • Functions and procedures
      • Tables/Arrays
      • Files (input/output)

    Syntax and Comments

    • Basic Perl syntax elements
      • use strict;
      • use warnings;
      • Handling accents with use utf8
      • Handling of standard output and error streams in utf8
      • Example of displaying text

    Minimal Perl Syntax

    • Basic rules for Perl programs
    • Statements, curly braces, comments (#)
    • Character strings (single quotes or double quotes)
    • Discussion of back slashes, and print statements, newline characters

    Interpreted Text in Perl

    • Concept of interpreted vs. non-interpreted character strings in Perl
    • Examples demonstrating string handling and output

    Variables in Perl

    • Data types (booleans, integers, reals, characters, strings).
    • Variables prefixed by $.
    • Variable assignment and equality tests.
    • Example demonstrating variable use.

    Perl Operators

    • Arithmetic operators (+, -, *, /).
    • Integer part (int), random number generation (rand).
    • String concatenation.
    • Logical operators (AND, OR, NOT).
    • Comparisons (numbers and strings).

    String Manipulation

    • String functions (e.g., length, substr).
    • Demonstrating extraction/manipulation of the parts of a string

    Control Structures in Perl

    • Conditional structures (if...else statements).
    • Iterative structures (loops).
      • while loop
    • for loop

    Functions in Perl

    • Defining and calling subroutines/functions.
    • Returning values.
    • Examples showing how to define and use simple functions.

    Arrays in Perl

    • Array characteristics.
    • Array creation and modification.
    • Using arrays with loops for and foreach in Perl.

    File Handling in Perl

    • Writing to files using open, print, close, and binmode for specific encodings.
    • Reading from files using open, a while loop for reading lines, and chop to remove newline characters.

    Perl's Advantages

    • Perl's ability to handle different text formats such as CSV, Markdown, HTML, XML (including TEI) and LATEX is highlighted.

    Regular Expressions Introduction

    • Introduction to regular expressions (regex) in the context of language theory.
    • Chomsky hierarchy, including regular languages.
    • Perl regex operators and functions are introduced.

    Regular Expressions in Perl

    • Basic regex concepts and notations for character classes, positions (start/end).
    • Quantifiers ?, *, and +.
    • Example using regex for string matching and extraction, showing how to find specific patterns in a string

    Regular Expression Operators in Perl

    • =~ (match operator), i (case-insensitive flag), and s/// (substitution operator).
    • g: global flag for multiple replacements, example demonstrating matching a particular pattern and capturing parts of the string.

    Operators in Perl

    • Splitting strings/data into pieces using the split function and explaining its usage.
    • Examples to handle the extraction of elements in strings with delimiter.

    Regular Expression Summary

    • Recap of regular expression capabilities and limitations.
    • Strengths and weaknesses for different kinds of linguistic tasks.
    • Overview of practical use and limitations.

    Studying That Suits You

    Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

    Quiz Team

    Related Documents

    More Like This

    Untitled Quiz
    6 questions

    Untitled Quiz

    AdoredHealing avatar
    AdoredHealing
    Untitled Quiz
    37 questions

    Untitled Quiz

    WellReceivedSquirrel7948 avatar
    WellReceivedSquirrel7948
    Untitled Quiz
    18 questions

    Untitled Quiz

    RighteousIguana avatar
    RighteousIguana
    Untitled Quiz
    50 questions

    Untitled Quiz

    JoyousSulfur avatar
    JoyousSulfur
    Use Quizgecko on...
    Browser
    Browser