Semi-Structured Data Overview

Choose a study mode

Play Quiz
Study Flashcards
Spaced Repetition
Chat to Lesson

Podcast

Play an AI-generated podcast conversation about this lesson
Download our mobile app to listen on the go
Get App

Questions and Answers

Que sont les données informatiques?

Les données sont des informations qui peuvent être interprétées et utilisées par les ordinateurs. Il s'agit d'un ensemble de faits, tels que des chiffres, des mots, des mesures, des observations ou même de simples descriptions de choses.

Selon la nature, combien de types de données existent-ils?

  • quatre
  • trois
  • un
  • deux (correct)

Quels sont les deux types de données selon leur nature?

Données textuelles et données multimédia.

Quels sont les trois types de données selon leur structure?

<p>Données structurées, données non-structurées et données semi-structurées.</p> Signup and view all the answers

Qu'est-ce qu'un document hypertextuelle?

<p>Il s'agit d'un document qui permet de passer d'une information à l'autre grâce à un système de renvois appelés Hyperliens, ou liens hypertextes.</p> Signup and view all the answers

La notion de l'hypertexte est forcement liées à la présence de l'internet.

<p>False (B)</p> Signup and view all the answers

Qu'est-ce qu'un document hypermédia?

<p>Un document est dit hypermédia s'il permet aux utilisateurs de se déplacer entre ses différentes sections ou vers d'autres documents.</p> Signup and view all the answers

Qu’est-ce qu’un système hypermédia?

<p>C'est un system destiné pour contenir et présenter des documents hypermédias aux utilisateurs. Il s'agit d'un system dans lequel le lecteur peut atteindre directement d'autres informations associées à ce qu'il lit.</p> Signup and view all the answers

Définir les données structurées

<p>Les données structurées sont des données qui utilisent un format prédéfini et attendu. Cela peut provenir de nombreuses sources différentes, mais le facteur commun est que les champs sont fixes.</p> Signup and view all the answers

Définir les données semi-structurées.

<p>Les données semi-structurées font référence à des données qui ne sont pas capturées ou formatées de manière conventionnelle. C'est-à-dire elles ne suivent pas le format d'un modèle de données tabulaire ou de bases de données relationnelles car elles n'ont pas de schéma fixe. En outre, la même entité peut avoir plusieurs structures dans la même source de données.</p> Signup and view all the answers

Quelles sont les nouvelles technologies qui ont vu le jour pour résoudre le problème des données semi-structurées?

<p>XML (eXtensible Markup Language), JSON (JavaScript Object Notation), CSV (Comma-separated Values), TSV (Tab-Separated Values), Format Parquet, YAML (Yet Another Markup Language).</p> Signup and view all the answers

Quel est l'objectif du cours?

<p>L'objectif de ce cours est de permettre à l'étudiant de se familiariser avec les données semi-structurées et les manipuler en utilisant la technologie XML.</p> Signup and view all the answers

Flashcards

What are computer data?

Information that can be interpreted and used by computers, including numbers, text, and observations.

What is text-based data?

Data that is written and stored in text format, including texts and numbers.

What is multimedia data?

Data that includes text, images, audio, and video, combining multiple formats.

What is structured data?

Data that is predefined and formatted with a precise structure before being stored.

Signup and view all the flashcards

What is unstructured data?

Data without a defined structure or schema, such as text files and social media posts.

Signup and view all the flashcards

What is semi-structured data?

Data that possesses a format but does not have a rigid structure, offering flexibility.

Signup and view all the flashcards

What is a hypertext document?

A document that allows users to navigate from one piece of information to another via hyperlinks.

Signup and view all the flashcards

What is a Hypertext system?

A system of documents linked by hyperlinks, allowing automatic navigation between related content.

Signup and view all the flashcards

What are Hypermedia Documents?

A document that allows users to navigate between sections or to other documents. (e.g. YouTube video)

Signup and view all the flashcards

What is a Hypermedia System?

A system designed to contain and present hypermedia documents, allowing direct access to related info.

Signup and view all the flashcards

What is structured data?

Data that uses a predefined format, often from various sources, with fixed fields.

Signup and view all the flashcards

Tabular structured data?

Data in table form with defined rows and columns that clearly indicate data attributes.

Signup and view all the flashcards

What are semi-structured data?

Data that isn't conventionally captured, lacking a fixed schema like relational databases.

Signup and view all the flashcards

Representation of Semi-Structured Data?

Data represented with a hierarchical model, schematized as a graph of nodes and leaves.

Signup and view all the flashcards

What happens during semi-structured data updates?

Where the data structure change as we change the data itself.

Signup and view all the flashcards

Benefits of semi-structured data?

Hierarchical, ease of scaling and easier to represent complex data.

Signup and view all the flashcards

Examples of semi-structured data formats?

XML, JSON, CSV, TSV, Parquet, and YAML.

Signup and view all the flashcards

What is the course objective?

To familiarize students with semi-structured data and manipulate it using XML.

Signup and view all the flashcards

Study Notes

  • The provided text covers the topic of semi-structured data
  • It is divided into multiple sections that cover the introduction to generalities and semi-structured data
  • The text is intended to provide a comprehensive overview of the subject

Computer Data

  • Computer data can be interpreted and used by computers
  • Data includes such facts as numbers, words, measurements, observations, or simple descriptions,
  • It can be in the form of numbers, texts, images, audio, or videos
  • Once collected and organized, data becomes the basis of the computer system

Data Classification by Nature

  • There are two types of Data classification based on nature: textual and multimedia
  • Text data is written and stored in text format, including text and numbers
  • Multimedia data groups different formats, containing text, image, audio, and video

Data Classification by Structure

  • There are three types of data structures: structured, unstructured, and semi-structured
  • Structured data is predefined and formatted with a precise structure before being placed on physical media
  • Unstructured data has no defined structure or schema; is used for reports, text files, comments, opinions on social networks, emails, etc
  • Semi-structured data has a format/structure; but is not fixed/rigid

Hypertext

  • Hypertext is a document that allows transition from one piece of information to another, using hyperlinks
  • Example: a webpage
  • A hypertext system contains documents linked together by hyperlinks
  • These hyperlinks automatically transition the user to another related document
  • Hypertext navigation is a non-linear consultation mode
  • Example: a web browser
  • Hypertext is not necessarily internet-based
  • Examples include local navigation for a web browser, or a PDF reader

Hypermedia

  • Hypermedia document allows users to navigate between different sections or to other documents
  • Example: YouTube video
  • Hypermedia systems are designed to contain and present hypermedia documents to users
  • The reader can directly access other related information
  • Examples include YouTube and Google Earth

Structured Data

  • Structured data uses a predefined and expected format
  • It may come from different sources, but common factor is that the fields are fixed

Structured Data Characteristics

  • Data is in table form with rows and columns clearly defining data attributes
  • A rigid structure is defined before data is populated
  • Data of the same attribute (column) are of the same type
  • It is easy to search, process, and analyze

Database Reminder

  • Review of the basis of databases, including definitions of base de données (BDD(R)), système de gestion de base de données (SGBD (R))
  • Covers data modeling of an information system (SI) using E/A or UML, relational models, schemas, primary and foreign keys, normalization
  • Physical level includes tables, indexes, and SQL

Semi-Structured Data

  • Semi-structured data isn't captured or formatted in a conventional way; that is, data isn't following a tabular model or a relational database because the dataset doesn't have a fixed schema
  • The same entry can have multiple structures from the same data source

Semi-Structured Data Representation

  • Semi-structured data is represented with the hierarchical model
  • Leaves of the model represent the data
  • Nodes and links represent the data structure.

Structural Comparisons of Data

  • Relational models are used data structures, using rigid schemas defined before data loading
  • Updates do not affect the structure of the data, is easy to find
  • Data is represented as a flat table, making it difficult to manage missing, multi-valued, or multi-order attributes
  • Scaling up a structured BDD schema is very difficult
  • Hierarchical models are used in semi-structured data, using flexible and extensible schemas defined implicitly in data
  • Updates cause a change in the data structure, with the updates being complicated
  • Data is easy to represent complicated, and easy to manage the various attributed values
  • Scaling is easy compared to structured data.

Semi-Structured Data Problems

  • The web is an important source of semi-structured data
  • Its importance and volume have grown with internet development
  • One needs to store and manipulate new client data, such as navigation history, cookies, and sessions
  • There is a need for adequate tools that can manipulate semi-structured data
  • Using SGBD poses problems - as data is completely different from structured data
  • Tools and method are needed that will efficiently manage this type of data

Solutions for Semi-Structured Data

  • To solve the problem, new technologies include XML (eXtensible Markup Language), JSON (JavaScript Object Notation), CSV (Comma-separated Values), TSV (Tab-Separated Values), the Parquet format, and YAML (Yet Another Markup Language)

Examples of XML and JSON

  • Text shows sample representation of the XML and JSON example dataset.

Course Objectives

  • The objective of the course is to familiarize students with semi-structured data
  • It teaches them to manipulate such data using XML technology
  • The module will be organized into four chapters, covering generalities, the XML core, XML galaxies, and XML with BDD

Studying That Suits You

Use AI to generate personalized quizzes and flashcards to suit your learning preferences.

Quiz Team

Related Documents

More Like This

[04/Kollidam/05]
9 questions

[04/Kollidam/05]

InestimableRhodolite avatar
InestimableRhodolite
Class XI Attendance Data Analysis
40 questions
Data Management Flashcards
10 questions
Use Quizgecko on...
Browser
Browser