Data Management and Storage Lecture Notes PDF
Document Details
Universidade de Lisboa
2023
Tags
Summary
These lecture notes cover data management and storage topics, including ontologies, relational data models, and entity-relationship models. The notes are from 2023.
Full Transcript
Data Management and Storage Lesson 04 - Ontologies. Relational data model. Entity-relationship model Relational model Topics for today - Ontologies - Entity-Relationship - Relational model Data Management and Storage - 2022/2023 - Lesson 04 - 2 Diversit...
Data Management and Storage Lesson 04 - Ontologies. Relational data model. Entity-relationship model Relational model Topics for today - Ontologies - Entity-Relationship - Relational model Data Management and Storage - 2022/2023 - Lesson 04 - 2 Diversity is good, but... Photo by Louis Reed on Unsplash … requires an effort to link up! Data Management and Storage - 2022/2023 - Lesson 04 - 3 Hierarchies DIKW wisdom wisdom know know ledge ledge information information data data Domain 1 Domain 2 Data Management and Storage - 2022/2023 - Lesson 04 - 4 Linked data Data Management and Storage - 2022/2023 - Lesson 04 - 5 Semantic web (linked data) The web of today - web of documents The web of tomorrow - web of data “The ultimate goal of the Web of data is to enable computers to do more useful work and Photo by Adrien Converse on Unsplash to develop systems that can support trusted interactions over the network” (W3C) ➔ automatic search-response ➔ automatic use of data Data Management and Storage - 2022/2023 - Lesson 04 - 6 Semantic web (linked data) To be able to link data, we need: A framework to express the data ==> RDF (Resource Description Framework), XML (XSD), JSONLD Identify objects globally ==> GUID (Global Object Photo by Joshua Sortino on Unsplash Identifier) Give meaning to data ==> Ontologies Data Management and Storage - 2022/2023 - Lesson 04 - 7 Ontology In computer science and information science, an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains of discourse. The ontology typically includes the following components: color individuals classes purple attributes violet relations rgb(153, 0, 204) https://en.wikipedia.org/wiki/Ontology_(information_science) Data Management and Storage - 2022/2023 - Lesson 04 - 8 Stand Genomic Sci. 2014; 9: 17. Published online 2014 Dec 8. doi: 10.1186/1944-3277-9-17 Example of linked data Data Management and Storage - 2022/2023 - Lesson 04 - 9 Data standard Darwin Core https://dwc.tdwg.org/terms/ Darwin Core https://dwc.tdwg.org/terms/ E RM T T C EP C ON VO CA BU LA RY Agrovoc - Linked Open Data set on agriculture https://agrovoc.fao.org/browse/agrovoc/en/ Data Management and Storage - 2022/2023 - Lesson 04 - 13 Database development Reality (Universe or Discourse) Requirements Conceptual model (e.g. Entity-Relationship ) Design Logical model (e.g. Relational ) Data conversion and loading Implementation Data Management and Storage - 2022/2023 - Lesson 04 - 14 Entity - Relationship model Entity Relationship Attribute Name Name MUNICIPALI INCLU FREGUESIA EXISTS CROP TY DES Name of Area Type Mayor Data Management and Storage - 2022/2023 - Lesson 04 - 15 Relational model Relational database - set of relations Table name FARM Attribute A relation is defined by: F_ID Name Freguesia - one schema: name of the relation, name and datatype of each attribute, the primary key,... - one table Key attribute Tables - operational unit of the relation, containing data values. Tables are simple and intuitive. Data Management and Storage - 2022/2023 - Lesson 04 - 16 Relation Relations represent objects and/or relationships between objects - Objects sharing commons characteristics are represented using one given relation - every relation represents objects with a common object type - Objects with different characteristics are represented using different relations Data Management and Storage - 2022/2023 - Lesson 04 - 17 Table Table with data One table is a set of attribute values organised/structured into rows and columns Attribute - every object is described by the values FARM in one row F_ID Name Freguesia - every attribute corresponds to a F12 Quinta Grande Caniço column F9 Monte Amarelo Cercal do Alentejo - attributes are also called fields F229 Quinta Verde Espinho or properties F21 Monte Branco Caniço - every value in a table describes a given characteristic of a particular object - the values of one attribute must Record (row Data value belong to the same domain or tuple) Data Management and Storage - 2022/2023 - Lesson 04 - 18 Table Identification key - one attribute or a minimal combination of attributes whose values uniquely identify the records (called rows or Table with data tuples) within the table Attribute Two important properties: FARM Uniqueness: Each key value uniquely F_ID Name Freguesia identifies one record within the table, i.e., F12 Quinta Grande Caniço different tuples must not have identical F9 Monte Amarelo Cercal do Alentejo keys. F229 Quinta Verde Espinho F21 Monte Branco Caniço Minimality: If the key is a combination of attributes, this combination must be minimal, i.e., no attribute can be removed Record (row Data value or tuple) from the combination without eliminating the unique identification. Data Management and Storage - 2022/2023 - Lesson 04 - 19 Attributes - An attribute represents a characteristic of an object - An attribute Ai takes values in a set Di, the attribute domain - the domain specifies the set of values that the attribute can take - the attribute domain is defined by its data type - An attribute value describes one characteristic of a particular object Data Management and Storage - 2022/2023 - Lesson 04 - 20 Data types Each attribute is set to be of a specific data type String (all characters) Numeric Date and time CHAR TINYINT DATE VARCHAR BOOLEAN DATETIME BINARY INT TIMESTAMP VARBINARY INTEGER TIME TINYBLOB BIGINT YEAR TINYTEXT FLOAT TEXT DOUBLE BLOB DECIMAL … … The extended list can be checked at https://www.w3schools.com/sql/sql_datatypes.asp Data Management and Storage - 2022/2023 - Lesson 04 - 21 Primary key Primary key - minimum subset of attributes where each value is unique and identifies a single object (or row) of the relation (or table) - Sometimes natural attributes are good primary keys - often, an artificial attribute is added to the objects in order to identify each object uniquely - this kind of keys have no intrinsic meaning but they are useful to uniquely identify every instance of the relation (e.g., in a table of data about students at a school, they might all be assigned a student identification number) - one table may have several candidate keys but one and only one primary key must be defined Data Management and Storage - 2022/2023 - Lesson 04 - 22 Primary keys, foreign keys Example with NUTS - Nomenclature of Territorial Units for Statistics ID Country NUTS1 NUTS2 NUTS3 1 Portugal Continente Norte Alto Minho 2 Portugal Continente Norte Alto Minho 3 Portugal Continente Norte Alto Minho 4 Portugal Continente Centro Oeste 5 Portugal Continente Centro Oeste 6 Portugal Continente Centro Oeste 7 Portugal Continente Alentejo Alentejo Litoral 8 Portugal Continente Alentejo Alentejo Litoral 9 Portugal Continente Alentejo Alentejo Litoral Região Autónoma Região Autónoma Região Autónoma 10 Portugal dos Açores dos Açores dos Açores Data Management and Storage - 2022/2023 - Lesson 04 - 23 Primary keys, foreign keys Example with NUTS - Nomenclature of Territorial Units for Statistics NUTS MUNICIPALITIES NUTS3 ID Country NUTS1 NUTS2 NUTS3 MUN_ID Name NUTS3_ID NT3_ID Name 1 Portugal Continente Norte Alto Minho 1 Ponte de Lima 1 1 Alto Minho 2 Portugal Continente Norte Alto Minho 2 Valença 1 2 Oeste 3 Portugal Continente Norte Alto Minho 3 Viana do Castelo 1 3 Alentejo Litoral 4 Portugal Continente Centro Oeste 4 Caldas da Rainha 2 Região Autónoma dos 5 Portugal Continente Centro Oeste 5 Nazaré 2 4 Açores 6 Portugal Continente Centro Oeste 6 Óbidos 2 7 Portugal Continente Alentejo Alentejo Litoral 7 Odemira 3 8 Portugal Continente Alentejo Alentejo Litoral 8 Alcácer do Sal 3 9 Portugal Continente Alentejo Alentejo Litoral 9 Grândola 3 Região Região Região 10 Lagoa 4 Autónoma dos Autónoma dos Autónoma dos 10 Portugal Açores Açores Açores Data Management and Storage - 2022/2023 - Lesson 04 - 24 Primary keys, foreign keys Example with NUTS - Nomenclature of Territorial Units for Statistics NUTS MUNICIPALITIES NUTS3 ID Country NUTS1 NUTS2 NUTS3 MUN_ID Name NUTS3_ID NT3_ID Name 1 Portugal Continente Norte Alto Minho 1 Ponte de Lima 1 1 Alto Minho 2 Portugal Continente Norte Alto Minho 2 Valença 1 2 Oeste 3 Portugal Continente Norte Alto Minho 3 Viana do Castelo 1 3 Alentejo Litoral 4 Portugal Continente Centro Oeste 4 Caldas da Rainha 2 Região Autónoma dos 5 Portugal Continente Centro Oeste 5 Nazaré 2 4 Açores 6 Portugal Continente Centro Oeste 6 Óbidos 2 Primary key 7 Portugal Continente Alentejo Alentejo Litoral 7 Odemira 3 8 Portugal Continente Alentejo Alentejo Litoral 8 Alcácer do Sal 3 9 Portugal Continente Alentejo Alentejo Litoral 9 Grândola 3 Região Região Região 10 Lagoa 4 Autónoma dos Autónoma dos Autónoma dos 10 Portugal Açores Açores Açores Primary key Data Management and Storage - 2022/2023 - Lesson 04 - 25 Primary keys, foreign keys A foreign key is an attribute (or a set of attributes) in a relation that matches the primary key of another relation NUTS MUNICIPALITIES NUTS3 ID Country NUTS1 NUTS2 NUTS3 MUN_ID Name NUTS3_ID NT3_ID Name 1 Portugal Continente Norte Alto Minho 1 Ponte de Lima 1 1 Alto Minho 2 Portugal Continente Norte Alto Minho 2 Valença 1 2 Oeste 3 Portugal Continente Norte Alto Minho 3 Viana do Castelo 1 3 Alentejo Litoral 4 Portugal Continente Centro Oeste 4 Caldas da Rainha 2 Região Autónoma dos 5 Portugal Continente Centro Oeste 5 Nazaré 2 4 Açores 6 Portugal Continente Centro Oeste 6 Óbidos 2 Primary key 7 Portugal Continente Alentejo Alentejo Litoral 7 Odemira 3 8 Portugal Continente Alentejo Alentejo Litoral 8 Alcácer do Sal 3 9 Portugal Continente Alentejo Alentejo Litoral 9 Grândola 3 Região Região Região 10 Lagoa 4 Autónoma dos Autónoma dos Autónoma dos 10 Portugal Açores Açores Açores Primary key Foreign key Data Management and Storage - 2022/2023 - Lesson 04 - 26 Entity-Relationships Diagrams Diagrams - there are several notations: Arrow, Baker, Represent ER Chen, Crow. IDEF1X. These try to represent entities, types of relations, keys (primary, foreign), constraints. Text Notation Here an example of IDEF1X made wti DBeaver. MUNICIPALITIES ( MUN_ID, Name, NUTS3_ID) NUTS3 (NT3_ID, Name) name of relation outside the parentheses parentheses contain the attributes primary key - underline foreign(s) key(s) - dashed underline Data Management and Storage - 2022/2023 - Lesson 04 - 27 Tables - wrap up Table definition - each table or relation is a set of tuples - Has an unique table name - All attribute names are unique within the table and identify a specific column - Any number of attributes, and without any specific order - no column order - Any number of rows, without any specific order - no row order - One attribute or combination of attributes identifies a row within the table - identification key Relational model - represents both data and relationships between data as tables. A table is an unordered set of rows or tuples. Data Management and Storage - 2022/2023 - Lesson 04 - 28