Multi-databases & Federated Databases PDF

Document Details

BrighterPersonification6672

Uploaded by BrighterPersonification6672

University of Malta

Prof Joseph G Vella

Tags

databases database management systems multidatabases computer science

Summary

This document is a presentation on multi-databases and federated databases. It discusses various concepts, including databases, DBMS, and data models, providing a comprehensive learning resource.

Full Transcript

Prof Joseph G Vella ©, CIS, FICT, UM Multi-databases & Federated Databases...

Prof Joseph G Vella ©, CIS, FICT, UM Multi-databases & Federated Databases Prof. Joseph G. Vella Dept. of Computer Information Systems 1 Databases & DBMS  A database is a collection of consistent & structured facts from a domain of discourse that are accessible by a number of end users.  The facts on the domain of discourse are of two types;  intensional - schema facts  extensional - data  The database must remain consistent when changing from one state instance to another;  e.g. during transactional updates.  A DBMS is a generic software system that supports and maintains databases.  Some of its important components include:  storage management;  query processing; and  transaction management. Slide - 2 DM & DWH – Multi & Federated DBs J Vella 2 MultiDatabases 1 Prof Joseph G Vella ©, CIS, FICT, UM DBMS and other thingies  A DBMS uses the services of the operating systems and networking facilities.  A DBMS uses a number of specialised artefacts:  hardware  RAID units  Cache memory  standards  Connectivity  GUI Slide - 3 DM & DWH – Multi & Federated DBs J Vella 3 MULTI DATABASE -- GES – Global External Schema, GCS – Global Conceptual Schema In each COMPONENT/LOCAL DB LES –External Schema, LCS –Conceptual Schema, LIS - Internal Schema Slide - 4 DM & DWH – Multi & Federated DBs J Vella 4 MultiDatabases 2 Prof Joseph G Vella ©, CIS, FICT, UM Classifying Databases - Data Models  Each database has a schema which is expressed with a data model.  There are a number of data models: for example  the relational  a collection of tables;  the network  a collection of graphs;  the hierarchic  a collection of trees; and  the object-oriented  pragmatic mix modelling (structure & behaviour) capabilities!  The semi-structure & NoSQL models  Hotpotch of keywork-value pair, graphical, tree structures.  The data model affects the level of detail we can model a domain of discourse.  The more detail we can express with the data model language the less programming we need to insert into the front ends. Slide - 5 DM & DWH – Multi & Federated DBs J Vella 5 Classifying Databases - Users  Each database instance can have a varying number of end users that access it.  If a number of users access the database at the same “instant” we say we have a multi-user system.  Otherwise we have a single user system – long dead.  Managing multi-user system implies a need for sophisticated sharing protocols.  The transaction model is based on the ACID principle:  Atomicity of transaction  Consistency preservation of database instance  Isolation from other transactions  Durability of committed transactions Slide - 6 DM & DWH – Multi & Federated DBs J Vella 6 MultiDatabases 3 Prof Joseph G Vella ©, CIS, FICT, UM Classifying Databases - Spread  Is the database instance available to one “site” or spread over many sites and connected by a communication network?  No - centralised system  Yes - distributed databases  A number of themes and variations exists for distributed databases!  Does each site have the same DBMS?  Yes - homogenous solution  No - heterogeneous solution  Does each site have the same schema?  Yes - tightly coupled distributed system (i.e. traditional);  No - loosely coupled system (MultiDatabases or federated systems - Hammer & MacLeod, 1979).  Does each site use the “same” data model to express the local sites’ schema? Slide - 7 DM & DWH – Multi & Federated DBs J Vella 7 Taxonomy of Multidatabase & Federated DBs MULTI- Autonomous DATABASE control over a DB? Systems Non-Federated Federated Independent vs Database Database Central/Common Systems Systems schema Single or Loosely Tightly Multiple Schemas Coupled Coupled Single Multiple Federation Federation DM & DWH – Multi & Federated DBs 8 MultiDatabases 4 Prof Joseph G Vella ©, CIS, FICT, UM Multidatabase: an example set-up Multidatabase/ Global Global Data Global DBMS Transaction Dictionary Global Transaction Manager Global Access Local Layer Transaction Global Global Subtransaction 1 Subtransaction 2 Local DBMS1 Local DBMSn Local Local Access Access Layer Layer Local Local Transaction Transaction Manager Manager... Local Local Database 1 Databasen DM & DWH – Multi & Federated DBs 9 Classifying Databases - Costs   multi-user DBMS - euro 2,000 - euro 100,000  And or plus user licences!  And or plus maintenance agreement! Slide - 10 DM & DWH – Multi & Federated DBs J Vella 10 MultiDatabases 5 Prof Joseph G Vella ©, CIS, FICT, UM Classifying Databases - Scoping  Commercial  Scientific Slide - 11 DM & DWH – Multi & Federated DBs J Vella 11 Classifying Databases - Mode  Batch  OLTP  OLAP Slide - 12 DM & DWH – Multi & Federated DBs J Vella 12 MultiDatabases 6 Prof Joseph G Vella ©, CIS, FICT, UM Homogenous & Heterogeneous  Distributed over a network  one logical database  one data model  one query model  n physical databases  Heterogeneous over a network (federated)  n logical databases  p data models  q query models  m physical databases Slide - 13 DM & DWH – Multi & Federated DBs J Vella 13 Homogenous vs Heterogeneous  DDBs vs HDBs  functional independence of the parts  DDBs -> none  HDBs -> total  modalities  DDBs support  QP, QO, SM & TM  HDBs support  transaction monitors Slide - 14 DM & DWH – Multi & Federated DBs J Vella 14 MultiDatabases 7 Prof Joseph G Vella ©, CIS, FICT, UM Heterogeneous DBs  We need standards - why?  But can standards solve all our problems?  Why do we need heterogeneity  it is natural;  political  decentralisation  acquisitions, merges, divisions and cross entity collaboration.  The research in  70s : DBMSi to DBMSj mappings for integration at communication and data exchange;  80s : operational issues started to get tackled; such as  schema integration  90s : transaction processing issues  00s : scalability and openness  10s : variety in data sources and transaction models  20s : Reducing the time lapse between data appearing in the OLTP and it being available in the OLAP Slide - 15 DM & DWH – Multi & Federated DBs J Vella 15 MDBs: Scheme of things HDBs CDBs DBs DBMS QM DM TP Slide - 16 DM & DWH – Multi & Federated DBs J Vella 16 MultiDatabases 8 Prof Joseph G Vella ©, CIS, FICT, UM MDBs: Scheme of things - notes  The key characteristic of HDBs is co-operation amongst the participants.  The HDBs is controlled by the HDBMS - of course!?  The HDBMS provides controlled and co-ordinated manipulation of the CDBs.  CDBS accept two ontologically different operations:  logical (and local) aspect;  controlled by a local DBMS  federal aspect. Slide - 17 DM & DWH – Multi & Federated DBs J Vella 17 But!?  Many DBMS are:  available for many platforms;  have transaction monitors to major & competitor’s DBMSs;  can scale up and down relatively easy. Hemmm!?!?!?!?!?! Yes; DBMS are great for data models, systems level support but weak in semantic heterogeneity! (for example when there is disparate meaning to an attribute name). Slide - 18 DM & DWH – Multi & Federated DBs J Vella 18 MultiDatabases 9 Prof Joseph G Vella ©, CIS, FICT, UM HDBs Characteristics Distribution Heterogeneity Autonomy Slide - 19 DM & DWH – Multi & Federated DBs J Vella 19 1. Distribution  Computer platforms  single  composite “site” with communications interconnections  LAN, or WAN  Data placement policies; for example:  vertical vs horizontal partitioning;  single vs multiple copies.  Variations of the above two points result in benefits along these lines:  higher data availability;  higher reliability; and  improved access time.  BUT issues of sharing/synchronisation of values. Slide - 20 DM & DWH – Multi & Federated DBs J Vella 20 MultiDatabases 10 Prof Joseph G Vella ©, CIS, FICT, UM 2. Heterogeneity  Semantic Heterogeinty occurs when there is a disagreement about meaning, interpretation, or intended use of the same or related data.  For example:  Total time to run an experiment is found in a relation experiment of database micro_bio_lab;  Total time required to finish an experiment is found in a relation run of database pathology.  But are these two “times” really comparable!?  Detecting semantic heterogeneity is a difficult problem. Slide - 21 DM & DWH – Multi & Federated DBs J Vella 21 3. Autonomy  The institutions that manage the individual CDBs are often autonomous.  What types of autonomy are we after:  design;  communications with other CDBs;  execution of local transactions w/o the interference of the external transactions.  The need to maintain the autonomy of CDBs and the need to share data is an often conflicting requirements. Slide - 22 DM & DWH – Multi & Federated DBs J Vella 22 MultiDatabases 11 Prof Joseph G Vella ©, CIS, FICT, UM Reference Architecture for MDS (a)... consists of various system components capable of describing a MDBs. Data  Facts and relationships in the DBs. Database  A database!? Commands  End user or application generated requests for specific actions. Processors  Software units that run the commands over the data (e.g. stored procedures). Schemas  A schema!?  Functions that convert schema entities Mappings from one CDB schema to another. Slide - 23 DM & DWH – Multi & Federated DBs J Vella 23 Processor Classification (i)  Transformer  translates commands from one language to another, or translates data from one format to another.  E.g. Relational tuple to JSON, XML doc to JSON.  E.g. SQL SPJ query into a sequence of LINQ queries, complex CTE SQL query into a procedural program with basic SPJ queries.  Transformers enable a type of data model independence.  Remember ERMs!  Transformers enable a type of query model independence.  Remember Predicate Calculus!  Filtering  constrain the commands and associated data that can be passed to another processor.  Each filtering has a set of mappings that describe the constraints on commands and data.  E.g. syntactic verification, checking i.c., access rights.  Address the “view update” problem (at local schemas). Slide - 24 DM & DWH – Multi & Federated DBs J Vella 24 MultiDatabases 12 Prof Joseph G Vella ©, CIS, FICT, UM Processor Classification (ii)  Constructing  classify and replicate an operation submitted by a single processor into operations that are accepted by other processors.  A number of tasks can be given to constructor processors:  schema integration;  negotiation;  query decomposition;  global transaction management.  (more later!?)  Accessing  accepts commands and produces data by running these on local databases.  Has to deal with local concurrency issues, etc Slide - 25 DM & DWH – Multi & Federated DBs J Vella 25 Reference Architecture for MDS (b)  Schema Types (some revision)  Remember the three level schema architecture for databases (ANSI/X3/SPARC).  Internal schema;  Conceptual schema;  External schema.  Rational behind this was to achieve a high level of logical and physical data independence. Slide - 26 DM & DWH – Multi & Federated DBs J Vella 26 MultiDatabases 13 Prof Joseph G Vella ©, CIS, FICT, UM ANSI/X3/SPARC with the MDS reference architecture External Sch. 1 External Sch. 2 External Sch. n Filtering Proc. 1 Filtering Proc. 2 Filtering Proc. n Conceptual Sch Trans. Processor Internal Sch Assessing Proc Slide - 27 DM & DWH – Multi & Federated DBs J Vella 27 Reference Architecture for MDS (b) A 5 level schema architecture  The 3 level architecture does not support:  distribution, heterogeneity and autonomy.  We need something else!?  The five level architecture includes:  Local Schema Level;  Component Schema Level;  Export Schema Level;  Federated Schema Level; and finally  External Schema Level. Slide - 28 DM & DWH – Multi & Federated DBs J Vella 28 MultiDatabases 14 Prof Joseph G Vella ©, CIS, FICT, UM Federated Database Systems: Five Level Data Architecture Slide - 29 DM & DWH – Multi & Federated DBs J Vella 29 Local Schema  This is the conceptual schema of the CDB.  It is expressed with the local DBMSs data model.  Remember we can have different local schemas expressed with different data models! Slide - 30 DM & DWH – Multi & Federated DBs J Vella 30 MultiDatabases 15 Prof Joseph G Vella ©, CIS, FICT, UM Component Schema  The local schema representation with a MDS canonical data model.  We transform the CDB local schema into the federation’s data modelling language.  Rational:  describe divergent local schemas using one representation;  missing semantics (in a CDB) could be built at this level.  Heterogeneity is thus supported. Slide - 31 DM & DWH – Multi & Federated DBs J Vella 31 Export Schema  Is a subset of the component schema that is available to the federation.  The place where we place access control information.  Autonomy can we supported if we augment filtering processors. Slide - 32 DM & DWH – Multi & Federated DBs J Vella 32 MultiDatabases 16 Prof Joseph G Vella ©, CIS, FICT, UM Federated Schema  It is the integration of multiple export schemas. Slide - 33 DM & DWH – Multi & Federated DBs J Vella 33 External Schema  Is a definition of the MDBs for a user or application.  We need external schemas for:  customisation;  additional i.c.;  access control. Slide - 34 DM & DWH – Multi & Federated DBs J Vella 34 MultiDatabases 17 Prof Joseph G Vella ©, CIS, FICT, UM MDBs Design  Complex!!!!  Have to deal with structure and semantics from many different perspective and yet factor out the FDBs.  Cautionary Hint:  The 5 level architecture could be riddled with redundant information on the federation.  For example, between:  external & federated schemas;  external@CDB and export@CDB;  etc. Slide - 35 DM & DWH – Multi & Federated DBs J Vella 35 Introducing MDBs (or evolving toward a MDBs).  Two main approaches:  radical - introduce a DDBMS;  in reality this entails:  changing and / or disrupting current systems;  probably requires strategic planning at an orginasation level;  abandoning some of the trusted functionality of the previous DBMS.  wraparound - build around the local DBs (I.e. CBDs) to create federal functionality  this follows an evolutionary path with less dramatic changes. It is usually carried out by undergoing the following phases:  preintegration;  development; and  operations. Slide - 36 DM & DWH – Multi & Federated DBs J Vella 36 MultiDatabases 18 Prof Joseph G Vella ©, CIS, FICT, UM Methodology for MDBs development  Both bottom-up and top-down methods exists.  But then again methodologies are like forest mushrooms in autumn! Slide - 37 DM & DWH – Multi & Federated DBs J Vella 37 A Systems Development Task Schema Integration  There are a number of “unique” development activities associated with MDBs development. For example:  Schema translation;  Access control;  Negotiation; and  Schema integration. Slide - 38 DM & DWH – Multi & Federated DBs J Vella 38 MultiDatabases 19 Prof Joseph G Vella ©, CIS, FICT, UM Schema Integration (i)  Batini et al (1986) discusses and compares many methodologies for schema integration. They use the following steps:  preintegration translate a schema into a CDB schema global naming and i.c. established  comparison 1. analyses and compares the schema objects to be integrated 2. specify the relationships between objects  conformation;  merging;  restructuring. Slide - 39 DM & DWH – Multi & Federated DBs J Vella 39 Schema Integration (ii)  Since the CDBs operate independently, the CDBs may include structural and representational conflicts.  These conflicts must be homogenised so that MDB end users can access the underlying CDBs.  This is a crucial problem to solve!  Let us assume that the canonical data model is the relational one (in reality the rdm is not really a good CDM!).  See the example schemas from a number of libraries. Note:  there are four independent libraries;  look at the naming of library objects - item X 2, items, books.  look at the number of tables that make up the publisher entity.  look at the number of attributes some “same” tables.  look at the date data format. Slide - 40 DM & DWH – Multi & Federated DBs J Vella 40 MultiDatabases 20 Prof Joseph G Vella ©, CIS, FICT, UM Schema Integration (iii) - Conflicts  Schema conflicts result from the use of different schema definitions in different CDBs.  Data conflicts are due to inconsistent data in the absence of schema conflicts. The following two slides give a categorisation of the two generic types of conflicts. Slide - 41 DM & DWH – Multi & Federated DBs J Vella 41 Schema Integration (iv) - Schema Conflicts  Table vs attribute  Table vs Table  1-1 name conflicts  table name  table structure  table i.c. & triggers  M-N table conflicts  Attribute vs Attribute  1-1 name conflicts  attribute name  default value  attribute level i.c. & triggers  M-N attribute conflicts Slide - 42 DM & DWH – Multi & Federated DBs J Vella 42 MultiDatabases 21 Prof Joseph G Vella ©, CIS, FICT, UM Schema Integration (iv) - Data Conflicts  Wrong data:  incorrect entry data;  obsolete data.  Different representation for the same data (or same representation for different data):  different expression;  different units;  different precision. Slide - 43 DM & DWH – Multi & Federated DBs J Vella 43 MDBs Operation: Global QP (i) Blunt fact: there are few general purpose query optimisation options in a loose federations! On the other hand, in the tight variants (i.e. DDBs) one has a number of interesting Q.P. options.  QP in MDBs involves converting a query over the federated external query into several queries against the CDBs export schemas.  Each of these dispatched queries need to be computed on the CDBs.  There are a number of similarities between MDBs & DDBs QP. Slide - 44 DM & DWH – Multi & Federated DBs J Vella 44 MultiDatabases 22 Prof Joseph G Vella ©, CIS, FICT, UM Slide - 45 DM & DWH – Multi & Federated DBs J Vella 45 MDBs Operation: Global QP (ii)  Additional problems, that are more evident in MDBs, include:  cost of queries (spawn by a federal query) are significantly different from each other;  factors: local loading, comms, priority.  a CDBMS might not be able to do any QO!  most MDBMS have no record of the QP primitives at each component! Slide - 46 DM & DWH – Multi & Federated DBs J Vella 46 MultiDatabases 23 Prof Joseph G Vella ©, CIS, FICT, UM MDBs Operation: Global TP  The Global Transaction Manager is responsible for maintaining database consistency while allowing concurrent updates across multiple databases.  Two types of transaction:  global transactions submitted to the MDBMS;  local transactions submitted to the CDBMS.  In a nutshell the basic problem is:  the GTM does not know about the local TM autonomous doings (i.e. over each CDB)!  With current algorithms it is difficult to identify whether the execution and serialisation order is different at any component site.  Basic solutions (of pragmatic nature include):  unsynchronised retrieval;  off-line updates;  new transaction model. Slide - 47 DM & DWH – Multi & Federated DBs J Vella 47 Slide - 48 DM & DWH – Multi & Federated DBs J Vella 48 MultiDatabases 24

Use Quizgecko on...
Browser
Browser