Multi-databases & Federated Databases PDF
Document Details
Uploaded by BrighterPersonification6672
University of Malta
Prof Joseph G Vella
Tags
Summary
This document is a presentation on multi-databases and federated databases. It discusses various concepts, including databases, DBMS, and data models, providing a comprehensive learning resource.
Full Transcript
Prof Joseph G Vella ©, CIS, FICT, UM Multi-databases & Federated Databases...
Prof Joseph G Vella ©, CIS, FICT, UM Multi-databases & Federated Databases Prof. Joseph G. Vella Dept. of Computer Information Systems 1 Databases & DBMS A database is a collection of consistent & structured facts from a domain of discourse that are accessible by a number of end users. The facts on the domain of discourse are of two types; intensional - schema facts extensional - data The database must remain consistent when changing from one state instance to another; e.g. during transactional updates. A DBMS is a generic software system that supports and maintains databases. Some of its important components include: storage management; query processing; and transaction management. Slide - 2 DM & DWH – Multi & Federated DBs J Vella 2 MultiDatabases 1 Prof Joseph G Vella ©, CIS, FICT, UM DBMS and other thingies A DBMS uses the services of the operating systems and networking facilities. A DBMS uses a number of specialised artefacts: hardware RAID units Cache memory standards Connectivity GUI Slide - 3 DM & DWH – Multi & Federated DBs J Vella 3 MULTI DATABASE -- GES – Global External Schema, GCS – Global Conceptual Schema In each COMPONENT/LOCAL DB LES –External Schema, LCS –Conceptual Schema, LIS - Internal Schema Slide - 4 DM & DWH – Multi & Federated DBs J Vella 4 MultiDatabases 2 Prof Joseph G Vella ©, CIS, FICT, UM Classifying Databases - Data Models Each database has a schema which is expressed with a data model. There are a number of data models: for example the relational a collection of tables; the network a collection of graphs; the hierarchic a collection of trees; and the object-oriented pragmatic mix modelling (structure & behaviour) capabilities! The semi-structure & NoSQL models Hotpotch of keywork-value pair, graphical, tree structures. The data model affects the level of detail we can model a domain of discourse. The more detail we can express with the data model language the less programming we need to insert into the front ends. Slide - 5 DM & DWH – Multi & Federated DBs J Vella 5 Classifying Databases - Users Each database instance can have a varying number of end users that access it. If a number of users access the database at the same “instant” we say we have a multi-user system. Otherwise we have a single user system – long dead. Managing multi-user system implies a need for sophisticated sharing protocols. The transaction model is based on the ACID principle: Atomicity of transaction Consistency preservation of database instance Isolation from other transactions Durability of committed transactions Slide - 6 DM & DWH – Multi & Federated DBs J Vella 6 MultiDatabases 3 Prof Joseph G Vella ©, CIS, FICT, UM Classifying Databases - Spread Is the database instance available to one “site” or spread over many sites and connected by a communication network? No - centralised system Yes - distributed databases A number of themes and variations exists for distributed databases! Does each site have the same DBMS? Yes - homogenous solution No - heterogeneous solution Does each site have the same schema? Yes - tightly coupled distributed system (i.e. traditional); No - loosely coupled system (MultiDatabases or federated systems - Hammer & MacLeod, 1979). Does each site use the “same” data model to express the local sites’ schema? Slide - 7 DM & DWH – Multi & Federated DBs J Vella 7 Taxonomy of Multidatabase & Federated DBs MULTI- Autonomous DATABASE control over a DB? Systems Non-Federated Federated Independent vs Database Database Central/Common Systems Systems schema Single or Loosely Tightly Multiple Schemas Coupled Coupled Single Multiple Federation Federation DM & DWH – Multi & Federated DBs 8 MultiDatabases 4 Prof Joseph G Vella ©, CIS, FICT, UM Multidatabase: an example set-up Multidatabase/ Global Global Data Global DBMS Transaction Dictionary Global Transaction Manager Global Access Local Layer Transaction Global Global Subtransaction 1 Subtransaction 2 Local DBMS1 Local DBMSn Local Local Access Access Layer Layer Local Local Transaction Transaction Manager Manager... Local Local Database 1 Databasen DM & DWH – Multi & Federated DBs 9 Classifying Databases - Costs multi-user DBMS - euro 2,000 - euro 100,000 And or plus user licences! And or plus maintenance agreement! Slide - 10 DM & DWH – Multi & Federated DBs J Vella 10 MultiDatabases 5 Prof Joseph G Vella ©, CIS, FICT, UM Classifying Databases - Scoping Commercial Scientific Slide - 11 DM & DWH – Multi & Federated DBs J Vella 11 Classifying Databases - Mode Batch OLTP OLAP Slide - 12 DM & DWH – Multi & Federated DBs J Vella 12 MultiDatabases 6 Prof Joseph G Vella ©, CIS, FICT, UM Homogenous & Heterogeneous Distributed over a network one logical database one data model one query model n physical databases Heterogeneous over a network (federated) n logical databases p data models q query models m physical databases Slide - 13 DM & DWH – Multi & Federated DBs J Vella 13 Homogenous vs Heterogeneous DDBs vs HDBs functional independence of the parts DDBs -> none HDBs -> total modalities DDBs support QP, QO, SM & TM HDBs support transaction monitors Slide - 14 DM & DWH – Multi & Federated DBs J Vella 14 MultiDatabases 7 Prof Joseph G Vella ©, CIS, FICT, UM Heterogeneous DBs We need standards - why? But can standards solve all our problems? Why do we need heterogeneity it is natural; political decentralisation acquisitions, merges, divisions and cross entity collaboration. The research in 70s : DBMSi to DBMSj mappings for integration at communication and data exchange; 80s : operational issues started to get tackled; such as schema integration 90s : transaction processing issues 00s : scalability and openness 10s : variety in data sources and transaction models 20s : Reducing the time lapse between data appearing in the OLTP and it being available in the OLAP Slide - 15 DM & DWH – Multi & Federated DBs J Vella 15 MDBs: Scheme of things HDBs CDBs DBs DBMS QM DM TP Slide - 16 DM & DWH – Multi & Federated DBs J Vella 16 MultiDatabases 8 Prof Joseph G Vella ©, CIS, FICT, UM MDBs: Scheme of things - notes The key characteristic of HDBs is co-operation amongst the participants. The HDBs is controlled by the HDBMS - of course!? The HDBMS provides controlled and co-ordinated manipulation of the CDBs. CDBS accept two ontologically different operations: logical (and local) aspect; controlled by a local DBMS federal aspect. Slide - 17 DM & DWH – Multi & Federated DBs J Vella 17 But!? Many DBMS are: available for many platforms; have transaction monitors to major & competitor’s DBMSs; can scale up and down relatively easy. Hemmm!?!?!?!?!?! Yes; DBMS are great for data models, systems level support but weak in semantic heterogeneity! (for example when there is disparate meaning to an attribute name). Slide - 18 DM & DWH – Multi & Federated DBs J Vella 18 MultiDatabases 9 Prof Joseph G Vella ©, CIS, FICT, UM HDBs Characteristics Distribution Heterogeneity Autonomy Slide - 19 DM & DWH – Multi & Federated DBs J Vella 19 1. Distribution Computer platforms single composite “site” with communications interconnections LAN, or WAN Data placement policies; for example: vertical vs horizontal partitioning; single vs multiple copies. Variations of the above two points result in benefits along these lines: higher data availability; higher reliability; and improved access time. BUT issues of sharing/synchronisation of values. Slide - 20 DM & DWH – Multi & Federated DBs J Vella 20 MultiDatabases 10 Prof Joseph G Vella ©, CIS, FICT, UM 2. Heterogeneity Semantic Heterogeinty occurs when there is a disagreement about meaning, interpretation, or intended use of the same or related data. For example: Total time to run an experiment is found in a relation experiment of database micro_bio_lab; Total time required to finish an experiment is found in a relation run of database pathology. But are these two “times” really comparable!? Detecting semantic heterogeneity is a difficult problem. Slide - 21 DM & DWH – Multi & Federated DBs J Vella 21 3. Autonomy The institutions that manage the individual CDBs are often autonomous. What types of autonomy are we after: design; communications with other CDBs; execution of local transactions w/o the interference of the external transactions. The need to maintain the autonomy of CDBs and the need to share data is an often conflicting requirements. Slide - 22 DM & DWH – Multi & Federated DBs J Vella 22 MultiDatabases 11 Prof Joseph G Vella ©, CIS, FICT, UM Reference Architecture for MDS (a)... consists of various system components capable of describing a MDBs. Data Facts and relationships in the DBs. Database A database!? Commands End user or application generated requests for specific actions. Processors Software units that run the commands over the data (e.g. stored procedures). Schemas A schema!? Functions that convert schema entities Mappings from one CDB schema to another. Slide - 23 DM & DWH – Multi & Federated DBs J Vella 23 Processor Classification (i) Transformer translates commands from one language to another, or translates data from one format to another. E.g. Relational tuple to JSON, XML doc to JSON. E.g. SQL SPJ query into a sequence of LINQ queries, complex CTE SQL query into a procedural program with basic SPJ queries. Transformers enable a type of data model independence. Remember ERMs! Transformers enable a type of query model independence. Remember Predicate Calculus! Filtering constrain the commands and associated data that can be passed to another processor. Each filtering has a set of mappings that describe the constraints on commands and data. E.g. syntactic verification, checking i.c., access rights. Address the “view update” problem (at local schemas). Slide - 24 DM & DWH – Multi & Federated DBs J Vella 24 MultiDatabases 12 Prof Joseph G Vella ©, CIS, FICT, UM Processor Classification (ii) Constructing classify and replicate an operation submitted by a single processor into operations that are accepted by other processors. A number of tasks can be given to constructor processors: schema integration; negotiation; query decomposition; global transaction management. (more later!?) Accessing accepts commands and produces data by running these on local databases. Has to deal with local concurrency issues, etc Slide - 25 DM & DWH – Multi & Federated DBs J Vella 25 Reference Architecture for MDS (b) Schema Types (some revision) Remember the three level schema architecture for databases (ANSI/X3/SPARC). Internal schema; Conceptual schema; External schema. Rational behind this was to achieve a high level of logical and physical data independence. Slide - 26 DM & DWH – Multi & Federated DBs J Vella 26 MultiDatabases 13 Prof Joseph G Vella ©, CIS, FICT, UM ANSI/X3/SPARC with the MDS reference architecture External Sch. 1 External Sch. 2 External Sch. n Filtering Proc. 1 Filtering Proc. 2 Filtering Proc. n Conceptual Sch Trans. Processor Internal Sch Assessing Proc Slide - 27 DM & DWH – Multi & Federated DBs J Vella 27 Reference Architecture for MDS (b) A 5 level schema architecture The 3 level architecture does not support: distribution, heterogeneity and autonomy. We need something else!? The five level architecture includes: Local Schema Level; Component Schema Level; Export Schema Level; Federated Schema Level; and finally External Schema Level. Slide - 28 DM & DWH – Multi & Federated DBs J Vella 28 MultiDatabases 14 Prof Joseph G Vella ©, CIS, FICT, UM Federated Database Systems: Five Level Data Architecture Slide - 29 DM & DWH – Multi & Federated DBs J Vella 29 Local Schema This is the conceptual schema of the CDB. It is expressed with the local DBMSs data model. Remember we can have different local schemas expressed with different data models! Slide - 30 DM & DWH – Multi & Federated DBs J Vella 30 MultiDatabases 15 Prof Joseph G Vella ©, CIS, FICT, UM Component Schema The local schema representation with a MDS canonical data model. We transform the CDB local schema into the federation’s data modelling language. Rational: describe divergent local schemas using one representation; missing semantics (in a CDB) could be built at this level. Heterogeneity is thus supported. Slide - 31 DM & DWH – Multi & Federated DBs J Vella 31 Export Schema Is a subset of the component schema that is available to the federation. The place where we place access control information. Autonomy can we supported if we augment filtering processors. Slide - 32 DM & DWH – Multi & Federated DBs J Vella 32 MultiDatabases 16 Prof Joseph G Vella ©, CIS, FICT, UM Federated Schema It is the integration of multiple export schemas. Slide - 33 DM & DWH – Multi & Federated DBs J Vella 33 External Schema Is a definition of the MDBs for a user or application. We need external schemas for: customisation; additional i.c.; access control. Slide - 34 DM & DWH – Multi & Federated DBs J Vella 34 MultiDatabases 17 Prof Joseph G Vella ©, CIS, FICT, UM MDBs Design Complex!!!! Have to deal with structure and semantics from many different perspective and yet factor out the FDBs. Cautionary Hint: The 5 level architecture could be riddled with redundant information on the federation. For example, between: external & federated schemas; external@CDB and export@CDB; etc. Slide - 35 DM & DWH – Multi & Federated DBs J Vella 35 Introducing MDBs (or evolving toward a MDBs). Two main approaches: radical - introduce a DDBMS; in reality this entails: changing and / or disrupting current systems; probably requires strategic planning at an orginasation level; abandoning some of the trusted functionality of the previous DBMS. wraparound - build around the local DBs (I.e. CBDs) to create federal functionality this follows an evolutionary path with less dramatic changes. It is usually carried out by undergoing the following phases: preintegration; development; and operations. Slide - 36 DM & DWH – Multi & Federated DBs J Vella 36 MultiDatabases 18 Prof Joseph G Vella ©, CIS, FICT, UM Methodology for MDBs development Both bottom-up and top-down methods exists. But then again methodologies are like forest mushrooms in autumn! Slide - 37 DM & DWH – Multi & Federated DBs J Vella 37 A Systems Development Task Schema Integration There are a number of “unique” development activities associated with MDBs development. For example: Schema translation; Access control; Negotiation; and Schema integration. Slide - 38 DM & DWH – Multi & Federated DBs J Vella 38 MultiDatabases 19 Prof Joseph G Vella ©, CIS, FICT, UM Schema Integration (i) Batini et al (1986) discusses and compares many methodologies for schema integration. They use the following steps: preintegration translate a schema into a CDB schema global naming and i.c. established comparison 1. analyses and compares the schema objects to be integrated 2. specify the relationships between objects conformation; merging; restructuring. Slide - 39 DM & DWH – Multi & Federated DBs J Vella 39 Schema Integration (ii) Since the CDBs operate independently, the CDBs may include structural and representational conflicts. These conflicts must be homogenised so that MDB end users can access the underlying CDBs. This is a crucial problem to solve! Let us assume that the canonical data model is the relational one (in reality the rdm is not really a good CDM!). See the example schemas from a number of libraries. Note: there are four independent libraries; look at the naming of library objects - item X 2, items, books. look at the number of tables that make up the publisher entity. look at the number of attributes some “same” tables. look at the date data format. Slide - 40 DM & DWH – Multi & Federated DBs J Vella 40 MultiDatabases 20 Prof Joseph G Vella ©, CIS, FICT, UM Schema Integration (iii) - Conflicts Schema conflicts result from the use of different schema definitions in different CDBs. Data conflicts are due to inconsistent data in the absence of schema conflicts. The following two slides give a categorisation of the two generic types of conflicts. Slide - 41 DM & DWH – Multi & Federated DBs J Vella 41 Schema Integration (iv) - Schema Conflicts Table vs attribute Table vs Table 1-1 name conflicts table name table structure table i.c. & triggers M-N table conflicts Attribute vs Attribute 1-1 name conflicts attribute name default value attribute level i.c. & triggers M-N attribute conflicts Slide - 42 DM & DWH – Multi & Federated DBs J Vella 42 MultiDatabases 21 Prof Joseph G Vella ©, CIS, FICT, UM Schema Integration (iv) - Data Conflicts Wrong data: incorrect entry data; obsolete data. Different representation for the same data (or same representation for different data): different expression; different units; different precision. Slide - 43 DM & DWH – Multi & Federated DBs J Vella 43 MDBs Operation: Global QP (i) Blunt fact: there are few general purpose query optimisation options in a loose federations! On the other hand, in the tight variants (i.e. DDBs) one has a number of interesting Q.P. options. QP in MDBs involves converting a query over the federated external query into several queries against the CDBs export schemas. Each of these dispatched queries need to be computed on the CDBs. There are a number of similarities between MDBs & DDBs QP. Slide - 44 DM & DWH – Multi & Federated DBs J Vella 44 MultiDatabases 22 Prof Joseph G Vella ©, CIS, FICT, UM Slide - 45 DM & DWH – Multi & Federated DBs J Vella 45 MDBs Operation: Global QP (ii) Additional problems, that are more evident in MDBs, include: cost of queries (spawn by a federal query) are significantly different from each other; factors: local loading, comms, priority. a CDBMS might not be able to do any QO! most MDBMS have no record of the QP primitives at each component! Slide - 46 DM & DWH – Multi & Federated DBs J Vella 46 MultiDatabases 23 Prof Joseph G Vella ©, CIS, FICT, UM MDBs Operation: Global TP The Global Transaction Manager is responsible for maintaining database consistency while allowing concurrent updates across multiple databases. Two types of transaction: global transactions submitted to the MDBMS; local transactions submitted to the CDBMS. In a nutshell the basic problem is: the GTM does not know about the local TM autonomous doings (i.e. over each CDB)! With current algorithms it is difficult to identify whether the execution and serialisation order is different at any component site. Basic solutions (of pragmatic nature include): unsynchronised retrieval; off-line updates; new transaction model. Slide - 47 DM & DWH – Multi & Federated DBs J Vella 47 Slide - 48 DM & DWH – Multi & Federated DBs J Vella 48 MultiDatabases 24