GEG410_Lecture2_GeoDatabases.pdf
Document Details
Uploaded by GentleVerdelite
Full Transcript
GEG 410 GIS II Lecture 2: GeoDatabase Fall 2024 Instructor: Dr. Guimin Zhu Email: [email protected] Contents Data Management Relational database Entity-relationship model (E-R model) Join/relate & spatial join SQL & Selection Structured Query L...
GEG 410 GIS II Lecture 2: GeoDatabase Fall 2024 Instructor: Dr. Guimin Zhu Email: [email protected] Contents Data Management Relational database Entity-relationship model (E-R model) Join/relate & spatial join SQL & Selection Structured Query Language (SQL) Querying non-spatial data Querying spatial data An Example of Data Management A visitor book a hotel room and take one or several ski classes A customer is described by his/her name and ID A hotel has the name, address, and rooms A room is described by its room number and the hotel, also the check-in and out time A customer can book one or several rooms, and a room can have several customers A ski site is described by its boundary, level, and a unique site id A ski class has a unique class number and level How to store and manage these data? Computer-based Data Management File-processing system Different datasets are stored in different formats, e.g., text, Excel, Word, etc. Each dataset can be accessed, managed and queried using a particular program Problems of file-processing system: Data redundancy: each organization/company maintains visitor information Data inconsistency: various copy of the same data may not agree Difficulty in organizing and accessing data: depending on each dataset’s method Data isolation: difficult to share data Security: everybody may access the information without any control Concurrency control: two hotel agents simultaneously reserve a room for two visitors Atomicity: a data processing event must happen in its entirety or not at all Computer-based Data Management Databases A database is a collection of organized and interrelated data that together model a real-world phenomenon (e.g., an enterprise, a watershed) A database management system (DBMS) consists of a set of computer programs to manage and access data. A spatial database is a database containing spatial data for a particular area Components of DBMS Data model Querying language Security, backup and recovery, administration tools, application, … Design a Database Step 1: Conceptual database design Conceptualization of a phenomenon into a set of data and their relationships What things to model? How are things related? What constraints exist in the permitted values of things? How to achieve good designs? … Design a Database Step 2: Logical database design Construction of the database based on a specific data model Entity-Relationship model (ER model) is the most widely used conceptual data model. Objects are modeled as entities Entities are described by a set of attributes A relationship is an association among entities Cardinality of on-spatial relation: One-To-One (1:1), One-To- Many (1:M), Many-To-One(M:1), Many-To-Many (M:N) E-R Diagram Rectangle for entities (entity set) Ellipses for attributes Diamond for relationships (relationship set) Lines linking entity to attributes and entities to relationships Primary key is underscored. A primary key is the minimal set of attributes used to uniquely identify an entity The overall design of a database is called the schema of the database E-R Diagram Tips for mapping natural language descriptions into ER diagrams: Noun: entity Transitive verb: relationship Intransitive verb, adjective, & adverb: attribute There are no strict rules for naming your entities/relationships/attributes Make sure they are ONE word/phrase. No space allowed. Use proper capitals/underscores/dashes instead, e.g., SKICLASS, Ski_Class, ski_class, Ski-Class, ski-class, etc. Sometimes, to distinguish entities/relationships from attributes, you can use bolds or all capitals. For example: SKICLASS Time SkiClass Time Example – State Park Database The state park consists of several forests, each is a collection of forest-stands that correspond to different species of trees (e.g., maple, oak, etc.). The state park has several managers; each manager manages one or more forests. But one forest can only be managed by one manager. The database needs to store the manager information, including her/his name, employment ID (a 3-digit number between 000 and 999) that is unique for individual managers, and phone number. Each forest is described by its name, and geographic boundary (as a polygon). Each forest-stand is described by its name, the species, and its geographic boundary (as a polygon). A forest may have no or one playground, but a playground does not cover across multiple forests. One manager can manage several playgrounds, and one playground can be managed by more than one manager. A playground is described by its name, location (a point in the database), type of playground, capacity (i.e., how many people can use this playground at a time), the year when the campground was constructed, the size and the minimal age required to use the playground. name boundary name phone name ID species 1 M 1 M manager manage-forest forest has-stand forest-stand N 1 manage-playground has-playground boundary M M playground size year min_age location type name capacity Design a Database Step 3: Physical database design manager=(ID, name, phone) ID name phone MNG01 John Smith 305-284-3949 MNG02 Taylor Swift 305-284-3950 MNG03 Justin Bieber 305-284-3951 forest=(name, boundary) name boundary Riviera POLYGON Granada POLYGON ESRI Geodatabases File Geodatabases Databases stored as folders in a file system Each dataset is a file that can scale up to 1 TB in size Personal Geodatabases Stored in a Microsoft Access data file Each database is limited in size to 2 GB Geometry is stored as an OLE object (Object Linking and Embedding), which is a standard developed by Microsoft enabling us to create objects with one application and then link or embed them in a second application How a feature is stored by ArcGIS Pro, in Catalog and in the physical folder Contents Data Management Relational database Entity-relationship model (E-R model) Join/relate & spatial join SQL & Selection Structured Query Language (SQL) Querying non-spatial data Querying spatial data Structured Query Language (SQL) One problem of file-processing systems: difficulty in accessing data SQL is a standard computer language for accessing and manipulating relational or object-relational databases Example: PublicSchools = (school_nam, street, city, category) Query: find all public elementary schools. Basic Form of an SQL Statement SELECT FROM WHERE The SELECT clause defines the columns to be presented as output The FROM clause defines which table(s) to be queried The WHERE clause defines the constraints on the columns to be selected Example Find all information of public elementary schools in PublicSchools: SELECT * FROM PublicSchools WHERE category=‘public elementary sch’ Find the school names in PublicSchools: SELECT school_nam FROM PublicSchools The SELECT clause can present all columns or a subset of columns in query results Example Find the names of the public elementary schools in the City of Adelphi: SELECT school_nam, city FROM PublicSchools WHERE category = ‘public elementary sch’ AND city=‘ Adelphi’ Example SELECT schoolname Enrollment = (schoolname, studentnumber) FROM Enrollment SchoolName StudentNumber WHERE StudentNumber > 500 Apple Grove 400 Laurel 280 SELECT * Allenwood 150 FROM Enrollment WHERE StudentNumber < 200 OR StudentNumber > 350 Querying Non-spatial Data in Multiple Relations PublicSchools Find the cities having schools school_nam city street with less than 200 students, and list the city names, school Adelphi Adelphi 8802 Riggs Rd names and enrollment: Allenwood Temple Hills 6300 Harley Ln Beltsville Beltsville 4300 Wicomico Ave Enrollment schoolname studentnumber Apple Grove 400 Laurel 280 Allenwood 150 Querying Non-spatial Data in Multiple Relations Find the cities having schools with less than 200 students, and list the city names, school names and enrollment: SELECT p.school_nam, p.city, e.studentnumber FROM publicschools p, enrollment e WHERE p.school_nam = e.schoolname AND e.studentnumber < 200 The WHERE clause joins two tables The SELECT and WHERE clauses must specify which table each column belongs to Join Type Inner join: Only matched tuples are presented in the result Attributes of the left-hand-side table appear first followed by the attributes of the right-hand-side table school_nam city street schoolname studentnumber Allenwood Temple Hills 6300 Harley Ln Allenwood 150 Join Type Left outer join: The result of the inner join is computed For every tuple in the left-hand-side table that did not match any tuple in the right-hand-side table, the attributes values of tuple is added to the result of join and the values of the attributes from the right-hand-side relation are set null school_nam city street schoolname studentnumber Allenwood Temple Hills 6300 Harley Ln Allenwood 150 Adelphi Adelphi 8802 Riggs Rd null null Beltsville Beltsville 4300 Wicomico Ave null null Join Type Right outer join: The result of the inner join is computed Symmetric to the left outer join For every tuple in the right-hand-side table that did not match any tuple in the left-hand-side table, the attributes of the tuple are added to the result of the join, and the values of the attributes from the left-hand-side table are set null school_nam city street schoolname studentnumber Allenwood Temple Hills 6300 Harley Ln Allenwood 150 null null null Apple Grove 400 null null null Laurel 280 Spatial Join Comparing the spatial relationships among spatial objects’ from a source table and a target table (GIS layers). Join the objects in the source table to the target table if certain rules of spatial relationships are satisfied. Spatial Join Spatial join in ArcGIS Pro: Intersect Within Within a distance Contains Boundary touches Closest … Spatial relationships between 2 features Additional Resources SQL Tutorials http://www.sqlcourse.com/ http://www.w3schools.com/sql/