Ch4 PDF - Database Concepts

To Amiga Data can be in any form – numbers, characters, pieces of text, pictures, or sounds – but crucially data has no context. This lack of context means that it is very hard to work out the meaning of that data. Computer programs process data. The computer simply follows the instructions held in the program. Computer programs transform data into information by imposing structure on data to make it meaningful to human beings. Data is one or more values that can be assigned to an object. Values: are familiar from everyday life – names, prices and titles of books or movies. Objects: aren’t just physical in our life like mountains, person, or a city it can be virtual like (character in a novel or weather forecast). Database: A collection of related data organized to allow access for reading and updating by a computer. Databases showed that it was the most important thing that drove the computerization of big business and government. IBM created (SABRE) Semi-Automatic Business Research Environment. Was the first computer Database to provide real-time responses to enquiries without taking hours doing that, it will only take few seconds (flight availability) using Teletype terminals. The designers of a database could make data more or less visible simply by changing how data was retrieved. Flat Database One to One One to Many Many to Many Manger  School Student teacher school teachers Each student has a each school only has Each school has lot of teachers, and one manger. many teachers. each teacher has a lot of students. Flat Database: used like a spreadsheet to calculate totals, generate statistics,and process data, using equations to generate new values. A spreadsheet or Flat database can store more than numeric information (text, Data and time, logical, etc.). Colum in this table can be as a Field Row is a Record It’s easy to make errors when entering data into a database, so databases restrict the type of data that can be stored in each filed. Types of Data:  Text (any combination of text, numbers, symbols)  Numbers (just numbers)  Data and times (calendar and clock)  Currency (monetary values)  Logical (True or False) Advanced databases support wider range of data types like (pictures, videos, audio, formatted documents) Such as (e-books, links to data on web) The database developer is responsible for choosing an appropriate type for each field. We are interacting with Database using Query languages (SQL) Why? Because it combines a very limited subset of English phrases with logical commands that allow items to be included or excluded from the query. The most common query language called Structured Query Language (SQL). Databases and query languages support most websites, rather than storing web pages for every product in an online store, a typical website stores components of the page – the text, pictures, prices, reviews. Adding fields We can add large numbers of fields, we should not let fields empty, because it’s a problem.  Empty fields occupy memory.  Queries run more slowly on large tables even if most of the fields are empty, because it will more data to search through. Editing record Repeating data in our database is a problem, because it uses extra memory, but there is a more serios issue whenever information is repeated. Relational Databases Relational databases solved the problems outlined previously by dividing data between two or more tables. This separation is known as Normalization. Normalization follows rules, one of them is One table per entity. Entity: an item for which we want to store information. Entity can be:  Tangible: (person or object)  Intangible: an event (sale, registration)  A concept: Bank account In the previous table, the most obvious entity is ‘student’ A particular student (Colin Cherry) is an instance of this entity (‘student’) Attribute: descriptive information about an entity. Steps to creating the relational database:  divide the data into separate tables with each table representing a single entity. Ex( we have to entities in our collage Table(student),(courses), so we need 2 tables, student table for the student entity, course table for course entity. (Look at table slide 17).  The next stage of the design is to re-establish relationships between the Student and Course entities using the new key values, Relationships are stored in a so-called joining table listing every relationship between every student and their course (or courses). Keys  An important piece of information was lost during the transition from the original flat database to relational database.  The tables no longer show which students are registered on which courses – the entire point of the original database!  This can be fixed by linking each student in Student_Table to one or more courses in Course_Table.  Before the two can be linked, every entity in each table must be identified using a unique key field. (look at the table slide 19) Big Data Databases were invented when data was rare and expensive. Big data: a type of data-handling technology for extracting useful information from huge pools of ill-defined data. Data processing has been so complicated and takes much time to scan a subset of the available data, called Sample. An existing application for big data is protecting credit card users from fraud. The Three Vs Volume: very large amounts of data need to be processed. Velocity: there is a need for data to be processed at an ever- increasing pace. Variety: The data being acquired can take any form. Structured and unstructured data Databases hold information in the form of structured data. If we know the structure of the database, we can create a query to extract an attribute from the table. Big data deal with Unstructured data. Unstructured data is produced by both computers and humans:  Machine-acquired data (satellite, aerial photography images)  Human-generated content (business docs, emails) Note: unstructured does not refer to how data is stored; rather, it refers to the contents of the file itself Data volume VS data quality Sampling is used to improve the quality of the data at the expense of the volume, rather than choosing a random sample from all data. The possibility that some data might be inaccurate, that led to create Fourth ‘V’. Veracity: ensuring the correctness and trustworthiness of the data. Processing Big data The ‘Datafication)‫ ’ (تحويل البيانات‬of our World Big data is fueled by two things.  The increasing ‘datafication’ of the world  Our increasing ability analyze large and complex sets of data. Sets of Data  Activity Data: Listening to music, reading a book.  Conversation Data: Our conversations are digitally recorded and started with emails.  Photo and Video Image Data: Think about the photos we take using our smartphone or cameras.  Sensor Data: Smartphone sensors.  The Internet of Things (IoT): Devices such as smart watches send data to a target. The datafication of our world gives us large amounts of data in terms of Volume, Velocity, Variety and Veracity. The latest technology such as cloud computing and distributed systems together with the latest software and analysis approaches allow us to leverage all types of data to add value. Data Science What is Data Science? Data science is the field of study that combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract meaningful insights from data. It is a multi-disciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data. Data science is a concept used to tackle big data and includes data cleansing, preparation, and analysis, also a concept to unify statistics, data analysis, machine learning. It employs techniques from mathematics, statistics, computer science, and information science. Why Data science is important?  Making sense of data will reduce the horrors of uncertainty for organizations.  Data science is a rapidly growing function.  Data mining for excavating insights has marked the demand to be able to use data for business strategies.  There are a few important stages for housing data science within businesses. From doing business health checks, evaluating data to maintain data through data cleansing, warehousing, procession, and then analyzing and finally visualizing and communicating. What skills are needed to become a Data Scientist?  Anyone who wants to start in it, should have 3 departments skills analytics, programming, and domain knowledge.  Strong knowledge of Python, SAS, R, SCALA.  Experience in SQL database coding.  Ability to work with unstructured data.  Knowledge of Mathematics  Understand multiple analytical functions.  Knowledge of machine learning and software engineering. Some Important Roles Required in Data Science Projects.  Data Scientist: produces mathematical models for the purposes of prediction.  Data Engineer: create easily accessible data pipelines for consumption by data scientists.  Machine Learning Engineer: his role is to deal with machine learning tools.  Data architect: define how the data will be stored, IT systems.  Business analyst:  Software engineer: to secure more structure in the data science work.  Domain expert: brings the technical understanding of its area of expertise.

Ch4 PDF - Database Concepts

Document Details

Tags

Related

Summary

Full Transcript