Alan Turing, Computer Science, and Database History PDF

Introduction 1 Alan Turing Father of Computer Proposed the Turing Machine in 1936 during his Ph.D studies. Father of Artificial Intelligence Proposed the Turing...

Introduction 1 Alan Turing Father of Computer Proposed the Turing Machine in 1936 during his Ph.D studies. Father of Artificial Intelligence Proposed the Turing Test in 1950 to verify if a computer can have intelligence. Cryptanalyst Cracked the famous communications June 23 1912 code system used by the German military June 7, 1954 and shorten the Second World War by a few years Marathoner: Only 11 minutes slower than the 1948 Olympic Games gold modelist. 2 Alan Turing Prosecuted in 1952 for homosexual acts. ACM Turing Award the highest annual award in computer June 23 1912 science since 1966, now it comes with June 7, 1954 US$1 million from Google. Nobel Prize in Computer Science 3 Alan Turing Turing Machine A simple mathematical model that can represent any computer algorithms. Turing Completeness A programming language that is Turing complete is theoretically capable of expressing all tasks accomplishable by computer. For example, C++, Python, JavaScript, etc are Turing complete. Decidability A problem that cannot be solved by a computer program is said to be undecidable. 4 Computers A computer consists of: One or more CPSs Main memory Secondary memory Various input/output devices Managing all these components requires a layer of software – the operating system 5 The Unix/Linux System Structure Users Application Programs User Mode Software User Interface (Shell) Operating System Kernel Mode File System CPU Memory Disk I/O Hardware File System Calls: open(), close(), read(), write(), lseek(), stat(), fstat() 6 Memory 0 1 2........ Max-1 How to store and retrieve data in memory? 7 Disk Storage 8 Disk Structure Disk sector/block address:. 9 Disk Storage Disk address Disk access unit: Block a sector within a cylinder on a surface Disk A sequence of blocks: 0 to Max -1 Disk Access Disk address in ML 10 Computer Architecture  Page – A page is a block-sized area of main memory  Block modification – Reads the block into a page – Modifies the bytes in the page – Writes the page back to the block on disk 11 Data Management File Processing system Hierarchical Model (IMS) Network Model (IDMS) Relational Model Nested Relation Object-Oriented (OO) Data Model Object Relational Data Model XML NoSQL 12 Database System Reviews NoSQL Databases XML Databases Object Relational Databases Object-oriented Databases Nested Relational Databases Relational Databases Hierarchical & Network Databases File Processing Systems 1960 1970 1980 1990 2000 2010 2020 13 Turing Award for DB People Bachman E.F. Codd Jim Gray M. Stonebraker ec 11, 1924 Aug.23,1923 Jan 12, 1944 Oct 11, 1943 Apr.18,2003 Jan 28, 2007 uring Award Turing Turing Turing 973 (age 49) Award Award Award 1981 (age 1998 (age 2014 (age 14 File Systems … To simplify disk access, OS manages the blocks on disks and provide services (file system calls) to users and applications. File A sequences of not necessarily contiguous blocks on disk It has file name and contents File system calls create, remove open, close read, write lseek, etc. 15 Contiguous File Allocation Multiple blocks can be read in at a time to improve I/O External fragmentation will occur Difficult to find contiguous blocks Need to perform compaction 16 File Organizations Any problem? 17 Unix/Linux Inode and File Structure 18 System Calls for File Systems Call Description open() Open a file for reading, writing, or both close() Close an open file read() Read data from a file into a buffer lseek() Move the file pointer write() Write data from a buffer into a file stat() Get a file’s status information fstat() As stat() but works with a file descriptor 19 Basic Concepts Record A collection of related data eg. Name, Age, Address Fixed-length records records have the same length Variable-length records records have different lengths 20 Storing Records in Blocks 1. A record is bigger than a block the record is spanned to several blocks 2. A record is smaller than a block several records on a block and unused space is wasted 21 Elements of File Management create delete read write optimizing performance identify and locate the selected file 22 File Processing Systems (FPS) A file system is a method for storing and organizing files and providing system calls to data in them. A file is a collection of records stored on disk A record is a collection of fields, possibly of different data types, typically in fixed number and sequence. Programing languages supports storage and retrieval of records on the disk Programmers could write File Processing Systems to CURD records for various data management A file processing system is a collection of programs that store and manage files on computer hard-disk. 23 Problems with FPS Business … Registration Office Office They had identical way to store and retrieve data, all that differed were the details of the input and output Data Redundancy. Difficulty in Accessing data No Data Sharing No Concurrent Access Security Problems Such systems are difficult to modify 24 Sample Database Faculty Adam Gray Jack Tony Teach （ 1 ： N) Course CS Math Chem Math Phys Take （ M ： N) Student Ellen James Henry Sandy Terry 25 Hierarchical model Initially implemented in a joint effort by IBM and North American Rockwell around 1965. Resulted in the IMS family of systems and used on early IBM mainframe computers Dominated during 1970s. 26 Hierarchical model records Faculty pointers Course pointers Student 27 Hierarchical Model Data are organized into a tree-like structure using records and links on disk rather than in memory Records are collections of fields, with each field containing only one value. Records are connected to one another through links. Each record has one parent record and many children so that records' relationships form a tree-like model. All fields of a specific record are listed under an entity type. This structure is simple but inflexible because the relationship is confined to a one-to-many relationship 28 Hierarchical Model Advantages: Simple to construct and operate Corresponds to a number of natural hierarchically organized domains, e.g., organization (“org”) chart Language is simple: Uses constructs like GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN PARENT, etc. Disadvantages: Navigational and procedural nature of processing Individual fields cannot identified by the system; a record is simply treated as a number of bytes into which data could be placed Cannot represent many to many (M:N) relationships naturally No data independence 29 Network Model A bachelor’s and a master’s degrees in Mechanical Engineering in 1948 and 1950 Lead the implementation of the Integrated Data Store (IDS) in 1962 to automate the business processes of the General Electric Low Voltage Switch Gear Department in Philadelphia (DDL, DML, OLAP), the basis of the network model first direct-access DBMS, finished in 1964 Received ACM’s Turing Award in 1973 Charles Bachman without a Ph.D when 49 born Dec 11, 1924 30 Network Model records Faculty pointers records Course pointers records Student pointers records The data are organized into a graph (lattice) structure. each parent can have many children each child can also have many parents. 31 Network Model Advantages: Network Model is able to model complex relationships and represents semantics of add/delete on the relationships. Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND NEXT within set, GET, etc. 32 Network Model Disadvantages: Database contains a complex array of pointers that thread through a set of records. Record at a time access. Little scope for automated “query optimization” 33 Network Model Although it was widely implemented and used, it failed to become dominant for two main reasons. IBM chose to stick to the hierarchical model with semi- network extensions in their established products such as IMS and DL/I. It was eventually displaced by the relational model. 34 Relational Model Born in England Studied mathematics and chemistry at Exeter College, Oxford Worked for IBM as a mathematical programmer in 1948 Moved to Ottawa in 1953 for 10 years Returned to US and received his doctorate in Edgar F. Codd computer science from the University of Aug.23,1923 Michigan in Ann Arbor in 1965 Apr.18,2003 Then worked at IBM's San Jose Research Laboratory 35 Relational Model Wrote an internal IBM paper about Relational Model in 1969. Published the paper a year later. Edgar F. Codd Aug.23,1923 Apr.18,2003 36 Relational Model FACULTY COURSES STUDENT F# FNAME C# CNAME S# SNAME 2 Jackson 222 Math 1 Smith 9 Henry 223 Math 2 Jones 14 Schuh 302 CS 3 Doe 21 Lerner 302 Chem 4 Varda 542 Hist 5 Carey all data is represented in terms of tuples (records), grouped into relations (files) FC SC F# C# S# C# 2 542 1 223 9 222 2 223 9 223 2 542 14 302 3 304 21 304 4 222 4 304 5 302 37 Relational Model IBM refused to implement the relational model in order to preserve revenue from IMS/DB. Codd showed IBM customers the potential of the implementation of its model, and they in turn pressured IBM. Continued to develop and extend his relational model Edgar F. Codd IBM started the System R project in 1974, Aug.23,1923 but put in charge of it developers who were Apr.18,2003 not thoroughly familiar with Codd's ideas, and isolated the team from Codd. 38 Relational Databases They did not use Codd’s Algebra language but created a non-relational one SQL, which has since become the standard relational database language. System R with SQL started in 1974 and finished in 1977 as a prototype. Commercial products for its mainframe computers Edgar F. Codd SQL/DS for VM/CMS in 1981 Aug.23,1923 DB2 for VMS in 1983 Apr.18,2003 Received ACM’s Turing Award in 1981 when 58 39 Michael Stonebraker MSc and Ph.D in Computer Science from the University of Michigan in 1967 and 1971 Joined UC Berkeley as an assistant pro. Started to work on the relational database system Ingres in 1974 based on E.F. Codd’s paper using a rotating team of student programmers using Unix machine and C language for 5 years. Michael Stonebraker Along with System R of IBM, show that it is Oct 11, 1943 possible to build a practical and efficient RDB. Received ACM’s Turing Award in 2015 for his work on Ingres, Postgres. 40 Network vs Relational People thinks relational is an ideal model but not practical as its performance won’t be acceptable After 5 years, ACM organized a workshop in 1974 to debate on the two models: Network model: Bachman and his supporters Relational model: Codd and his supporters The debating improved the environment for relational model 41 Oracle attended the University of Chicago for one term, where he first encountered computer design. began his career as a computer programmer for different companies Larry Ellison one of his project was a database for CIA, Born Aug 17, 1944 called Oracle. The third wealthiest American citizen In 1977, inspired by E.F. Codd’s paper on relational database systems, and founded consultancy Software Development Laboratories (SDL) with his friends, former coworkers Bob Miner and Ed Oates 42 Oracle They implemented a relational database system called Oracle on Unix operating systems In 1978, Oracle Version 1 was finished but not released. Larry Ellison In 1979, changed to Relational Software, Inc. Born Aug 17, 1944 and released its Oracle 2, run on PDP-11. The third wealthiest American citizen In 1982, changed to Oracle Systems Corp. In 1995, changed to Oracle Corporation In 2024, he is listed him as the third-wealthiest man person in the world, with a net worth of US$208 billion. 43 Informix Founded as Relational Database Systems (RDS) in 1980 by Roger Sippl and Laura King Released their Relational database product Informix (INFORMation on unIX) in 1981. In 1995, purchased Illustra (a commercial version of Postgre), focused on object-relational databases. It released the first object-relational databases Informix Universal Server in 1996, making it the first big three DB company (Oracle, Sybase, Informix) to offer built-in object relation support. In late 1996, product releases began to fall behind schedule, with 10 key people joined Oracle in early 1997. In April 2001 IBM bought from Informix the database technology. 44 Sybase Founded in 1984 by Mark Hoffman, Robert Epstein (a student on the INGRES project), Jane Doughty and Tom Haggin in Epstein’s home in Berkeley, California In late 1986, Sybase shipped its first test programs, and in May 1987 formally released the SYBASE system, the first high- performance RDBMS for online applications. SYBASE was the first to provide a client/server computer architecture. The server is called Sybase SQL Server It sold the rights to its database system to Microsoft Corporation, markets SQL Server. It has changed to other business instead 45 Microsoft SQL Server In 1989, Microsoft started to sell Sybase system and call it SQL Server 1.0 for IBM OS/2 system In 1993, Microsoft released its operating system Windows NT, and it bought SQL Server code specific for Windows NT from Sybase and called it SQL Server 4.21 Gradually, it modified the system with its own code. In 2005, it completely rewrote SQL Server code and released its SQL Server 2005 46 Transaction Processing He entered into UC Berkeley in 1961 Failed the Chemistry course in the first year and gave up studies. Worked 6 months at General Dynamics Came back to school to study Data Analysis and Discrete Mathematics Graduated with both Mathematics and Engineering degree of bachelor. Jim Gray Jan 12, 1944 Then worked on Multics with Ken Thompson in Jan 28, 2007 Bell Labs. Studied again at UC Berkeley and obtained the first Ph.D in CS from UC Berkeley in three years in 1969. 47 Transaction Processing Worked in IBM on various database systems, IMS, System R, SQL/DS, DB2. Invented transaction processing to make relational database system possible in the paper “Granularity of Locks and Degrees of Consistency in a Shared Data Base” in 1976. i.e., the famous ACID properties. In 1993, Microsoft wanted to get into relational Jim Gray DB market and got him. Jan 12, 1944 Jan 28, 2007 His term released MS SQL server 7.0 Received ACM’s Turing Award in 1998 for his work on Transaction Processing when 54 Was missing during a short sol sailing on January 28, 2007. 48 Relational Database Wars IBM dominated the mainframe relational database market with its SQL/DS(1981) and DB2 (1983) database products, it delayed entering into mini and microcomputer. Oracle, Sybase, Informix dominated mini and microcomputers Oracle almost went bankrupt in 1990 Sybase was far ahead of Oracle and expanded rapidly, resulted in a loss of focus on DB and sold its DB software to Microsoft in 1993, which now markets it under SQL Server Informix overtook Sybase between 1994-1997 and competed with Oracle, but its CEO landed in Jail in 1997 and Informix relational DB division was taken by IBM in 2001 Since then, Oracle enjoyed years of industry dominance 49 MySQL Initially released in 23 May 1995 by the Swedish company MySQL The world second most widely used RDBMS It was acquired by Sun Microsystems in 2008 for $1 billion, which was in turn acquired by Oracle in 2010. The world's most popular open source database. With over 65,000 downloads per day Popular choice of database for use in web applications (Linux, Apache, MySQL, Perl/PHP/Python) 50 Relational DB History Database Year Company Name Released Oracle 1979 Oracle Informix 1981 Informix Db2 1983 IBM Sybase 1986 Sybase SQL Server 2005 Microsoft 51 Database Engine Ranking 52 Big Data Challenges Big data can be described by the following 5Vcharacteristics: Volume (huge large amount of data: terabytes, petabytes, exabytes) Velocity (speed of data in and out: real-time, streaming) Variety (range of data types and sources, non-relational data such as nested relation, documents, XML data, web, graph, multimedia) Veracity (correctness and accuracy of information: data quality and reliability) Value (use machine learning, data mining, statistics, visualization, decision analysis techniques to extract/mine/derive previously unknown insights from data and become actionable knowledge, business value) 53 Big Data Challenges Advance in computing technologies Processors Increased memory & low storage cost Parallel processing technologies Use clusters of commodity hardware, distributed storage Hadoop Distributed File System (HDFS) 54 Big Data Challenges Traditional Relational database management systems are inadequate to handle such big data applications efficiently. Big Data Technologies Hadoop Ecosystems NoSQL databases: Hadoop Hbase, MongoDB, Cassandra, Cloudera NewSQL databases: support ACID properties and SQL. E.g. Google spanner, VoltDB, MemSQL, NuoDB, Clustrix In-memory databases Big Data Warehousing (ETL (Extract, transform, load), ELT (extract, load, transform), data visualization, EDW (Enterprise Data Warehouse), LDW (Logical Data Warehouse), data integration) 55 Big Data Challenges 56

Alan Turing, Computer Science, and Database History PDF

Document Details

Tags

Related

Summary

Full Transcript