Lect 01 File (Midterm) PDF
Document Details
Uploaded by DauntlessAestheticism
Sinai University
Almohammady S. Alsharkawy
Tags
Summary
This document is a lecture on file organization and processing, covering file design, file manipulation, and different types of storage devices. The lecture also explains the difference between data structures and file structures, and the concept of computer architecture and memory hierarchy.
Full Transcript
1 File Organization & Processing CSW 241 SINAI UNIVERSITY Almohammady S. Alsharkawy, PHD "[email protected]" 2 CSW 241 File Organization and Processing Overview of Files:...
1 File Organization & Processing CSW 241 SINAI UNIVERSITY Almohammady S. Alsharkawy, PHD "[email protected]" 2 CSW 241 File Organization and Processing Overview of Files: File design, file manipulation, blocking and buffering, single and double buffering. Types of storage devices; magnetic tape and disks. Space and time calculation. Sequential file, relative file, indexed sequential file, multiple key file, and direct access file. External sort/merge algorithms. File systems-disk scheduling 3 Introduction to File Organization Data processing from a computer science perspective: – Storage of data – Organization of data – Access to data This will be built on your knowledge of : Data Structures Introduction to File Organization 4 Data Structure vs. File Structure Both involve: – Representation of Data & Operations for accessing data Difference: – Data Structures deal with data in main memory – File Structures deal with data in secondary storage device (File). 5 Computer Architecture 6 Memory Hierarchy There are four major storage levels. 1. Internal – Processor registers and cache. 2. Main – the system RAM and controller cards. 3. On-line mass storage – Secondary storage. Primary storage 4. Off-line bulk storage – Tertiary and Off-line storage. Secondary storage Tertiary storage 7 Primary Memory Vs. Secondary Memory Main Memory – Fast (since electronic) – Small (since expensive) – Volatile (information is lost when power failure occurs) Secondary Storage – Slow (since electronic and mechanical) – Large (since cheap) – Stable, persistent (information is preserved longer) 8 File System A file is a collection of related information defined by its creator. Files are mapped by the operating system onto physical mass-storage devices. A file system describes how files are mapped onto physical devices, as well as how they are accessed and manipulated by both users and programs. Accessing physical storage can often be slow, so file systems must be designed for efficient access. Other requirements may be important as well, including providing support for file sharing and remote access to files. 9 File A file is organized logically as a sequence of records. These records are mapped onto disk blocks. Files are provided as a basic construct in operating systems, so we shall assume the existence of an underlying file system. We need to consider ways of representing logical data models in terms of files. Each file is also logically partitioned into fixed-length storage units called blocks, which are the units of both storage allocation and data transfer. 10 File A block may contain several records; the exact set of records that a block contains is determined by the form of physical data organization being used. A file may have fixed-length records, or variable-length records. 11 Record and record type A record is a unit which data is usually stored in. Each record is a collection of related data items, where each item is formed of one or more bytes and corresponds to a particular field of the record. Records usually describe entities and their attributes. 12 Record and record type (Cont.) A collection of field (item) names and their corresponding data types constitutes a record type. In short, we may say that a record type corresponds to an entity type and a record of a specific type represents an instance of the corresponding entity type. An example of a record type and its record STUDENT(9901536, “James Bond”, “1 Bond Street, London”, “Intelligent Services”, 9) 13 File Structure Terms Field File collection of similar records basic element of data treated as a single entity contains a single value may be referenced by name fixed or variable length access control restrictions usually apply at the file Record level Database collection of related fields that can be treated as a unit by some collection of related data application program. relationships among elements of data are explicit designed for use by different applications fixed or variable length consists of one or more types of files 14 Data-processing application A data-processing application is likely to require some, or all, of the following facilities: Inserting records into a file (which may or may not be initially empty) Retrieving all the records from a file, one by one Retrieving a record with a given key Deleting a record with a given key Changing a record (possibly in a way that alters its length) Retrieving records one by one in some order 15 Definition: File Structures File Structures is the Organization of Data in Secondary Storage Device in such a way that minimize the access time and the storage space. A File Structure is a combination of representations for data in files and of operations for accessing the data. A File Structure allows applications to read, write and modify data. It might also support finding the data that matches some search criteria or reading through the data in some particular order. 16 Definitions: File organization File organization: This concept generally refers to the organization of data into records, blocks and access structures. It includes the way in which records and blocks are placed on disk and interlinked. Access structures are particularly important. They determine how records in a file are interlinked logically as well as physically, and therefore dictate what access methods may be used. Why Study File Structure Design? 17 I. Data Storage Computer Data can be stored in three kinds of locations: Primary Storage ==> Memory [Computer Memory] Secondary Storage [Online Disk/ Tape/ CDRom that can be accessed by the computer] Tertiary Storage ==> Archival Data [Offline Disk/Tape/ CDRom not directly available to the computer.] Why Study File Structure Design? 18 II. Memory ve. Secondary Storage Secondary storage such as disks can pack thousands of megabytes in a small physical location. Computer Memory (RAM) is limited. However, relative to Memory, access to secondary storage is extremely slow [E.g., getting information from slow RAM takes 120. 10-9 seconds (= 120 nanoseconds) while getting information from Disk takes 30. 10-3 seconds (= 30 milliseconds)] Why Study File Structure Design? 19 III. How Can Secondary Storage Access Time be Improved? By improving the File Structure. Since the details of the representation of the data and the implementation of the operations (Read, Write, …) determine the efficiency of the file structure for particular applications, improving these details can help improve secondary storage access time. Overview of File Structure Design 20 I. General Goals Get the information we need with one access to the disk. If that’s not possible, then get the information with as few accesses as possible. Group information so that we are likely to get everything we need with only one trip to the disk. 21 Overview of File Structure Design II. Fixed vs. Dynamic Files It is relatively easy to come up with file structure designs that meet the general goals when the files never change. When files grow or shrink when information is added and deleted, it is much more difficult. 22 File Management System Objectives Meet the data management needs of the user. Guarantee that the data in the file are valid. Optimize performance. Provide I/O support for a variety of storage device types. Minimize the potential for lost or destroyed data. Provide a standardized set of I/O interface routines to user processes. Provide I/O support for multiple users in the case of multiple-user systems. 23 Good File Structure Design Fast access to great capacity. Reduce the number of disk accesses by collecting data into buffers, blocks or buckets. Manage growth by splitting these collections. 24 Goal of the Course Minimize number of trips to the disk in order to get desired information. “Ideally get what we need in one disk access or get it with as few disk access as possible”. Grouping related information so that we are likely to get everything we need with only one trip to the disk (e.g. name, address, phone number, account balance). Locality of Reference in Time and Space 25 Physical Files and Logical Files physical file: a collection of bytes stored on a disk or tape. logical file: a "channel" (like a telephone line) that connects the program to a physical file. C++ code ‘’(ex) fd = open(filename, flags [, pmode]); ‘’ logical file physical file 26 Physical Files and Logical Files The program (application) sends (or receives) bytes to (from) a file through the logical file. The program knows nothing about where the bytes go (came from). The operating system is responsible for associating a logical file and a program to a physical file in disk or tape. Writing to or reading from a file in a program in done through the operating system. 27 Opening Files Two options (1) open an existing file position at the beginning of the file and ready to start reading and writing (2) create a new file ready for use after creation C++ and C fd = open(filename, flags [, pmode]); (ex) FILE * outfile; outfile = fopen("myfile.txt", "w"); 28 Opening Files The first argument indicates the physical name of the file. The second one determines the “mode", i.e. the way, the le is opened. For example : “r" = open for reading, “w" = open for writing (file need not to exist), “a" = open for appending (file need not to exist), among other modes (“r+",”w+", “a+"). 29 Closing Files This is like “hanging up" the line connected to a file. After closing a file, the logical name is free to be associated to another physical file. Closing a file used for output guarantees everything has been written to the physical file. In C : fclose(outfile); In C++ : outfile.close(); 30 Reading Read data from a file and place it in a variable inside the program. Generic Read function (not specific to any programming language) Read(Source_file, Destination_addr, Size) Source file = logical name of a le which has been opened Destination addr = rst address of the memory block were data should be stored Size = number of bytes to be read 31 Reading In C++ : char c; fstream infile; infile.open("myfile.txt",ios::in); infile >> c; // Since c is a char variable, it's implicit that only 1 byte is to be transferred. 32 Writing Write data from a variable inside the program into the file. Generic Write function : Write (Destination_File, Source_addr, Size) Destination file = logical le name of a le which has been opened Source addr = rst address of the memory block where data is stored Size = number of bytes to be written 33 Writing In C++ : char c; fstream outfile; outfile.open("mynew.txt",ios::out); outfile > c; if (infile.fail()) // file has ended 35 Sample programs for file manipulation Program to write in a text file #include using namespace std; int main() { ofstream fout; fout.open("out.txt"); char str = "Time is a great teacher but unfortunately it kills all its pupils. Berlioz"; //Write string to the file. fout