File Structures - PDF
Document Details
Tags
Summary
This document provides an introduction to file structures, data processing, and computer architecture. It covers topics such as file structure design, data processing, memory hierarchy, and file management. The presentation explains the differences between data structures and file structures and the importance of effectively managing data in computer systems.
Full Transcript
1/8/23 Contents 1 Introduction to File Structures 2 History of File Structures Design 3 File Basics 4 File Management 7 LOGO 7 Introduction to File Struct...
1/8/23 Contents 1 Introduction to File Structures 2 History of File Structures Design 3 File Basics 4 File Management 7 LOGO 7 Introduction to File Structures 8 LOGO 8 1 1/8/23 File Structure v A File Structure is a combination of representations for data in files and of operations for accessing the data. v A File Structure allows applications to read, write and modify data. v It might also support finding the data that matches some search criteria or reading through the data in some particular order. 9 LOGO 9 Data Processing v Data processing from a computer science perspective involves: § Storage of data § Organization of data § Access to data v This will be built on your knowledge of: 10 LOGO 10 2 1/8/23 Data Structures vs. File Structures v Both involve: § Representation of Data § Operations for accessing data v Difference: § Data Structures deal with data in main memory § File Structures deal with data in secondary storage device (File) Main Storage (Memory) Secondary Storage Data Structures File Structures 11 LOGO 11 Computer Architecture 12 LOGO 12 3 1/8/23 Main Memory vs. Secondary Storage v Main Memory § Fast (since electronic) § Small (since expensive) § Volatile (information is lost when power failure occurs) v Secondary Storage § Slow (since electronic and mechanical) § Large (since cheap) § Stable, persistent (information is preserved longer) 13 LOGO 13 How Fast …? v Typical times for getting information § Main memory: ~120 nanoseconds = 120 × 10-9 § Magnetic Disks: ~30 milliseconds = 30 × 10-6 v An analogy keeping same time proportion as above § Looking at the index of a book: 20 seconds versus § Going to the library: 58 days 14 LOGO 14 4 1/8/23 Memory Hierarchy CPU Cache Data Request Main Memory satisfying for data request Magnetic Disks Tapes 15 LOGO 15 Main Goal of This Course v Minimize number of trips to the disk in order to get desired information (Ideally get what we need in one disk access or get it with as few disk access as possible). v Grouping related information so that we are likely to get everything we need with only one trip to the disk (e.g. name, address, phone number, account balance). Locality of Reference in Time and Space In order to achieve these goals, we need good file structure design 16 LOGO 16 5 1/8/23 Good File Structure Design v Fast access to great capacity v Reduce the number of disk accesses v By collecting data into buffers, blocks or buckets v Manage growth by splitting these collections 17 LOGO 17 History of File Structures Design 18 LOGO 18 6 1/8/23 History of File Structures Design 1. In the beginning… it was the tape § Sequential access § Access cost proportional to size of file [Analogy to sequential access to array data structure] 2. Disks became more common § Direct access [Analogy to access to position in array] § Indexes were invented list of keys and points stored in small file allows direct access to a large primary file Great if index fits into main memory. As file grows we have the same problem we had with a large primary file 19 LOGO 19 History of File Structures Design 3. Tree structures emerged for main memory (1960`s) § Binary search trees (BST`s) § Balanced, self adjusting BST`s: e.g. AVL trees (1963) 4. A tree structure suitable for files was invented: § B trees (1979) and B+ trees § good for accessing millions of records with 3 or 4 disk accesses. 5. What about getting info with a single request? § Hashing Tables (Theory developed over 60’s and 70’s but still a research topic) Good when files do not change too much in time. § Expandable, dynamic hashing (late 70’s and 80’s) One or two disk accesses even if file grows dramatically 20 LOGO 20 7 1/8/23 File Basics 21 LOGO 21 Computer File v A computer file, or simply a file, is defined as a named collection of data that exists on a storage medium, such as a hard disk, CD, DVD, or USB flash drive. v A file can contain a group of records, a document, a photo, music, a video, an e-mail message, or a computer program. 22 LOGO 22 8 1/8/23 Rules for Naming Files v Every file has a name and might also have a file extension. v When you save a file, you must provide a valid file name that adheres to specific rules, referred to as file-naming conventions. v Each operating system has a unique set of file-naming conventions. 23 LOGO 23 Rules for Naming Files Microsoft Windows Mac OS 24 LOGO 24 9 1/8/23 Rules for Naming Files v Some operating systems also contain a list of reserved words that are used as commands or special identifiers. You cannot use these words alone as a file name. v You can also use spaces in file names. That’s a different rule than for e-mail addresses, where spaces are not allowed. 25 LOGO 25 File Extension v A file extension (sometimes referred to as a file name extension) is an optional file identifier that is separated from the main file name by a period, as in Paint.exe. v File extensions provide clues to a file’s contents. For example.exe files (Windows) and.app files (Mac OS) contain computer programs. 26 LOGO 26 10 1/8/23 File’s Location 27 LOGO 27 File’s Location v To determine a file’s location, you must first specify the device where the file is stored. v You can store files on a hard drive, removable storage, a network computer, or cloud-based storage. v When working with Windows, each local storage device is identified by a device letter. The main hard disk drive is referred to as drive C: v Macs do not use drive letters. Every storage device has a name. The main hard disk is called Macintosh HD, for example. 28 LOGO 28 11 1/8/23 File’s Location v A disk partition is a section of a hard disk drive that is treated as a separate storage unit. v Every storage device has a directory containing a list of its files. v The main directory is referred to as the root directory. On a PC, the root directory is identified by the device letter followed by a backslash (C:\). v A root directory can be subdivided into smaller lists. Each list is called a subdirectory 29 LOGO 29 File’s Location v A computer file’s location is defined by a file path (sometimes called a file specification), which on a PC includes the drive letter, folder(s), file name, and extension. v Suppose that you have stored an MP3 file called Marley One Love in the Reggae folder on your hard disk. 30 LOGO 30 12 1/8/23 File Format v The term file format refers to the organization and layout of data that is stored in a file. v The format of a file usually includes a header, data, and possibly an end-of-file marker. v A file header is a section of data at the beginning of a file that contains information about a file, such as the date it was created, the date it was last updated, its size, and its file type. 31 LOGO 31 File Format v Music files are stored differently than text files or graphics files; but even within a single category of data, there are many file formats. v For example, graphics data can be stored in file formats such as BMP, GIF, JPEG, or PNG. v Although a file extension is a good indicator of a file’s format, it does not really define the format. 32 LOGO 32 13 1/8/23 File Format: Executable File Extensions v Windows software program consists of at least one executable file with an.exe file extension. It might also include a number of support programs with extensions such as.dll,.vbx, and.ocx. 33 LOGO 33 File Format: Data File Extensions v The list of data file formats is long. 34 LOGO 34 14 1/8/23 Why can’t I open some files v When a file doesn’t open, one of three things probably went Wrong: § The file might have been damaged by a transmission or disk error. § Someone might have accidentally changed the file extension. § Some file formats exist in several variations, and your software might not have the capability to open a particular variation of the format. 35 LOGO 35 File Management 36 LOGO 36 15 1/8/23 File Management v File management encompasses any procedure that helps you organize your computer-based files so that you can find and use them more efficiently. 37 LOGO 37 Application-based File Management v Applications generally provide a way to open files and save them in a specific folder on a designated storage device. Some applications also allow you to delete and rename files. 38 LOGO 38 16 1/8/23 Application-based File Management v Creating a new folder while saving a file 39 LOGO 39 Saving Files on Windows 40 LOGO 40 17 1/8/23 Saving Files on Macs 41 LOGO 41 File Management Metaphors v The operating system has a file management utility, such as the Windows File Explorer or the Mac OS X Finder, to handle different file operations. v File management utilities often use some sort of storage metaphor to help you visualize and mentally organize the files on your disks. 42 LOGO 42 18 1/8/23 File Management Metaphors Filing Cabinet Tree Structure In this metaphor, each storage device In this metaphor, a tree represents a corresponds to one of the drawers in a filing storage device. cabinet. The drawers hold folders and the folders hold files. 43 LOGO 43 File Management Metaphors Combined Filing Cabinet & Tree Structure Microsoft programmers combined the filing cabinet metaphor to depict a tree structure in the Windows file management utility 44 LOGO 44 19 1/8/23 File Management Tips v Use descriptive names v Maintain file extensions. v Group similar files. v Organize your folders from the top down. v Consider using default folders. v Use Public folders for files you want to share. v Do not mix data files and program. v Don’t store files in the root directory. v Access files from the hard disk. v Follow copyright rules. v Delete or archive files you no longer need. v Back up! 45 LOGO 45 20