Device and File Management (Unit 5) PDF

UNIT-5 Device and File management Device Management in Operating System The process of implementation, operation, and maintenance of a device by an operating system is called device management. When we use computers we will be having various devices connected to our system like mouse, keyboard, scanner, printer, and pen drives. So all these are the devices and the operating system acts as an interface that allows the users to communicate with these devices. An operating system is responsible for successfully establishing the connection between these devices and the system. The operating system uses the concept of drivers for establishing a connection between these devices with the system. How does it work? Device management performs various tasks like managing the operation of devices and implementing and maintaining those devices. Therefore, it is beneficial for a business who are working in IT infrastructure as it supports all the hardware used in the computer system. Device management performs the following tasks: Device management helps to install related drivers and switches. It helps to implement security measures. Device configuration is made easy with device management as it helps IT hardware work well. Devices can be physical as well virtual, like virtual machines and switches. key Features of Device Management? Firstly, the device drivers help the Operating system to interact with the device controller. At the same time, they execute multiple processes on a computer system. Secondly, device drivers work like system software programs that help to execute the process. Thirdly, the critical feature of devices management is to implement the API. Device drivers are the same as software programs, which helps OS manage multiple devices. Lastly, the device controller for three registers in device management operations, i.e., command, status, and data. Contd… Operating system is responsible in managing device communication through their respective drivers. Operating system keeps track of all devices by using a program known as an input output controller. It decides which process to assign to CPU and for how long. OS is responsible in fulfilling the request of devices to access the process. It connects the devices to various programs in efficient way without error. Deallocate devices when they are not in use. Device Characteristics Device characteristic Indicates whether Accept Third Party Cookies the device, by default, accepts a cookie set from a pixel in a page of a different domain. Fully Supports Flash the device has full Flash support. Flash is a platform used for creating certain types of graphics, animations, and applications. Is Mobile the request is coming from a known mobile device. Is Tablet the device is a tablet computer, regardless of operating system. This is a subcategory of mobile device. Is Wireless Device the device is, or is not, wireless. A mobile phone will return true, but a desktop or laptop it will return false. Supports AJAX the device supports asynchronous JavaScript and XML (AJAX), a set of programming techniques. Supports Cookies the device's browser supports cookies. This device characteristic doesn't indicate whether the request includes cookies. If the value is false because the native browser does not support cookies, cookies may be supported using other methods. Functions of Device Management The device management functions start with installing devices and device drivers and software. Another vital part is configuring a device to perform well to meet business expectations. It helps to implement the security measure and process. Keeps track of all devices and the program which is responsible to perform this is called the I/O controller. There are four main functions of device management that are: Monitoring the status of each device such as storage drivers, printers, and other peripheral devices. Enforcing preset policies and taking a decision on which process gets the device when and for how long. Allocating the devices Deallocates the device in an efficient way. Types of Device The OS peripheral devices can be categorized into 3: Dedicated, Shared, and Virtual. Dedicated devices: Such type of devices in the device management in operating system are dedicated or assigned to only one job at a time until that job releases them. Devices like printers, tape drivers, plotters etc. demand such allocation scheme since it would be difficult if several users share them at the same point of time. The disadvantages of such kind of devices inefficiency resulting from the allocation of the device to a single user for the entire duration of job execution even though the device is not put to use 100% of the time. Contd… Shared devices: These devices can be allocated to several processes. Disk can be shared among several processes at the same time by interleaving their requests. The interleaving is carefully controlled by the Device Manager and all issues must be resolved on the basis of predetermined policies. Contd… Virtual Devices: These devices are the combination of the first two types and they are dedicated devices which are transformed into shared devices. For example, a printer converted into a shareable device via spooling program which re-routes all the print requests to a disk. A print job is not sent straight to the printer, instead, it goes to the disk(spool)until it is fully prepared with all the necessary sequences and formatting, then it goes to the printers. This technique can transform one printer into several virtual printers which leads to better performance and use. Contd… Input/output devices Input/output devices are the devices that are responsible for the input/output operations in a computer system. Basically there are following two types of input/output devices: Block devices Character devices Contd… Block Devices It stores information in fixed-size block, each one with its own address. Example, disks. A block device stores information in block with fixed-size and own-address. It is possible to read/write each and every block independently in case of block device. In case of disk, it is always possible to seek another cylinder and then wait for required block to rotate under head without mattering where the arm currently is. Therefore, disk is a block addressable device. Contd… Character Devices It delivers or accepts a stream of characters. the individual characters are not addressable. For example, printers, keyboards etc. A character device accepts/delivers a stream of characters without regarding to any block structure. Character device isn’t addressable. Character device doesn’t have any seek operation. There are too many character devices present in a computer system such as printer, mice, rats, network interfaces etc. These four are the common character devices.  Network Device: It is for transmitting data packets. Device controller A device controller is a system that handles the incoming and outgoing signals of the CPU by acting as a bridge between CPU and the I/O devices. A device is connected to the computer via a plug and socket, and the socket is connected to a device controller. Device controllers use binary and digital codes. In computer systems, I/O devices do not usually communicate with the operating system. The operating system manages their task with the help of one intermediate electronic device called a device controller. Input devices are those devices that generate data to give input to the computer system. Examples- mouse, trackball, keyboard, CD-ROM. Output devices accept data from the computer system. Examples- printer, graphics display screen, plotter. Input/Output (I/O devices) devices can give data as input and receive output as data from the computer system. Examples- disk, tape, writable CD. Contd… The device controller knows how to communicate with the operating system as well as how to communicate with I/O devices. So device controller is an interface between the computer system (operating system) and I/O devices. The device controller communicates with the system using the system bus. So how the device controller, I/O devices, and the system bus is connected is shown below in the diagram. Contd… Contd… In the above diagram, some IO devices have DMA (Direct Memory access) via device controllers and some of them do not have DMA. The devices which have a DMA path to communicate with the system to access memory are much faster than devices that have a non-DMA path to access the memory. The devices have a non-DMA path via the device controller to access the memory, they have to go from the processor which means it will be scheduled by the scheduler and then when it gets loaded into RAM then it will get the CPU to execute its instruction to access memory so it is slow from devices which has a DMA. Contd… A device controller generally can control more than one IO device but it is most common to control only a single device. Device controllers are stored in the chip and that chip is attached to the system bus there is a connection cable from the controller to each device which is controlled by it. Generally, one controller controls one device. The operating system communicates with device controllers and the device controller communicates with devices so indirectly operating system communicates with IO devices. Nowadays, most controllers have Direct Memory Access to execute their tasks and because of DMA, concurrent execution of process increases and so overall performance increases. Buffering In operating systems, buffering is a technique which is used to enhance the performance of I/O operations of the system. Buffering in operating system is a method of storing data in a buffer or cache temporarily, this buffered data then can be accessed more quickly as compared to the original source of the data. In a computer system, data is stored on several devices like hard discs, magnetic tapes, optical discs and network devices. In the case, when a process requires to read or write data from one of these storage devices, it has to wait while the device retrieves or stores the data. Contd… This waiting time could be very high, especially for those devices which are slow or have a high latency. This problem can be addressed by buffering. Buffering provides a temporary storage area, called buffer. Buffer can store data before it is sent to or retrieved from the storage device. When the buffer is fully occupied, then data is sent to the storage device in a batch, this will reduce the number of access operations required and hence improves the performance of the system. Reasons of Buffering The following three are the major reasons for buffering in operating systems: Buffering creates a synchronization between two devices having different processing speed. For example, if a hard disc (supplier of data) has high speed and a printer (accepter of data) has low speed, then buffering is required. Buffering is also required in cases where two devices have different data block sizes. Buffering is also required to support copy semantics for application I/O operations. How Buffering in OS works? The operating system starts with allocating memory for creating the Buffers which can be one or more, the size of each one depends on requirements. Then the data which is read from the input device is stored in the buffer, as the buffer act as the intermediate stage between the sender and receiver. The details of all the buffers that are present in the operating system, details include information like the amount of data stored in the buffer, etc. This information helps the Operating system to manage all the buffers. The data is processed and retrieved by the CPU, using this technique the CPU works independently and improves the device’s speed. This process helps in the Asynchronous functioning of the CPU in the device. Then the Data in the Buffer is Flushed i.e., it is deleted and the memory is freed. The temporary memory space is further used. Contd… Function of Buffering in OS Synchronization: This process increases the synchronization of different devices that are connected, so the system’s performance also improves. Smoothening: The input and output devices have different operating speeds and Buffer data block sizes, this process encapsulates the difference and ensures a smooth operation. Efficient Usage: Using this processing technique the system overhead and the inefficient usage of the system resources. Types of Buffering In operating systems, the following three types of buffering techniques are there : Single Buffering Double Buffering Circular Buffering Single Buffering It is the simplest buffering that operating system can support. In the case of single buffering, when a process issues an I/O request, the operating system assigns a buffer (or cache) in the system potion of the main memory to the operation. Then, the input transfers are made to the buffer and are moved to the user space when needed. This is the simplest type of Buffering where only one system buffer is allocated by the Operating System for the system to work with. The producer(I/O device) produces only one block of data in the buffer for the consumer to receive. After one complete transaction of one block of data, the buffer memory again produces the buffer data. Double Buffering Double buffering is an extended variant of single buffering. In this type buffering, a process can transfer data to or from one buffer while the operating system removes or fills the other. Therefore, double buffering has two system buffers instead of one. This is an upgrade over Single Buffering as instead of using one buffer data here two are used. The working of this is similar to the previous one, the difference is that the data is first moved to the first buffer then it is moved to the second buffer. Then retrieved by the consumer end. Here on one hand the data is inserted into one buffer while the data in the other buffer is processed into the other one. Circular Buffering When more than two buffers are used, then it is called circular buffering. It is used to solve the issues associated with the double buffering technique. Sometimes, the double buffering becomes insufficient, when the process performs rapid bursts of I/O. In the circular buffer, each individual buffer acts a unit. Double Buffering is upgraded to this, in this process more than two buffers are used. The mechanism which was used earlier is a bit enhanced here, where one buffer is used to insert the data while the next one of it used to process the data that was earlier inserted. This chain of processing is done until the last buffer in the queue and then the data is retrieved from the last buffer by the consumer. This mechanism is used where we need faster data transfers and more bulky data is transferred. Advantages of Buffering Buffering significantly reduces the waiting time for a process or application to access a device in the system Smoothens the I/O operations between the different devices that are connected to the system. Using Buffer number of calls needed for carrying out an operation, so it increases the overall performance of the system. Buffering helps cut down on the number of I/O operations that are needed to access the desired data. Contd… Buffering reduces the number of I/O operations required to access data. Buffering reduces the amount of time for that processes have to wait for the data. Buffering improves the performance of I/O operations as it allows data to be read or written in large blocks instead of 1 byte or 1 character at a time. Buffering can improve the overall performance of the system by reducing the number of system calls and context switches required for I/O operations. Disadvantages of Buffering Buffers/Temporary memory that is used to assign data while transferring takes up a lot of memory in a long process. Most of the time the exact size of the data going to be transferred is unpredictable so more memory is assigned for every data and most of the time the extra space goes wasted. Due to unpredictability sometimes more data is stored on the Buffer than it can store which leads to Buffer Overflow and Data corruption. In some situations, Buffers can result in a delay between the read and write of the data in the memory and the processing of the data. Contd… Buffers of large sizes consume significant amount of memory that can degrade the system performance. Buffering may also impact the real-time system performance and hence, can cause synchronization issues. Disk management in operating system Disk management is the process of organizing and maintaining the storage on a computer's hard disk. It involves dividing the hard disk into partitions, formatting these partitions to different file systems, and regularly maintaining and optimizing disk performance. The goal of disk management is to provide a convenient and organized storage system for users to store and access their data, as well as to ensure that the computer runs smoothly and efficiently. Disk scheduling Disk scheduling is done by operating systems to schedule I/O requests arriving for the disk. Disk scheduling is also known as I/O Scheduling. How Disk Scheduling Algorithms Works? Seek Time: This is the time it takes for the disk arm to move to the right track where the data is located.... Rotational Latency: Once the disk arm is on the right track, the disk platter must rotate to bring the desired data under the read/write head. Importance of Disk Scheduling in Operating System Multiple I/O requests may arrive by different processes and only one I/O request can be served at a time by the disk controller. Thus other I/O requests need to wait in the waiting queue and need to be scheduled. Two or more requests may be far from each other so this can result in greater disk arm movement. Hard drives are one of the slowest parts of the computer system and thus need to be accessed in an efficient manner. Key Terms Associated with Disk Scheduling Seek Time: Seek time is the time taken to locate the disk arm to a specified track where the data is to be read or written. So the disk scheduling algorithm that gives a minimum average seek time is better. Rotational Latency: Rotational Latency is the time taken by the desired sector of the disk to rotate into a position so that it can access the read/write heads. So the disk scheduling algorithm that gives minimum rotational latency is better. Transfer Time: Transfer time is the time to transfer the data. It depends on the rotating speed of the disk and the number of bytes to be transferred. Contd… Disk Access Time: Disk Access Time = Seek Time + Rotational Latency + Transfer Time Total Seek Time = Total head Movement * Seek Time Disk Response Time: Response Time is the average time spent by a request waiting to perform its I/O operation. The average Response time is the response time of all requests. Variance Response Time is the measure of how individual requests are serviced with respect to average response time. So the disk scheduling algorithm that gives minimum variance response time is better. Disk Scheduling Algorithms FCFS (First Come First Serve) FCFS is the simplest of all Disk Scheduling Algorithms. In FCFS, the requests are addressed in the order they arrive in the disk queue. The First-Come-First-Served (FCFS) disk scheduling algorithm is one of the simplest and most straightforward disk scheduling algorithms used in modern operating systems. It operates on the principle of servicing disk access requests in the order in which they are received. In the FCFS algorithm, the disk head is positioned at the first request in the queue and the request is serviced. The disk head then moves to the next request in the queue and services that request. This process continues until all requests have been serviced. Example Suppose we have an order of disk access requests: 20 150 90 70 30 60. The disk head is : Currently located at track 50. The total seek time = (50-20) + (150-20) + (150-90) + (90-70) + (70-30) + (60-30) = 310 Example: Suppose the order of request is- (82,170,43,140,24,16,190) And current position of Read/Write head is: 50 So, total overhead movement (total distance covered by the disk arm) = (82-50)+(170-82)+(170-43)+(140-43)+(140-24)+(24-16)+(190-16) =642 Contd… Contd… Advantages of FCFS Every request gets a fair chance No indefinite postponement Disadvantages of FCFS Does not try to optimize seek time May not provide the best possible service Shortest-Seek-Time-First Shortest Seek Time First (SSTF) is a disk scheduling algorithm used in operating systems to efficiently manage disk I/O operations. The goal of SSTF is to minimize the total seek time required to service all the disk access requests. In SSTF, the disk head moves to the request with the shortest seek time from its current position, services it, and then repeats this process until all requests have been serviced. The algorithm prioritizes disk access requests based on their proximity to the current position of the disk head, ensuring that the disk head moves the shortest possible distance to service each request. Contd… In SSTF (Shortest Seek Time First), requests having the shortest seek time are executed first. So, the seek time of every request is calculated in advance in the queue and then they are scheduled according to their calculated seek time. As a result, the request near the disk arm will get executed first. SSTF is certainly an improvement over FCFS as it decreases the average response time and increases the throughput of the system. Contd… In this case, for the same order of success request, the total seek time = (60-50) + (70-60) + (90-70) + (90-30) + (30-20) + (150-20) = 240 Contd… Suppose the order of request is- (82,170,43,140,24,16,190) And current position of Read/Write head is: 50 So, total overhead movement (total distance covered by the disk arm) = (50-43)+(43-24)+(24- 16)+(82-16)+(140-82)+(170-140)+(190-170) =208 Contd… Advantages of Shortest Seek Time First The average Response Time decreases Throughput increases Disadvantages of Shortest Seek Time First Overhead to calculate seek time in advance Can cause Starvation for a request if it has a higher seek time as compared to incoming requests The high variance of response time as SSTF favors only some requests SCAN SCAN (Scanning) is a disk scheduling algorithm used in operating systems to manage disk I/O operations. The SCAN algorithm moves the disk head in a single direction and services all requests until it reaches the end of the disk, and then it reverses direction and services all the remaining requests. In SCAN, the disk head starts at one end of the disk, moves toward the other end, and services all requests that lie in its path. Once the disk head reaches the other end, it reverses direction and services all requests that It missed on the way. This continues until all requests have been serviced. So, this algorithm works as an elevator and is hence also known as an elevator algorithm. As a result, the requests at the midrange are serviced more and those arriving behind the disk arm will have to wait. Contd… If we consider that the head direction is left in case of SCAN, the total seek time = (50-30) + (30-20) + (20-0) + (60-0) + (60-70) + (90-70) + (90-150) = 200 Contd… Suppose the requests to be addressed are-82,170,43,140,24,16,190. And the Read/Write arm is at 50, and it is also given that the disk arm should move “towards the larger value”. Therefore, the total overhead movement (total distance covered by the disk arm) is calculated as = (199-50) + (199-16) = 332 Contd… Advantages of SCAN Algorithm High throughput Low variance of response time Average response time Disadvantages of SCAN Algorithm Long waiting time for requests for locations just visited by disk arm C-SCAN The C-SCAN (Circular SCAN) algorithm operates similarly to the SCAN algorithm, but it does not reverse direction at the end of the disk. Instead, the disk head wraps around to the other end of the disk and continues to service requests. This algorithm can reduce the total distance the disk head must travel, improving disk access time. This algorithm can lead to long wait times for requests that are made near the end of the disk, as they must wait for the disk head to wrap around to the other end of the disk before they can be serviced. The C-SCAN algorithm is often used in modern operating systems due to its ability to reduce disk access time and improve overall system performance. Contd… In the SCAN algorithm, the disk arm again scans the path that has been scanned, after reversing its direction. So, it may be possible that too many requests are waiting at the other end or there may be zero or few requests pending at the scanned area. These situations are avoided in the CSCAN algorithm in which the disk arm instead of reversing its direction goes to the other end of the disk and starts servicing the requests from there. So, the disk arm moves in a circular fashion and this algorithm is also similar to the SCAN algorithm hence it is known as C-SCAN (Circular SCAN). Contd… For C-SCAN, the total seek time = (60-50) + (70-60) + (90-70) + (150-90) + (199-150) + (199-0) + (20-0) + (30-20) = 378 Contd… Suppose the requests to be addressed are-82,170,43,140,24,16,190. And the Read/Write arm is at 50, and it is also given that the disk arm should move “towards the larger value”. So, the total overhead movement (total distance covered by the disk arm) is calculated as: =(199-50) + (199-0) + (43-0) = 391 Advantages of C-SCAN Algorithm Provides more uniform wait time compared to SCAN. LOOK The LOOK algorithm is similar to the SCAN algorithm but stops servicing requests as soon as it reaches the end of the disk. This algorithm can reduce the total distance the disk head must travel, improving disk access time. This algorithm can lead to long wait times for requests that are made near the end of the disk, as they must wait for the disk head to wrap around to the other end of the disk before they can be serviced. The LOOK algorithm is often used in modern operating systems due to its ability to reduce disk access time and improve overall system performance. Contd… LOOK Algorithm is similar to the SCAN disk scheduling algorithm except for the difference that the disk arm in spite of going to the end of the disk goes only to the last request to be serviced in front of the head and then reverses its direction from there only. Thus it prevents the extra delay which occurred due to unnecessary traversal to the end of the disk. Contd… Considering the head direction is right, in this case, the total seek time = (60-50) + (70-60) + (90-70) + (150-90) + (150-30) + (30-20) = 230 Contd… Suppose the requests to be addressed are-82,170,43,140,24,16,190. And the Read/Write arm is at 50, and it is also given that the disk arm should move “towards the larger value”. So, the total overhead movement (total distance covered by the disk arm) is calculated as: = (190-50) + (190-16) = 314 C-LOOK C-LOOK is similar to the C-SCAN disk scheduling algorithm. In this algorithm, goes only to the last request to be serviced in front of the head in spite of the disc arm going to the end, and then from there it goes to the other end’s last request. Thus, it also prevents the extra delay which might occur due to unnecessary traversal to the end of the disk. In CLOOK, the disk arm in spite of going to the end goes only to the last request to be serviced in front of the head and then from there goes to the other end’s last request. Thus, it also prevents the extra delay which occurred due to unnecessary traversal to the end of the disk. Contd… For the C-LOOK algorithm, the total seek time = (60-50) + (70-60) + (90-70) + (150-90) + (150-20) + (30-20) = 240 Example: Suppose the requests to be addressed are-82,170,43,140,24,16,190. And the Read/Write arm is at 50, and it is also given that the disk arm should move “towards the larger value” So, the total overhead movement (total distance covered by the disk arm) is calculated as = (190-50) + (190-16) + (43-16) = 341 RAID (Redundant Arrays of Independent Disks) RAID is a technique that makes use of a combination of multiple disks instead of using a single disk for increased performance, data redundancy, or both. The term was coined by David Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in 1987. RAID is very transparent to the underlying system. This means, that to the host system, it appears as a single big disk presenting itself as a linear array of blocks. This allows older technologies to be replaced by RAID without making too many changes to the existing code. Why Data Redundancy? Data redundancy, although taking up extra space, adds to disk reliability. This means, that in case of disk failure, if the same data is also backed up onto another disk, we can retrieve the data and go on with the operation. On the other hand, if the data is spread across multiple disks without the RAID technique, the loss of a single disk can affect the entire data. Key Evaluation Points for a RAID System Reliability: How many disk faults can the system tolerate? Availability: What fraction of the total session time is a system in uptime mode, i.e. how available is the system for actual use? Performance: How good is the response time? How high is the throughput (rate of processing work)? Note that performance contains a lot of parameters and not just the two. Capacity: Given a set of N disks each with B blocks, how much useful capacity is available to the user? Contd… Different RAID Levels RAID-0 (Stripping) RAID-1 (Mirroring) RAID-2 (Bit-Level Stripping with Dedicated Parity) RAID-3 (Byte-Level Stripping with Dedicated Parity) RAID-4 (Block-Level Stripping with Dedicated Parity) RAID-5 (Block-Level Stripping with Distributed Parity) RAID-6 (Block-Level Stripping with two Parity Bits) Contd… 1. RAID-0 (Stripping) Blocks are “stripped” across disks. In the figure, blocks “0,1,2,3” form a stripe. Instead of placing just one block into a disk at a time, we can work with two (or more) blocks placed into a disk before moving on to the next one. Contd… Evaluation Reliability: 0 There is no duplication of data. Hence, a block once lost cannot be recovered. Capacity: N*B The entire space is being used to store data. Since there is no duplication, N disks each having B blocks are fully utilized. Contd… Advantages It is easy to implement. It utilizes the storage capacity in a better way. Disadvantages A single drive loss can result in the complete failure of the system. Not a good choice for a critical system. RAID-1 (Mirroring) More than one copy of each block is stored in a separate disk. Thus, every block has two (or more) copies, lying on different disks. The above figure shows a RAID-1 system with mirroring level 2. RAID 0 was unable to tolerate any disk failure. But RAID 1 is capable of reliability. Contd… Evaluation Assume a RAID system with mirroring level 2. Reliability: 1 to N/2 1 disk failure can be handled for certain because blocks of that disk would have duplicates on some other disk. If we are lucky enough and disks 0 and 2 fail, then again this can be handled as the blocks of these disks have duplicates on disks 1 and 3. So, in the best case, N/2 disk failures can be handled. Capacity: N*B/2 Only half the space is being used to store data. The other half is just a mirror of the already stored data. Contd… Advantages It covers complete redundancy. It can increase data security and speed. Disadvantages It is highly expensive. Storage capacity is less. RAID-2 (Bit-Level Stripping with Dedicated Parity) In Raid-2, the error of the data is checked at every bit level. Here, we use Hamming Code Parity Method to find the error in the data. It uses one designated drive to store parity. The structure of Raid-2 is very complex as we use two disks in this technique. One word is used to store bits of each word and another word is used to store error code correction. It is not commonly used. Contd… Advantages In case of Error Correction, it uses hamming code. It uses one designated drive to store parity. Disadvantages It has a complex structure and high cost due to extra drive. It requires an extra drive for error detection. RAID-3 (Byte-Level Stripping with Dedicated Parity) It consists of byte-level striping with dedicated parity striping. At this level, we store parity information in a disc section and write to a dedicated parity drive. Whenever failure of the drive occurs, it helps in accessing the parity drive, through which we can reconstruct the data. Here, Disk 3 contains the Parity bits for Disk 0, Disk 1, and Disk 2. If data loss occurs, we can construct it with Disk 3. Contd… Advantages Data can be transferred in bulk. Data can be accessed in parallel. Disadvantages It requires an additional drive for parity. In the case of small-size files, it performs slowly. RAID-4 (Block-Level Stripping with Dedicated Parity) Instead of duplicating data, this adopts a parity-based approach. In the figure, we can observe one column (disk) dedicated to parity. Parity is calculated using a simple XOR function. If the data bits are 0,0,0,1 the parity bit is XOR(0,0,0,1) = 1. If the data bits are 0,1,1,0 the parity bit is XOR(0,1,1,0) = 0. A simple approach is that an even number of ones results in parity 0, and an odd number of ones results in parity 1. Contd… Assume that in the above figure, C3 is lost due to some disk failure. Then, we can recompute the data bit stored in C3 by looking at the values of all the other columns and the parity bit. This allows us to recover lost data. Contd… Evaluation Reliability: 1 RAID-4 allows recovery of at most 1 disk failure (because of the way parity works). If more than one disk fails, there is no way to recover the data. Capacity: (N-1)*B One disk in the system is reserved for storing the parity. Hence, (N-1) disks are made available for data storage, each disk having B blocks. Contd… Advantages It helps in reconstructing the data if at most one data is lost. Disadvantages It can’t help in reconstructing when more than one data is lost. RAID-5 (Block-Level Stripping with Distributed Parity) This is a slight modification of the RAID-4 system where the only difference is that the parity rotates among the drives. In the figure, we can notice how the parity bit “rotates”. This was introduced to make the random write performance better. Contd… Evaluation Reliability: 1 RAID-5 allows recovery of at most 1 disk failure (because of the way parity works). If more than one disk fails, there is no way to recover the data. This is identical to RAID-4. Capacity: (N-1)*B Overall, space equivalent to one disk is utilized in storing the parity. Hence, (N-1) disks are made available for data storage, each disk having B blocks. Contd… Advantages Data can be reconstructed using parity bits. It makes the performance better. Disadvantages Its technology is complex and extra space is required. If both discs get damaged, data will be lost forever. RAID-6 (Block-Level Stripping with two Parity Bits) Raid-6 helps when there is more than one disk failure. A pair of independent parities are generated and stored on multiple disks at this level. Ideally, you need four disk drives for this level. There are also hybrid RAIDs, which make use of more than one RAID level nested one after the other, to fulfill specific requirements. Contd… Advantages Very high data Accessibility. Fast read data transactions. Disadvantages Due to double parity, it has slow write data transactions. Extra space is required. Advantages of RAID Data redundancy: By keeping numerous copies of the data on many disks, RAID can shield data from disk failures. Performance enhancement: RAID can enhance performance by distributing data over several drives, enabling the simultaneous execution of several read/write operations. Scalability: RAID is scalable, therefore by adding more disks to the array, the storage capacity may be expanded. Versatility: RAID is applicable to a wide range of devices, such as workstations, servers, and personal PCs. Disadvantages of RAID Cost: RAID implementation can be costly, particularly for arrays with large capacities. Complexity: The setup and management of RAID might be challenging. Decreased performance: The parity calculations necessary for some RAID configurations, including RAID 5 and RAID 6, may result in a decrease in speed. Single point of failure: RAID is not a comprehensive backup solution, while offering data redundancy. The array’s whole contents could be lost if the RAID controller malfunctions. File A file is a collection of related information that is recorded on secondary storage. or file is a collection of logically related entities. From the user’s perspective, a file is the smallest allotment of logical secondary storage. A file is a named collection of related information that is recorded on secondary storage such as magnetic disks, magnetic tapes and optical disks. In general, a file is a sequence of bits, bytes, lines or records whose meaning is defined by the files creator and user. The name of the file is divided into two parts as shown below:  Name  Extension, separated by a period. What is a File System? A file system is a method an operating system uses to store, organize, and manage files and directories on a storage device. Some common types of file systems include: FAT (File Allocation Table): An older file system used by older versions of Windows and other operating systems. NTFS (New Technology File System): A modern file system used by Windows. It supports features such as file and folder permissions, compression and encryption. EXT (Extended File System): A file system commonly used on Linux and Unix-based operating systems. HFS (Hierarchical File System): A file system used by macOS. APFS (Apple File System): A new file system introduced by Apple for their Macs and iOS devices. File Systems in Operating System A computer file is defined as a medium used for saving and managing data in the computer system. The data stored in the computer system is completely in digital format, although there can be various types of files that help us to store the data. Advantages of File System Organization: A file system allows files to be organized into directories and subdirectories, making it easier to manage and locate files. Data protection: File systems often include features such as file and folder permissions, backup and restore, and error detection and correction, to protect data from loss or corruption. Improved performance: A well-designed file system can improve the performance of reading and writing data by organizing it efficiently on disk. Disadvantages of File System Compatibility issues: Different file systems may not be compatible with each other, making it difficult to transfer data between different operating systems. Disk space overhead: File systems may use some disk space to store metadata and other overhead information, reducing the amount of space available for user data. Vulnerability: File systems can be vulnerable to data corruption, malware, and other security threats, which can compromise the stability and security of the system. File Structure A File Structure should be according to a required format that the operating system can understand. A file has a certain defined structure according to its type. A text file is a sequence of characters organized into lines. A source file is a sequence of procedures and functions. An object file is a sequence of bytes organized into blocks that are understandable by the machine. When operating system defines different file structures, it also contains the code to support these file structure. Unix, MS-DOS support minimum number of file structure. File Type File type refers to the ability of the operating system to distinguish different types of file such as text files source files and binary files etc. Many operating systems support many types of files. Operating system like MS-DOS and UNIX have the following types of files: Ordinary files Directory files Special files Contd… Ordinary files These are the files that contain user information. These may have text, databases or executable program. The user can apply various operations on such files like add, modify, delete or even remove the entire file. Directory files These files contain list of file names and other information related to these files. Contd… Special files These files are also known as device files. These files represent physical device like disks, terminals, printers, networks, tape drive etc. These files are of two types : Character special files − data is handled character by character as in case of terminals or printers. Block special files − data is handled in blocks as in the case of disks and tapes. File Access Mechanisms File access mechanism refers to the manner in which the records of a file may be accessed. There are several ways to access files: Sequential access Direct/Random access Indexed sequential access Sequential access A sequential access is that in which the records are accessed in some sequence, i.e., the information in the file is processed in order, one record after the other. This access method is the most primitive one. Example: Compilers usually access files in this fashion. Direct/Random access Random access file organization provides, accessing the records directly. Each record has its own address on the file with by the help of which it can be directly accessed for reading or writing. The records need not be in any sequence within the file and they need not be in adjacent locations on the storage medium. Indexed sequential access This mechanism is built up on base of sequential access. An index is created for each file which contains pointers to various blocks. Index is searched sequentially and its pointer is used to access the file directly. Space Allocation Files are allocated disk spaces by operating system. Operating systems deploy following three main ways to allocate disk space to files. Contiguous Allocation Linked Allocation Indexed Allocation Contiguous Allocation Each file occupies a contiguous address space on disk. Assigned disk address is in linear order. Easy to implement. External fragmentation is a major issue with this type of allocation technique. Linked Allocation Each file carries a list of links to disk blocks. Directory contains link / pointer to first block of a file. No external fragmentation Effectively used in sequential access file. Inefficient in case of direct access file. Indexed Allocation Provides solutions to problems of contiguous and linked allocation. A index block is created having all pointers to files. Each file has its own index block which stores the addresses of disk space occupied by the file. Directory contains the addresses of index blocks of files. File Sharing in OS File Sharing in an Operating System(OS) denotes how information and files are shared between different users, computers, or devices on a network; and files are units of data that are stored in a computer in the form of documents/images/videos or any others types of information needed. Primary Terminology Related to File Sharing Folder/Directory: It is basically like a container for all of our files on a computer. The folder can contain files and even other folders maintaining like hierarchical structure for organizing data. Networking: It is involved in connecting computers or devices where we need to share the resources. Networks can be local (LAN) or global (Internet). IP Address: It is numerical data given to every connected device on the network Protocol: It is given as the set of rules which drives the communication between devices on a network. In the context of file sharing, protocols define how files are transferred between computers. File Transfer Protocol (FTP): FTP is a standard network protocol used to transfer files between a client and a server on a computer network. Various Ways to Achieve File Sharing 1. Server Message Block (SMB) SMB is like a network based file sharing protocol mainly used in windows operating systems. It allows our computer to share files/printer on a network. SMB is now the standard way for seamless file transfer method and printer sharing. Example: Imagine in a company where the employees have to share the files on a particular project. Here SMB is employed to share files among all the windows based operating system. SMB/CIFS is employed to share files between Windows-based computers. Users can access shared folders on a server, create, modify, and delete files. SMB File Sharing Network File System (NFS) NFS is a distributed based file sharing protocol mainly used in Linux/Unix based operating System. It allows a computer to share files over a network as if they were based on local. It provides a efficient way of transfer of files between servers and clients. Example: Many Programmer/Universities/Research Institution uses Unix/Linux based Operating System. The Institutes puts up a global server datasets using NFS. The Researchers and students can access these shared directories and everyone can collaborate on it. Contd… File Transfer Protocol (FTP) It is the most common standard protocol for transferring of the files between a client and a server on a computer network. FTPs supports both uploading and downloading of the files, here we can download, upload and transfer of files from Computer A to Computer B over the internet or between computer systems. Example: Suppose the developer makes changes on the server. Using the FTP protocol, the developer connects to the server they can update the server with new website content and updates the existing file over there. Cloud-Based File Sharing It involves the famous ways of using online services like Google Drive, DropBox , One Drive ,etc. Any user can store files over these cloud services and they can share that with others, and providing access from many users. It includes collaboration in real-time file sharing and version control access. Ex: Several students working on a project and they can use Google Drive to store and share for that purpose. They can access the files from any computer or mobile devices and they can make changes in real-time and track the changes over there. What is File Management in OS? File management in an operating system refers to the set of processes and techniques involved in creating, organizing, accessing, manipulating, and controlling files stored on storage devices such as hard drives, solid-state drives, or network storage. It encompasses a range of tasks and functionalities that ensure efficient handling of files, including their creation, deletion, naming, classification, and protection. File management serves as the intermediary layer between applications and the underlying storage hardware, providing a logical and organized structure for storing and retrieving data. It involves managing file metadata, which includes attributes such as file name, file size, creation date, access permissions, ownership and file type. Types of File Management in Operating System There are several types of file management in operating systems, including: Sequential File Management: In a sequential file management system, files are stored on storage devices in a sequential manner. Each file occupies a contiguous block of storage space, and accessing data within the file requires reading from the beginning until the desired location is reached. This type of file management is simple but can be inefficient for random access operations. Contd… Direct File Management: Direct file management, also known as random access file management, enables direct access to any part of a file without the need to traverse the entire file sequentially. It utilizes a file allocation table (FAT) or a similar data structure to keep track of file locations. This approach allows for faster and more efficient file access, particularly for larger files. Contd… Indexed File Management: Indexed file management utilizes an index structure to improve file access efficiency. In this system, an index file is created alongside the main data file, containing pointers to various locations within the file. These pointers allow for quick navigation and direct access to specific data within the file. Contd… File Allocation Table (FAT) File System: The FAT file system is commonly used in various operating systems, including older versions of Windows. It employs a table, known as the file allocation table, to track the allocation status of each cluster on a storage device. The FAT file system supports sequential and random access to files and provides a simple and reliable file management structure. Contd… New Technology File System (NTFS): NTFS is a more advanced file system commonly used in modern Windows operating systems. It offers features such as enhanced security, file compression, file encryption, and support for larger file sizes and volumes. NTFS utilizes a complex structure to manage files efficiently and provides advanced file management capabilities. Contd… Distributed File Systems: Distributed file systems allow files to be stored across multiple networked devices or servers, providing transparent access to files from different locations. Examples include the Network File System (NFS) and the Server Message Block (SMB) protocol used in network file sharing. These are some of the commonly used types of file management systems in operating systems, each with its own strengths and characteristics, catering to different requirements and use cases. File system reliability File system reliability refers to the ability of a file system to preserve the integrity and availability of data, even in the presence of errors or failures. To achieve this level of reliability, several mechanisms can be used, such as checksums that compute a value based on the data of a file or a block and compare it with a stored value to detect any corruption or tampering. Additionally, journaling can log changes made to a file or a block before applying them, allowing the file system to recover from an incomplete or interrupted operation. Replication is also an option, as it copies data of a file or a block to another location, providing redundancy and fault tolerance in case of a disk failure or network outage. Lastly, backup creates a copy of the entire file system or a subset of it, allowing the file system to restore the data from a previous state in case of disaster.

Device and File Management (Unit 5) PDF

Document Details

Tags

Related

Summary

Full Transcript