Operating System Past Paper PDF 2019

2019 1) An Operating System is built out of many components and has to provide concurrency to enable multiple applications to work at once. a) Provide a diagram which outlines the main components of an Operating System. Discuss each component in detail. (10 marks) Solution: 1. Process Management: Purpose: Manages process life cycles. Key Functions: Resource allocation, synchronization, deadlock management. 2. File Management: Purpose: Manages file operations. Key Functions: Organizes directories, handles permissions, maintains data integrity. 3. System Calls: Purpose: Interface for process-OS interactions. Key Functions: Process creation, file management, hardware communication. 4. Signals: Purpose: Handles asynchronous events. Key Functions: Notifies processes of system events like hardware failures. 5. Secondary Storage Management: Purpose: Manages external and internal storage devices. Key Functions: Disk scheduling, file system setup. 6. Main Memory Management: Purpose: Manages RAM. Key Functions: Allocates memory, handles paging and swapping. These components work together to ensure the OS functions efficiently and securely. b) One of the most important aspects of an operating system is how it deals with concurrency. In this context: i. from your answer to (a), discuss which component(s) are relevant to concurrency. (6 marks) OS Components and Concurrency: 1. Process Management: Relevance: Central to handling concurrency; manages, schedules, and terminates processes and threads. The scheduler allocates CPU time, facilitating multitasking. 2. Main Memory Management: Relevance: Allocates separate memory to processes, ensuring isolation and preventing interference, crucial for managing multiple concurrent processes. 3. System Calls: Relevance: Interface for processes to request kernel services like thread management and synchronization, essential for concurrency. 4. Signals: Relevance: Manages asynchronous events common in concurrent environments, allowing processes to respond promptly to events. ii. describe two scheduling strategies and discuss how they work and how they differ from each other. (14 marks) Two commonly used scheduling strategies in operating systems are Round Robin and Priority Scheduling. Both approaches have their own mechanisms and policies for deciding the order in which processes are allocated CPU time: 1. Round Robin Scheduling: How It Works: Assigns a fixed time slice to each process in a circular queue. Processes use their quantum and then move to the queue's end. Characteristics: Promotes fairness and responsiveness, but may have high context- switching overhead. 2. Priority Scheduling: How It Works: Processes are assigned priorities. Higher priority processes are executed first, with the scheduler selecting the highest priority process from the queue. Characteristics: Prioritizes urgent tasks, flexible, but may cause starvation of lower- priority tasks. iii. choose one of the scheduling strategies you have discussed in (ii) and discuss how the process manager would work to enable concurrency within an Operating System using this method. Include diagram(s) and example(s) to help you in your discussion. (12 marks) How Round Robin Enables Concurrency: Round Robin scheduling enables concurrency by allowing the operating system to manage multiple processes, giving them each a slice of CPU time in a cyclic manner. Process Manager Role: 1. Time Slice Allocation: The process manager assigns a fixed time slice (quantum) to every process in the ready queue. This time is typically in the range of 10-100 milliseconds. 2. Queue Management: Processes are managed in a circular queue. When a process's quantum expires, the process manager moves it to the end of the queue and the next process in line receives the CPU. 3. Context Switching: When switching from one process to another, the process manager saves the state of the current process (context saving) and loads the state of the next process (context loading). This context includes program counter, registers, and other process-specific data. Concurrency Example: Suppose three processes, P1, P2, and P3, are ready to run. Each process needs different amounts of CPU time, say P1 needs 20 ms, P2 needs 30 ms, and P3 needs 10 ms, and the quantum is set to 10 ms. In the first round, P1 runs for 10 ms, then it is moved to the end of the queue. P2 runs next for 10 ms, followed by P3 for 10 ms. In the second round, P1 runs for another 10 ms (total 20 ms) and completes. P2 runs again for 10 ms (total 20 ms), followed by P3 for 10 ms (total 20 ms, completes). P2 requires another 10 ms to complete, which it gets in the third round. This cyclic allocation allows all processes to advance their execution in a fair manner, ensuring that all get CPU time without long waits. iv. discuss what mechanisms are available to protect a shared resource between multiple processes. Include a diagram and an example to help you in your discussion. (8 marks) To ensure that this access does not lead to conflicts or data corruption, the operating system must employ specific mechanisms to manage and protect these resources. Below are some of the primary mechanisms used: 1. Mutexes (Mutual Exclusion Objects) Description: Consider two processes, A and B, both need to write data to the same log file. Process A acquires the mutex lock before starting the write operation, ensuring that Process B can only start writing to the log file after Process A has released the mutex. 2. Semaphores Description: If a database connection pool allows up to three connections at a time, a semaphore initialized to three can be used. Each process decrements the semaphore when taking a connection and increments it when releasing the connection. 3. Read/Write Locks Description: In a client-server application where a data structure is read frequently but updated infrequently, read/write locks ensure that multiple clients can read the data concurrently, but updates are safely executed without concurrent access. 2) You are a developer who is writing software for a large high-performance cluster. You are required to use Java to develop your system. a) Discuss the purpose of the JVM. Provide diagrams and examples to support your answer. (12 marks) Key Purposes of the JVM 1. Platform Independence: Description: JVM acts as an abstraction layer between the Java program and the underlying hardware and operating system. Java programs are compiled into platform-independent bytecode, which the JVM interprets and executes on the host machine. Example: A Java application developed on a Windows machine can be run on Linux or macOS without any modifications, provided that a JVM is installed. 2. Performance Optimization: Description: Modern JVMs incorporate advanced performance features like Just-In- Time (JIT) compilation, which compiles bytecode into native machine code at runtime. This improves performance by optimizing code execution based on the runtime context. Example: HotSpot, a widely used JVM, uses JIT compilation to speed up frequently executed sections of code, making Java applications faster over time. 3. Memory Management: Description: The JVM manages memory through an area called the heap for dynamic memory allocation to Java objects. It also handles garbage collection, automatically freeing memory by removing objects that are no longer in use. Example: In a Java application, developers don't need to manually allocate and free memory, reducing the risk of memory leaks and other related errors. 4. Security: Description: The JVM provides a secure execution environment by sandboxing Java applications. It enforces strict runtime constraints through its class loader and bytecode verifier, which ensure that code loaded over the network is not malicious and adheres to Java's safety standards. Example: Applets running in a web browser are confined by the JVM's security manager, preventing them from accessing local files and system resources. 5. Exception Handling: Description: The JVM helps manage runtime errors in Java applications by providing a robust exception handling mechanism. This allows developers to handle errors gracefully without crashing the system. Example: If a Java program tries to access a null object, the JVM throws a NullPointerException, which can be caught and handled by the program. 6. Class Loader: Purpose: Loads class files from the file system, network, or other sources into the JVM. It reads the.class files (bytecode) and loads them into the runtime data area. Subcomponents: Bootstrap loader, Extension loader, Application loader. 7. Runtime Data Areas: Purpose: Provides the memory necessary for storing data and objects used by the JVM during execution. Areas: 1. Method Area: Stores class structure (like the constant pool, method data, and field data). 2. Heap: All the objects and their associated instance variables and arrays are allocated from here. 3. Java Stacks: Each thread has a private JVM stack, created at the same time as the thread. A stack stores frames. 4. PC Registers: Each thread has a PC (Program Counter) register which stores the address of the JVM instruction currently being executed. 5. Native Method Stacks: It stores the state of native calls (calls to non-Java code). 8. Execution Engine: Purpose: Executes the bytecode loaded into the memory by the class loader. Components: 1. Interpreter: Reads bytecode stream then executes the instructions step-by- step. 2. Just-In-Time (JIT) Compiler: Improves the efficiency of Java programs by compiling bytecode into native machine code at runtime. 3. Garbage Collector: Reclaims memory used by objects that are no longer in use by the program. 9. Java Native Interface (JNI): Purpose: Interfaces Java code running in the JVM with libraries and applications that are written in other languages like C or C++. Functionality: Allows execution of methods written in non-Java languages which can be included in Java code. 10. Native Method Libraries: Purpose: Provide an interface to native system calls and libraries through JNI. Usage: Include platform-specific features and enhancements that cannot be accessed from Java directly. b) Discuss the main components you would use when designing a load-balancer to distribute work units to worker nodes. Provide diagrams and examples to support your answer. (12 marks) When designing a load balancer for distributing work units across multiple worker nodes in a high- performance cluster, several key components are crucial for its efficient and effective operation. Here’s a breakdown of these main components, their functionalities, and an example scenario: 1. Dispatcher Functionality: The dispatcher is the front-facing component of the load balancer that receives incoming requests or tasks. It is responsible for initially processing requests and passing them on to the appropriate internal processes. Example: In a web application environment, the dispatcher would handle incoming HTTP requests and decide which worker node should handle the request based on the current load and the scheduling algorithm. 2. Scheduling Algorithm Functionality: This component determines how tasks are distributed among the available worker nodes. Several strategies can be employed, such as Round Robin, Least Connections, or Hashing. Round Robin distributes tasks sequentially among the worker nodes. Least Connections chooses the worker node with the fewest active connections. Hashing (typically consistent hashing) distributes requests based on a hash key derived from the request (e.g., user ID or session ID) to maintain session affinity. Example: For a real-time data processing system, a hashing-based approach might be used to ensure that all data from a specific source is processed by the same node to maintain order and context. 3. Health Check Manager Functionality: It monitors the health of the worker nodes to ensure they are operational and can handle requests. If a node fails, the health check manager can trigger a reassignment of tasks and possibly initiate a failover procedure. Example: Periodic pings or heartbeat messages are sent to each worker node. If a node does not respond within a predefined timeout, it is considered down, and its tasks are redistributed among the remaining nodes. 4. Load Analyzer Functionality: This component continuously analyzes the workload and performance of each worker node. It provides data that can help the scheduling algorithm make more informed decisions. Example: The load analyzer tracks metrics such as CPU usage, memory usage, response time, and network latency. This data helps adjust the load distribution to optimize system performance. 5. Session State Manager Functionality: In scenarios where session persistence is important, this component ensures that a user's session data is consistently handled by the same node, or can be quickly accessed by any node that takes over the session. Example: In an e-commerce site, ensuring that a user’s shopping cart persists throughout their session, even if requests are handled by different nodes. c) When communicating with each node, which of the following approaches is best for transmitting information: (1) an event-based system; (2) UDP; (3) broadcasting. Outline the advantages and disadvantages for each when discussing your answer and state which method you would use for implementing your answer to (b). (16 marks) When designing a load balancer and deciding on the communication method to use for transmitting information between the load balancer and worker nodes, it's crucial to consider the specific requirements of the system regarding reliability, speed, and complexity. Below, I discuss the three mentioned communication approaches event-based systems, UDP, and broadcasting highlighting their advantages and disadvantages to determine the best fit for the load balancer architecture. 1. Event-Based System Advantages: Scalability: Efficient in handling a large number of connections with minimal resource usage as it reacts only when events occur. Responsiveness: Highly responsive to state changes, making it suitable for real-time applications. Decoupling: Promotes decoupling of components, which can enhance modularity and ease of maintenance. Disadvantages: Complexity: Can become complex to manage and debug because of its asynchronous nature. Overhead: Might involve more overhead to set up and manage the event handling and callback mechanisms. 2. UDP (User Datagram Protocol) Advantages: Speed: Offers fast data transmission by eliminating the overhead of connection setup and maintaining connection state as seen in TCP. Simplicity: Simpler to implement for scenarios where reliability can be compromised for speed. Disadvantages: Reliability: Does not guarantee delivery, order, or integrity of the data packets. Handling Loss: Requires additional mechanisms to handle data loss and reordering, if needed. 3. Broadcasting Advantages: Efficiency: Efficient in scenarios where the same data needs to be sent to multiple recipients simultaneously. Simplicity: Simple to implement as it does not require maintaining individual sessions or connections. Disadvantages: Network Load: Can lead to increased network load and congestion, especially in large networks. Scalability: Not scalable for large systems where not all nodes need the same data, leading to unnecessary data transmission and processing. Best Approach for the Load Balancer Given the need to efficiently distribute tasks among various worker nodes with potentially different capabilities and current loads, the best communication approach for the load balancer would be an event-based system. This choice is motivated by the following reasons: Dynamic Response: The load balancer can react dynamically to changes in the system state (e.g., node availability, load changes) via events, which is crucial for maintaining balance and performance. Scalability: As the number of nodes increases, an event-based system can efficiently manage communication without unnecessary overhead or complexity. Modular Design: This approach supports a more modular design, allowing for easier expansion and maintenance of the load balancing system. Implementation Consideration: For the event-based system, frameworks such as Node.js for JavaScript or asyncio in Python can be utilized to handle asynchronous event-driven communication effectively. This would allow the load balancer to quickly and efficiently respond to events such as node health checks, task completion, and system load changes. d) Outline the pseudo code you would use for your answer in (c). (10 marks) the pseudo code below will outline how the load balancer can manage incoming tasks, distribute them to worker nodes, and handle node status updates through events PSEUDO CODE: Initialize: Create list of worker nodes with properties: IP address, status, current load Set up event listeners for "TaskReceived", "NodeStatusUpdate" Event Listener on "TaskReceived": Input: Task details Procedure: Find the node with the least current load and status 'active' Assign task to this node Increment the load counter for the node Emit "TaskAssigned" event with node details and task details Log the task assignment Event Listener on "TaskAssigned": Input: Node details, task details Procedure: Send task to the specified node via HTTP POST request or appropriate protocol Await confirmation from the node If confirmation received: Log success Else: Handle error (e.g., reassign the task, log failure, alert admin) Event Listener on "NodeStatusUpdate": Input: Node IP, new status, (optional) current load Procedure: Update the corresponding node's status and current load in the node list If node status is 'inactive': Reassign any tasks from this node to other nodes Log the status update Function to find least loaded active node: Return the node with the minimum load and status 'active' Main Loop: While True: Await new events (TaskReceived, NodeStatusUpdate) Handle each event according to its type 2020 3) One of the most important components of an operating system is memory management. a) Describe the differences between physical memory and virtual memory. (8 marks) b) Explain the memory hierarchy within an operating system. Provide diagram(s) and example(s) to aid in your answer. (8 marks) The memory hierarchy within an operating system is designed to balance cost, speed, and capacity to optimize the overall performance of the system. This hierarchy is composed of several layers of memory storage, each varying in speed, size, and function. Below, I will describe these layers, their importance, and how they interact. Memory Hierarchy Layers 1. Registers Speed: Fastest Capacity: Smallest, a few bytes to a few kilobytes Function: Store the data that is currently being processed by the CPU. Example: CPU registers are used to store operands and results of CPU operations directly. 2. Cache Memory Speed: Very fast Capacity: Small, often a few megabytes Function: Acts as a buffer for the data used most frequently by the CPU, reducing the time needed to access data from main memory. Example: L1, L2, and L3 caches store copies of data from frequently accessed main memory locations. 3. Main Memory (RAM) Speed: Fast Capacity: Moderate, typically a few gigabytes Function: Holds the programs currently running and their data. Example: Running applications like a web browser or word processor use RAM to store their operational data and code. 4. Magnetic disks (Hard Drives, SSDs) Speed: Slower than RAM but faster than tertiary storage Capacity: Large, typically several terabytes Function: Used for long-term data storage and to hold data and programs not currently in use. Example: Files, databases, and installed applications are stored on secondary storage devices. 5. Tertiary Storage ( Magnetic Tapes) Speed: Slowest Capacity: Very large, scalable to multiple petabytes Function: Used for backup and archival purposes. Example: Enterprises use tape drives for the archival of historical transaction data. c) Virtual memory is controlled by a paging scheme. Discuss and detail the purpose of each of the following and show how they relate to providing virtual memory: i. page tables (16 marks) ii. page replacement algorithms (12 marks) iii. the best-fit algorithm (6 marks) Virtual memory is a fundamental concept in modern operating systems, allowing the execution of processes that require more memory than is physically available. This is achieved by swapping data to and from the physical memory to a space on the disk, typically referred to as swap space or paging file. i. Page Tables Purpose: Mapping Virtual to Physical Memory: Page tables store the mapping between virtual addresses and physical memory addresses. Each entry in the page table corresponds to a page and contains the physical address of the frame where the page resides in physical memory. Access Control: Page tables include flags that provide essential data about access permissions (read, write, execute) and status (modified, referenced), which help in maintaining security and optimizing memory usage. Relation to Virtual Memory: Translation Lookaside Buffer (TLB): A cache used to reduce the time taken to access a memory location. It stores recent translations of virtual memory to physical memory addresses. If the virtual address is found in the TLB, accessing memory becomes faster. Swapping: When a process accesses a page that is not loaded in memory, a page fault occurs, and the operating system must load the page from disk. The page table determines where the page should be loaded into physical memory. Example: Imagine a system with a virtual address space larger than its physical memory. When a program accesses a virtual address, the page table is used to translate this address into a corresponding physical address. If the required page is not in memory (a miss in the TLB and no valid frame in the page table), the operating system needs to load the page into physical memory, potentially evicting another page. ii. Page Replacement Algorithms Purpose: Memory Management: Determines which pages to remove from physical memory when new pages need to be loaded but memory is full. Optimize Utilization: Aims to minimize page faults and optimize the use of the physical memory by intelligently selecting which pages to swap out. Relation to Virtual Memory: LRU (Least Recently Used): Pages that have not been accessed for the longest time are replaced first. FIFO (First In, First Out): The oldest loaded page is selected for replacement. Optimal: Replaces the page that will not be used for the longest period in the future. This is theoretical as it requires future knowledge. Example: In a busy system, when a user opens a new application while many others are running, the operating system uses a page replacement algorithm to decide which pages to swap out to disk to make room for the new application's pages. iii. Best-Fit Algorithm Purpose: Memory Allocation: Used to allocate a block of memory from a list of free blocks. Minimize Wastage: Chooses the smallest block of memory that is large enough to satisfy a request. Relation to Virtual Memory: While the best-fit algorithm is generally used in memory allocation within the heap (for dynamic memory allocation), not directly in paging or virtual memory management, it serves a similar purpose in efficiently utilizing resources. In the context of virtual memory: Disk Space Management: Could hypothetically be used for allocating space in the swap file on disk, selecting the smallest suitable portion of the disk that fits the memory page needing to be swapped out, although this is less common. Example: Suppose a program requests memory allocation for a structure requiring 100 bytes. If the smallest available free block in the heap is 120 bytes, the best-fit algorithm would allocate this block to the request, aiming to minimize the leftover space. 4) You are a developer who is writing software for a large high-performance cluster (HPC). You have been requested to use Java to develop your system. a) Discuss, with the use of diagrams, the structure of the Java JVM. (10 marks) Repeated b) Discuss, with the use of diagrams and examples, how the client / server model works. (10 marks) The client/server model is a distributed application structure that partitions tasks or workloads between providers of a resource or service, called servers, and service requesters, called clients. Often used on a network, this model is one of the central ideas of modern computing, and many of the services offered on the internet are based on the client/server model. Key Components of the Client/Server Model 1. Client: Role: Initiates requests to servers for data or services. Characteristics: Clients are often applications that users interact with, such as web browsers, email clients, or mobile apps. Functionality: Clients provide the interface for users to interact with the server resources, handling user input and presenting the data received from the server. 2. Server: Role: Provides resources or services to clients. Characteristics: Servers are powerful computers or processes dedicated to managing disk drives (file servers), printers (print servers), or network traffic (network servers). Functionality: Servers wait for incoming requests from clients and serve them through responses. They manage shared resources and perform tasks requested by clients. 3. Network: Role: Connects clients and servers, enabling communication. Characteristics: Can be local or wide, including the Internet. Functionality: Transmits data between clients and servers. How the Client/Server Model Works When a client wants to retrieve data or initiate a transaction, it sends a request to the server over a network. The server then processes the request and sends back the appropriate response. This interaction is governed by protocols, sets of rules that define how the data is formatted and transmitted. c) Outline a way in which Java classes can be used to allow communication between clients and the server. Provide pseudo code outlining what classes need to be used in your answer. (10 marks) To Create communication between a client and a server in Java, you can use several core Java classes primarily from the java.net package. This package provides the functionality to implement network connections using sockets. Here, I'll outline a simple client-server communication model using TCP sockets, which are reliable and connection-oriented. Java Classes Involved 1. ServerSocket: This class is used by the server to listen on a specific port for client requests. 2. Socket: This class is used by both the client and server to send and receive data over the network. 3. InputStream and OutputStream: These classes (or their subclasses) are used to read from and write to the socket, respectively. Server Side Pseudo Code 1. Create a ServerSocket listening on a specific port. 2. Loop forever: a. Accept an incoming client connection. This returns a new Socket. b. Create input and output streams from the Socket. c. Read data from the InputStream. d. Process the received data. e. Write the response back to the client using the OutputStream. f. Close the connection. Client Side Pseudo Code 1. Create a Socket to connect to the server's IP address and port. 2. Obtain input and output streams from the Socket. 3. Write data to the OutputStream to send to the server. 4. Read the server's response from the InputStream. 5. Close the connection. d) Discuss whether a client/server approach is an appropriate method for controlling the functionality within a distributed system. What challenges and issues do you need to be aware of and how would this impact on your solution from 4(c)? (10 marks) The client/server model is a common architectural framework for distributed systems, providing a structured approach to resource sharing and service delivery across a network of machines. Advantages of the Client/Server Approach in Distributed Systems 1. Centralized Management: The server centralizes the control and management of resources, simplifying administration, updates, and scalability interventions. 2. Efficiency: Servers can be optimized for specific tasks, improving the efficiency of resource usage and service delivery. 3. Security: Centralized servers allow for better controlled access to resources and data, enabling more robust security measures. 4. Scalability: Adding more clients or upgrading server capabilities can extend the system’s capacity. Impact on the Solution from 4(c) In the context of implementing an event-based system for a load balancer (as discussed in part 4(c)), the client/server model brings both benefits and challenges: Benefit: The server (in this case, the load balancer) can efficiently manage the distribution of tasks to client nodes (servers) in the distributed system. It can dynamically adjust to changing loads and optimize task distribution based on current system states. Challenge: Reliance on a single load balancer can introduce a single point of failure. If the load balancer fails, the entire distributed system could experience disruptions or downtime. Network Sensitivity: The performance of the load balancer is highly sensitive to network conditions. Delays in communication between the load balancer and the worker nodes can lead to inefficiencies in task distribution. Mitigation Strategies Redundancy: Implement redundant load balancers to ensure that the failure of one does not bring down the entire system. Load Balancing the Load Balancer: Use a second layer of load balancing for the load balancers themselves, distributing requests among multiple load balancers to manage failover and maintain system availability. Performance Monitoring and Network Optimization: Regularly monitor system performance and optimize network configurations to handle increased traffic and reduce bottlenecks. e) From your answer in 4(d) outline an algorithm, or steps, which allows communication to occur without introducing any bottlenecks. (10 marks) To ensure efficient communication in a distributed system, particularly one using an event-based load balancer as outlined in 4(d), it's critical to design a communication algorithm that minimizes bottlenecks and maximizes throughput and responsiveness. Algorithm for Efficient Communication 1. Initialization: 2. Client Request Handling: 3. Load Balancer Operation: 4. Data Transmission: 5. Load Balancing Among Load Balancers: 6. Failure Handling: 7. Performance Optimization: 8. Feedback Loop: 2021 5) An Operating System is built out of many components and has to provide concurrency to enable multiple applications to work at once. a) Provide a diagram which outlines the main components of an Operating System. Discuss each component in detail. (10 marks) SAME AS QUESTION 1 b) One of the most important aspects of an operating system is how it deals with concurrency. In this context: i. explain the three-state model and state queues that are used in the process life cycle. (10 marks) The three states are: Ready, Running, and Blocked (also referred to as Waiting). Each of these states is associated with a queue that manages processes in that particular state. 1. Ready State Description: The Ready State contains all the processes that are prepared to run but are currently not running because the CPU is executing other processes. Processes in this state are waiting for CPU time to start or resume execution. Queue: Ready Queue 2. Running State Description: The Running State is where the process is currently being executed by the CPU. There can only be as many processes in the running state as there are CPUs or CPU cores; that is, a single-core CPU can have only one running process at any time, while a multi-core CPU can execute multiple processes simultaneously. Queue: There is no traditional queue for running processes because each CPU core can run only one process at a time. However, multi-core systems manage multiple running processes simultaneously. 3. Blocked (Waiting) State Description: The Blocked State contains processes that cannot continue until some external event occurs, such as the completion of an I/O operation or the release of a resource. Processes in this state are not using the CPU, even if the CPU is idle. Queue: Blocked Queue Process Transitions Ready to Running: A scheduler selects a process from the Ready Queue and assigns it to the CPU, changing its state from Ready to Running. Running to Ready: If a running process requires more CPU time than allowed in one quantum (in preemptive multitasking) or releases the CPU voluntarily (in cooperative multitasking), it returns to the Ready Queue. Running to Blocked: If a running process initiates an I/O request or needs to wait for another resource, it moves to the Blocked Queue. Blocked to Ready: Once the event that caused a process to be blocked (like I/O completion) occurs, it moves back to the Ready Queue, ready to be resumed. ii. describe two scheduling strategies and discuss how they work and how they differ from each other. (10 marks) SAME AS QUESTION 1 iii. choose one of the scheduling strategies you have discussed in (ii) and discuss how the process manager would work to enable concurrency within an Operating System using this method. Include diagram(s) and example(s) to help you in your discussion. (10 marks) SAME AS QUESTION 1 iv. consider three processes P1, P2, and P3 each with a duration of 25, 20, 10 milliseconds respectively. Draw a diagram which shows the CPU usage using the Round Robin scheduling algorithm with a time quantum equal to 5 milliseconds. Assume that all processes start consecutively with a 10 millisecond difference (i.e. P1 at 0ms, P2 at 10ms and P3 at 20ms). Understanding Round-Robin Scheduling Preemptive: Round Robin can interrupt a running process mid-execution if its time quantum expires. Time Quantum: A fixed time slice allocated to each process (in our case, 5 milliseconds). Circular Execution: Processes are executed in order, and if a process isn't finished within its time quantum, it goes to the back of the queue. Process Arrival: P1: Arrives at 0ms (Burst Time: 25ms) P2: Arrives at 10ms (Burst Time: 20ms) P3: Arrives at 20ms (Burst Time: 10ms) Scheduling Diagram 6) You are a developer who is writing software for a large high-performance cluster. You are required to use Java to develop your system. a) Discuss, with the use of diagram(s), the structure of the Java Virtual Machine (JVM). (10 marks) SAME AS QUESTION 4 b) Java provides packages that allow you to write network enabled applications. Assuming that UDP packets are used, outline the Java like pseudo code to get a coordinator to send, receive and control a set of nodes. Discuss the key points of your pseudo code and explain what it is doing and how it works. (30 marks) In Java, the java.net package includes classes that facilitate network communication, including the use of UDP (User Datagram Protocol). UDP is a connectionless protocol that allows sending and receiving datagrams without establishing a reliable connection, making it suitable for situations where speed is more critical than reliability, such as controlling a network of nodes where real-time response is essential. Below is a Java-like pseudo code for a coordinator program that uses UDP to send, receive, and control a set of nodes. The coordinator will send commands to the nodes and listen for their responses. CODE ON NEXT PAGE: Key Points of the Pseudo Code Algorithm: Coordinator for Sending, Receiving, and Controlling Nodes using UDP Initialize the coordinator: Create a socket for sending and receiving UDP datagrams. Define the multicast group address and port number. Send message to nodes: Prepare the message to be sent. Send the message to the multicast group address and port using the UDP socket. Receive messages from nodes: Wait to receive UDP datagrams from any node in the multicast group. Extract the received message from the datagram. Control the nodes based on received messages: Process the received message to determine the action to be taken. Control the behavior of the nodes accordingly, such as instructing them to perform specific tasks or sending commands. Repeat: Continue listening for incoming messages from the nodes. Process and control the nodes based on the received messages. Repeat this loop to maintain communication and control over the nodes. Discussion: The coordinator initializes a UDP socket to send and receive datagrams, allowing it to communicate with the nodes over the network. By specifying a multicast group address and port number, the coordinator can send messages to multiple nodes simultaneously. When a message is sent, it is broadcasted to all nodes in the multicast group, enabling efficient communication with the entire set of nodes. The coordinator continuously listens for incoming messages from the nodes, allowing for real- time communication and control over the distributed system. Upon receiving messages from the nodes, the coordinator processes them to determine the appropriate action to take, enabling dynamic control and coordination of the nodes' behavior. This approach facilitates the development of network-enabled applications where a central coordinator interacts with and manages a group of nodes, such as in distributed systems, IoT networks, or multiplayer games. c) Discuss, in general, what alternatives are there to your method detailed in (b) and highlight how it differs and any challenges that need to be considered. (10 marks) In the scenario described in (b) where UDP is used for coordinating a set of nodes in a distributed system, several alternative methods can be employed, each with its own set of characteristics, advantages, and challenges. Here, we'll explore some of these alternatives 1. TCP (Transmission Control Protocol) How It Differs: Connection-Oriented: TCP establishes a reliable connection between the sender and receiver before data transmission begins. This is in contrast to UDP's connectionless nature. Data Integrity and Order: TCP ensures that all packets are delivered accurately and in order. Lost packets are retransmitted. Challenges: Overhead: The need to establish a connection and the mechanisms for ensuring data integrity and order introduce significant overhead, which can lead to reduced performance compared to UDP, especially in high-latency environments. Scalability: Maintaining a large number of concurrent TCP connections can be resource- intensive, which might not be ideal for systems with many nodes. 2. HTTP/HTTPS How It Differs: Protocol Standards: Built on top of TCP, HTTP is a stateful protocol commonly used in client- server communications over the web. HTTPS adds a layer of security with SSL/TLS. Rich Feature Set: Supports a variety of methods (GET, POST, PUT, DELETE) and includes extensive header information for content negotiation, authentication, and more. Challenges: Latency and Overhead: The additional overhead of HTTP headers and the handshake processes of TCP and SSL/TLS can introduce latency and consume more bandwidth. Complexity: Managing HTTP sessions and securing communications with HTTPS can increase the complexity of system implementation. 2022 7) Concurrency within the operating system is handled by the kernel. However, there are a number of issues that need to be considered when dealing with concurrent processes. a) Discuss what is meant by a process and three bits of information stored in the process control block. (5 marks) a process is essentially a program in execution. It represents a single instance of a running program and includes the program code, its current activity, and a set of allocated resources. Process Control Block (PCB) The Process Control Block (PCB) is a crucial data structure used by the operating system to manage and keep track of all relevant information about a process. The PCB is sometimes called a task controlling block or process descriptor. It serves as the "storage center" for any information that may vary from process to process, which is essential for effective process management, especially in multitasking environments. Here are three important pieces of information stored in a PCB: 1. Process State: Description: The current state of the process. States can include "new" (being created), "ready" (waiting to be assigned to a processor), "running" (executing), "waiting" (waiting for some event to occur), and "terminated" (completed). Purpose: Helps the operating system manage transitions between these states and schedule processes appropriately. 2. Program Counter: Description: The address of the next instruction that the process will execute. This is crucial because it allows the CPU to keep track of where a process is in its execution. Purpose: When a process is resumed after being paused, the CPU knows exactly where to start/restart execution by referring to the program counter in the PCB. 3. CPU Registers: Description: The values of all process-critical CPU registers associated with the process at the time it was stopped or paused. This includes registers like the accumulator, index registers, stack pointers, and general-purpose registers. Purpose: Storing the registers in the PCB ensures that when a process is swapped out of the CPU, its execution state can be saved fully and restored when the process is swapped back in. b) Discuss how the three-state model works and how it used in the process life cycle. (4 marks) SAME AS QUESTION 5 c) Discuss the difference between non-pre-emptive and pre-emptive scheduling. Discuss a non-pre- emptive scheduling algorithm and a pre-emptive scheduling algorithm. Compare and discuss the differences between both algorithms. (12 marks) 8) You are a developer who is writing software for a large high-performance cluster (HPC). You have been requested to use Java to develop your system SAME AS QUESTION 4 2023 9) Concurrency within the operating system is handled by the kernel. However, there are a number of issues that need to be considered when dealing with concurrent processes. a) Discuss what scheduling criteria needs to be considered to enable process scheduling. (5 marks) When designing a process scheduler, several criteria need to be considered to ensure that the scheduling strategy aligns with the system’s goals and the needs of its users. 1. CPU Utilization Goal: To keep the CPU as busy as possible. High CPU utilization indicates that the system is doing more work and less time is wasted on idle processes. Importance: Maximizing CPU utilization is crucial in environments where resources are limited or costly. 2. Throughput Goal: To maximize the number of processes that complete their execution per time unit. Importance: High throughput is essential in batch processing environments where the focus is on completing a volume of similar jobs in the shortest amount of time. 3. Turnaround Time Goal: To minimize the average time it takes for a process to be completed after it has been submitted. Importance: Turnaround time is a critical measure of the effectiveness of a scheduling algorithm, particularly in batch systems. It includes the total time spent waiting to get into memory, waiting in the ready queue, executing on the CPU, and doing I/O. 4. Waiting Time Goal: To minimize the time that processes spend in the ready queue waiting for CPU time. Importance: Shorter waiting times generally lead to better responsiveness and user satisfaction in interactive systems. 5. Response Time Goal: To reduce the time it takes from when a request is submitted until the first response is produced, not output (for time-sharing systems). Importance: Minimizing response time is vital in interactive environments to ensure that the system feels responsive to users. 6. Fairness Goal: To ensure that all processes are given equitable CPU time. Importance: Fairness is important to prevent starvation, where a process never gets sufficient resources to proceed while others keep executing. 7. Predictability Goal: To minimize the variance in response time, making the system behavior consistent and predictable. Importance: Predictability is especially significant in real-time operating systems where processes often have strict execution deadlines. b) There are multiple scheduling algorithms to enable concurrency. Discuss, with the aid of diagrams, how the round-robin scheduling algorithm works within an operating system. (15 marks) Round Robin CPU Scheduling is the most important CPU Scheduling Algorithm which is ever used in the history of CPU Scheduling Algorithms. Round Robin CPU Scheduling uses Time Quantum (TQ). Advantages The Advantages of Round Robin CPU Scheduling are: 1. A fair amount of CPU is allocated to each job. 2. Because it doesn't depend on the burst time, it can truly be implemented in the system. 3. It is not affected by the convoy effect or the starvation problem as occurred in First Come First Serve CPU Scheduling Algorithm. Disadvantages The Disadvantages of Round Robin CPU Scheduling are: 1. Low Operating System slicing times will result in decreased CPU output. 2. Round Robin CPU Scheduling approach takes longer to swap contexts. 3. Time quantum has a significant impact on its performance. 4. The procedures cannot have priorities established. c) In table 1.1, six processes arrive at different times with different execution durations. Draw the temporal diagram using the round-robin scheduling algorithm for the information given in table 1.1. Process Arrival Time Service Time A 0 5 B 2 4 C 3 2 D 5 4 E 6 3 F 8 2 Table 1.1 (15 marks) we need to assume a time quantum. However, since the quantum isn’t specified I’ll use a commonly selected time quantum of 1 time unit (1 ms for simplicity) Temporal Diagram Using Round Robin Scheduling Here is the breakdown: 0-1 ms: Process A (only A has arrived) 1-2 ms: Process A 2-3 ms: Process A (Process B arrives at 2 ms) 3-4 ms: Process A (Process C arrives at 3 ms) 4-5 ms: Process A completes its 5 ms, next Process B 5-6 ms: Process B (Process D arrives at 5 ms) 6-7 ms: Process C (Process E arrives at 6 ms) 7-8 ms: Process B 8-9 ms: Process C completes its 2 ms, next Process D (Process F arrives at 8 ms) 9-10 ms: Process B 10-11 ms: Process D 11-12 ms: Process E 12-13 ms: Process B completes its 4 ms, next Process D 13-14 ms: Process F 14-15 ms: Process D 15-16 ms: Process E 16-17 ms: Process F completes its 2 ms, next Process D 17-18 ms: Process E 18-19 ms: Process D completes its 4 ms, next Process E completes at 19 ms |A|A|A|A|A|B|C|B|C|B|D|E|B|D|F|D|E|F|E|D|E| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 d) Round-robin can also be used in distributed systems as a way of distributing workload to worker nodes. i. Discuss what how this would work in a distributed system and compare how this differs with 1(b). Round-Robin Scheduling in Distributed Systems In distributed systems, Round-Robin scheduling can be adapted to distribute workloads evenly across multiple worker nodes. This approach is often used to manage load balancing among servers, ensuring that no single node becomes a bottleneck, which helps in achieving better performance and reliability in a distributed environment. How It Works in Distributed Systems: 1. Load Distribution: Equal Allocation: Each new request or task is assigned to a node in a sequential, cyclic order. After the last node in the sequence has been assigned a task, the next task is assigned to the first node, and the cycle continues. Statelessness: The algorithm does not keep track of the current load on each node but merely passes each new request to the next node in line. 2. Fault Tolerance and Scalability: Adding/Removing Nodes: Nodes can be dynamically added or removed from the cycle. When a new node is added, it is simply inserted into the rotation. If a node fails or is removed, it is taken out of the rotation. 3. Implementation: Dispatcher: A central dispatcher or load balancer is typically responsible for implementing the Round-Robin algorithm, directing incoming requests to the next node in the sequence. Comparison with Process Scheduling (1b): In the context of CPU scheduling within an operating system (as discussed in part 1b), Round-Robin is used to allocate CPU time to processes. Here are the key differences and similarities when applied to distributed systems for workload distribution: Differences: Scope of Application: Process Scheduling: In an OS, Round-Robin allocates time slices of the CPU to processes to ensure each process gets equal CPU time. Distributed Systems: Round-Robin distributes network requests or tasks among multiple servers or nodes, not time slices on a single CPU. Complexity and Overhead: Process Scheduling: High context-switching overhead due to frequent switches between processes. Distributed Systems: Potentially lower overhead in managing requests across nodes compared to process context switching, but network latency and node performance can vary. State Awareness: Process Scheduling: No inherent requirement for maintaining state information about process conditions or previous CPU bursts. Distributed Systems: Often, the state of nodes (e.g., load, capacity) isn't considered in simple Round-Robin, which can lead to imbalances if nodes vary significantly in performance or current load. Similarities: Fairness: Both use cases aim to distribute resources or requests fairly to prevent starvation. Cyclic Order: Both follow a cyclic order in handling requests or allocating resources. 10) You are a developer who is writing software for a large high-performance cluster. You are required to use Java to develop your system. a) Discuss, including diagrams and examples, how the JVM works and its main components. (10 marks) SAME AS QUESTION 4 b) Discuss why Java is a good choice for implementing both the middleware and applications that will run on a cluster. (5 marks) Java is a popular choice for developing both middleware and applications, especially in clustered and distributed environments, due to several features and capabilities of the language and its runtime environment. 1. Platform Independence Write Once, Run Anywhere: Java applications are compiled into platform-independent bytecode, which can be executed on any system that has a Java Virtual Machine (JVM). This feature is particularly beneficial for clusters, which may comprise heterogeneous systems with different operating systems and hardware configurations. Ease of Deployment: The uniform execution environment simplifies the deployment and maintenance of applications across the various nodes of a cluster. 2. Robust Standard Libraries Networking and Concurrency: Java provides extensive support for network programming and concurrent processing through its standard libraries. Classes in java.net package simplify the implementation of network communication required for middleware, while utilities in java.util.concurrent provide robust structures for managing multi-threaded operations, crucial for maximizing the utilization of cluster resources. Rich Set of APIs: Java offers comprehensive APIs for database connectivity (JDBC), remote method invocation (RMI), messaging (JMS), and web services, which are essential for developing scalable middleware that can manage communication and data exchange between different applications and components in a cluster. 3. Scalability and Performance Memory Management: The JVM handles memory allocation and garbage collection, which helps in managing the large and complex data structures often used in clustered applications without the overhead of manual memory management. JVM Tuning: JVMs can be tuned to optimize performance specific to applications, which is vital for achieving high performance on cluster nodes. 4. Enterprise Support Enterprise Edition: Java Enterprise Edition (Java EE) provides built-in support for building scalable, distributed, multi-tier applications. Features like servlets, EJBs, and web services are integral for middleware applications that require reliability, security, and transaction management. Community and Frameworks: There is a vast community and many frameworks available for Java, such as Spring and Hibernate, which support the rapid development of robust and scalable cluster applications and middleware. 5. Security Built-in Security Features: Java provides a strong security model that defines access rights at a granular level (e.g., the Java security manager and class loaders). This is crucial for middleware, which often needs to securely manage sensitive operations between different applications and data sources across the cluster. Secure Communication: Support for SSL/TLS and secure authentication mechanisms further makes Java a strong candidate for scenarios where secure data transmission is necessary across the network. c) Java provides packages that allow you to write network enabled applications. Outline pseudo code to allow a client program to send and receive information from a server program. Discuss the key points of your pseudo code and explain how and what it is doing. (20 marks) Same as 4C d) Discuss what MPI is and how MPI could be used to build applications running on the cluster. (10 marks) What is MPI? MPI, or the Message Passing Interface, is a standardized and portable message-passing system designed to function on a wide variety of parallel computing architectures. MPI is a de facto standard for writing parallel applications, particularly in the context of high-performance computing (HPC). The MPI standard defines the syntax and semantics of library routines and allows programs to be written in C, C++, and Fortran. Key Features of MPI: 1. Communication Mechanism: MPI supports point-to-point and collective communication among processes. This allows for efficient data transfer and coordination between different parts of a parallel program. 2. Portability: MPI is designed to be portable across different computing architectures, including distributed and shared memory systems, making it widely usable in various environments. 3. Scalability: Designed for high performance on both massive-scale supercomputers and smaller clusters or multicore architectures. 4. Rich Functionality: Provides numerous functions for sending and receiving messages, collective communication (like broadcast, scatter, gather), synchronization (barrier synchronization), and other utilities that are essential for parallel computation. How MPI Could be Used to Build Applications Running on a Cluster Building applications on a cluster using MPI involves writing parallel code where tasks are divided among multiple processes running potentially on different nodes of a cluster. Here’s how MPI facilitates this: 1. Decomposing the Problem: Domain Decomposition: MPI is particularly effective in applications where the problem can be broken down into discrete parts that can be solved concurrently. For example, in a simulation involving a physical space, the space can be divided into sections, each handled by different processes. 2. Distributing Work Among Processes: Load Balancing: MPI can distribute workloads dynamically among processes during runtime. Using MPI, processes can send and receive pieces of data, balancing the load as required by the application demands. 3. Communication Between Processes: Data Exchange: Processes might need to exchange data with each other as part of the computation. MPI provides efficient point-to-point communication mechanisms, such as MPI_Send() and MPI_Recv(), to facilitate this. Collective Operations: MPI supports operations involving all processes, such as computing a global sum or broadcasting data from one process to all others, which are often used in algorithms that require a reduction step or synchronized data. 4. Optimizing Performance: Overlap Computation and Communication: MPI allows the design of non-blocking communication patterns where a process can continue computation while waiting for data from other processes, thus optimizing the overall runtime. Tuning Communication Protocols: MPI implementations often provide tunable parameters that control how messages are sent and received, which can be optimized based on the network characteristics of the cluster. 5. Scalability: Adding More Nodes: MPI programs can scale efficiently with the addition of more computational resources. As more nodes are added to a cluster, the same MPI program can utilize these additional resources without significant changes to the code. e) Discuss and justify which approach from 4(c) and 4(d) is the better way of developing software on a cluster. When choosing the best approach for cluster software development, it's crucial to tailor the decision to the specific requirements of the application and the operational environment of the cluster: Choosing the Best Approach for Cluster Software Development: 1. Consider the Application Requirements: Computation Intensity: Does the application require intense computation and data processing that benefits from parallel execution? If yes, MPI is likely more suitable because it is specifically designed for high-performance computing (HPC) where tasks can be efficiently divided across many nodes. General-Purpose Functionality: If the application involves more general-purpose tasks that include handling web requests, database management, or business logic, the Java client/server model is more appropriate. 2. Evaluate the Need for Scalability and Performance: MPI: Best for applications requiring scalable and efficient parallel processing. MPI excels in environments where performance and computation speed are critical, such as in scientific research, simulations, and complex calculations distributed across many cluster nodes. Java: While Java applications can be scaled, they may face performance limitations due to the overhead associated with the JVM and garbage collection, which might not be optimal for extremely high-performance requirements. 3. Determine the Suitability Based on System Interaction: Inter-Process Communication: If the application heavily relies on intricate inter- process communication, MPI provides more granular control over message passing and is optimized for such operations across distributed systems. Client/Server Interactions: For applications that fit the traditional client/server model where clients may be lightweight (like browsers or mobile apps) communicating with a server over a network, Java's network capabilities are more than sufficient. Justification for the Preferred Approach: Given these considerations, the decision hinges on the specific application's nature and the cluster's role: For High-Performance Computational Tasks: MPI should be chosen for its superior performance in handling large-scale parallel computations. For General Software Applications on a Cluster: Java should be chosen for its ease of use, broad applicability, and extensive support for diverse computing needs.

Operating System Past Paper PDF 2019

Document Details

Tags

Related

Summary

Full Transcript