Lecture Notes - Infrastructure and Virtualization PDF
Document Details
Uploaded by Deleted User
Tags
Summary
These lecture notes cover infrastructure and virtualization concepts, including introductions to virtual machines (VMs), hypervisors, and containers. Diagrams and figures are included to illustrate the concepts. The notes also include a high-level overview of container technologies.
Full Transcript
Lecture 1 Infrastructure and Virtualization Infrastructure Overview Infrastructure comprises physical hardware like MacBooks, PC builds, and servers. Examples include Apple laptops, custom gaming rigs, and Dell servers. Figure Explanation: MacBook Pro, PC Build, and Dell...
Lecture 1 Infrastructure and Virtualization Infrastructure Overview Infrastructure comprises physical hardware like MacBooks, PC builds, and servers. Examples include Apple laptops, custom gaming rigs, and Dell servers. Figure Explanation: MacBook Pro, PC Build, and Dell Rack Server Images: o Represent varied computing environments from personal devices to enterprise-grade hardware. o Infrastructure forms the foundation for running operating systems and applications. Virtual Machines Definition: o A Virtual Machine (VM) emulates a computer system, running its own Operating System (OS) and applications on shared physical hardware. VM Structure: Infrastructure → Hypervisor → OS → Programs: o Hypervisor: Software layer that enables multiple VMs to run on a single physical machine. o Programs: Applications executed within isolated environments. Types of Hypervisors: 1. Type 1 (Bare Metal): o Runs directly on the hardware (e.g., VMware ESXi). 2. Type 2 (Hosted): o Runs on an existing operating system (e.g., VirtualBox). Advantages: Isolation of VMs from the host and each other. Ability to suspend and resume VMs. Supports legacy systems. Disadvantages: High resource usage. Slower startup times compared to containers. Containers Definition: o A lightweight, standalone, and portable software package that includes everything needed to run an application (e.g., code, dependencies, runtime). Container Runtime: A container runtime, like Docker, manages the container lifecycle. Docker Structure: o Dockerfile → Docker Image → Docker Container. Figures Explained: Container Components: o Program + Dependencies: Isolated environment for applications. o Shared Libraries: Reduces duplication of resources between containers. o Container Runtime: Abstracts the OS to manage multiple containers. Advantages: Isolation and portability. Faster startup times compared to VMs. Better resource efficiency. Limitations: Dependency on container runtimes like Docker. Limited to environments where container runtimes are available. Docker Ecosystem Docker Workflow 1. Dockerfile: o Blueprint for building Docker images. o Contains instructions like: ▪ FROM ubuntu: Base image. ▪ COPY: Copy files to the container. ▪ CMD: Commands to execute when the container starts. 2. Docker Image: o A read-only template built from the Dockerfile. o Each Dockerfile instruction forms a layer in the image. 3. Docker Container: o A running instance of a Docker image. o Analogy: Docker Images are like classes, and Docker Containers are like objects. Building a docker image: - The dot at the end is to specify that the docker image is built in the current path. Running a docker container: Docker Registry Definition: o A repository for storing and distributing Docker images. o Example: Docker Hub. Commands: o $ docker pull : Download an image. o $ docker push : Upload an image. Docker Compose Definition: o A tool for defining and managing multi-container Docker applications using YAML files. Example YAML: Containers in Production Challenges: 1. Dead Application: Requires automated handling of container failures. 2. Dead Host Machine: If the physical host fails, containers on it are lost. 3. Full Host Machine: Resource constraints can cause containers to fail. Scaling and Orchestration Scaling: Horizontal Scaling: o Adding more machines to handle load. Vertical Scaling: o Adding more resources (CPU, RAM) to existing machines. Figure Explained: Horizontal vs. Vertical Scaling: o Horizontal scaling spreads load across multiple systems, while vertical scaling increases the capacity of a single system. Container Orchestration Definition: o Automation of container management tasks like deployment, scaling, and lifecycle management. Components: o A Cluster of nodes, each running a container runtime like Docker. o A Container Orchestrator manages these nodes and containers. Examples: o Kubernetes, Docker Swarm. Figures Explained: Cluster Diagram: o Nodes in a cluster each run containers with their own OS and infrastructure, managed by an orchestrator. o Demonstrates distributed container management across multiple machines. Lecture 2 1. Introduction to Operating Systems (OS) Definition of an OS An operating system (OS) is a program that serves as an intermediary between the user and the computer hardware. It enables efficient use of hardware while providing a convenient interface for users. Goals of an OS 1. Execute user programs: Ensures users' applications run properly. 2. Convenience: Simplifies interactions with the computer system. 3. Efficiency: Maximizes resource utilization and minimizes wastage. 3. Core Concepts and Architectures Computer System Structures Follows the von Neumann architecture, which separates: o Input/Output devices for user interaction. o Memory for data storage. o Processor for executing instructions. Unix Architecture 1. Hardware: The physical components of the system. 2. Kernel: Core part of the OS; manages resources like memory, CPU, and I/O devices. 3. Shell: Interface for users to communicate with the kernel. 4. Utilities: Tools and applications that perform specific tasks. Linux Architecture Similar to Unix but includes additional components such as: o Linux Kernel: Core OS program for hardware management. o GNU Tools & Libraries: Essential for development and running applications. o Window System: Graphical interface for interacting with the OS. o Desktop Environment: Complete user interface like GNOME or KDE. Unix Philosophy Programs should: o Perform one function well. o Be designed so that their output becomes the input of another program. o Be created iteratively for improvement. Focuses on reusability and simplicity. Linux file system Files are not just documents or directories; they represent any resource or entity that can be accessed and manipulated. Almost all system components, from hardware devices to inter-process communication endpoints, are treated as files with a unified interface. This approach provides a single, consistent way to interact with resources, whether you’re reading from a text file, writing to a device, or sending data between processes. 3. Operating Systems Perspectives From Different Stakeholders 1. User Perspective: o Prioritizes ease of use, responsiveness, and performance. 2. Mobile OS: o Focuses on usability and energy efficiency. 3. Superuser Perspective: o Handles Inter-Process Communication (IPC) and system-level control. 4. System Perspective: o Acts as a resource allocator and provides a uniform hardware interface via drivers and firmware. 4. Key Operating System Features Dual Mode Operation User Mode: Non-privileged mode for running applications. Kernel Mode: Privileged mode for executing critical OS instructions. Prevents users from directly accessing hardware, improving security and reliability. Virtual Memory Extends physical memory by using disk space, allowing programs to run as if more memory is available. Physical Memory: Actual hardware memory. Virtual Memory: Simulated memory space for running large applications efficiently. File Systems Divided into: o Physical File Systems: Handle storage mechanisms (e.g., FAT32, NTFS). o Logical File Systems: Provide abstraction for user interaction. o Virtual File Systems: Interface unifies various file systems. 5. Linux Advanced Features Linux: Control Groups (cGroups) Kernel feature that limits, accounts for, and isolates resource usage. Applies to: o CPU o Memory o Storage o Network Used by Docker to restrict container resources. Linux: Namespaces Another kernel feature to partition kernel resources. Each set of processes has isolated views of: o File systems o Network interfaces o Process IDs (PIDs) Allows multiple virtual environments on the same system. 6. Schedulers and Process Management Concepts Scheduling: Allocates CPU time to processes to ensure efficient execution. CPU–I/O Burst Cycle: Programs alternate between CPU bursts (computation) and I/O bursts (waiting for data). Purpose of Schedulers To select the next process to run To context-switch between processes To ensure optimal use of resources Scheduler Types 1. Preemptive Scheduling: Allows interruption of processes to allocate CPU to others. 2. Non-preemptive Scheduling: Process retains CPU until completion or voluntary relinquishment. Scheduler Algorithms First Come First Serve (FCFS): Executes processes in order of arrival. Shortest Job First (SJF): Prioritizes processes with the least execution time. Round-Robin Scheduling: Allocates a fixed time slice (quantum) to each process in a cyclic manner. Priority Scheduling: Processes with higher priority run first. Multilevel Queue Scheduling: Processes categorized into queues based on priority. Evaluation Metrics Maximizing: o CPU Utilization: Keeping the CPU as busy as possible. o Throughput: Number of processes completed in a given time. Minimizing: o Turnaround Time: Total time taken to execute a process. o Waiting Time: Time spent in the ready queue. o Response Time: Time from request submission to first output. Task States Ready: Waiting to be executed. Running: Actively executing on the CPU. Blocked: Waiting for I/O or another event. Terminated: Finished execution. 7. Command Line and Jobs Job Control Commands: o $ sleep 60: Pauses the shell for 60 seconds. o : Interrupts the process. o $ sleep 60 &: Runs the process in the background. o $ kill : Terminates a process using its Process ID. o : Suspends a running process. o $ fg %1: Resumes a suspended process. Command Line Example 8. Figures and Their Explanations Figure: Von Neumann Architecture Illustrates the foundational design of modern computers: o Memory Unit stores data and instructions. o Control Unit fetches and decodes instructions. o Arithmetic Logic Unit (ALU) performs computations. o I/O Devices facilitate user interaction. Virtual Memory Diagram Shows how the system creates an abstraction of larger memory space. o Maps physical addresses to virtual addresses for seamless execution. Dual Mode Operation Definition Dual Mode Operation is a security mechanism used in operating systems to differentiate between two levels of operation: 1. User Mode: For running user-level applications and processes. 2. Kernel Mode (Supervisor Mode): For executing OS-level tasks and accessing critical hardware resources. This separation ensures that critical system resources and hardware are protected from accidental or malicious misuse by user processes. How It Works 1. Mode Bit: o A special hardware bit (called the mode bit) indicates the current mode: ▪ 0 for Kernel Mode ▪ 1 for User Mode o When the system boots, it starts in Kernel Mode to initialize hardware and system resources. Once the OS has loaded, it switches to User Mode for running applications. 2. Transitions Between Modes: o Transitions occur via system calls or interrupts: ▪ User Mode → Kernel Mode: When a user process requests OS services (e.g., opening a file, allocating memory), a system call is made, transitioning the process to Kernel Mode. ▪ Kernel Mode → User Mode: Once the OS completes the service request, it switches back to User Mode. Why Dual Mode Operation Is Necessary 1. Protection: o Prevents user processes from directly accessing critical system resources (e.g., memory, I/O devices). o Protects the kernel from accidental corruption or intentional misuse by malicious code. 2. Controlled Resource Access: o The kernel validates all user requests before granting access to hardware or sensitive operations (e.g., device drivers, file systems). 3. Stability: o Ensures that system crashes caused by a faulty application do not compromise the entire system. Task States Diagram Depicts the lifecycle of a process, including transitions between states like Ready, Running, and Blocked. CPU–I/O Burst Cycle Definition The CPU–I/O cycle describes the alternating phases of computation (CPU operations) and input/output operations (e.g., reading from disk, network communication) that occur during a program's execution. Key Components 1. CPU Burst: o A period where the process is actively using the CPU for computations. o Examples: Performing calculations, executing instructions, or modifying data. 2. I/O Burst: o A period where the process is waiting for input/output operations to complete. o Examples: Reading data from a disk, sending/receiving data over a network. Cycle Process 1. A process begins with a CPU burst, where it performs calculations or computations. 2. It reaches a point where it needs data or resources (e.g., from disk or network), initiating an I/O burst. 3. While the process waits for I/O to complete, the CPU is free to execute other processes. 4. Once I/O completes, the process transitions back to the CPU burst phase. Why It’s Important 1. Efficient CPU Utilization: o The OS can schedule other processes during an I/O burst, ensuring the CPU is not idle. o Example: While a process is waiting for a file to be read from disk, another process can use the CPU. 2. Process Performance: o The balance between CPU bursts and I/O bursts determines whether a process is CPU- bound or I/O-bound: ▪ CPU-bound Processes: Spend most of their time performing computations (e.g., simulations, data processing). ▪ I/O-bound Processes: Spend most of their time waiting for I/O operations (e.g., database queries, file transfers). CPU–I/O Burst Distribution A process's CPU burst distribution is critical for designing scheduling algorithms: o Short CPU Bursts: Frequent switching between processes (better suited for Round-Robin scheduling). o Long CPU Bursts: Benefit from fewer context switches (e.g., First Come First Serve scheduling). Lecture 3 1. Network Models TCP/IP Model A specific model designed for the internet. Composed of 5 layers: It starts at the bottom so first layer is physical layer, second layer is data link layer, etc. 1. Application Layer: Includes protocols like HTTP, FTP, SMTP, DNS, RPC, etc., which handle application-specific tasks. 2. Transport Layer: Responsible for data transport between devices. Uses: ▪ TCP (Transmission Control Protocol): Reliable, connection-oriented communication. ▪ UDP (User Datagram Protocol): Faster, connectionless communication. 3. Network Layer: Manages addressing with IP addresses and handles routing. 4. Data Link Layer: Deals with MAC addresses for physical hardware and ensures error-free delivery. 5. Physical Layer: Handles physical transmission of data (e.g., wires, wireless signals). OSI Model A general model applicable to all networks. Contains 7 layers (similar to TCP/IP but includes additional segmentation like Session and Presentation layers). 2. Encapsulation in the TCP/IP Model Encapsulation: Process of wrapping data with protocol headers as it passes down the layers. o Application Layer: Adds application-specific data. o Transport Layer: Adds a TCP/UDP header. o Network Layer: Adds an IP header (forming a Packet). o Data Link Layer: Adds an Ethernet header and trailer (forming a Frame). o Physical Layer: Converts the frame into electrical signals, optical pulses, or radio waves. Decapsulation: Reverse process where headers are removed as data moves up the stack. Figure Explanation Each layer adds its specific information: 1. Raw DATA originates at the Application layer. 2. TCP/UDP headers are added at the Transport layer. 3. IP headers are added at the Network layer. 4. Ethernet headers and trailers are added at the Data Link layer. 5. At the Physical layer, data is transmitted as signals. 3. Physical Layer (Layer 1) Focuses on transmitting raw data over physical media (e.g., cables, wireless). Examples of hardware: o 100BASE-T (Ethernet standard for twisted-pair cables). o Cat5 cables: Twisted-pair cables used for Ethernet. Line Coding: o 4B5B: Encodes 4 bits into 5 bits to ensure synchronization. o MLT-3: Uses three voltage levels for efficient signaling. Functions: o Encoding: Converts data into signals. o Error Detection/Correction: Identifies and corrects physical layer errors. 4. Data Link Layer (Layer 2) MAC Address: o A unique identifier assigned to each network interface (e.g., 0a:42:33:3a:a0:b1). o Used for local addressing in LANs. Switches: o Devices operating at Layer 2 that forward data frames based on MAC addresses. Error Detection: o Ensures data integrity using techniques like CRC (Cyclic Redundancy Check). 5. Network Layer (Layer 3) IP Addressing IP Address: o Unique identifier for a device on a network. o Example: ▪ Routable IPs: Public addresses like 130.226.87.9. ▪ Private IPs (RFC1918): Used within local networks: ▪ 192.168.0.0/16, 10.0.0.0/8, 172.16.0.0/16. o CIDR (Classless Inter-Domain Routing): Efficient IP allocation (e.g., /24 means 256 IPs). Routing Mimics sending a letter: 1. Write the destination address (IP of the receiver). 2. Send it to the nearest "post office" (router). 3. The router determines the next step until the letter (data packet) reaches its destination. Dynamic Routing: Protocols like RIP, OSPF, and BGP dynamically determine the best path. 6. IP Address Assignment Dynamic Host Configuration Protocol (DHCP): o Automatically assigns IP addresses to devices on a network. o Example: ▪ Server assigns IPs like 192.168.0.10 to a client. Static IPs: o Manually assigned IPs (e.g., 10.0.0.1) for servers and critical devices. 7. Transport Layer (Layer 4) Responsible for process-to-process communication. Ports: o Logical communication endpoints. o Example: ▪ Port 80: HTTP traffic. ▪ Port 443: HTTPS traffic. TCP vs. UDP: o TCP: Reliable, ensures all data is received and ordered correctly. o UDP: Lightweight, suitable for real-time applications like video streaming. 8. Network Address Translation (NAT) Allows multiple devices on a private network to share a single public IP address. Types: o Source NAT (SNAT): Replaces the private source IP with the router's public IP. o Destination NAT (DNAT): Replaces the destination public IP with a private IP (used for port forwarding). 9. VLANs (Virtual LANs) IEEE 802.1Q standard. Allows segmentation of a physical network into multiple logical networks. Example: o VLAN 10: Employees. o VLAN 20: Guests. Improves security and reduces broadcast domain size. 10. IPv4 vs. IPv6 IPv4: o 32-bit addresses (e.g., 192.168.0.1). o Limit: 4.3 billion addresses. IPv6: o 128-bit addresses (e.g., 2001:0db8::1428:57ab). o Virtually unlimited addresses to accommodate future growth. Figures and Their Explanations 1. Encapsulation Process Figure: Shows how raw data moves through layers, gaining additional headers (TCP, IP, Ethernet) at each step. The resulting unit (frame) is transmitted across the network. 2. Routing Analogy (Letter Example) Figure: Compares routing to sending a letter via post offices. Each "post office" (router) examines the destination address and forwards the letter to the next appropriate stop until it reaches the recipient. 3. IP Assignment via DHCP Figure: Demonstrates how a DHCP server assigns IPs dynamically to clients on a network. 4. NAT Process Figure: Shows how a private IP (192.168.0.10) is replaced by the router's public IP (73.14.27.8) when sending data to the internet. 5. VLAN Segmentation Figure: Displays how VLANs segment a network into logical groups, allowing different traffic flows. How SNAT Works 1. Private Network: o Devices within the local network have private IP addresses (e.g., 192.168.0.x or 10.0.0.x), which cannot be routed directly on the internet. 2. SNAT Process: o When a device (e.g., 192.168.0.10) sends a request to an external server, the NAT device (router or gateway) replaces the device's private IP with the NAT device's public IP (e.g., 73.14.27.8) in the packet's header. 3. Response Handling: o When the external server responds, it sends the data back to the public IP of the NAT device. o The NAT device then translates the public IP back to the private IP of the original sender (192.168.0.10) and forwards the response. Key Features of SNAT IP Address Translation: o Converts multiple private IP addresses to a single public IP address for external communication. Port Tracking: o SNAT keeps track of outgoing connections using port numbers to ensure incoming responses are delivered to the correct internal device. o For example: ▪ Device A: 192.168.0.10 uses Port 50001. ▪ Device B: 192.168.0.11 uses Port 50002. ▪ Both use the same public IP (73.14.27.8), but the NAT distinguishes devices by their port numbers. Preservation of Internal Network: o The internal network's private IP addresses remain hidden from the external network, enhancing security. Diagram Explanation Imagine a private network with two devices: Device A: 192.168.0.10 Device B: 192.168.0.11 When Device A initiates a request to a server (e.g., 92.234.1.9), the following happens: 1. The router replaces 192.168.0.10 with the public IP 73.14.27.8 in the packet's source field. 2. The server responds to 73.14.27.8. 3. The router identifies the response as belonging to Device A (based on port mapping) and forwards it back to 192.168.0.10. How VLANs Work 1. Tagging: o VLANs are identified using VLAN tags inserted into Ethernet frames. The tag contains the VLAN ID (a unique identifier between 1 and 4094). o Example: ▪ VLAN 10: Devices for employees. ▪ VLAN 20: Devices for guests. o Tags are added by switches using the IEEE 802.1Q standard. 2. Switch Configuration: o Ports on a switch can be configured as: ▪ Access Ports: ▪ Assigned to a specific VLAN. ▪ Used for connecting devices like PCs or printers. ▪ Trunk Ports: ▪ Allow traffic from multiple VLANs to pass through. ▪ Commonly used to connect switches or routers. 3. Broadcast Domains: o Each VLAN creates a separate broadcast domain, reducing unnecessary traffic. Diagram Explanation Example: VLAN Setup Physical Setup: o A single switch connects devices from different departments (e.g., HR, IT, and Guests). Logical Segmentation: o VLAN 10: Devices in HR (e.g., 192.168.10.x). o VLAN 20: Devices in IT (e.g., 192.168.20.x). o VLAN 30: Guest devices (e.g., 192.168.30.x). Trunk Port: o A router connects to the switch's trunk port to enable inter-VLAN communication. Lecture 4 2. Switches What is a Switch? A networking device that connects multiple devices within the same network. Operates primarily at Layer 2 (Data Link Layer) and sometimes at Layer 3 (Network Layer). Types of Switches: 1. Unmanaged Switch: o Operates at the Data Link Layer (Layer 2). o Forwards frames based on MAC addresses. o Simple, plug-and-play functionality. o Example: Expanding the number of devices in a small network. 2. Managed Switch: o Operates at Layer 3 and provides advanced features: ▪ VLAN support. ▪ Traffic prioritization. ▪ Forwards data using both MAC addresses and IP addresses. o Used in larger, more complex networks. Figure Explanation: Switch Diagram: Illustrates how devices connect to a switch. A switch forwards frames only to the device they are intended for, reducing unnecessary traffic compared to hubs. 3. Routers What is a Router? A networking device that connects multiple networks. Operates at the Network Layer (Layer 3). Forwards packets between networks using IP addresses. Functions: 1. Routing Table: o Determines the best path for forwarding data packets. 2. Traffic Management: o Manages and prioritizes network traffic. 3. Network Segmentation: o Isolates different subnets for better security and performance. Figure Explanation: Router Diagram: Shows how routers connect two or more networks and forward data between them. 4. Transmission Types Unicast: Data sent to a single recipient. Multicast: Data sent to a specific group of recipients. Broadcast: Data sent to all devices in the network. Figure Explanation: Diagrams display the direction of data flow: o Unicast: One-to-one. o Multicast: One-to-many (specific). o Broadcast: One-to-all. 5. Address Resolution Protocol (ARP) [Layer 2] Resolves IP addresses to MAC addresses for communication within the same local network. How It Works: 1. A device sends an ARP request: "Who has IP 10.0.0.2? Tell 10.0.0.3." o Sent as a broadcast. 2. The target device (10.0.0.2) replies with its MAC address: "I am 10.0.0.2." 3. Devices store this information in the ARP Cache/Table for future use. Figure Explanation: Shows a device broadcasting an ARP request and the reply being sent back. 6. Transmission Control Protocol (TCP) [Layer 4] Ensures reliable communication between devices. Three-Way Handshake: 1. SYN: The client sends a request to establish a connection. 2. SYN/ACK: The server acknowledges the request and agrees to establish a connection. 3. ACK: The client confirms the connection is established. Figure Explanation: A sequence diagram illustrates the handshake between a client and a server. 7. Domain Name System (DNS) [Layer 5] Purpose: Translates domain names (e.g., sdu.dk) into IP addresses (e.g., 20.105.224.27). How It Works: 1. Client Request: o The client queries a DNS resolver (e.g., 1.1.1.1) for a domain name. 2. Recursive Resolution: o The resolver queries the Top-Level Domain (TLD) servers (e.g.,.dk). o TLD servers point to the Authoritative Name Server for the domain. 3. Response: o The authoritative server returns the IP address for the domain. DNS Record Types: A: Maps a domain to an IPv4 address. AAAA: Maps a domain to an IPv6 address. CNAME: Points a domain to another domain. MX: Specifies the mail server for the domain. TXT: Stores arbitrary text (e.g., for verification purposes). NS: Lists name servers for the domain. 8. HyperText Transfer Protocol (HTTP) [Layer 5] Purpose: Protocol used for transmitting web data. Components of a Request: 1. Method: o Common HTTP methods: GET, POST, PUT, DELETE. o Example: GET /sdu.dk/something. 2. Headers: o Metadata sent with the request (e.g., Authorization, Content-Type). 3. Response: o Status code (e.g., 200 OK, 404 Not Found). o Headers and body (e.g., HTML content). Figure Explanation: Illustrates a client making a GET request to a server and the server responding with a status code and content. 9. Autonomous Systems (AS) Definition: Large networks that make up the internet. Controlled by organizations like ISPs, universities, or governments. Functions: 1. IP Address Space Management: o Each AS controls a block of IP addresses. 2. Routing Policies: o Determines how data is routed between ASes. Routing Protocols: 1. Within AS: o Routing Information Protocol (RIP): ▪ Uses hop count to find the shortest path. o Open Shortest Path First (OSPF): ▪ Dynamically identifies the best path considering cost and state. 2. Between ASes: o Border Gateway Protocol (BGP): ▪ Announces which networks control which IP addresses. ▪ Connects different ASes together. Figure Explanation: Shows how BGP connects autonomous systems and advertises IP routes. Key Terms 1. Routing Table: o A database used by routers to store paths to different networks. 2. Hop Count: o The number of routers a packet passes through to reach its destination. 3. IP Address: o A unique identifier for a device on a network (e.g., IPv4: 192.168.0.1, IPv6: 2001:db8::). 4. MAC Address: o A hardware address unique to a device's network interface. Lecture 5 Batch vs. Stream Processing 1. Batch Processing: o Processes large blocks of data that are saved in storage units before being analyzed. o Significant latency because the data is only analyzed after being stored. o Example: Advanced stock exchange processing where historical data is processed to identify trends. o Figure Explanation: The figure visually illustrates how data is collected in chunks (batches) and then processed collectively after being stored. 2. Stream Processing: o Processes a continuous stream of data without a defined start or end. o Data is analyzed in real time, even before hitting storage units. o Results in "no" latency and allows immediate responses. o Example: Live stock exchange applications where price fluctuations are monitored in real time. o Figure Explanation: The figure compares stream processing with batch processing, showing that streaming handles live, flowing data continuously. Key Design Differences Between Batch and Stream Processing Implications: Batch processing suits scenarios where accuracy and error correction are critical (e.g., financial audits). In contrast, stream processing is better for real-time applications where speed is prioritized, such as fraud detection. Layer 4: Transport Protocols 1. Transmission Control Protocol (TCP): o Reliable delivery is the main goal. o Characteristics: ▪ Substantial overhead due to connection management. ▪ Uses a three-way handshake to establish a connection. ▪ Ensures in-order packet delivery and retransmits lost packets. ▪ Offers error correction and guarantees data integrity over speed. o Use Case: Banking transactions where data accuracy is vital. 2. User Datagram Protocol (UDP): o Prioritizes speed and timely delivery over data integrity. o Characteristics: ▪ Minimal overhead with no handshake. ▪ Does not guarantee packet ordering or retransmissions. ▪ Always focuses on delivering the most recent data. o Use Case: Live streaming or online gaming where latency is critical. Comparison Summary: TCP ensures accuracy and reliability, while UDP focuses on speed and efficiency. Choose TCP for critical systems (e.g., file transfers) and UDP for real-time applications. Protocol Stacks Definition: Protocols are arranged in layers to handle specific communication tasks. Each layer builds on the functionalities of the layer below it. Figure Explanation: A visual depiction of the layered communication model shows how data flows through these layers. Publish-Subscribe Architecture Definition: A messaging pattern where publishers send messages to topics, and subscribers’ express interest in receiving updates from these topics. Figure Explanation: The figure depicts publishers broadcasting to topics and subscribers receiving messages of their interest. This structure supports scalability and decoupling in data delivery. Practical Application: Used in systems like Apache Kafka, where streaming data is processed efficiently across distributed systems. Video on Demand Protocols Purpose: Protocols to deliver pre-recorded video content efficiently, focusing on minimizing latency and ensuring smooth playback. Figure Explanation: Depicts how low-latency protocols are implemented to ensure real-time interaction, especially for applications like sports streaming. Apache Kafka 1. What is Kafka? o A distributed, horizontally scalable, fault-tolerant commit log system. o Developed by LinkedIn for real-time data pipelines and streaming applications. o Capabilities: ▪ Publish-subscribe to streams. ▪ Store streams persistently. o Kafka is designed to handle high-throughput, low-latency data ingestion. 2. How Kafka Works: o Messages are produced by producers and categorized into topics. o Consumers subscribe to topics and retrieve messages. o Kafka ensures scalability by dividing topics into partitions distributed across multiple servers. 3. Figures Explanation: o Kafka Cluster Figure: Shows how Kafka operates as a distributed system, with multiple brokers managing partitions of data for scalability and fault tolerance. o Consumer Groups Figure: Depicts how multiple consumers can read from a topic concurrently, ensuring parallel processing. 4. Example Use Case: o Counting People at Rosengårdscenteret: ▪ Cameras at entrances determine whether individuals are entering or exiting. ▪ Each camera sends data to Kafka, which categorizes it into specific streams for real- time analysis. Key Takeaways 1. Batch Processing: o Processes historical data in bulk, emphasizing accuracy and reliability. 2. Stream Processing: o Processes live data in real-time for timely decisions. 3. Publish-Subscribe Architecture: o Efficient and scalable messaging pattern for dynamic data delivery. 4. Transport Protocols: o TCP for reliability and data integrity; UDP for speed and low latency. Apache Kafka's Role: A robust tool for building real-time data pipelines, enabling scalable and fault-tolerant streaming applications. Lecture 6 Container Orchestration Overview Definition: o A system that manages the configuration, coordination, and lifecycle of containers. o Operates as a cluster, each running a container runtime (e.g., Docker). Containers Features: o Isolation o Portability o Dependency management o Maintainability o Deployability Example (docker-compose): Why Container Orchestration? Deployment to clusters High availability Deployment as code Automatic scaling Health checks Rollout & rollback strategies Zero downtime updates Automatic recovery Kubernetes Purpose: o Manages containerized workloads and services. o Features include automatic deployment, scaling, and operations of applications. Key Features: o Namespacing o Load balancing o Autoscaling o Health checks o Rollout and rollback strategies Core Functions: o Monitors containers. o Ensures deployments align with the desired state. Components 1. Worker-Plane: o kubelet: Manages individual node-level tasks. o kube-proxy: Handles networking. 2. Control-Plane: o Scheduler: Assigns workloads to nodes. o Controller Manager: Handles state maintenance. o Cloud Controller Manager: Integrates with cloud providers. o etcd: Key-value store for configuration data. o API Server: Central point of interaction. kubectl (Command-Line Interface) General Syntax: $ kubectl Imperative Commands: o $ kubectl create o $ kubectl delete o $ kubectl edit Declarative Commands: o $ kubectl apply -f o $ kubectl delete -f Kubernetes Objects 1. Nodes: o Server in the Worker-Plane. o Example commands: ▪ $ kubectl get nodes ▪ $ kubectl describe nodes 2. Pods: o Smallest Kubernetes object. o Features: ▪ Lifecycle-managed. ▪ Contains one or more containers. ▪ Addressable via IP. ▪ Sidecar Pattern: Containers share fate ("ride together, die together"). o Example YAML: 3 Namespaces: Virtual clusters within a Kubernetes cluster. 4 Deployments: Manages and scales pods. Example YAML Networking Pod IPs: o Each pod and node has an IP. o Pods are ephemeral; new ones are assigned fresh IPs. Services: o Used to expose pod endpoints. Types of Services: 1. ClusterIP: Internal cluster communication. 2. NodePort: Accessible from external sources via specified port. 3. LoadBalancer: Provides a front-end load balancer (requires setup). Example YAML (ClusterIP): Secrets Securely inject confidential data into pods. o Can be mounted as a file or set as environment variables. Example YAML: Lecture 7 File Systems & Storage Block Devices Represented as files in /dev/ (e.g., /dev/sda, /dev/nvme0). Types include USB flash drives, solid-state drives (SSDs), and M.2 storage devices. Fragmentation Fragmentation refers to the process by which storage space on a disk or memory becomes inefficiently used over time, leading to gaps or fragmented sections where data is not stored contiguously. Fragmentation can occur in various contexts, including file systems, memory management, and network communication. It can lead to reduced performance and increased access time, as data becomes scattered rather than stored in a continuous block. There are two main types of fragmentation: 1. File System Fragmentation In the context of a file system, fragmentation occurs when files are split into pieces and stored non- contiguously across the storage medium. This happens as files are created, modified, and deleted over time. 2. Memory Fragmentation Memory fragmentation occurs in a computer's RAM (random access memory) when free memory is scattered in small, non-contiguous blocks. This often happens when memory is allocated and deallocated dynamically over time. Random vs. Sequential Access Performance differences based on data access patterns: o Random Access: Slower read/write speeds. o Sequential Access: Faster read/write speeds. Example: 1TB Seagate HDD performance o Random Read: 0.87 MB/s; Write: 1.53 MB/s. o Sequential Read: 173 MB/s; Write: 159 MB/s. Sectors & Blocks Sector: Physical disk division; typically 4 KB. Block: Logical file system division; typically 4 KB. Performance metrics include IOPS (I/O Operations Per Second): o HDD: 100–1500 IOPS. o SSD: 35,000+ IOPS. o M.2: 1,000,000+ IOPS. HDD vs. SSD vs. RAM Comparison File Systems Stores data as: o Binary data: File contents. o Metadata: Includes timestamps (created, edited, accessed), size, owner, and group. Types of File Systems Physical File Systems: FAT32, EXT4, EXT3, NTFS, APFS. Virtual File Systems: Interface for managing multiple file systems. Logical File Systems: Manages higher-level functionality like file types and directories. Physical File System (PFS) deals with the actual storage on the physical device and how data is written and read. Virtual File System (VFS) provides a common interface for accessing files across different file systems, abstracting away the details of the underlying physical file system. Logical File System (LFS) deals with the user-level operations, such as managing file names, metadata, and permissions, as well as handling file access patterns and logical organization of files. Simplified Example Flow: 1. A user or program requests to open a file using its name. 2. The logical file system checks the file's metadata (e.g., existence, permissions) and identifies its location. 3. The virtual file system interfaces with the appropriate physical file system based on the file’s location (e.g., local storage, network storage). 4. The physical file system handles reading or writing data blocks from/to the actual storage medium. Directories Linux: Filename and file index (inode). Windows: Filename, metadata, and data pointer. Inodes Data structure storing metadata about files/directories. Contains size, ownership, permissions, timestamps. Does not store filenames (managed by directories). Mounting File Systems Command examples: o Mount: mount /dev/sda /mnt/usbkey o Unmount: umount /dev/sda Journaling: o Logs uncommitted changes to reduce corruption risk during system crashes. o Examples: EXT3, EXT4. Next-Generation File Systems Address advanced features like: o Bit-rot prevention. o Snapshots, checksumming, and self-healing. Examples: btrfs, zfs, hdfs. RAID (Redundant Array of Independent Disks) Overview Combines multiple disks for improved performance and reliability. RAID Levels 1. RAID 0 (Striping): o Data split across multiple drives. o Advantages: Improved performance. o Disadvantages: No reliability (data loss on single drive failure). 2. RAID 1 (Mirroring): o Duplicate data across drives. o Advantages: High reliability. o Disadvantages: Inefficient storage use. 3. RAID 4 (Striping with Dedicated Parity): o Data split across drives, with one storing parity bits. o Advantages: Reliability with minimal space waste. o Disadvantages: Slight performance loss. 4. RAID 6 (Striping with Dual Parity): o Data and parity bits spread across drives. o Advantages: High reliability (survives two drive failures). o Disadvantages: Minor performance and space efficiency trade-offs. Key Concepts for Exam 1. Storage Types & Access Patterns: o Differences between HDDs, SSDs, and RAM. o Random vs. Sequential access implications. 2. File Systems: o Differences between physical, virtual, and logical file systems. o Purpose and structure of inodes. 3. RAID Configurations: o Key features, advantages, and trade-offs of RAID levels (0, 1, 4, 6). 4. Journaling and Advanced File Systems: o Benefits of journaling in modern file systems. o Features of btrfs, zfs, and hdfs. 5. Command Usage: o Basic commands for managing partitions and mounting file systems. 6. IOPS Metrics: o Comparison between HDDs, SSDs, and M.2 storage for performance benchmarks. Lecture 8 Memory Random Access Memory (RAM) Address Register: Holds the addresses of memory locations. Data stored in binary format, e.g., 10110111. Virtual Memory Provides the illusion of a larger memory space than physically available. Maps virtual addresses to physical memory. Process Memory Divided into segments: o Code Segment: Stores the program's executable code. o Data Segment: Stores global and static variables. o Heap: Dynamic memory allocation. o Stack: Function calls and local variables. Multitasking Processors Single Processor: Executes tasks sequentially. Multiprocessor Systems: o Parallel systems (tightly coupled). o Advantages: ▪ Increased throughput. ▪ Cost efficiency (economy of scale). ▪ Improved reliability (fault tolerance). Multicore Processors Multiple cores in a single processor enhance parallelism. Hyperthreading (Intel): Enables multiple threads to execute per core. Cache Levels: o Hardware Cache: Closest to the processor, fastest access. o OS Cache: Utilized to store frequently accessed data. o Software Cache: Temporary storage for repeated operations. Cache management is critical for performance optimization. Multithreading Multitasking vs. Multithreading Multitasking: o Process-based. o Programs run by the operating system. o Easier to implement but memory-intensive. Multithreading: o Thread-based (within a single program). o Faster development but prone to errors (e.g., race conditions). Concurrency Issues Race Condition: Two or more threads access shared resources simultaneously, leading to unpredictable results. Atomicity: Ensures operations are indivisible (either fully completed or not executed). Mutex (Mutual Exclusion): Used to lock resources to prevent concurrent access. o Example: Deadlocks Occurs when two or more processes are indefinitely waiting for events triggered by one another. Process States 1. New: Process is being created. 2. Ready: Waiting to be assigned to a processor. 3. Running: Executing instructions. 4. Waiting: Waiting for an event. 5. Terminated: Execution completed. Concurrency vs. Parallelism Concurrency: Multiple tasks make progress by sharing resources. Parallelism: Tasks are executed simultaneously. Inter-Process Communication (IPC) Definition Mechanisms for processes to communicate and synchronize with each other. Types of IPC 1. Shared Memory: o Processes share a memory region. o Faster than other IPC mechanisms. 2. Message Passing: o Data is exchanged between processes using messages. o Modes: ▪ Synchronous: Sender waits for receiver. ▪ Asynchronous: Non-blocking communication. Sync. Vs Async Linux IPC Mechanisms 1. Pipes: o Unidirectional communication. o Example: $ cat /etc/passwd | awk -F ":" '{print $1}' | sort | head -n 10 2. Named Pipes (FIFOs): o Persistent pipes with a name in the file system. o Commands: ▪ $ mkfifo pipe1 ▪ $ echo {Text} > pipe1 ▪ $ cat pipe1 3. UNIX Sockets: o Bi-directional communication. o Abstracts TCP/IP for local systems. o Example: Docker uses /var/run/docker.sock. 4. DBUS: o Systemwide bus for inter-process communication. o Used for GUI applications. IPC Between Computers Implemented using sockets. TCP Server Example (Python): TCP Client Example (Python): Key Topics for Exam 1. Memory: o Differences between physical and virtual memory. o Process memory segmentation. 2. Concurrency Issues: o Race conditions, atomicity, and deadlocks. o Mutex usage and thread management. 3. Multitasking and Multithreading: o Advantages of multicore systems and hyperthreading. o Differences between multitasking and multithreading. 4. IPC Mechanisms: o Shared memory vs. message passing. o Linux-specific IPC tools: pipes, named pipes, UNIX sockets, DBUS. 5. Python Sockets: o Understanding server-client interaction for network-based IPC. Lecture 9 Computer Devices 1. Computational Devices: o Servers 2. Networking Devices: o Access Points, Switches, Routers, Modems 3. Storage Devices: o Network Attached Storage (NAS) o Storage Area Network (SAN) Database-Centered Architecture Components: o Frontend: Webservers handling user interactions. o Backend: API servers for processing business logic. o Database: Persistent storage for data. Example Transaction: CAP Theorem Distributed systems can uphold only two of the following guarantees: 1. Consistency: Ensures the most recent write is visible to all reads. 2. Availability: Guarantees responses to every request (but data might not be the latest). 3. Partition Tolerance: System operates despite dropped messages between nodes. CAP Trade-offs: o Consistency & Partition Tolerance: Non-consistent nodes are unavailable during partitions. o Availability & Partition Tolerance: Nodes stay online but may return outdated data. o Consistency & Availability: Limited to a single machine without fault tolerance. ACID Properties of Database Transactions 1. Atomicity: Transactions execute completely or not at all. 2. Consistency: Transactions leave the database in a valid state. 3. Isolation: Concurrent transactions appear serialized. 4. Durability: Completed transactions are permanent. Quality Attributes Availability Ensures the system is operational when needed. Metrics: o MTBF (Mean Time Between Failures). o MTTR (Mean Time To Repair). o Availability = MTBFMTBF+MTTR\frac{MTBF}{MTBF + MTTR}MTBF+MTTRMTBF. Tactics: o Fault Detection: Heartbeats, ping/echo. o Fault Recovery: Redundant spare systems, retries. o Fault Prevention: Predictive models, removing faulty services. Deployability Efficient transitions from development to production. Tactics: o Deployment pipeline management. o Scaled rollouts and rollback mechanisms. Energy Efficiency Balancing energy usage with performance. Tactics: o Monitor and allocate resources dynamically. o Reduce resource demand and prioritize critical tasks. Modifiability The effort required to implement changes. Tactics: o Increase cohesion: Split modules, redistribute responsibilities. o Reduce coupling: Encapsulation, intermediaries. o Defer binding: Flexible component replacement. Performance Ability to meet timing requirements. Tactics: o Control resource demand: Limit event response and computational overhead. o Manage resources: Introduce concurrency, increase resources. Safety Avoid unsafe states and minimize resulting harm. Tactics: o Detect unsafe states: Sanity checks. o Containment: Redundancy, replication. o Consequence mitigation: Recovery mechanisms like rollback. Security Protect data from unauthorized access while enabling access for legitimate users. CIA Triad: 1. Confidentiality: Prevent unauthorized access. 2. Integrity: Protect against unauthorized data modification. 3. Availability: Ensure system remains operational. Tactics: o Attack Detection: Intrusion detection systems. o Attack Resistance: Authentication mechanisms. o Attack Reaction: Revoking access, auditing. Testability Ease of identifying software faults during testing. Tactics: o Control system state with specialized interfaces. o Limit structural complexity. Key Exam Topics 1. CAP Theorem: Understand trade-offs in distributed systems. 2. ACID Properties: Importance of transaction guarantees in databases. 3. Quality Attributes: Key metrics and tactics for availability, deployability, and performance. 4. Database-Centered Architecture: Frontend, backend, and database interactions. 5. Energy Efficiency: Strategies for reducing resource usage. 6. Security: Focus on the CIA triad and mitigation tactics. Lecture 10 System Architectures 1. Monolithic Architecture Single codebase where all functionalities are tightly coupled. Advantages: o Simple deployment and testing. Disadvantages: o Difficult to scale and maintain as the application grows. 2. Client-Server Architecture Two components: o Client: Sends requests to the server. o Server: Processes and responds to client requests. 3. Publish-Subscribe Architecture Components: o Publisher: Sends messages. o Subscribers: Receives messages through a broker. Decouples producers and consumers of data. 4. Microservices Architecture Application broken into smaller, independent services. Advantages: o Easy to scale and maintain individual components. o Fault isolation. Disadvantages: o Increased complexity and overhead. 12-Factor Application Principles 1. Codebase: One repository per application. 2. Dependencies: Explicit declaration of dependencies. 3. Config: Stored in the environment. 4. Backing Services: Treat external services as attached resources. 5. Build, Release, Run: Separate stages for development. 6. Processes: Stateless, scalable processes. 7. Port Binding: Export services through ports. 8. Concurrency: Leverage concurrency to scale. 9. Disposability: Fast startup and graceful shutdown. 10. Dev/Prod Parity: Maintain similar environments for development and production. 11. Logs: Stream logs to stdout. 12. Admin Processes: Execute as one-off tasks. XaaS (Anything as a Service) Definition: Collective term for delivering IT services. Examples: 1. SaaS (Software as a Service): e.g., Google Docs. 2. PaaS (Platform as a Service): e.g., Heroku. 3. IaaS (Infrastructure as a Service): e.g., AWS EC2. 4. On-Premises: Locally hosted solutions. Raft Consensus Algorithm Purpose: Achieve consensus in distributed systems. Key Features: o Fault-tolerant; operational as long as a majority of servers are functional. o Guarantees finality of decisions. Leader Election: o Random election timeout triggers a candidate state. o Follower nodes vote for the candidate. o Majority votes elect a leader. Log Replication: o Leaders replicate logs across followers. o Logs are committed once a majority agrees. Certificates Definition: Prove the validity of public keys. Components: o Issuer, validity period, subject name, signature. Certificate Generation Process: 1. Generate a private-public key pair. 2. Create a Certificate Signing Request (CSR). 3. Submit CSR to a Certificate Authority (CA). 4. CA verifies and issues the certificate. Self-Signed Certificates: o Not issued by trusted authorities. o Enables encryption but lacks trust validation. Take-Home Assignment Overview Task: Create a three-container application (proxy, backend, database). Requirements: o Proxy: ▪ NGINX server enabling HTTPS via port 443. ▪ Serves HTML files and forwards API requests. ▪ Use self-signed certificates stored in /etc/nginx/ssl/. o Backend: ▪ Serve two endpoints via port 5000: ▪ POST /person/ ▪ GET /persons/ ▪ Handles data insertion and retrieval for the database. o Database: ▪ MySQL server with a table Person containing: ▪ PersonID (Auto-incrementing integer). ▪ Firstname (Text). ▪ Lastname (Text). Testing Checklist 1. Execute: docker compose exec -T newman newman run --env-var url=proxy od-2023-test.json 2. Validate certificates: docker compose exec -T proxy openssl verify -CAfile /etc/nginx/ssl/rootCA.pem /etc/nginx/ssl/site.crt7 3. Ensure no database exposure: wget --no-check-certificate --timeout 1 -qO /dev/null http://localhost:3306 || echo "Database not exposed" 4. Ensure backend is not exposed: wget --no-check-certificate --timeout 1 -qO /dev/null http://localhost:5000/ || echo "Backend not exposed" Key Topics for Exam 1. System Architectures: o Monolithic vs. Microservices vs. Client-Server vs. Publish-Subscribe. 2. Raft Consensus Algorithm: o Leader election and log replication mechanisms. 3. Certificates: o Process and use cases for self-signed and CA-issued certificates. 4. 12-Factor Application Principles: o Best practices for building scalable and maintainable applications. 5. XaaS: Understanding SaaS, PaaS, and IaaS distinctions. Lecture 11 Firewalls Purpose: Separate and protect networks by filtering incoming and outgoing traffic. Benefits: o Protect data, identity, and resources. o Defend against attacks like DoS/DDoS and smurf attacks. Firewall Policies: 1. Actions: o Accept o Drop o Reject 2. Default Policies: o Default Drop: Block all traffic except explicitly allowed. o Default Permit: Allow all traffic except explicitly denied. 3. Zones: o Physical or logical separation of network areas (e.g., DMZ). Load Balancing Definition: Distributes incoming traffic across multiple servers to ensure availability and reliability. Advantages: o Avoid single points of failure. o High availability and redundancy. Types: o Layer 4 (transport layer). o Layer 7 (application layer) in the OSI model. DNS (Domain Name System) Purpose: Acts as the internet's phonebook, mapping domain names to IP addresses. DNS Record Types: SSL/TLS 1. Overview of SSL/TLS SSL/TLS establishes an encrypted and secure channel for communication by: Encrypting data, so even if it's intercepted, it can't be read. Authenticating both parties (client and server), ensuring they are who they claim to be. Ensuring data integrity, so data cannot be altered during transmission. SSL has evolved into TLS, but the term "SSL" is still commonly used when referring to both protocols. 2. SSL/TLS Handshake Process The SSL/TLS handshake is the initial step in establishing a secure connection. It involves several key steps: Step 1: Client Hello The client (usually a web browser) initiates the handshake by sending a "Client Hello" message to the server. This message includes: o SSL/TLS version: The version of the protocol supported by the client (e.g., TLS 1.2 or TLS 1.3). o Cipher Suites: A list of cryptographic algorithms the client supports (e.g., AES, RSA). o Random Number: A random value generated by the client, used in creating session keys. o Session ID: A unique identifier for the session (used in session resumption). Step 2: Server Hello The server responds with a "Server Hello" message. This message includes: o Chosen Cipher Suite: The cipher suite selected by the server from the client’s list. o Random Number: Another random value generated by the server, used in key creation. o Server Certificate: The server’s SSL/TLS certificate, which contains the server’s public key and other identifying information. o Session ID: The server's session identifier if session resumption is used. Step 3: Server Authentication and Key Exchange The server sends its digital certificate (issued by a trusted Certificate Authority, or CA) to authenticate its identity. The certificate contains the server's public key and information about the server’s identity. The client verifies the certificate by checking its authenticity with the CA’s public key (usually bundled with the client’s software or operating system). o If the certificate is valid and matches the server, the client proceeds. o If the certificate is invalid, the client will raise an error (e.g., warning about an insecure connection). Step 4: Client Key Exchange The client generates a pre-master secret (a shared secret) and encrypts it with the server’s public key from the certificate. It then sends the encrypted pre-master secret to the server. Both the client and server will independently generate the same session keys based on the pre- master secret and the two random numbers exchanged earlier (one from the client and one from the server). Step 5: Session Key Generation Both parties use the pre-master secret and the random values to generate symmetric session keys. These session keys will be used to encrypt and decrypt the data during the session. Symmetric encryption is faster than asymmetric encryption (public/private keys), so it’s used for the actual data exchange after the handshake. Step 6: Finished Messages The client sends a Finished message encrypted with the session key, indicating that the client’s part of the handshake is complete. The server sends its own Finished message encrypted with the session key, indicating that the server’s part of the handshake is complete. At this point, the secure connection is established, and both the client and server can begin exchanging encrypted data. Ensure trust: Purpose: 1. Private Connection: Encryption to prevent eavesdropping. 2. Reliable Connection: Ensures integrity by preventing tampering. 3. Authenticated Communication: Prevents impersonation (e.g., man-in-the-middle attacks). Challenges: Increased latency and bandwidth usage. Complexity due to protocol setup. Protocol Stack: HTTP over SSL/TLS becomes HTTPS. Certificate Generation Process: 1. Generate private-public key pair. 2. Create a Certificate Signing Request (CSR). 3. Submit CSR to a Certificate Authority (CA). 4. CA verifies and issues the certificate. Encryption Types: 1. Symmetric Encryption: o Single shared secret key for encryption and decryption. o Advantages: Fast and efficient. o Disadvantages: Key distribution challenges. 2. Asymmetric Encryption: o Public key for encryption; private key for decryption. o Used in SSL/TLS for secure communication. Key Exchange (Diffie-Hellman): Securely shares secret keys without interception. Provisioning Definition: Process of setting up and configuring infrastructure resources (e.g., servers, networks, users). Methods: o Manual or automated provisioning. o Includes resource configuration and maintenance post-setup. Automation Tools: 1. Ansible: o Automates server configurations using playbooks. o Alternatives: Chef, Puppet, SALT. 2. Terraform: o Automates infrastructure provisioning. o Supports Infrastructure as Code (IaC). o Works with cloud providers like AWS, Google Cloud, and Azure. o Example Terraform Configuration: Wake-on-LAN (WoL) Protocol to wake up a computer via its Network Interface Controller (NIC). How It Works: o NIC listens for a "magic packet." o Magic packet signals the system to turn on. Key Topics for Exam 1. Firewalls: o Policies (accept, drop, reject) and zones (DMZ). o Default permit vs. default drop configurations. 2. DNS: o Functionality and record types (A, AAAA, CNAME, MX, etc.). 3. SSL/TLS: o Encryption types, key exchange, certificate generation. o Differences between HTTPS and HTTP. 4. Provisioning Tools: o Terraform for IaC and Ansible for configuration management. 5. Load Balancing: o Concepts, types, and advantages. 6. Encryption: o Symmetric vs. asymmetric encryption and Diffie-Hellman key exchange. 7. Wake-on-LAN: o Mechanism and practical applications. Other key topics Proxy and reverse proxy A proxy and a reverse proxy are both intermediary devices or services that sit between clients (e.g., web browsers or users) and servers, but they serve different purposes in terms of data routing, privacy, and security. A proxy (or forward proxy) acts as an intermediary between a client (such as a user or a web browser) and the server from which the client requests resources. The client sends its requests to the proxy server, which then forwards them to the target server. When the target server responds, the proxy receives the response and sends it back to the client. o Key functions of a proxy: ▪ Privacy and Anonymity ▪ Content filtering ▪ Caching ▪ Bypass Geo-restrictions o How it works 1. The client sends a request to the proxy server for a web page or resource. 2. The proxy server forwards the request to the target web server. 3. The target server responds to the proxy with the requested data. 4. The proxy sends the data back to the client. A reverse proxy works in the opposite direction of a forward proxy. It acts as an intermediary between the client and one or more backend servers. When a client makes a request, the reverse proxy forwards it to the appropriate backend server, which processes the request and sends the response back to the reverse proxy. The reverse proxy then forwards the response back to the client. Unlike a forward proxy, which handles requests from clients, a reverse proxy handles requests on behalf of the server(s). This makes the server or servers behind the reverse proxy shielded from direct access by the clients. o Key functions of a reverse proxy ▪ Load balancing ▪ Security ▪ SSL Termination ▪ Caching ▪ Content compression o How it works: 1. The client sends a request to the reverse proxy for a web page or resource. 2. The reverse proxy determines which backend server should handle the request. 3. The reverse proxy forwards the request to the chosen backend server. 4. The backend server processes the request and sends the response back to the reverse proxy. 5. The reverse proxy sends the response to the client. NGINX Nginx (pronounced "engine-x") is a highly efficient and flexible web server and reverse proxy server that is also widely used for load balancing, HTTP caching, and as an application server for dynamic content. Originally developed by Igor Sysoev in 2004 to solve the problem of handling a large number of simultaneous connections, Nginx has grown to become one of the most popular and widely used web servers on the internet. Frame In the context of operating systems and memory management, a frame refers to a fixed-sized block of physical memory (RAM). Frames are a part of the paging mechanism used by the operating system to manage memory efficiently.