TCP/IP Model Lecture Notes
Document Details

Uploaded by PopularMachuPicchu6331
Tags
Summary
These lecture notes cover the TCP/IP model, detailing the five layers (application, transport, internet, network, and physical) and their functions in enabling reliable communication over networks. Key topics include Remote Procedure Call (RPC) and Remote Method Invocation (RMI), along with related protocols and concepts like parameter passing and error handling.
Full Transcript
Lecture 2 TCP/IP Model The TCP/IP model consists of five layers, each serving a specific function to enable reliable communication over networks. 1. Application Layer: o The highest layer that provides network services to applications. o Enables users and software to interact with the...
Lecture 2 TCP/IP Model The TCP/IP model consists of five layers, each serving a specific function to enable reliable communication over networks. 1. Application Layer: o The highest layer that provides network services to applications. o Enables users and software to interact with the network. o Includes protocols for different types of services, such as: HTTP/HTTPS (Web Browsing), FTP (File Transfer). SMTP, IMAP, POP3 (Email). DNS (Domain Name System), SSH, Telnet (Remote Access) 2. Transport Layer: o Ensures end-to-end communication and reliability between devices. o Uses port numbers to identify specific processes/services and enable the communication channels o Supports segmentation, reassembly, and error handling. o Provides two key transport protocols: TCP (Transmission Control Protocol): Reliable, connection-oriented communication (used in HTTP, FTP, etc.). UDP (User Datagram Protocol): Fast, connectionless communication (used in VoIP, DNS, etc.). 3. Internet Layer: o Handles packet routing and addressing between different networks. o Uses logical addressing (IP addresses) to identify source and destination hosts. o Determines the best path for data transmission across networks. o Protocols: IP (IPv4, IPv6), ICMP (Internet Control Message Protocol), and RIP (Routing Information Protocol). 4. Network Layer: o Ensures reliable data transfer between adjacent network nodes (delivery of messages). o Follows the idea of a mesh-network (a distributed system that has a certain structure). o Typically, Data Stations are not connected. The network layer is the one that tries to bridge the first step into many steps of the delivery of a message. 5. Physical Layer: o This is the lowest layer responsible for the physical connection between devices. o It defines how data is transmitted in terms of electrical signals, radio waves, or light pulses (signaling) o Includes network hardware like cables, switches, and NICs (Network Interface Cards). o Protocols: Ethernet, Wi-Fi (802.11), DSL, and Fiber Optics. 1 Lecture 2 Note: When a computer sends or receives data over the network, the IP address identifies the device, while the port number identifies the specific application/service running on that device. When a client requests data from a server, it specifies both the IP address and the port number. Procedure Call The first lines are the procedure definition, which defines how the procedure operates in the computer. This definition is already contained in the function and it does not have to be stated every time the function is called. By only using the procedure name and required parameters, the function operates the computations. The definition contains: 1) procedure name, 2) parameters, 3) return type, 4) procedure body, 5) procedure call, 6) argument, 7) assignment. Procedure name: factorial Procedure parameter: (5) Functions are created so they can be called (and reused), whenever the procedure is needed (Procedure Call). Local Procedure Call The caller (client program) and the callee (server program) run on the same machine. If two programs run on the same machine, exchanging data (parameters and results) between these programs is straightforward. Local procedure calls typically use a synchronous communication scheme. In a local procedure call, data is exchanged via a shared local memory. There are two ways to exchange data: Call by value (CBV): The parameter values of a procedure call are copied into an area of the memory that can be accessed by the callee. If the callee modifies this copy, such changes do not affect the original value in the caller program. o Downside: We have higher memory usage. o Benefit: Original value is not changed Call by reference (CBR): the callee receives a pointer (i.e. a memory address) for each parameter of a procedure call. If the callee modifies a referenced parameter value it changes the same data structure that is used by the caller program. o Downside: Original values get lost o Benefit: You are working on the same value in the memory system. We do not use so much memory space. 2 Lecture 2 o There are no shared memories in distributed systems so CBR cannot be used, but it can be emulated. Remote Procedure Call The caller and the callee run on different machines. However, from a programmer perspective (and from the perspective of the client and server components) a remote procedure call and a local procedure call look exactly the same. Remote calls take longer than local calls. Classical RPCs follow (or at least emulate) a synchronous communication scheme. RPC middleware interacts with client and server to hide the distribution details. It is an important goal of RPC middleware to hide distribution details and to provide different types of transparency. Remote Procedure Call (RPC) transparency refers to the ability of an RPC system to hide or minimize the complexities of distributed computing from the programmer and user. The goal is to make a remote procedure call look and behave as closely as possible to a local procedure call, even though it happens over a network. However, in practice, RPCs can never be fully transparent, because distributed systems introduce challenges such as network failures, latency, and security issues. Access transparency enables local and remote resources to be accessed using identical operations (realized via stubs that act as local proxies, see below) Location transparency enables resources to be accessed without knowledge of their physical or network location (requires a binding mechanism, see below) Failure transparency enables the concealment of faults (to a certain degree) by offering different types of RPC semantics (see below) Performance transparency allows the system to be reconfigured to improve performance as loads vary (because of the roundtrip delay, RPCs are usually significantly slower than local procedure calls). Type of Meaning Transparency Location The user doesn't know (or need to know) where a resource is located. Transparency Access The user accesses local and remote resources the same way. Transparency Failure The system handles failures, so users and applications don't see them. The Transparency system automatically retries it or redirects it to another available server. The system dynamically adjusts to load changes to optimize performance. If Performance a server is slow or overloaded, the system reroutes traffic dynamically to Transparency maintain good performance. RPC Middleware: The middleware between computer 1 and computer 2. We have a client component (this is the python script that has the factorial call, in our example). We also have a server component that contains the definition of the procedure. In between we have a software component that enables transparency called stubs and skeletons. A stub is a program that is a part of the RPC middleware and acts as proxy for a remote procedure. A client stub provides the client component with a local interface for an RPC. It 3 Lecture 2 prepares the RPC requests that are sent to the server stub. A server stub receives RPC’s and performs respective local procedure calls in the server component. They are relevant because they hide the complexity of network communication, making remote calls appear as local calls. The stubs hide the remote procedure/program from the caller and callee. Parameter Passing In a distributed system, client and server components run on different machines and do not have a shared local memory. This fulfils the characteristic of a remote procedure call. Moreover, other different problems exist: Client and server often run on different software platforms that use different character encodings Client and server may run on different hardware platforms Therefore, parameter passing in an RPC context is much more difficult than for local procedure calls. Client and server must agree on the representation of data structures and the exchange format of the RPC request and reply messages. The stubs use the agreed upon format and procedure to assemble and disassemble RPC request and reply messages (marshalling). Because of the difficulties discussed above, RPC parameter passing usually follows a call-by- value scheme. The client sends a copy of the parameter values to the server, but any modifications made by the server do not affect the original variables in the client. It is possible to simulate a call-by-reference (CBR) scheme, also known as call-by-copy/restore (CBC/R). Since remote systems do not share memory, true call-by-reference is not possible. Instead, a copy of the data is sent, modified, and returned to the client, mimicking call-by- reference behavior. The client stub copies the CBR parameter(s) (i.e. the corresponding memory structure) into the request message. The server stub rebuilds the corresponding memory structure and passes a respective pointer to the server procedure. The server procedure can modify the memory structure. After the server procedure has finished, the server stub copies the changed memory structure into the reply message. Upon receipt of the reply, the client stub copies the changed values into the corresponding memory structure that is used by the client component. Marshalling: the process of taking a collection of data items and assembling them into a form suitable for transmission in a message. It translates data items from a local in-memory representation into an external data representation. Unmarshalling: the process of disassembling the corresponding message to produce an equivalent collection of data items. Unmarshalling rebuilds a corresponding in-memory representation from the external data representation. 4 Lecture 2 Marshalling and Unmarshalling are conducted by the stubs. One way direction = Computer 2 never requests anything RPC Binding In order to invoke remote procedures, the client component must be bound to the corresponding server component. Binding includes at least the name and the network address of the server component that offers the interface that the client needs. To hide the distribution details from the client, the RPC middleware establishes the binding for the client. Static binding: Is compiled into the client stub and can only be changed by re-compiling the client stub. You get this information once for the entire lifetime of this client stub and it remains unchanged until computer 1 is shut down. It is unchangeable for the entire lifespan. You bind statically the local procedure call to the server component. Benefit: You do not have a time penalty. Dynamic Binding: Here you need to have a global registry/discovery system. Must be established before sending an RPC request. Server components advertise their bindings with a discovery service. In the directory/discovery service where we can see the address where we need to deliver to. It is dynamic because for each and every call the information (the address) might change. Benefit of the dynamic binding: It is changeable during run-time. You do not have to shut down the client or server stuff. But you wait longer! Dynamic binding is way easier in a homogenous system (It is written in the same programming language) than in a heterogenous system (where we deal with two different programming languages instead of one). !Bindings offer Location Transparency! There are different RPC semantics that describe the reliability of an RPC: Maybe: The RPC may be executed or not at all (one request message; the client may or may not receive a reply) At-least-once: The RPC is executed one or more times (possibly multiple request messages; the client either receives a reply or an exception if no reply is received) At-most-once: The RPC is either executed once or not at all Exactly-once: The RPC is executed exactly once and one corresponding reply message is received by the client 5 Lecture 2 RPC middleware aims for exactly-once. However, because failures occur an exactly-once might not always be possible. RPC Failures In a distributed system, each hardware and software component might fail independently, and messages may be lost. 1. Client-cannot-locate-server failure: The client stub cannot locate the server stub. When binding details may have changed, binding can then be not established. Then the client stub cannot locate the server stub; however, it should be able to recover from this failure (e.g., exception triggers or other type of compensation action, such as caching for local implementation). 2. Lost Messages: The client cannot distinguish a lost request from a lost reply. Therefore, all lost messages are treated the same. The server sets a time t and if it does not receive a reply before the timeout expires, it resubmits the corresponding request message. Each (request and reply) message include a unique message identifier to enable the detection of duplicate messages. Different approaches exist to deal with a duplicate message. Idempotent operation: Procedure that can be performed repeatedly with the same effect as if it had been performed once. In this case, the server can repeat the operation and submit a new reply message. The client can ignore quasi-duplicate (“unrequested”) reply messages. Non-idempotent operation: In this case, the server resubmits the corresponding reply without repeating the corresponding RPC; this scheme requires the server to (temporarily) store the results of a request and some implicit or explicit) acknowledgement message to tell the server when it can delete a stored result. A client can ignore duplicate reply messages. What happens when the message is lost? The server component is never triggered. The client does not perceive that the message (the request message) is lost. It just never receives a reply message. It is endless waiting. How do we deal with that? We need to ask ourselves “How long do we wait for the incoming data?”. But how do we decide that? We assume the time it takes to complete the operation. But the problem is you do not know how long it takes the server component to execute. You can make assumptions by looking at historical data etc. Then you can calculate the time out time. After the time-out is over you resend the corresponding request message. To enable the detection of duplicate messages, each request and reply message contain a unique message identifier. This is globally unique (usually a string). On the server side you can do bookkeeping with these identifiers. 3. Server Crash: From the client perspective, a server crash cannot be differentiated from a lost message. Therefore, it is treated similar to a lost message failure. The client determines a second time-out to determine for how long/how often it will resubmit the request message. 4. Client Crash: In this case, the client crashes before the reply message. However, the server does not know that the client crashed! When this occurs, the reply message or server operation that is lost is called an “orphan”. These orphans block resources of the middleware when client and server are in a constant communication scheme. Solutions: If the client recovers, it can send a broadcast to all servers telling them to delete all orphans related to this particular client. Another option is that the server defines a timeout and deletes all (presumably orphaned) resources if the timeout expires without an 6 Lecture 2 acknowledgement (expiration). In order to differentiate between lost messages and server crashes you can open a second communication channel to ensure liveliness. The client repeatedly pings the server (“Server are you there?”). This is called pinging. Only if the last ping was successful, you do the RPC. This means you can differentiate between lost messages and crashes (because if we have a crash it would not react). o Benefit Client: It is adaptable to each client’s situation. o Downside Client: You have to wait for the result 90 milliseconds each way. o Downside Server: You double the request for the data station (because you need to process 2 requests for one ping). The server regularly sends a periodic heartbeat. If such a check is performed in regular intervals, a heartbeat is more efficient because each ping requires a corresponding acknowledgement. Benefit: The servers’ resources are less used because it can choose the timepoint of the heartbeat. It sends one heartbeat to all clients. Downside Client: The client needs to listen all the time and has an active part. Asynchronous RPC If the client does not require a result, or if it is sufficient for the client to receive a result somewhere in the future, or if a certain procedure does not return a result, we can use an asynchronous RPC. However, the client usually expects an acknowledgement for his request. Asynchronous RPC behavior can be realized in different ways: The server ‘directly’ acknowledges the receipt of a request message The middleware provides a corresponding synchronization option Asynchronous RPC is a game-changer for distributed systems—it avoids blocking the client, enabling it to continue execution while waiting for the response. However, it introduces complexity in synchronization, which many developers underestimate. Callback RPC In a callback RPC the server does not send a reply message but ‘calls the client back’ via a corresponding RPC request. The client provides the server with a binding for the callback RPC Thus, in a callback RPC client and server (temporarily) switch roles For example, callback RPCs can be used to: simulate an asynchronous RPC via two ‘ordinary’ RPCs inform the client that a particular event occurred at the server request additional information from a client. Context Handles If client and server reside on different machines, they do not have a shared memory to maintain status information for a session (i.e. across multiple RPCs). In general, a context handle is a data structure that stores state information associated with a particular client/server session. 7 Lecture 2 Context handles can be implemented in different ways: Via a structured document that is exchanged between client and server. Via a reference to a data structure that is stored at the server. If the server does not trust the client to return the context handle unchanged, it provides the client with a reference only. If the “reference option” in not feasible (e.g. in an identity management context), the context handle can be protected cryptographically (e.g. via a digital signature or a message authentication code). Remote Method Invocation A remote method invocation (RMI) is essentially an RPC in the context of object-oriented programming languages and remote objects. A remote object (also: distributed object) is an object which can be accessed from other objects that reside in a different (remote) address space. In an object-oriented programming language, local objects are accessed via object references. An object reference provides the ID-string (fully-qualified object name) of a particular local object. Object references can be assigned to variables, passed as arguments, or returned as results of methods, for example. A remote object reference (at least) provides the network address of the remote machine, an end point (e.g. a port number), and an ID-string of the remote object. Similar to local object references, remote object references can also be assigned to variables, passed as arguments, or returned as results of methods. Thus, RMI enables a call-by-reference scheme for remote objects. If an object is accessed via a remote object reference, the corresponding stub (i.e. the client-side proxy) marshals the call into a request message and sends it to the remote object (resp. the server-side stub). In a homogeneous software system, the client stub can either be downloaded from the server, or generated by the client. Thus, the client stub can directly act as remote object reference, i.e. the stub is used as a reference to the remote object (such as in Java RMI for example). RMI in heterogeneous systems is usually more complicated TCP/IP Model: framework that defines how data is transmitted using a layered architecture to ensure reliable communication Physical Layer (Layer 1) – Handles the actual transmission of raw bits over cables or wireless. Network Layer (Layer 2) → Responsible for local delivery of frames using MAC addresses. Internet Layer (Layer 3) → Handles the routing of datagrams using IP addresses to ensure they reach the correct destination across networks. Transport Layer (Layer 4) – Manages data segmentation and ensures reliable (TCP) or fast (UDP) communication between devices Application Layer (Layer 5) – Exchanges messages between applications, enabling user- level communication through protocols like HTTP. 8