Chapter 2 Application Layer Summary PDF

Chapter 2: Application Layer Network applications are the raison d'être of a computer network. They include text email, remote access to computers, file transfers, the World Wide Web (mid 90s), web searching, e-commerce, Twitter/Facebook, Amazon, Netflix, Youtube , WoW... 2.1 Principles of Network Applications At the core of network application development is writing programs that run on different end systems and communicate with each over the network. The programs running on end systems might be different (server-client architecture) or identical (Peer- to-Peer architecture). Importantly we write programs that run on end systems/hosts, not on network-core devices (routers/link-layer switches). 2.1.1 Network Application Architectures From the application developer's perspective, the network architecture is fixed and provides a specific set of services to applications. The application architecture, on the other hand, is chosen by him. In choosing the application architecture, a developer will likely draw one of the two predominant architectural paradigms used in modern network applications: Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 1 Client-server architecture: there is an always on host, called the server which serves requests from many other hosts, called clients: [Web Browser and Web Server]. Clients do not communicate directly with each other. The server has a fixed, well-known address, called an IP address that clients use to connect to him. often, a single server host is incapable of keeping up with all the requests from clients, for this reason, a data center, housing a large number of hosts, is often used to create a powerful virtual server (via proxyin). P2P architecture: there is minimal or no reliance on dedicated servers in data centers, the application exploits direct communication between pairs of intermittently connected bots, called peers. They are end systems owned and controlled by users. [Bittorrent, Skype]. P2P applications provide self-scalability (the network load is distributed) They are also cost-effective since they don't require significant infrastructure and server bandwidth. Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 2 P2P face challenges: i. ISP Friendly (asymmetric nature of residential ISPs) ii. Security iii. Incentives (convincing users to participate) Some applications have hybrid architectures, such as for many instant messaging applications: a server keeps track of the IP addresses of users, but user-to-user messages are sent directly between users. 2.1.2 Processes Communicating In the jargon of operating systems, it's not programs but processes that communicate. A process can be thought of as a program that is running within an end system. Processes on two different end systems communicate with each other by exchanging messages across the computer network: a sending process creates and sends messages into the network; a receiving process receives these messages and possibly responds by sending messages back. Client and Server Processes Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 3 A network application consists of pairs of processes that send messages to each other over a network. For each pair of communicating processes we label:  The process that initiates the communication as the client [web browser]  The process that waits to be contacted to begin the session as the server [web server] This labels stand even for P2P applications in the context of a communication session. The Interface between the Process and the Computer Network A process sends messages into, and receives messages from, the network through a software interface called a socket. A socket is the interface between the application layer and the transport layer within a host, it is also referred to as the Application Programming Interface (API) between the application and the network. The application developer has control of everything on the application-layer of the socket but has little control of the transport-layer side of the socket. The only control that he has over the transport-layer is: 1. The choice of the transport protocol 2. Perhaps the ability to fix a few transport-layer parameters such as maximum buffer and maximum segment sizes Addressing Processes In order for a process running on one host to send packets to a process running on another host, the receiving process needs to have an address. To identify the receiving processes, two pieces of information need to be specified: Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 4 1. The address of the host. In the Internet, the host is identified by its IP Address, a 32- bit (or 64) quantity that identifies the host uniquely. 2. An identifier that specifies the receiving process in the destination host: the destination port number. Popular applications have been assigned specific port numbers (web server -> 80) 2.1.3 Transport Services Available to Applications What are the services that a transport-layer protocol can offer to applications invoking it? Reliable Data Transfer For many applications, such as email, file transfer, web document transfers and financial applications, packet's drops and data loss can have devastating consequences. If a protocol provides guarantees that the data sent is delivered completely and correctly, it is said to provide reliable data transfer. The sending process can just pass its data into the socket and know with complete confidence that the data will arrive without errors at the receiving process. Throughput In Chapter 1, we introduced the concept of available throughput, which, in the context of a communication session between two processes along a network path, is the rate at which the sending process can deliver bits to the receiving process. A transport-layer protocol could provide guaranteed available throughput at some specific rate. Applications that have throughput requirements are said to be bandwidth-sensitive applications. While bandwidth-sensitive applications have specific throughput requirements, elastic applications can make use of as much, or as little, throughput as happens to be available. Electronic mail, file transfer, and Web transfers are all elastic applications. Timing A transport-layer protocol can also provide timing guarantees. Example: guarantees that every bit the sender pumps into the socket arrives at the receiver's socket no more than 100 msec later, interesting for real-time applications such as telephony, virtual environments... Security Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 5 Finally, a transport protocol can provide an application with one or more security services. For example, in the sending host, a transport protocol can encrypt all data transmitted by the sending process, and in the receiving host, the transport-layer protocol can decrypt the data before delivering the data to the receiving process. Such a service would provide confidentiality between the two processes, even if the data is somehow observed between sending and receiving processes. A transport protocol can also provide other security services in addition to confidentiality, including data integrity and end-point authentication. 2.1.4 Transport Services Provided by the Internet The Internet makes two transport protocols available to applications: TCP and UDP. TCP Services TCP includes a connection-oriented service and a reliable data transfer service:  Connection-oriented service: client and server exchange transport-layer control information before the application-level messages begin to flow. This so-called handshaking procedure alerts the client and server, allowing them to prepare for an onslaught of packets. Then a TCP connection is said to exist between the sockets of the two processes. When the application finishes sending messages, it must tear down the connection. SECURING TCP Neither TCP nor UDP provide encryption. Therefore the Internet community has developed an enhancement for TCP called Secure Sockets Layer (SSL), which not only does everything that traditional TCP does but also provides critical process-to-process security services including encryption, data integrity and end-point authentication. It is not a third protocol, but an enhancement of TCP, the enhancement being implemented in the application layer in both the client and the server side of the application (highly optimized libraries exist). SSL has its own socket API, similar to the traditional one. Sending processes passes clear text data to the SSL socket which encrypts it.  Reliable data transfer service the communicating processes can rely on TCP to deliver all data sent without error and in the proper order. TCP also includes a congestion-control mechanism, a service for the general welfare of the Internet rather than for the direct benefit of the communicating processes. It Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 6 throttles a sending process when the network is congested between sender and receiver. UDP Services UDP is a no-frills, lightweight transport protocol, providing minimal services. It is connectionless, there's no handshaking. The data transfer is unreliable: there are no guarantees that the message sent will ever reach the receiving process. Furthermore messages may arrive out of order. UDP does not provide a congestion-control mechanism neither. Services Not Provided by Internet Transport Protocols These two protocols do not provide timing or throughput guarantees, services not provided by today's Internet transport protocols. We therefore design applications to cope, to the greatest extent possible, with this lack of guarantees. 2.1.5 Application-Layer Protocols An application-layer protocol defines how an application's processes, running on different end systems, pass messages to each other. It defines:  The type of the messages exchanged (request/response)  The syntax of the various message types Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 7  The semantics of the fields (meaning of the information in fields)  The rules for determining when and how a process sends messages and responds to messages 2.2 The Web and HTTP In the early 1990s, a major new application arrived on the scene: the World Wide Web (Berners-Lee 1994), the first application that caught the general public's eye. The Web operates on demand: users receive what they want, when they want it. It is enormously easy for an individual to make information available over the web, hyperlinks and search engines help us navigate through the ocean of web sites... 2.2.1 Overview of HTTP The HyperText Transfer Protocol (HTTP), the Web's application-layer protocol is a the heart of the Web. It is implemented in two programs: a client program and a server program. The two programs talk to each other by exchanging HTTP messages. A Web page (or document) consists of objects. An object is simply a file (HTML file, jpeg image...) that is addressable by a single URL. Most Web pages consist of a base HTML file and several referenced objects. The HTML file references the other objects in the page with the objects' URLs. Each URL has two components: the hostname of the server that houses the object and the object's path name. Web Browsers implement the client side of HTTP. HTTP uses TCP as its underlying transport protocol. The server sends requested files to clients without storing any state information about the client: it is a stateless protocol. PC running Firefox browser server running Apache Web Dr. Eman Sanad , Assistant Prof. IT Department, server Faculty of computers and Artificial intelligence 8 iPhone running Safari browser 2.2.2 Non-Persistent and Persistent Connections In many Internet applications, the client and server communicate for an extended period of time, depending on the application and on how the application is being used, the series of requests may be back-to-back, periodically at regular intervals or intermittently. When this is happening over TCP, the developer must take an important decision: should each request/response pair be sent over a separate TCP connection or should all of the requests and their corresponding responses be sent over the same TCP connection? In the former approach, the application is said to use non-persistent connections and in the latter it is said to use persistent connections By default HTTP uses non-persistent connections but can be configured to be use persistent connections. To estimate the amount of time that elapses when a client requests the base HTML file until the entire file is received by the client we define the round-trip time (RTT) which is the time it takes for a small packet to travel from client to server and then back to the client. HTTP with Non-Persistent Connections For the page and each object it contains, a TCP connection must be opened (handshake request, handshake answer), we therefore observe an addition RTT, and for each object we will have a request followed by the reply This model can be expensive on the server side: a new connection needs to be established for each requested object, for each connection a TCP buffer must be allocated along some memory to store TCP variables. HTTP with Persistent Connections The server leaves the TCP connection open after sending a response, subsequent requests and responses between the same client and server will be sent over the same connection. In particular an entire web page (text + objects) can be sent over a single persistent TCP connection, multiple web pages residing on the same server can be sent from the server to the same client over a single persistent TCP connection. These requests can be make back-to-back without waiting for replies to pending requests (pipelining). When the server receives back-to-back requests, it sends the objects back- to-back. If connection isn't used for a pre-decided amount of time, it will be closed. Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 9 Non-Persistent HTTP Persistent HTTP 2.2.3 HTTP Message Format Two types of HTTP messages: HTTP Request Message carriage return character line-feed request line (GET, POST, character GET /index.html HTTP/1.1\r\n HEAD commands) Host: www-net.cs.umass.edu\r\n User-Agent: Firefox/3.6.10\r\n Accept: header text/html,application/xhtml+xml\r\n lines Accept-Language: en-us,en;q=0.5\r\n Accept-Encoding: gzip,deflate\r\n Accept-Charset: ISO-8859-1,utf- carriage return, 8;q=0.7\r\n line feed at start Dr. Eman Sanad ,Keep-Alive: 115\r\n Assistant Prof. IT Department, Faculty of computers and Artificial intelligence of line indicates Connection: keep-alive\r\n end of header lines \r\n 10  Ordinary ASCII text  First line: request line  Other lines: header lines  the first lines has 3 fields: method field, URL field, HTTP version field: o method field possible values: GET, POST, HEAD, PUT, DELETE The majority of HTTP requests use the GET method, used to request an object. The entity body (empty with GET) is used by the POST method, for example for filling out forms. The user is still requesting a Web page but the specific contents of the page depend on what the user entered into the form fields. When POST is used, the entity body contains what the user entered into the form fields. Requests can also be made with GET including the inputted data in the requested URL. The HEAD method is similar to GET, when a server receives it, it responds with an HTTP message but it leaves out the Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 11 requested object. It is often used for debugging. PUT is often used in conjunction with web publishing tools, to allow users to upload an object to a specific path on the web servers. Finally, DELETE allows a user or application to delete an object on a web server. HTTP Response Message A typical HTTP response message:  Status line: protocol version, status code, corresponding status message  six header lines: o the connection will be closed after sending the message o date and time when the response was created (when the server retrieves the o object from the file system, insert object in the message, sends the response o message) o Type of the server / software o Last modified: useful for object caching Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 12 o Content-Length: number of bytes in the object o Content-Type  entity body: contains the requested object itself (data) Some common status codes:  200 OK: request succeeded, information returned  301 Moved Permanently: the object has moved, the new location is specified in the header of the response  400 Bad Request: generic error code, request not understood  404 Not Found: The requested document doesn't exist on the server  505 HTTP Version Not Supported: The requested HTTP protocol version is not supported by the server 2.2.4 User-Server Interaction: Cookies An HTTP server is stateless in order to simplify server design and improves performances. A website can identify users using cookies. Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 13 Cookie technology has 4 components: 1. Cookie header in HTTP response message 2. Cookie header in HTTP request message 3. Cookie file on the user's end-system managed by the browser 4. Back-end database at the Website User connects to website using cookies: Server creates a unique identification number and creates an entry in its back-end database indexed by the identification number -server responds to user's browser Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 14 including in the header: Set-cookie: identification number The browser will append to the cookie file the hostname of the server and the identification number header Each time the browser will request a page, it will consult the cookie file, extract the identification number for the site and put a cookie header line including the identification number The server can track the user's activity: it knows exactly what pages, in which order and at what times that identification number has visited. This is also why cookies are controversial: a website can learn a lot about a user and sell this information to a third party. Therefore cookies can be used to create a user session layer on top of stateless HTTP. 2.2.5 Web Caching A Web cache, also called proxy server is a network entity that satisfies HTTP requests on behalf of an origin Web server. It has its own disk storage and keeps copies of recently requested objects in this storage. 1. The browser establishes a TCP connection to the web cache, sending an HTTP request for the object to the Web cache. 2. The web cache checks to see if it has a copy of the object stored locally. If yes, it will return it within an HTTP response message to the browser. 3. If not, the Web cache opens a TCP connection to the origin server, which responds with the requested object. 4. The Web caches receives the object, stores a copy in its storage and sends a copy, within an HTTP response message, to the browser over the existing TCP connection. Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 15 Therefore a cache is both a server and a client at the same time. Usually caches are purchased and installed by ISPs. assumptions:  avg object size: 100K bits  avg request rate from browsers to origin servers:15/sec  avg data rate to browsers: 1.50 Mbps  RTT from institutional router to any origin server: 2 sec  access link rate: 1.54 Mbps consequences:  LAN utilization: 0.15% problem!  access link utilization = 99%  total delay = Internet delay + access delay + LAN delay = 2 sec + minutes + usecs Caching Example Option 1: buy a faster access link assumptions:  avg object size: 100K bits  avg request rate from browsers to origin servers:15/sec  avg data rate to browsers: 1.50 Mbps  RTT from institutional router to any origin server: 2 sec  access link rate: 1.54 Mbps 154 Mbps consequences:  LAN utilization: 0.15%  access link utilization = 99% 9.9%  total delay = Internet delay + access delay + LAN delay Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence = 2 sec + minutes + usecs 16 Msec Cost: faster access link (expensive!) The web cache can substantially reduce the response time for a client request and substantially reduce traffic on an institution's access link to the Internet. Through the use of Content Distribution Networks (CDNs) web caches are increasingly playing an important role in the Internet. A CDN installs many geographically distributed caches throughout the Internet, localizing much of the traffic. 2.2.6 The Conditional GET Caches introduce a new problem: what if the copy of an object residing in the cache is stale? The conditional GET is used to verify that an object is up to date. An HTTP request message is a conditional get if 1. The request message uses the GET method 2. The request message includes an If-modified-since: header line. Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 17 A conditional get message is sent from the cache to server which responds only if the object has been modified. 2.3 Electronic Mail in the Internet As with ordinary postal mail, e-mail is an asynchronous communication medium— people send and read messages when it is convenient for them, without having to coordinate with other people’s schedules. In contrast with postal mail, electronic mail is fast, easy to distribute, and inexpensive. Modern e-mail has many powerful features, including messages with attachments, hyperlinks, HTML-formatted text, and embedded photos. In this section, we examine the application-layer protocols that are at the heart of Internet e-mail. But before we jump into an in-depth discussion of these protocols, let’s take a high-level view of the Internet mail system and its key components. Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 18 Figure 2.14 presents a high-level view of the Internet mail system. We see from this diagram that it has three major components: user agents, mail servers, and the Simple Mail Transfer Protocol (SMTP). We now describe each of these components in the context of a sender, Alice, sending an e-mail message to a recipient, Bob. User agents allow users to read, reply to, forward, save, and compose messages. Examples of user agents for e-mail include Microsoft Outlook, Apple Mail, Web- based Gmail, the Gmail App running in a smartphone, and so on. When Alice is finished composing her message, her user agent sends the message to her mail server, where the message is placed in the mail server’s outgoing message queue. When Bob wants to read a message, his user agent retrieves the message from his mailbox in his mail server. Mail servers form the core of the e-mail infrastructure. Each recipient, such as Bob, has a mailbox located in one of the mail servers. Bob’s mailbox manages and maintains the messages that have been sent to him. A typical message starts its journey in the sender’s user agent, then travels to the sender’s mail server, and then travels to the recipient’s mail server, where it is deposited in the recipient’s mailbox. When Bob wants to access the messages in his mailbox, the mail server containing his mailbox authenticates Bob (with his username and password). Alice’s mail server must also deal with failures in Bob’s mail server. If Alice’s server cannot deliver mail to Bob’s server, Alice’s server holds the message in a Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 19 message queue and attempts to transfer the message later. Reattempts are often done every 30 minutes or so; if there is no success after several days, the server removes the message and notifies the sender (Alice) with an e-mail message. SMTP  uses TCP to reliably transfer email message from client (mail server initiating connection) to server, port 25  direct transfer: sending server (acting like client) to receiving server  three phases of transfer  SMTP handshaking (greeting)  SMTP transfer of messages  SMTP closure  command/response interaction (like HTTP)  commands: ASCII text  response: status code and phrase SMTP is the principal application-layer protocol for Internet electronic mail. It uses the reliable data transfer service of TCP to transfer mail from the sender’s mail server to the recipient’s mail server. As with most application-layer protocols, SMTP has two sides: A client side, which executes on the sender’s mail server, and a server side, which executes on the recipient’s mail server. Both the client and server sides of Outgoing message queue. SMTP runs on every mail server. When a mail server sends mail to other mail servers, it acts as an SMTP client. When a mail server receives mail from other mail servers, it acts as an SMTP server. Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 20 S: 220 hamburger.edu C: HELO crepes.fr S: 250 Hello crepes.fr, pleased to meet you C: MAIL FROM: S: 250 [email protected]... Sender ok C: RCPT TO: S: 250 [email protected]... Recipient ok C: DATA S: 354 Enter mail, end with "." on a line by itself C: Do you like ketchup? C: How about pickles? C:. S: 250 Message accepted for delivery C: QUIT S: 221 hamburger.edu closing connection Today, there are two common ways for Bob to retrieve his e-mail from a mail server. If Bob is using Web-based e-mail or a smartphone app (such as Gmail), then the user agent will use HTTP to retrieve Bob’s e-mail. This case requires Bob’s mail server to have an HTTP interface as well as an SMTP interface (to communicate with Alice’s mail server). The alternative method, typically used with mail clients such as Microsoft Outlook, is to use the Internet Mail Access Protocol (IMAP) defined in RFC 3501. Both the HTTP and IMAP approaches allow Bob to manage folders, maintained in Bob’s mail server. Bob can move messages into the folders he creates, delete messages, mark messages as important, and so on. 2.5 DNS - The Internet's Directory Service One identifier for a host is its hostname [ cnn.com , www.yahoo.com ]. Hostnames are mnemonic and therefore used by humans. Hosts are also identified by IP addresses. 2.5.1 Services provided by DNS Routers and use IP addresses. The Internet's domain name system (DNS) translates Host names to IP addresses. The DNS is: 1. A distributed database implemented in a hierarchy of DNS Servers 2. An application-layer protocol that allows hosts to query the distributed database. Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 21 DNS servers are often UNIX machines running the Berkeley Internet Name Domaine (BIND) software. DNS runs over UDP and uses port 53 It is often employed by other application-layer protocols (HTTP, FTP...) to translate user-supplied hostnames to IP addresses. How it works:  The user machine runs the client side of the DNS application  The browser extracts www. xxxxx. xxx from the URL and passes the hostname to the client side of the DNS application  The DNS sends a query containing the hostname to a DNS server  The DNS client eventually receives a reply including the IP address for the hostname  The browser can initiate a TCP connection. DNS adds an additional delay DNS provides other services in addition to translating hostnames to IP addresses:  host aliasing: a host with a complicated hostname can have more alias names. the original one is said to be a canonical hostname.  mail server aliasing: to make email servers' hostnames more mnemonic. This also allows for an e-mail server and an Web server to have the same hostname.  load distribution: replicated servers can have the same hostname. In this case, a set of IP addresses is associated with one canonical hostname. When a client make a DNS query for a name mapped to a set of addresses, the server responds with the entire set, but rotates the ordering within each reply. 2.5.2 Overview of How DNS Works From the perspective of the invoking application in the user's host, DNS is a black box providing a simple, straightforward translation service. Having one single global DNS server would be simple, but it's not realistic because it would a single point of failure, it would have an impossible traffic volume, it would be geographically too distant from some querying clients, its maintenance would be impossible. Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 22 A Distributed, Hierarchical Database The DNS uses a large number of servers, organized in a hierarchical fashion and distributed around the world. The three classes of DNS servers:  Root DNS servers: In the Internet there are 13 root DNS servers, most hosted in North America, each of these is in reality a network of replicated servers, for both security and reliability purposes (total: 247)  Top-level domain (TLD) servers: responsible for top-level domains such as com org net edu and govand all of the country top-level domains uk fr jp  Authoritative DNS servers: every organization with publicly accessible hosts must provide publicly accessible DNS records that map the names of those hosts to IP addresses. An organization can choose to implement its own authoritative DNS server or to pay to have the records stored in an authoritative DNS of some service provider. Finally there are local DNS servers which are central to the DNS architecture. They are hosted by ISPs. When a host connects to one of these, the local DNS server provides the host with the IP addresses of one or more of its local DNS servers. Requests can be up to the root DNS servers and back down. Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 23 We can have both recursive and iterative queries. In recursive queries the user sends the request its nearest DNS which will ask to a higher-tier server, which will ask to lower order... the chain, goes on until it reaches a DNS that can reply, the reply will follow he inverse path that the request had. In iterative queries the same machine sends requests and receives replies. Any DNS can be iterative or recursive or both. DNS Caching DNS extensively exploits DNS caching in order to improve the delay performance and to reduce the number of DNS messages ricocheting around the Internet. In a query chain, when a DNS receives a DNS reply it can cache the mapping in its local memory. 2.5.3 DNS Records and Messages The DNS servers that implement the DNS distributed database store resource records (RRs) including RRs that provide hostname-to-IP address mappings. Each DNS reply messages carries one or more resource records. A resource record is a four-tuple that contains the fields: (Name, Value, Type, TTL) TTL is the time to live of the resource record (when a resource should be removed from a cache). The meaning of Name and Value depend on Type : Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 24 DNS: distributed database storing resource records (RR) RR format: (name, value, type, ttl) type=A  name is hostname type=CNAME  value is IP address  name is alias name for some “canonical” (the real) name  www.ibm.com is really type=NS servereast.backup2.ibm.com name is domain (e.g., foo.com) value is hostname of  value is canonical name authoritative name server for this domain type=MX  value is name of mail server associated with name DNS Messages The only types of DNS messages are DNS queries and reply messages. They have the same format:  first 12 bytes in the header section: 16-bit number identifying the query, which will be copied into the reply query so that the client can match received replies with sent queries. 1 bit query/reply flag (0 query, 1 reply). 1 bit flag authoritative flag set in reply messages when DNS server is an authoritative for a queried name. 1 bit recursion flag if the client desires that the server performs recursion when it doesn't have a record, 1 bit recursion-available field is set in the reply if the DNS server supports recursion.  question section: information about the query: name field containing the name being queried, type field.  answer section: resource records for the name originally queried: Type, Value, TTL. Multiple RRs can be returned if the server has multiple IP addresses Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 25  authority section: records for other authoritative servers.  additional section: other helpful records: canonical hostnames... Inserting Records into the DNS Database We created a new company. Next we register the domain name new company.com at a registrar. A registrar is a commercial entity that verifies the uniqueness of the domain name, enters it into the DNS database and collects a small fee for these services. When we register the address, we need the provide the registrar with the IP address of our primary and secondary authoritative DNS servers, that will make sure that a Type NS and a Type A records are entered into the TLD com servers for our two DNS servers. Focus on security: DNS vulnerabilities  DDoS bandwidth-flooding attack  MITM: the mitm answers queries with false replies tricking the user into connecting to another server.  The DNS infrastructure can be used to launch a DDoS attack against a targeted host To date, there hasn't been an attack that that has successfully impeded the DNS service, DNS has demonstrated itself to be surprisingly robust against attacks. However there Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 26 have been successful reflector attacks, these can be addressed by appropriate configuration of DNS servers. 2.6 Peer-to-Peer Applications 2.6.1 File Distribution In P2P file distribution, each peer can redistribute any portion of the file it has received to any peers, thereby assisting the server in the distribution process. As of 2012 the most popular P2P file distribution protocol is BitTorrent, developed by Bram Cohen. Scalability of P2P architectures Denote the upload rate of the server's access link by u_s, the upload rate of the ith peer's access link by u_i and the download rate of the ith access link by d_i, the size of them to be distributed in bits () Comparison client-server and P2P. Client-Server The server must transmit one copy of the file to N peers, thus it transmits *NF *bits. The time to distribute the file is at least NF/u_s. Denote d_min = min{ d_i } the link with the slowest download rate cannot obtain all F bits in less than F/d_min seconds Therefore: time to distribute F to N clients using D > max{NF/u ,F/d } c-s s, min client-server approach increases linearly in N Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 27 P2P When a peer receives some file data, it can use its own upload capacity to redistribute the data to other peers. At the beginning of the distribution only the server has the file. It must send all the bits at least once F/u_s The peer with the lowest download rate cannot obtain all F bits of the file in less than F/d_min seconds. The total upload capacity of the system is equal to the summation of the upload rates of the server and of all the peers. The system must upload F bits to N peers thus delivering a total of NF bits which can't be done faster than BitTorrent In BitTorrent the collection of all peers participating in the distribution of a particular file is called a torrent. Peers in a torrent download equal-size chunks of the file from one another with a typical chunk size of 256 KBytes. At the beginning a peer has no chunks, it accumulates more and more chunks over time. While it downloads chunks it also uploads chunks to other peers. Once a peer has acquired the entire file it may leave the torrent or remain in it and continue to upload chunks to other peers (becoming a seeder). Any peer can leave the torrent at any time and later rejoin it at anytime as well. Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 28 Each torrent has an infrastructure node called a tracker: when a peer joins a torrent, it registers itself with the tracker and periodically informs it that it is still in the torrent. The tracker keeps track of the peers participating in the torrent. A torrent can have up to thousands of peers participating at any instant of time. User joins the torrent, the tracker randomly selects a subset of peers from the set of participating peers. User establishes concurrent TCP connections with all of these peers, called neighboring peers. The neighboring peers can change over time. The user will ask each of his neighboring peers for the list of chunks they have (one list per neighbor). Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 29 The user starts downloading the chunks that have the fewest repeated copies among the neighbors (rare first technique). In this manner the rarest chunks get more quickly redistributed, roughly equalizing the numbers of copies of each chunk in the torrent. Every 10 seconds the user measures the rate at which she receives bits and determines the four peers that are sending to her at the highest rate. It then reciprocates by sending chunks to these same four peers. The four peers are called unchocked. Every 30 seconds it also choses one additional neighbor and sends it chunks. These peers are called optmistically unchocked. 2.6.2 Distributed Hash Tables (DHTs) How to implement a simple database in a P2P network? In the P2P system each peer will only hold a small subset of the totality of the (key, value) pairs. Any peer can query the distributed database with a particular key, the database will locate the peers that have the corresponding pair and return the pair to querying peer. Any peer can also insert a new pair in the database. Such a distributed database is referred to as a distributed hash table (DHT). In a P2P file sharing application a DHT can be used to store the chunks associated to the IP of the peer in possession of them. Peer Churn In a P2P system, a peer can come or go without warning. To keep the DHT overlay in place in presence of a such peer churn we require each peer to keep track (know to IP address) of its predecessor and successor, and to periodically verify that its two successors are alive. If a peer abruptly leaves, its successor and predecessor need to update their information. The predecessor replaces its first successor with its second successor and ask it for the identifier and IP address of its immediate successor. What if a peer joins? If it only knows one peer, it will ask him what will be his predecessor and successor. The message will reach the predecessor which will send the new arrived its predecessor and successor information. The new arrived can join the DHT making its predecessor successor its own successor and by notifying its predecessor to change its successor information. Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 30 2.6 Video Streaming and Content Distribution Networks By many estimates, streaming video—including Netflix, YouTube and Amazon Prime— account for about 80% of Internet traffic in 2020 [Cisco 2020]. This section we will provide an overview of how popular video streaming services are implemented in today’s Internet. We will see they are implemented using application-level protocols and servers that function in some ways like a cache. 2.6.1 Internet Video In streaming stored video applications, the underlying medium is prerecorded video, such as a movie, a television show, a prerecorded sporting event, or a prerecorded user- generated video (such as those commonly seen on YouTube). These prerecorded videos are placed on servers, and users send requests to the servers to view the videos on demand. Many Internet companies today provide streaming video, including, Netflix, YouTube (Google), Amazon, and TikTok. But before launching into a discussion of video streaming, we should first get a quick feel for the video medium itself. A video is a sequence of images, typically being displayed at a constant rate, for example, at 24 or 30 images per second. An uncompressed, digitally encoded image consists of an array of pixels, with each pixel encoded into a number of bits to represent luminance and color. An important characteristic of video is that it can be compressed, thereby trading off video quality with bit rate. Today’s off-the-shelf compression algorithms can compress a video to essentially any bit rate desired. Of course, the higher the bit rate, the better the image quality and the better the overall user viewing experience. From a networking perspective, perhaps the most salient characteristic of video is its high bit rate. Compressed Internet video typically ranges from 100 kbps for low-quality video to over 4 Mbps for streaming high-definition movies; 4K streaming envisions a bitrate of more than 10 Mbps. This can translate to huge amount of traffic and storage, particularly for high-end video. Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 31 2.6.2 HTTP Streaming and DASH Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 32 2.6.3 Content Distribution Networks challenge: how to stream content (selected from millions of videos) to hundreds of thousands of simultaneous users?  option 1: single, large “mega-server” single point of failure point of network congestion long path to distant clients multiple copies of video sent over outgoing link ….quite simply: this solution doesn’t scale  option 2: store/serve multiple copies of videos at multiple geographically distributed sites (CDN) enter deep: push CDN servers deep into many access networks close to users used by Akamai, 1700 locations bring home: smaller number (10’s) of larger clusters in POPs near (but not within) access networks used by Limelight Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 33 Case study: Netflix 2.7 Socket Programming: Creating Network Applications socket: door between application process and end-end-transport protocol Two socket types for two transport services: UDP: unreliable datagram Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 34 TCP: reliable, byte stream-oriented Socket programming with UDP UDP: no “connection” between client and server:  no handshaking before sending data  sender explicitly attaches IP destination address and port # to each packet  receiver extracts sender IP address and port# from received packet UDP: transmitted data may be lost or received out-of-order Application viewpoint:  UDP provides unreliable transfer of groups of bytes (“datagrams”) between client and server processes Client/server socket interaction: UDP Socket programming with TCP Client must contact server  server process must first be running  server must have created socket (door) that welcomes client’s contact Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 35 Client contacts server by:  Creating TCP socket, specifying IP address, port number of server process  when client creates socket: client TCP establishes connection to server TCP  when contacted by client, server TCP creates new socket for server process to communicate with that particular client  allows server to talk with multiple clients source port numbers used to distinguish clients Client/server socket interaction: TCP Dr. Eman Sanad , Assistant Prof. IT Department, Faculty of computers and Artificial intelligence 36

Chapter 2 Application Layer Summary PDF

Document Details

Tags

Related

Summary

Full Transcript