HTTP Protocol Explained PDF
Document Details
Uploaded by LightHeartedEpigram
Tags
Summary
This document provides a comprehensive overview of the HTTP protocol. It explains how HTTP works as a client-server protocol for data exchange on the web, including details about requests, responses, and message types. The document also explores the connection between HTTP and TCP, and different versions of the protocol, like HTTP/2 and pipelining.
Full Transcript
1. HTTP is a protocol used to fetch resources like HTML documents, forming the basis of data exchange on the Web. It operates on a client-server model, where requests are initiated by the recipient, typically a Web browser. A complete document is assembled from various sub-documents...
1. HTTP is a protocol used to fetch resources like HTML documents, forming the basis of data exchange on the Web. It operates on a client-server model, where requests are initiated by the recipient, typically a Web browser. A complete document is assembled from various sub-documents retrieved, such as text, layout descriptions, images, videos, and scripts. 2. Clients and servers communicate by exchanging individual messages (as opposed to a stream of data). The messages sent by the client are called requests and the messages sent by the server as an answer are called responses. 3. ***HTTP messages*** are how data is exchanged between a ***server*** and a ***client***. There are 2 types of messages: ***requests*** sent by the ***client*** to trigger an action on the ***server***, and ***responses***, the answer from the ***server***. 4. HTTP is transmitted over TCP or a TLS-encrypted TCP connection, though it can theoretically use any reliable transport protocol. 5. HTTPs extensibility allows it to fetch not only hypertext documents but also images, videos, and to post content to servers, such as HTML form results. Additionally, HTTP can retrieve parts of documents to update web pages on demand. 6. HTTP is a client-server protocol: requests are sent by one entity, the user-agent (or a proxy on behalf of it). Most of the time the user-agent is a web browser, but it can be anything. For example, a bot or spider that crawls the Web to populate and maintain a search engine index. 7. A web server processes each request from a client and provides a response. Between the client and server, various entities called proxies perform different operations and act as gateways or caches. Additionally, there are other computers like routers and modems involved, which are hidden in the network and transport layers of the OSI Model. HTTP operates at the application layer, while the underlying layers, though important for diagnosing network issues, are mostly irrelevant to the description of HTTP. 8. The browser initiates requests, not the server. To display a web page, the browser sends a request for the HTML document. After receiving the HTML, it parses the file and makes additional requests for execution scripts, layout information (CSS), and sub-resources like images and videos. The browser then combines these resources to present the complete web page. Scripts executed by the browser can fetch more resources later, updating the web page accordingly. 9. A server, which serves documents requested by the client, may appear as a single machine but can be a collection of servers sharing the load (load balancing) or complex software interacting with other computers (like cache, a database server, or e-commerce servers), generating documents on demand. Multiple server software instances can be hosted on the same machine, and with HTTP/1.1 and the Host header, they can even share the same IP address. 10. HTTP messages are relayed between the Web browser and the server by numerous computers and machines, which can significantly impact performance. These intermediaries operate at the Transport, Network, or Physical levels and are transparent at the HTTP layer. Proxies, operating at the application layer, can be either transparent, forwarding requests without alteration, or non-transparent, modifying requests before passing them to the server. 11. ***HTTP*** is generally designed to be simple and human readable, even with the added complexity introduced in ***HTTP/2*** by encapsulating ***HTTP messages*** into ***frames***. ***HTTP messages*** can be read and understood by humans, providing easier testing for developers, and reduced complexity for newcomers. 12. ***HTTP*** is ***stateless***: there is no link between 2 ***requests*** being successively carried out on the same connection. This immediately has the prospect of being problematic for users attempting to interact with certain pages coherently. 13. ***HTTP*** doesn\'t require the underlying transport protocol to be connection-based; it only requires it to be reliable, or not lose messages (at minimum, presenting an error in such cases). 14. Before a ***client*** and ***server*** can exchange an ***HTTP request******response*** pair. This is less efficient than sharing a single ***TCP*** connection when multiple ***requests*** are sent in close succession. 16. ***HTTP/1.1*** introduced ***pipelining*** (which proved difficult to implement) and persistent connections: the underlying ***TCP*** connection can be partially controlled using the [**Connection**](onenote:#Connection§ion-id=%7B31B61E05-6E82-48E8-B084-7F362029EF24%7D&page-id=%7BB117F8B9-8719-4686-BD20-2D662714F0EE%7D&end&base-path=https://d.docs.live.net/0323c8769028f7d4/Documents/Web%20Development/HTTP.one) header. 17. HTTP/2 added multiplexing messages over a single connection, helping keep the connection warm and more efficient. 18. HTTP controls various features such as caching, which allows servers to instruct proxies and clients on what to cache and for how long, and clients to instruct proxies to ignore stored documents. It can relax the origin constraint to enable information sharing across different domains for security reasons. HTTP also provides basic authentication using headers or cookies, supports proxy and tunneling to hide IP addresses, and facilitates sessions through cookies, linking requests with the server\'s state, which is useful for e-commerce websites. 19. When a client wants to communicate with a server, it opens a TCP connection to send requests and receive responses. The client can open a new connection, reuse an existing one, or open multiple connections. It then sends an HTTP message, which is human-readable before HTTP/2 but encapsulated in frames with HTTP/2, making them unreadable directly. Finally, the client reads the server\'s response and either closes or reuses the connection for further requests. 20. If ***HTTP pipelining*** is activated, several requests can be sent without waiting for the first response to be fully received. 21. ***HTTP messages*** are composed of textual information encoded in [***ASCII***](onenote:https://d.docs.live.net/0323c8769028f7d4/Documents/Computer%20Science/IT%20Basics.one#Data%20Representation§ion-id=%7B4ED6C1AD-191D-4CD6-A989-61F3E9617BD7%7D&page-id=%7B3BCBE821-53DC-48AC-9458-68FE2D217634%7D&end), and span over multiple lines. In ***HTTP/1.1***, and earlier versions of the ***protocol***, these messages were openly sent across the connection. In ***HTTP/2***, the once human-readable message is now divided up into ***HTTP frames***, providing optimization and performance improvements. 22. Web developers rarely craft these textual ***HTTP messages*** themselves. Software such as a ***web browser***, ***proxy***, or ***web server***, perform this action. They provide ***HTTP messages*** through config files (for ***proxies*** or ***servers***), ***API***s (for ***browsers***), or other interfaces. 23. HTTP requests and responses share a similar structure, consisting of a start-line that describes the request to be implemented or its status, indicating success or failure. This start-line is always a single line. 24. HTTP requests and responses share a similar structure, consisting of an optional set of HTTP headers that specify the request or describe the body included in the message, followed by a blank line indicating that all meta-information for the request has been sent. 25. HTTP requests and responses share a similar structure, consisting of an optional body that contains data related to the request (such as the content of an HTML form) or the document associated with a response. The presence and size of the body are determined by the start-line and HTTP headers. 26. The *start-line* and ***HTTP headers*** of the ***HTTP message*** are collectively known as the head of the requests, whereas its payload is known as the body. 27. HTTP requests are client messages that initiate actions on a web server. They consist of three elements: an HTTP method (like GET, PUT, POST, HEAD, or OPTIONS) that describes the action to be performed; the request target, usually a URL or absolute path, which varies between methods and can take forms such as origin, absolute, authority, or asterisk; and the HTTP version, which defines the message structure and expected response version. Examples include GET for fetching resources, POST for pushing data, and OPTIONS for representing the server as a whole. 28. The status line of an HTTP response includes the protocol version (usually HTTP/1.1), a status code (such as 200, 404, or 302) indicating the success or failure of the request, and a status text providing a brief description of the status code to help humans understand the HTTP message. An example of a typical status line is: HTTP/1.1 404 Not Found. 29. The last part of a response is the body. Not all responses have one: responses with a status code that sufficiently answers the request without the need for corresponding payload (like 201 Created or 204 No Content) usually don\'t. 30. Bodies can be categorized into three types: single-resource bodies with a single file of known length, defined by the Content-Type and Content-Length headers; single-resource bodies with a single file of unknown length, encoded in chunks with Transfer-Encoding set to chunked; and multiple-resource bodies, which are multipart bodies containing different sections of information, though these are relatively rare. 31. HTTP/2 introduces an extra step: it divides HTTP/1.x messages into frames which are embedded in a stream. Data and header frames are separated, which allows header compression. Several streams can be combined together, a process called multiplexing, allowing more efficient use of underlying TCP connections. 32. HTTP/1.x messages have several performance drawbacks: headers are uncompressed, often repetitive across connections, and lack multiplexing capabilities, requiring multiple connections to the same server, which is less efficient than maintaining warm TCP connections. 33. HTTP frames are now transparent to Web developers. This is an additional step in HTTP/2, between HTTP/1.1 messages and the underlying transport protocol. No changes are needed in the APIs used by Web developers to utilize HTTP frames; when available in both the browser and the server, HTTP/2 is switched on and used.