Web Engineering Lecture Notes PDF
Document Details
Uploaded by SleekTroll
University of Groningen
2025
Tags
Summary
This document is a set of lecture notes on web engineering. It covers topics like the internet, the TCP/IP model, different layers and protocols, URIs, HTTP, caching, and authentication. The notes have a focused approach to the stated topics.
Full Transcript
Re-2 Favorites Notebook Web Engineering Tags Last edited time @January 9, 2025 12:19 AM Archive Created time @January 7, 2025 1:21 PM The Internet What is the web? Information system for re...
Re-2 Favorites Notebook Web Engineering Tags Last edited time @January 9, 2025 12:19 AM Archive Created time @January 7, 2025 1:21 PM The Internet What is the web? Information system for retrieving resources which are: Identified by URLs Connected to each other by hyperlinks Accessible over the Internet It is not a network, but a collection of heterogenous networks Began with ARPANET in the 60s Internet was from the late 80s TCP/IP Model Layer Role Protocols Application high-level protocols HTTP, DNS allows sources and targets to Transport TCP, UDP communicate hosts inject packets in any network Internet IP, ICMP to travel across it Re-2 1 Layer Role Protocols interface between hosts and DLS, SONET, Link transmission links Ethernet Internet Architecture ISP- offers connectivity - may have to pay for transit to other networks Points of Presence - where user packets enter the ISP's network Internet eXchange Points - points routing traffic between diff networks URIs Resource Identification Early systems could only identify documents in one system. The web uses Uniform Resource Identifiers to solve this. A URI is a reference identifying an abstract or physical (re)source Usually is hierarchal Schemes URL: identifies resources by their location on the system/network URN: uniquely identifies resource regardless of its primary storage location like a logical address HTTP - 1 HyperText Transfer Protocol is a request/response protocol relies on TCP It is reliable, connection-oriented Re-2 2 There is no reference to past request/response pairs Connection are persistent Client submits the request Server processes and executes it Origin Server is where a certain resource resides Proxy acts on behalf of an origin server client explicitly seeks it receives request from client forwards it to OG server (if cannot be processed locally) Gateway is an intermediary for another server unlike proxy, client interacts with gateway like the server itself Tunnel acts as a blind relay between two connections it routes messages without peeking for when information needs to pass through firewall Connection Management HTTP 1.0 : one transaction per connection HTTP 1.1 : multiple transactions over a persistent connection (but waits for one pair to complete) HTTP 2.0 : multiplexed requests over same connection (can send and receive multiple simultaneous requests) Content Negotiation When resources are available in multiple representations or formats, the mechanism must decide which to be served to client Server-driven In this case, client includes request headers and the server does its best to respect them in the response Re-2 3 Ad: faster because avoids asking client to decide Disad: The vision of the client cannot be taken into consideration (text/html and text/pdf have different purposes) Client-driven Server responds with list of options, client choses the best one Ad: this puts the client's needs first Disad: 2 roundtrips for the request Transparent Mixes both, but cannot be done in HTTP HTTP - 2 Headers General headers Describes the message itself Cache-Control: directive for caching Connection: set to Close if wish to not have persistent connection Transfer-Encoding: encoding applied to the message Via: whenever message goes through an intermediary, it adds a Via-field Entity headers Content-Encoding: encoding applied to the encapsulated entity Content-MD5: the fingerprint of complete message body (integrity check) Expires: specifies timestamp of when response is out-dated Last-modified: when the entity was last changed Re-2 4 Request headers Host: mandatory header containing the IP of the server hosting the resource If-Match: performs the requested action only if the entity tags are the same for the resource we want If-Modified-Since: used for conditional GET Request URIs * : request applies to the server itself absoluteURI : required if request is made to proxy abs_path : requests a resource directly from origin server Request methods CONNECT : Used for tunneling HEAD : No message body, just header OPTIONS : determines properties of server TRACE : only status code is responded, when we are interested in Via headers Request properties Safe: user is not held accountable for the result of the action, users do not commit themselves to anything by querying a resource Idempotent: these requests are side-effect free, safe methods are also idempotent POST: the URI identifies the resource, the service that will handle the entity PUT: the URI identifies the entity in the request GET: safe interactions, Re-2 5 POST: state-altering interactions, user is accountable for result Response headers Age: estimated age of entity Server: type and version of server Accept-Ranges: whether the server accepts accessing the resource in parts Retry-After: estimated period after which service is up and running again WWW-Authenticate: name of auth scheme and parameters to be used when we get status 401 Status Codes Categories 1xx: Informational 2xx: Success 3xx: Redirection 4xx: Client error 5xx: Server error Caching Storage for temporary persistent response messages Aims to speed up the response time for requesting the same response message Forward proxy: external facing stand-ins for origin server (N:1) Reverse proxy: internal facing proxy, single entry point usually with load balancing between origin servers (1:N) Web accelerator: proxy with prefetching and compression, speeds up encryption Edge caching: between network boundaries Re-2 6 Benefits of caching: reduces network bandwidth usage decreases perceived delays on the user side removes the load from origin server Challenges of caching: if the resource on the origin is changed then the cached copy is invalidated cache must have mechanism to synchronise copies HTTP cache consistency is based on the assumption that everyone operates on the same time basis Semantic Transparency enforces: usage of cache must have no impact on client or origin server each request produces the same response as if it was directed to the server itself warning must be produced on deviations Each response has a life time: fresh if within time stale or outdated if after lifetime 0 responses are never fresh infinite lifetime resp are never stale Cache stores all fresh responses (freshness control) Unless client explicitly wants outdated responses, they must not be used for serving requests Re-2 7 The age of a response can be controlled with the Date field to state when it was generated Server can set max-age or Expires to invalidate response after time Cache-control field can be set to no-cache and no copy of this response will be recorded The cache can also set freshness period. Unless specified, cache uses last-modified to determine validity Authentication Basic Scheme It transmits username/passwords in base64 (not encrypted) Content Delivery Networks Edge proxy copies a resource onto CDN node CDN copies it onto other nodes, and when it is requested, it is delivered to the node that needs it Questions What do the acronyms ISP, POP, and IXP stand for and what is their purpose? How are the terms URI, URL, and URN defined, what is their purpose, and what is the relation between them? l What is the difference between a proxy, a gateway, and a tunnel server? l Under which conditions are HTTP requests safe/idempotent? Which HTTP methods are considered safe/idempotent? Re-2 8 What are the differences between PUT and POST in terms of request URI semantics, and pragmatics (i.e. how they are to be used)? Which proxy types are common on the Web, and what is their function? What are the benefits of caching? How is the principle of Semantic Transparency for HTTP defined? Under which conditions is it violated by the use of caches? How can a server stop a proxy or client from caching a response? How can a cache determine if it is fresh in the absence of an Expires header? If Cache-control is missing too? How does the Basic authentication scheme for HTTP work? Under which conditions should it be used? How do CDNs work and what benefits do they offer? Re-2 9