How the Web Works: Fundamentals by Randy Connolly and Ricardo Hoar PDF
Document Details

Uploaded by PoignantOboe5992
2015
Randy Connolly and Ricardo Hoar
Tags
Summary
This textbook, "How the Web Works" by Randy Connolly and Ricardo Hoar published in 2015 by Pearson, provides a comprehensive guide to web development fundamentals. It covers topics from internet history and protocols to web servers and the client-server model, utilizing clear explanations and diagrams to elucidate concepts.
Full Transcript
How the Web Works Chapter 1 Randy Connolly and Ricardo Hoar Fundamentals of Web Development © 2015 Pearson Randy Connolly and Ricardo Hoar Fundamentals of Web Development...
How the Web Works Chapter 1 Randy Connolly and Ricardo Hoar Fundamentals of Web Development © 2015 Pearson Randy Connolly and Ricardo Hoar Fundamentals of Web Development http://www.funwebdev.com Objectives 1 Definitions and History 2 Internet Protocols 3 Client-Server Model 4 Where is the Internet? 5 Domain Name System 6 Uniform Resource Locators (URL) 7 Hypertext Transfer 8 Web Servers Protocol Randy Connolly and Ricardo Hoar Fundamentals of Web Development Section 1 of 8 DEFINITIONS AND HISTORY Randy Connolly and Ricardo Hoar Fundamentals of Web Development Internet = Web? The answer is no The World-Wide Web (WWW or simply the Web) is certainly what most people think of when they see the word “internet.” But the WWW is only a subset of the Internet. Internet Email Web Online gaming FTP Randy Connolly and Ricardo Hoar Fundamentals of Web Development Short History of the Internet Perhaps not short enough The early ARPANET network was funded and controlled by the United States government, and was used exclusively for academic and scientific purposes. The early network started small with just a handful of connected campuses in 1969 and grew to a few hundred by the early 1980s. Randy Connolly and Ricardo Hoar Fundamentals of Web Development TCP/IP Rides to the rescue To promote the growth and unification of the disparate networks a suite of protocols was invented to unify the networks together. By 1981, new networks built in the US began to adopt the TCP/IP (Transmission Control Protocol / Internet Protocol) communication model (discussed in the next section), while older networks were transitioned over to it. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Tim Berners-Lee I meant Sir Tim Berners-Lee The invention of the WWW is usually attributed to the British Tim Berners-Lee, who, along with the Belgian Robert Cailliau, published a proposal in 1990 for a hypertext system while both were working in Switzerland. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Core Features of the Web Shortly after that initial proposal Berners-Lee developed the main features of the web: 1. A URL to uniquely identify a resource on the WWW. 2. The HTTP protocol to describe how requests and responses operate. 3. A software program (later called web server software) that can respond to HTTP requests. 4. HTML to publish documents. 5. A program (later called a browser) to make HTTP requests from URLs and that can display the HTML it receives. Randy Connolly and Ricardo Hoar Fundamentals of Web Development What is an “Intranet”? A short digression One of the more common terms you might encounter in web development is the term “intranet” (with an “a”), which refers to an internet network that is local to an organization or business. Intranet resources are often private, meaning that only employees (or authorized external parties such as customers or suppliers) have access to those resources. Thus Internet (with an “e”) is a broader term that encompasses both private (intranet) and public networked resources. Randy Connolly and Ricardo Hoar Fundamentals of Web Development What is an “Intranet”? Intranets are typically protected from unauthorized external access via security features such as firewalls or private IP ranges. Because intranets are private, search engines such as Google have limited or no access to content within a private intranet. Due to this private nature, it is difficult to accurately gauge, for instance, how many web pages exist within intranets, and what technologies are more common in them. Some especially expansive estimates guess that almost half of all web resources are hidden in private intranets. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Intranet versus Internet Randy Connolly and Ricardo Hoar Fundamentals of Web Development Intranets and the Job Market Being aware of intranets is also important when one considers the job market and market usage of different web technologies. If one focuses just on the public internet, it will appear that, for instance, PHP, MySQL, and WordPress, are absolutely dominant in their market share. But when one adds in the private world of corporate intranets, other technologies such as ASP.NET, JSP, SharePoint, Oracle, SAP, and IBM WebSphere, are just as important. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Static Web Sites Partying Like It’s 1995 In the earliest days of the web, a webmaster (the term popular in the 1990s for the person who was responsible for creating and supporting a web site) would publish web pages, and periodically update them. In those early days, the skills needed to create a web site were pretty basic: one needed knowledge of the HTML markup language and perhaps familiarity with editing and creating images. This type of web site is commonly referred to as a static web site, in that it consists only of HTML pages that look identical for all users at all times. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Static Web Sites Randy Connolly and Ricardo Hoar Fundamentals of Web Development Dynamic Web Sites Within a few years of the invention of the web, sites began to get more complicated as more and more sites began to use programs running on web servers to generate content dynamically. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Dynamic Web Sites Randy Connolly and Ricardo Hoar Fundamentals of Web Development Dynamic Web Sites What are they? These server-based programs would read content from databases, interface with existing enterprise computer systems, communicate with financial institutions, and then output HTML that would be sent back to the users’ browsers. This type of web site is called here in this book a dynamic web site because the page content is being created at run-time by a program created by a programmer; this page content can vary for user to user. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Web 2.0 and Beyond In the mid 2000s, a new buzz-word entered the computer lexicon: web 2.0. This term had two meanings, one for users and one for developers. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Web 2.0 Its meaning for users For the users, Web 2.0 referred to an interactive experience where users could contribute and consume web content, thus creating a more user-driven web experience. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Web 2.0 Its meaning for developers For software developers, Web 2.0 also referred to a change in the paradigm of how dynamic web sites are created. Programming logic, which previously existed only on the server, began to migrate to the browser. This required learning Javascript, a programming language that runs in the browser, as well as mastering the rather difficult programming techniques involved in asynchronous communication. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Section 2 of 8 INTERNET PROTOCOLS Randy Connolly and Ricardo Hoar Fundamentals of Web Development What’s a Protocol? The internet exists today because of a suite of interrelated communications protocols. A protocol is a set of rules that partners in communication use when they communicate. Randy Connolly and Ricardo Hoar Fundamentals of Web Development A Layered Architecture The TCP/IP Internet protocols were originally abstracted as a four-layer stack. Later abstractions subdivide it further into five or seven layers. Since we are focused on the top layer anyhow, we will use the earliest and simplest four-layer network model. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Four Layer Network Model Randy Connolly and Ricardo Hoar Fundamentals of Web Development IP Addresses Two types IPv4 addresses are the IP addresses from the original TCP/IP protocol. In IPv4, 12 numbers are used (implemented as four 8-bit integers), written with a dot between each integer. Since an unsigned 8-bit integer's maximum value is 255, four integers together can encode approximately 4.2 billion unique IP addresses. Randy Connolly and Ricardo Hoar Fundamentals of Web Development IP Addresses Two types To future proof the Internet against the 4.2 billion limit, a new version of the IP protocol was created, IPv6. This newer version uses eight 16-bit integers for 2128 unique addresses, over a billion billion times the number in IPv4. These 16-bit integers are normally written in hexadecimal, due to their longer length. Randy Connolly and Ricardo Hoar Fundamentals of Web Development 4 - 8 bit components IPv4 32 (32 bits) 2 addresses 192.168.123.254 8 - 16 bit components IPv6 128 (128 bits) 2 addresses 3fae:7a10:4545:9:291:e8ff:fe21:37ca Randy Connolly and Ricardo Hoar Fundamentals of Web Development IP Addresses Inside of networks is different Your IP address will generally be assigned to you by your Internet Service Provider (ISP). In organizations, large and small, purchasing extra IP addresses from the ISP is not cost effective. In a local network, computers can share a single IP address between them. Randy Connolly and Ricardo Hoar Fundamentals of Web Development TCP Packets Randy Connolly and Ricardo Hoar Fundamentals of Web Development Section 3 of 8 CLIENT-SERVER MODEL Randy Connolly and Ricardo Hoar Fundamentals of Web Development Client-Server Model What is it? The web is sometimes referred to as a client-server model of communications. In the client-server model, there are two types of actors: clients and servers. The server is a computer agent that is normally active 24 hours a day, 7 days a week (or simply 24/7), listening for queries from any client who make a request. A client is a computer agent that makes requests and receives responses from the server, in the form of response codes, images, text files, and other data. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Request-Response Loop Within the client-server model, the request-response loop is the most basic mechanism on the server for receiving requests and transmitting data in response. The client initiates a request to a server and gets a response that could include some resource like an HTML file, an image or some other data. Randy Connolly and Ricardo Hoar Fundamentals of Web Development The Peer-to-Peer Alternative Not actually illegal In the peer-to-peer model where each computer is functionally identical, each node is able to send and receive directly with one another. In such a model each peer acts as both a client and server able to upload and download information. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Peer-to-Peer Model Randy Connolly and Ricardo Hoar Fundamentals of Web Development Server Types A server is rarely just a single computer Earlier, the server was shown as a single machine, which is fine from a conceptual standpoint. Clients make requests for resources from a URL; to the client, the server is a single machine. However, most real-world web sites are typically not served from a single server machine, but by many servers. It is common to split the functionality of a web site between several different types of server. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Server Types Randy Connolly and Ricardo Hoar Fundamentals of Web Development Real-World Server Installations Not only are there different types of servers, there is often replication of each of the different server types. A busy site can receive thousands or even tens of thousands of requests a second; globally popular sites such as Facebook receive millions of requests a second. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Server Farms Have no cows A single web server that is also acting as an application or database server will be hard- pressed to handle more than a few hundred requests a second, so the usual strategy for busier sites is to use a server farm. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Server Farms The goal behind server farms is to distribute incoming requests between clusters of machines so that any given web or data server is not excessively overloaded. Special routers called load balancers distribute incoming requests to available machines. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Server Farm Randy Connolly and Ricardo Hoar Fundamentals of Web Development Server Farms Even if a site can handle its load via a single server, it is not uncommon to still use a server farm because it provides failover redundancy. That is, if the hardware fails in a single server, one of the replicated servers in the farm will maintain the site’s availability. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Server Racks In a server farm, the computers do not look like the ones in your house. Instead, these computers are more like the plates stacked in your kitchen cabinets. That is, a farm will have its servers and hard drives stacked on top of each other in server racks. A typical server farm will consist of many server racks, each containing many servers. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Server Rack Randy Connolly and Ricardo Hoar Fundamentals of Web Development Data Centers Server farms are typically housed in special facilities called data centers. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Hypothetical Data Center Randy Connolly and Ricardo Hoar Fundamentals of Web Development Data Centers Where are they? To prevent the potential for site down times, most large web sites will exist in mirrored data centers in different parts of the country, or even world. As a consequence, the costs for multiple redundant data centers are quite high, and only larger web companies can afford to create and manage their own. Most web companies will instead lease space from a third-party data center. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Commercial Web Hosting It is also common for the reverse to be true – that is, a single server machine may host multiple sites. Large commercial web hosting companies such as GoDaddy, Blue Host, Dreamhost, and others will typically host hundreds or even thousands of sites on a single machine (or mirrored on several servers). Randy Connolly and Ricardo Hoar Fundamentals of Web Development Section 4 of 8 WHERE IS THE INTERNET? Randy Connolly and Ricardo Hoar Fundamentals of Web Development From the Computer to the Local Provider Our main experience of the hardware component of the Internet is that which we experience in our homes. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Routers and Routing Tables Randy Connolly and Ricardo Hoar Fundamentals of Web Development From the Local Provider to the Ocean Eventually your ISP has to pass on your requests for Internet packets to other networks. This intermediate step typically involves one or more regional network hubs. Your ISP may have a large national network with optical fiber connecting most of the main cities in the country. Some countries have multiple national or regional networks, each with their own optical network. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Across the Oceans Eventually, international Internet communication will need to travel underwater. The amount of undersea fiber optic cable is quite staggering and is growing yearly. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Undersea fiber optic lines (courtesy TeleGeography) Randy Connolly and Ricardo Hoar Fundamentals of Web Development Section 5 of 8 DOMAIN NAME SYSTEM (DNS) Randy Connolly and Ricardo Hoar Fundamentals of Web Development Domain Name System Why do we need it? As elegant as IP addresses may be, human beings do not enjoy having to recall long strings of numbers. Instead of IP addresses, we use the Domain Name System (DNS) Randy Connolly and Ricardo Hoar Fundamentals of Web Development DNS Overview Randy Connolly and Ricardo Hoar Fundamentals of Web Development Domain Levels Top Level Domain (TLD) Third-Level Domain server1.www.funwebdev.com Fourth-Level Domain Second-Level Domain (SLD) Most general Top-Level Domain (TLD) com Second-Level Domain (SLD) funwebdev Third-Level Domain www Most specific Fourth-Level Domain server1 Randy Connolly and Ricardo Hoar Fundamentals of Web Development DNS Address Resolution While domain names are certainly an easier way for users to reference a web site, eventually, your browser needs to know the IP address of the web site in order to request any resources from it. The Domain Name System provides a mechanism for software to discover this numeric IP address. This process is referred to here as address resolution. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Domain name address resolution process Randy Connolly and Ricardo Hoar Fundamentals of Web Development Section 6 of 8 UNIFORM RESOURCE LOCATORS (URL) Randy Connolly and Ricardo Hoar Fundamentals of Web Development URL Components In order to allow clients to request particular resources from the server, a naming mechanism is required so that the client knows how to ask the server for the file. For the web that naming mechanism is the Uniform Resource Locator (URL). http://www.funwebdev.com/index.php?page=17#article Protocol Domain Path Query String Fragment Randy Connolly and Ricardo Hoar Fundamentals of Web Development Query String Query strings will be covered in depth when we learn more about HTML forms and server-side programming. They are the way of passing information such as user form input from the client to the server. In URL's they are encoded as key-value pairs delimited by “&” symbols and preceded by the “?” symbol. Keys ?username=john&password=abcdefg Values Delimiters Randy Connolly and Ricardo Hoar Fundamentals of Web Development Section 7 of 8 HYPERTEXT TRANSFER PROTOCOL (HTTP) Randy Connolly and Ricardo Hoar Fundamentals of Web Development HTTP The HTTP protocol establishes a TCP connection on port 80 (by default). The server waits for the request, and then responds with a response code, headers and an optional message (which can include files). Randy Connolly and Ricardo Hoar Fundamentals of Web Development HTTP Randy Connolly and Ricardo Hoar Fundamentals of Web Development Web Requests While we as web users might be tempted to think of an entire page being returned in a single HTTP response, this is not in fact what happens. In reality the experience of seeing a single web page is facilitated by the client's browser which requests the initial HTML page, then parses the returned HTML to find all the resources referenced from within it, like images, style sheets and scripts. Only when all the files have been retrieved is the page fully loaded for the user Randy Connolly and Ricardo Hoar Fundamentals of Web Development Browser parsing HTML and making subsequent requests Randy Connolly and Ricardo Hoar Fundamentals of Web Development Browser Tools for HTTP Modern browsers provide the developer with tools that can help us understand the HTTP traffic for a given page. Randy Connolly and Ricardo Hoar Fundamentals of Web Development HTTP Request Methods The HTTP protocol defines several different types of requests, each with a different intent and characteristics. The most common requests are the GET and POST request, along with the HEAD request. Other requests, such as PUT, DELETE, CONNECT, TRACE and OPTIONS are seldom used, and are not covered here. Randy Connolly and Ricardo Hoar Fundamentals of Web Development GET versus POST requests Randy Connolly and Ricardo Hoar Fundamentals of Web Development Section 8 of 8 WEB SERVERS Randy Connolly and Ricardo Hoar Fundamentals of Web Development Web Servers A web server is, at a fundamental level, nothing more than a computer that responds to HTTP requests. Randy Connolly and Ricardo Hoar Fundamentals of Web Development Web Stack Regardless of the physical characteristics of the server, one must choose an application stack to run a website. This stack will include an operating system, web server software, a database and a scripting language to process dynamic requests. LAMP software stack, which refers to the Linux operating system, Apache web server, MySQL database, and PHP scripting language WISA software stack, which refers to Windows operating system, IIS web server, SQL Server database, and the ASP.NET server-side development technologies. Randy Connolly and Ricardo Hoar Fundamentals of Web Development