Web Application Technologies PDF

Web Application Technologies Release 0.01 Patrick Ingham and Tony Jenkins Nov 08, 2024 CONTENTS: 1 Welcome 1 2 Getting Started...

Web Application Technologies Release 0.01 Patrick Ingham and Tony Jenkins Nov 08, 2024 CONTENTS: 1 Welcome 1 2 Getting Started 5 3 How the Web Works 15 4 Web Documents and HTML 29 5 Styling with CSS 41 6 CSS Toolkits 57 7 Javascript 65 8 The MVC Design Pattern 73 9 Intermission: Good Web Design 77 10 Web Frameworks 85 11 Django: Basics 101 12 Django: Models 107 13 Django: Endpoints, Views, and Templates 117 14 Django: Building a Website 125 15 Summary and Review 133 16 Web Technologies 135 Index 137 i ii CHAPTER ONE WELCOME Welcome to Web Application Technologies, also known as WAT. Our aim in this module is to introduce you to some of the key technologies that underpin modern web development. We will take a broad view, so that you meet a number of tools that are widely used by web developers. You will have the opportunity to take this deeper in the Level 6 Module Advanced Web Engineering, in your Production Project, or (best of all) on a placement next year! 1.1 Pre-Requisites This module builds on topics that you have studied in your first year. Specifically, we assume that: You have a good working knowledge of Python programming. You understand object-oriented programming, although maybe not in Python. You are familiar with a modern IDE, such as PyCharm. You are comfortable with source code control with Git, and using GitHub as a remote repo. You understand the basics of relational databases (but we are not much bothered about SQL). You have a basic idea of networking concepts. 1.2 Approach The approach in this module could be described as broad rather than deep. We aim to show you several technologies that are used in web development, and to give you practical experience of some. Web development is a massive rapidly changing area, so we feel this is preferable to a “deep dive” into one specific technology. Our chosen techs are among those currently in use in industry, and are those when demand for skills is high. Where there are several options, we have chosen an example. 1 Web Application Technologies, Release 0.01 1.3 Outcomes At the end of the module you will be able to: Explain the workings of the web, and web architectures, at a level appropriate for a developer. Create a static web page to a provided design using HTML and CSS. Apply CSS toolkits and frameworks. Recognise good, and bad, web design. Create a data-driven website using a modern web framework that displays records drawn from a database. The framework we will use is Django, which is based on Python. We will also mention many other concepts, tools, and frameworks as we go. Hopefully, most are defined in the final section of these notes! 1.4 Scope Web Development is a huge topic, covering a huge range of technologies. At a very high level it can be split into two parts: Front-End Development This relates to what the user sees, and how the user interacts with the website. UX and UI issues are very important here, as is everything surrounding “good” web design. Technologies used here are HTML, CSS, JavaScript, and (especially) JavaScript frameworks and libraries. Back-End Development This is creating the side of the system that accepts requests from a front-end, and returns responses. This usually involves interacting with data stores (databases, for example). The key idea is to code the busi- ness logic of the application. Technologies here are basically programming languages (modern examples include Python, Java, PHP, GoLang, Rust). Some technologies as “glue” to allow the front- and back-ends to communicate. The most popular one these days is JSON, a simple text-based method of encoding data. Others (that you should have heard of) are XML and YAML. Note: Traditionally, a developer would focus on front- or back-end development. A few years back the concept of a full stack developer gained a lot of impetus - a developer who can cover all the components. Anyone who can do this, and do it well, can expect to be paid very well! In this module, we will cover both front- and back-end. With only 12 weeks, this explains why we are taking a broad approach! One programming language we will leave out is JavaScript, although we will use it as part of Django. JavaScript, and especially its frameworks, moves fast, and so is not really easy to cover in a University course. The techs we will stick to are those that are tried and tested, and guaranteed to be current when you arrive in the workplace. Probably the most important technology we will leave out is Docker, and containerisation. 2 Chapter 1. Welcome Web Application Technologies, Release 0.01 1.5 Tools All the tools you need to complete the practical work for the module are available in the James Graham labs, and are also freely available for you to download onto your own machines. On your own machine you can use whatever operating system you prefer. Most of the examples you will see will use Linux. Important: If you have serious aspirations to work in web development, and have not looked at Linux so far, we strongly recommend that you do. Windows is in decline as an operating system for developers. As the stats show, Mac is a worthy, but rather more expensive, alternative to Linux. We recommend PyCharm Professional Edition as this has tools that make working with the Django web framework a more pleasant experience. JetBrains offer full versions of all their tools free of charge to students; you just need to create an account with an academic email address to confirm your status. To work on your own machine you would need: PyCharm Professional 2024. Python 3.12 or later. A web browser, ideally two. An Internet connection. Nothing we will do will make any significant demands on machine resources. 1.6 A Note on AI The University has its own regulations on the use of AI tools. You should make sure you are familiar with them, and you should obviously abide by them. Program code written solely by AI is very easy to spot, and is not allowed in any assessment. Rest assured that we have fed the assignment specs into AI tools, and we know exactly what the output looks like! That said, working with AI tools is an important part of web development, and will be even more important when you start your career as a developer. So it is senseless for us to forbid the use of AI tools as you work on your project for this module. These notes, and the example apps we will show you, were built using PyCharm with the GitHub Copilot plugin installed and very much enabled. Assuming you have a free student account on GitHub, Copilot is free; we recommend you install it and get familiar with what it does. 1.7 About these Notes This is the first year we have run this module, so feedback is welcome. These notes will be updated as we go - you can see the date of the most recent update at the bottom of the page. The notes were created with Sphinx, which was originally created to maintain the Python documentation. 1.5. Tools 3 Web Application Technologies, Release 0.01 4 Chapter 1. Welcome CHAPTER TWO GETTING STARTED This section covers what you need to know before starting serious work. We recommend PyCharm for the module, and that is what will be used in the following sections. You are welcome to use any other IDE, but that is not supported, and we will not be able to answer questions about it. Specifically, we recommend the PyCharm Professional Edition. This is what you will find in the labs, and is available for free download as long as you have a free JetBrains account with a student email associated. The Professional Edition has some useful features that make working with the Django Web Framework a little more straightforward. We are going to make extensive use of Python and Python packages later in the module. We will also use these are we explore how HTTP and the web work. You have met the idea of packages and the Python Package Index (PyPi) before1 , but now we need to work with it properly. 2.1 Python: Reminders First, some reminders about good practice when programming with Python. It should go without saying that all your code should follow the standard conventions for Python, which are set out in a document called PEP-8. PyCharm will enforce most of these guidelines, and will warn you in the usual way of any of your code does not conform. Important conventions that you should follow include: Variable and function names should be in lower_snake_case. Constant names should be in UPPER_SNAKE_CASE. Class names should start with a capital letter, followed by lower case. Functions should be defined at the top of any program using them. Blank lines should be used in the code to split functions, mark out program structure, and so on. Since you have now met at least two programming languages, you should remember: Important: Python is NOT Java. Java has its own set of conventions, that you will have followed when you studied object-oriented programming. OO Programming is fully supported in Python - in fact almost every value in Python is an object. There are differences in how OO concepts are conventionally applied, however. For example, the most important (and sometimes surprising) one is probably: 1 You may remember that PyPi is often referred to as The Cheese Shop. 5 Web Application Technologies, Release 0.01 Warning: Python classes do NOT typically include getter and setter methods. Programs just access the instance variables directly. There are other differences, but most aspects of OOP that you will remember are supported in Python. Later on, when we come to define the data that sits behind our websites, we will do this by defining Python classes. So you will need to check the syntax for classes, constructors, “magic methods”, and so on. Here, for example, is a good introduction.. In summary, remember you are writing Python, with all its conventions, and not Java (with all its!). This is something that you need to become accustomed to as you work with more different programming languages. 2.2 Managing Packages Another difference between Python and Java is in how they manage packages. The Java JDK provides many, many classes that cover many use cases. You should remember that Python provides a relatively small core (the standard library), which can be added to for specific needs or projects. Packages that add to the core are housed in the Python Package Index (usually PyPi), a resource to which anyone can contribute. PyPi provides a vast array of packages for many applications. Remember that Python has a standard library that covers most everyday applications, and PyPi adds packages for specific applications. Sometimes PyPi packages are so standard in use that they almost replace the standard package. It is not unknown for packages from PyPi to find their way into the standard library. An example of a package in PyPi that has almost, but not quite, got into the standard library will follow shortly, but first, why do we worry about managing packages? Any reasonably complex system written in any programming language will require packages. Hardly any system is ever written from scratch; most systems are built by combining and customising packages. So any system will probably require a bunch of packages, which are called its dependencies. Now, some of the required packages will in turn require other packages, and so on. This produces a network of dependencies, which can be quite complex. This quickly becomes a problem, and beyond most people to manage. So developers use a software tool to keep track of things. In PHP, for example, they would use composer, in Java maven or gradle. In Python, we use pip. For example, suppose we install package A, that requires packages B and C. It would be really neat if our package manager automatically installed the dependencies along with the main package. So installing package A will magically cause packages B and C to installed too. And if package B requires package D, then that will be installed too, and so on until all the dependencies are satisfied. This is what pip and friends will do for us. Usually these package managers can either be used to install a specific package (in which case they will also install its dependencies), or they can take a file listing the required packages as their input. Most can do either. There is more. It is not unusual for a package to depend on a specific version of another, most often just because it has not been tested with an updated version. Given that the second package might also depend on specific versions of other packages, this really shows why we need an automatic tool! Note: It follows from this that we would not necessarily use the latest version of every package. We would use a version that is know to work with all the specific versions of the other packages we are using. That said, we would expect our package manager to install the latest version that is compatible with all the other packages, and to be able to update versions as appropriate. It gets better. Suppose a developer has multiple Python projects on their trusty laptop. It is quite likely that each project will have its own dependencies on versions of packages. So it is impossible to install all the packages globally on the 6 Chapter 2. Getting Started Web Application Technologies, Release 0.01 system; they must be installed separately for each project. This need has the happy side effect that the developer will be able to include a list of the required packages along with the code for their application. This whole business leads us to the need for Python Virtual Environments. These allow developers to work with different collections of packages, and even different versions of Python, on the same machine. Note: Some package managers (most, actually) have a specific name for the file that contains a list of the dependencies. Python doesn’t, but it has a convention that is what every Python developer does. This is what is described below. Before moving on, it is worth noting that we are describing how Python solves this problem. But the problem exists for all other languages. So if you understand how Python’s pip works, you should have no trouble working with maven or composer in the future. 2.3 Virtual Environments Important: This section describes the basics of how to create and use a virtual environment. The exact commands depend on your operating system, command shell, and more. It is not possible to cover all the possibilities here, so you should refer to the docs for full information. There is also a full tutorial in the Python docs. As usual, the examples below use Linux. These commands should work on Mac. On Windows, they should work in GitBash. A virtual environment contains a copy of the Python interpreter, some admin tools (like pip), and any packages that are then installed. It is commonly stored in a folder called venv in the main project folder, but this is only a convention, and the folder name has no special meaning. Tip: The venv folder should not be stored in Git, so should be listed in the.gitignore file. This prevents confusion when, for example, developers are using different operating systems. To create a virtual environment in a folder called venv below the current folder, the command is: $ python -m venv venv Tip: The command to invoke the Python interpreter could be python3 or even py. You can find out what is correct on your system by experimenting in a terminal window. This command has just made local copies of the files in the current Python installation. To use the virtual environment it must be activated. It will be obvious when this has been done because the name of the environment folder will be shown in the prompt. Here is an example on a Linux system, using the bash shell (remember that alternatives are listed in the tutorial linked above): tony@myrddin:~/wat$ which python3 /usr/bin/python3 tony@myrddin:~/wat$ python3 -m venv venv tony@myrddin:~/wat$ source venv/bin/activate (venv) tony@myrddin:~/wat$ which python3 /home/tony/wat/venv/bin/python3 2.3. Virtual Environments 7 Web Application Technologies, Release 0.01 The Linux which command reports, ah, which file will be executed when a command is entered. If you check the second and last lines you will see that this has changed from the copy of Python installed globally in /usr/bin/ to a copy stored in the virtual environment. So, effectively, once the virtual environment has been activated, the python command points to a different copy of Python. Important: The commands you need on your system MAY WELL BE DIFFERENT. Refer to the full tutorial in the Python docs for details. This all means that now the virtual environment (often called a v-env, or venv) is activated, any packages installed will be installed in the virtual environment, and not globally. Tip: If this all seems very complicated, do not panic. PyCharm will do most of this for you! But it’s important to understand what it’s doing behind the scenes, and to be able to do this from the command line if needed. 2.3.1 Installing a Package As a simple example, the tabulate package is a widely used solution for printing out neat tables of data. So much easier than coding it by hand, working out the best column widths, and so on. Check its docs at PyPi for the details and examples. The package can be added to the virtual environment with some swift use of pip: (venv) tony@myrddin:~/wat$ pip install tabulate Collecting tabulate Downloading tabulate-0.9.0-py3-none-any.whl (35 kB) Installing collected packages: tabulate Successfully installed tabulate-0.9.0 This package requires no other packages, so the installation is very quick. See that we are told what version has been installed. Later in the module we will reach for the requests module to allow us to send HTTP (web) requests from a Python program. If we were to install that, we would see some dependencies being pulled down along with the package itself: (venv) tony@myrddin:~/wat$ pip install requests Collecting requests Using cached requests-2.32.3-py3-none-any.whl (64 kB) Collecting urllib3=1.21.1 Using cached urllib3-2.2.2-py3-none-any.whl (121 kB) Collecting certifi>=2017.4.17 Using cached certifi-2024.6.2-py3-none-any.whl (164 kB) Collecting charset-normalizer=2 Collecting idna=2.5 Using cached idna-3.7-py3-none-any.whl (66 kB) Installing collected packages: urllib3, idna, charset-normalizer, certifi, requests Successfully installed certifi-2024.6.2 charset-normalizer-3.3.2 idna-3.7 requests-2.32.3 urllib3-2.2.2 Again, we are told the specific versions that have been installed, and we can be confident that all the dependencies have been met. 8 Chapter 2. Getting Started Web Application Technologies, Release 0.01 Note: pip uses a cache, so it will download a package only if there is no copy locally. In the above, all the packages were already on the system, being used in another project, so there was no download. If you try it, you will probably see the packages being downloaded. 2.3.2 Recording Packages It is important to document what packages are needed by a project, so that another developer can take it on with the minimum fuss. The conventional way to do this with Python is to include a file called requirements.txt in the top folder of the project repo. The file does belong in Git! The file is simply generated by pip, like so: (venv) tony@myrddin:~/wat$ pip freeze > requirements.txt The generated file contains the package names, and versions, that are currently in use. Here it will include the two packages that were explicitly installed, along with the others that requests needed: (venv) tony@myrddin:~/wat$ cat requirements.txt certifi==2024.6.2 charset-normalizer==3.3.2 idna==3.7 requests==2.32.3 tabulate==0.9.0 urllib3==2.2.2 So by checking this file, any developer (and any IDE) can see what packages are needed. PyCharm will detect duch a file and offer to install the dependencies, for example. 2.3.3 Installing Required Packages The requirements.txt file also allows a developer to quickly install the required packages. This is the command (in a new virtual environment): (venv) tony@myrddin:~/wat$ pip install -r requirements.txt So the complete set of commands to create a new virtual environment, and install all the packages is as below. Again, the packages will have been cached from the earlier commands, so no fresh downloads are needed. tony@myrddin:~/wat$ python3 -m venv new-venv tony@myrddin:~/wat$ source new-venv/bin/activate (new-venv) tony@myrddin:~/wat$ pip install -r requirements.txt Collecting certifi==2024.6.2 Using cached certifi-2024.6.2-py3-none-any.whl (164 kB) Collecting charset-normalizer==3.3.2 Using cached charset_normalizer-3.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_ ˓→x86_64.whl (142 kB) Collecting idna==3.7 Using cached idna-3.7-py3-none-any.whl (66 kB) Collecting requests==2.32.3 (continues on next page) 2.3. Virtual Environments 9 Web Application Technologies, Release 0.01 (continued from previous page) Using cached requests-2.32.3-py3-none-any.whl (64 kB) Collecting tabulate==0.9.0 Using cached tabulate-0.9.0-py3-none-any.whl (35 kB) Collecting urllib3==2.2.2 Using cached urllib3-2.2.2-py3-none-any.whl (121 kB) Installing collected packages: urllib3, tabulate, idna, charset-normalizer, certifi, requests Successfully installed certifi-2024.6.2 charset-normalizer-3.3.2 idna-3.7 requests-2.32.3 tabulate-0.9.0 urllib3-2.2.2 2.3.4 More Package Management This is enough to get by in managing packages with pip. Obviously pip offers more, including commands to delete packages, and update versions. There is even, confusingly, a command to update pip itself. Full docs on pip have all the details. 2.4 PyCharm and Virtual Environments It is possible to manage packages entirely within PyCharm, and probably in many other IDEs. But, as usual, it is very, very important to know the actual commands, or at least to know where to find them. It really is much easier and quicker to type a command than it is to hunt through a bunch of possibly unfamiliar menus. You may remember that PyCharm provides the option to create a virtual environment when a new project is created. We avoided this before, because it adds some complexity, and was more than we needed for introductory programming. But now you might want to use it - PyCharm will even take care of activating it for you (remember to check the command prompt). In addition, if you clone a new Python repo from a remote, PyCharm will spot a requirements.txt file and will automatically offer to download and install the required packages in a new virtual environment. 2.5 Worksheet: A Python Refresher This exercise will get you back up to speed on using Python, and its associated tools. By the end you should understand how to create a Python virtual environment, and how to use Git in conjunction with one. As you might expect, we have a set of recommended tools for you to use, but you are welcome to use any others, provided you understand that they we are not able to answer questions about them. Specifically, we recommend: PyCharm Professional Edition: This is installed in the labs, and is available to you if you create a JetBrains account with a student email. (The Professional Edition has some features for web development that will come in handy later.) GitHub: You should be used to Git and GitHub by now, and you should have an account already setup. (You should also now take another look at the GitHub Education Pack, which includes many free things that will be useful later, as well as for your future career. If working on your own system, you will also need Python. Install the latest version, which will probably be a little more up to date than that installed in the labs. 10 Chapter 2. Getting Started Web Application Technologies, Release 0.01 2.5.1 Creating and Cloning Start by creating a new repo on GitHub. The button is quite well hidden, but you are looking for the + symbol to the left of your avatar on the Dashboard page. Give it any name you like (just accept the default) as you will be throwing this away. Choose the.gitignore file for Python, and include a README.md file. Tip: A good GitHub presence is essential for getting a placement, and for your career. If you have not already done so, now is a really good time to upload an avatar image, and to complete the rest of the information in your profile. This matters much, much more than your CV. Really. Once the repo is created, open up PyCharm, and clone the repo. Precisely how you do this depends on a few things - it will likely be different at home and in the labs due to the locked-down nature of the lab machines. 2.5.2 Setting the Interpreter Once cloned, you can see the Python interpreter that will be used for the project at the bottom right of the PyCharm window. Either PyCharm will have picked a default (based on previous projects), or it will be indicating that no interpreter has been chosen. In either case, that needs to be fixed so that the project uses a virtual environment. Click the name of the interpreter, and you will be shown the options. We want a new venv, so pick Add New Interpreter, and then Add Local Interpreter on the sub-menu that appears. On the new window, a Virtualenv Environment should be chosen by default, based on the most recent interpreter on the system. The New box should be ticked, and make sure that the Inherit Global Site Packages box is not ticked, and click OK. If all goes well, you should see a new folder called venv appear in the project. Its name will be coloured to indicate that it will not be included in Git (the colour will depend on what colour scheme you have chosen for PyCharm). Open a terminal window in PyCharm, and you should see that the prompt is prefixed with (venv) to show that the environment is active. Important: You will see that PyCharm has done most of the work for us here. Be sure you review the notes from above so that you understand what commands have been issued behind the scenes. One day you will have to do this without hand-holding from the IDE. You should also see that the name of the interpreter for the project (bottom-right of PyCharm) now includes the name of the project. So this interpreter, and all its packages, are specific to this project. Note: You may also see a pop-up offering to add project IDE settings to Git. It is up to you whether you do this. On a single-person project this is probably OK (it remembers which files are open, and so on), but on a project with several developers it is not the best idea. To make sure the settings are not stored in Git, open up the.gitignore file, and uncomment the like for the.idea folder (probably at the very bottom of the file). Now we have a environment, we can install a package. 2.5. Worksheet: A Python Refresher 11 Web Application Technologies, Release 0.01 2.5.3 Git Revision But first, your README file is a chance to refresh your knowledge of Git. So make some changes to the file, perhaps just by adding a brief summary of the purpose of the repo. Then add the new version of the file, make a new commit (with a meaningful message, of course), and push the new version to the remote. Important: We strongly recommend that you do this from the command-line. It is very important to know the (very few and simple) commands needed, and not to rely on a specific IDE. If you don’t, it will be very awkward later on when you claim knowledge of Git, but only really know how to hunt out menu items! 2.5.4 Installing tabulate Programs often output tables of data, and formatting the tables neatly can be a chore. For this reason there are several packages available in PyPi that format data neatly in columns, provide grid lines, and so on. One such is tabulate, all the details of which are on its PyPi page. To install, make sure you have activated the virtual environment, and: $ pip install tabulate The package will be downloaded (if needed), and installed. Remember that pip uses a cache, so if there is already an up-to-date version on your machine, and pip can find it, no download is needed. Note: On your first use of pip you may well see a notice that pip itself can be updated. If you do, go ahead and do it - the command is in the notice. If you open up the venv folder, and then lib within it, you will see the new folder that contains the tabulate package. To check that the package works, try a short program. You can see from the docs that the package takes a list of lists (or some other suitable data structure), and displays it. There are various options for headers, grid lines, and so on. Here is a short program that uses it: Listing 1: Using the tabulate Package #!/usr/bin/env python3 from tabulate import tabulate def f2c(f): return round((f - 32) * 5 / 9, 2) if __name__ == '__main__': temps = [[f, f2c(f)] for f in range (30, 60)] print(tabulate(temps, headers=['F', 'C'])) Refer to the docs and tinker with the program to see what the package can do. 12 Chapter 2. Getting Started Web Application Technologies, Release 0.01 2.5.5 Time to Commit With the program complete, it is time to commit the source code to the repo. Important: Remember that we do not commit the virtual environment. There is no point - for one thing a venv built on Windows is no use at all to a developer who uses a proper operating system. We need to commit the code, along with the information needed to rebuild the required environment. The conventional way to do this is to add a requirements.txt file into the repo. To build the requirements.txt file, in the top folder of the project, issue the command: $ pip freeze > requirements.txt For your simple project, there should be just one line in this file, indicating that tabulate is used, and specifying the version. Obviously for more complex projects, this file will be much larger. Finally, the code can be committed, along with the requirements.txt file. Do this from the command line, using the usual commands. Then push to the remote repo. 2.5.6 Checking the Repo Finally, check the repo on GitHub. If all has gone to plan it should: Contain the program that uses the tabulate package. Contain a requirements.txt file. Not contain the virtual environment from the venv folder. Finally, delete the work from your local machine. Then clone the repo again. Create a virtual environment as before. If you run the program now, it will fail. And PyCharm will highlight errors. Why? If you issue the command: $ pip install -r requirements.txt all will be well. Make sure you understand why! 2.6 Summary This section has introduced a key concept when working with advanced projects in Python - the virtual environment. Python provides a core of packages that can be used across many projects, and types of project. The venv allows you to include some of the many additional special-purpose packages that are available. This will become very important as we come to work with those packages with specific applications on the web. We have also revised the basics of Git. Using Git to manage source code is a key part of working on the web, especially when working on a project of any size, where multiple developers will be involved. And we have touched on some of the issues that can arise when working with others, specifically how to share the setup of the virtual environment. Now we can move on to understand how the web works, and to build programs that send requests over web protocols. 2.6. Summary 13 Web Application Technologies, Release 0.01 14 Chapter 2. Getting Started CHAPTER THREE HOW THE WEB WORKS Before we can develop applications for the web we need to have some understanding of how everything works. This will then allow us to fit all the different technologies into their correct places. Technologies change over time, of course. Some gain and lose popularity, new techs appear, and others disappear as their use dwindles. However, the basic principles of the web have remained the same since its inception. The web was originally intended to be an interconnected collection of documents, and it was created with mainly scientific ideas in mind. The basic idea was that related documents could be linked together, and that hyperlinks would allow a reader to move quickly between them. This was a revolutionary idea at the time, and it has since evolved into the web we know today. The web we know now, with interactive websites, video, games, and so on is very far from this original idea, but the underlying technolgoies are much the same. 3.1 Client-Server The web is based on a client-server model. A client sends a request to a server, and receives a response. A client was originally a web browser, but these days it might be an application running on a phone, a tablet, a desktop computer, or even a server. The term user agent is used to cover all the possible client types. A server is a computer that listens for requests and sends responses. The server is usually a powerful computer that is always available, and is connected to the Internet. The client is usually a device that is not always on, and is not always connected to the Internet. Note: A server is a single computer, but only conceptually. A business like Amazon, for example, does not run its website on a single computer. Rather it uses hundreds of computers, all working together, and controlled by other servers that manage traffic, allocate work, and balance the load. But conceptually, it is one server. A client request might be something simple, such as a request for the contents of a specific document. Or it might be more complex, for example when user credentials and passwords need to be sent. If a user has completed a form on the website, the request will also transmit the content of that form. The server will send back a document. This will be preceded with a code that tells the client how the request went - basically whether it succeeded or failed. 15 Web Application Technologies, Release 0.01 3.2 HTTP The communication between client and server uses a protocol called HTTP (HyperText Transfer Protocol). We don’t need to concern ourselves with the details of how HTTP requests are transmitted, but we do need to understand how the requests are used. Important: We will assume here that our client knows the address of the server, and is able to transmit HTTP. Obviously there is a lot to the transmission, but that is lower level, and is why we have SysAdmins. A whole load of other protocols are involved, but they are beyond our scope here. To use the correct jargon, HTTP operates at the Application Layer of the Internet Protocol Suite, and we are not going to concern ourselves with anything below that. HTTP is a stateless protocol. This means that every request happens in isolation, with no overall session state main- tained. The client sends a request, gets a response, and it is up to the client software to decide what to do next. HTTP typically uses TCP port 80 on the server (which is why this number often crops up in the names of web de- velopment and hosting companies). If the traffic is encrypted (that’s HTTPS), port 443 is usually used1. For local development (more on this below), commonly used ports are 8000, 8080, 8888, which you will probably see in docs and tutorials. 3.2.1 URLs These days, URLs (“web addresses”) are a common concept. They are the main part of an HTTP request. In the early days of the web, a URL was usually a reference to a specific file on the server, so: http://www.example.com/reports/report001.html would be pointing to a file called report001.html in a folder called reports on the www.example.com server, which file was to be accessed over HTTP. These days it is not quite simple. The change is quite a big one. The first step was to have URLs that pointed to some program code. Long ago, this would have been in Perl, more recently it would probably be PHP: http://www.example.com/cgi-bin/forecasts.php So here the request is effectively asking for the PHP script forecasts.php to be run, and is expecting the output to be returned. The growing importance of search engines, among other things, pushed developers to create sites with URLs that didn’t necessarily point to a file. So now we might see URLs like: http://www.example.com/products/posters/turing/ The client still sends this, but it is less clear what the server will do. But the important thing is that the URL is clear, and can stay the same, even if the way in which the return data is created can change. We will think more about good URL schemes later. 1 These ports can be changed, and sometimes are in systems that need to be very secure. Changing from the default port can make an attacker’s job slightly harder. 16 Chapter 3. How the Web Works Web Application Technologies, Release 0.01 3.2.2 HTTP Requests An HTTP request starts with a method that indicates the type of request being made. Note: HTTP methods are often called HTTP verbs. No-one knows why, but it’s become so common that everyone assumes it must be correct, even if it isn’t! We’ll call them verbs from now on, because it really is the more common name. At the last count there are nine HTTP verbs, but for most purposes two are sufficient: A GET request asks for the content at the specified URL. A POST request sends some data (the payload) to whatever is at the specified URL. So, for example, if you click a link to view a new page on WikiPedia, that is a GET. If you have entered some data in a search box, and want to see the results, that is a POST. It is possible to send data along with a GET request. You have probably see URLs that do it, like this: http://www.example.com/products?query=turing&size=a4 This is fine, and is still widely used (even if it is being replaced by the more friendly URL schemes above). The drawback of it is that the data is visible in the URL; this can cause issues with saving bookmarks, and is obviously not an ideal way to send login credentials! 3.2.3 HTTP Responses After the request has been processed, the server sends back the relevant data (usually the HTML code that makes up a web page), along with a status code to indicate what has happened. These codes can be picked up by the client in order to determine ehat should be displayed. For example, the server might indicate that there is nothing at the requested URL, or that the client does not have permission to view what was requested. The two most common codes are: 200 Everything worked, and the request has been fulfilled successfully. 404 There was nothing to be found at the requested URL. Obviously a web app (or any other client) will need to behave differently in these two cases. There are many other codes, but it is usually enough to remember that they fall into categories: 100 - 199 Informational Messages 200 - 299 Successful Messages 300 - 399 Redirection Messages 400 - 499 Client Error Messages 500 - 599 Server Error Messages The correct action for an app to take can be determined by the type of message. A Client Error means that it may be worth sending the request again, possibly after correcting it. But if there is a Server Error there is something more more fundamentally wrong! 3.2. HTTP 17 Web Application Technologies, Release 0.01 3.2.4 HTTP and APIs We tend to think of HTTP as the protocol that “gets” web content and displays it in a browser, but it is often used for more than that. A common use of HTTP is to query an API. It works like this. A server advertises a number of end-points (basically URLs) which can receive requests. A client can sent HTTP requests to these end-points, and will receive responses. Optionally, the client can send a payload of data that forms a query for the API. The server sends back a response, consisting of an HTTP code (above) and more data. For example, many bus companies provide an API to allow querying of current bus locations. Armed with a unique number for a bus stop, the API can respond with the next buses due at that stop, and even other information such as how busy they are. If this API is made public, anyone can write apps that use it. Note: Check the app store for your phone, and you will find may apps using APIs like this. Common applications are public transport, weather, currency exchange rates, sports scores, and more. There is even an API from Leeds City Council that will tell you when your bins are due to be emptied! If an API is public its results can be included in a web app. So this is a handy way to include relevant and possibly useful data on a website - the local weather is a common one. When clients and servers interact like this there is obviously a need for them to send messages. And this is a case where some standards will make life much easier - it would be very dull if every API defined its own format for exchanging messages. The most common “language” at the moment is JSON, a simple text-based format for encoding data. (Python has a module that will encode and decode JSON with limited effort.) We will use a simple API in the worksheet below. 3.2.5 API Tools: Postman If a web app is going to make a large number of API calls it is often useful to have a tool that can create API requests, send them, and display the results. Using such a tool aids debugging as well as giving confidence that the API calls used in the app are correct. One of the most widely used tools is Postman. Postman exists as a downloadable tool, a web browser plug-in, and a website. As with may tools in this marketplace, it offers a “freemium” version where most of its features are available, with restricted usage. With Postman, a developer can specify the end-point (URL), the type of HTTP request, and supply any payload. Postman will then submit the request, and display the results in human-friendly format. We will use Postman to test API calls later. 3.2.6 API Tools: curl We will just mention one other tool commonly used for API testing. curl is a command-line utility that is often used for quick and easy testing of APIs and URLs. It should be installed on most Unix systems, so is something that Linux users might want to check out. A related command-line tool, wget can be used to download resources from the web without the need for a web browser session. 18 Chapter 3. How the Web Works Web Application Technologies, Release 0.01 3.3 Clients (User Agents) So, a client in the context of HTTP is anything that can send a request. The most common type of client is probably a web browser, and it has been for some time. But increasingly apps also need to communicate over HTTP (although many modern apps are just web browsers in disguise!). So it is useful to think of a client as anything - even, as we will see, a short Python program - that can send HTTP to a server. As part of an HTTP request a client will usually identify itself in the header of the request. The information sent usually includes the browser type (Firefox, Chrome, etc) and the operating system, along with versions of both. This is how the server can, for example, direct a client to appropriate downloads for the user’s operating system. Note: In the early days of the web, it was common for sites to require specific browsers, usually some flavour of Microsoft’s Internet Explorer. A common workaround for users of other browsers was to spoof the header to make the site think it was dealing with MS IE, while really the client was a proper browser. This information is mildly useful for sites that need statistics on their users. It should not be used to serve different versions of a site to different browsers, unless there is a very good reason! A long time ago, developing for the web was a nightmare in terms of creating apps that would work under any browser. Thankfully, this is now mostly a thing of the past, but you may still encounter sites that have hacks to deal with the shortcomings of early browser. Microsoft Internet Explorer 8 was particularly troublesome. Important: When working with the web, you probably think in terms of using a web browser to access web pages on a server. It is worth remembering, that many of your users think of “The Internet” and simply have no concept of what is really going on. This is annoying, but we need to remember that it’s the users as pays the bills. The big shift over the past few years has been away from browsers running on desktops and laptops towards mobile. In 2022, over 40% of clients were running Android, and 17% Apple iOS. This points to mobile first being an important concept in development. This is not trivial, especially if we consider that most desktop and laptop PCs present a landscape display, whereas most mobiles are used in portrait. Web development has rotated by 90 degrees! 3.4 Servers While a web server is a computer, it obviously needs some software. The software needs to receive the HTTP requests, find the resources requested, and send these back. It also needs to be able to handle multiple clients making requests at the same time, and needs to be able to serve multiple website from the same server. For development purposes, we don’t need to know very much about how the server software works, but it is useful to be able to recognise the names of common solutuons. For a long time, Apache was the dominant software choice. More recently, nignx (pronounced “engine-x”) has taken over as the main player. There are others, including an offering from Microsoft, but these two are the dominant choices. As regards an operating system, a web server is probably running Linux, or another Unix-like operating system. The figures are something like 70% of servers running these, around 25% Windows, and 5% others. Linux scores here on being free, well tested, and ultra-reliable. Hint: Most modern web development frameworks provide a built-in web server that can be used to test the site. These are usually very lightweight, not fit for production environments, but they “do the job”. 3.3. Clients (User Agents) 19 Web Application Technologies, Release 0.01 Python has its own built-in web server that “just works”, and can be used when developing. The command: $ python -m http.server --directory www will serve the contents on the www folder on port 8000 of the local machine. (The port can be changed using command- line flags, as can the folder). We will use this when we come to create web pages. 3.5 Data-Driven Sites Most website are driven by data. Originally, the web was a collection of static documents, but things have moved on. There is also a need for websites to provide more interactive content, along with multimedia. But behind most websites is a good data store. Note that we don’t say database. The data store for a website could be a database, but it could also be anything else that can provide data, such as a separate server that accepts HTTP requests and returns data via some API. The most common type of database to sit behind a website is still one that uses the relational model. MySQL used to be the most common choice, but since its acquisition by Oracle, MariaDB has gained a lot of users. PostgreSQL is another solid choice, and is arguably more powerful that MariaDB. It is equally free. You will know that relational databases have been around since the 1970s, and over 50 years of development work mean that they are now very powerful and very reliable. They are a sound choice for many web apps, but maybe not for all. Relational databases make some assumptions about the data; mainly that it can be made to fit neatly into tables, but also that the nature of the data is known and unchanging. These assumptions can break with modern web apps. Take a web site like Google Maps. Clearly this is using a data store, but could that mapping data fit into tables and columns? Even if that could be managed, how could traffic data (that changes every few minutes) be included? And exactly how much data is there in mapping the whole world? Solutions to problems like this are grouped under the heading of NoSQL databases. Basically, they are just that - databases that do not use the SQL language, and so are not restricted to all the rigid rules of tabular data. This is not the place to go into them, but it is worth noting that they exist, and will become more and more important. 3.6 Tech Stacks Web developers often refer to stacks of technology. This commonly occurs when we talk about a LAMP stack - Linux, Apache, MySQL, PHP - as a deployment platform. (These days the “P” can be Python, in older times it could be Perl.) LEMP is the same, except that nginx replaces Apache. The idea here is that certain techs work well together, and gain traction as a unit. Cloud providers will often offer servers with a LAMP/LEMP stack ready installed, for example, as they know it will fit the needs of many users. There are many other stacks out there, each featuring different technologies. For example, a MEAN stack is MongoDB, Express.js, Angular.js, and Node.js. This is a popular stack for developing web applications, as it allows developers to use JavaScript for both the front-end and back-end of the application. Some products have been developed that package up a tech stack into a single convenient download. XAMPP ships a stack of Apache, MariaDB2 , PHP, and Perl, and is available for all common desktop operating systems. Laragon offers something similar. Using a tool like this can be much easier than installing the components separately. 2 For many years, XAMPP shipped with MySQL, but the Oracle takeover ended this. Remember that MariaDB is a drop-in replacement for MySQL, with the advantage of being free, as in freedom. 20 Chapter 3. How the Web Works Web Application Technologies, Release 0.01 3.7 Worksheet: Querying an API with Python In this exercise you will develop a Python program that queries an API. The example will be finding a local weather forecast. This will be based on using geolocation on the client’s IP address, and then querying a site that provides free basic weather information. Warning: The forecast will be quite accurate if we run it in the labs, but may well be less so if you run it from home. Do not be surprised if you get odd results from home. You might want to think about why this is! As part of this exercise you will need to create an account with two API providers: one provides geolocation based on IP address, and the other is our chosen weather provided. This is so that they will provide you with an API Key (sometimes called an access token), which is basically a token that identifies your requests. This allows to site to manage the volume of traffic it serves for free. Your traffic will be very limited, so you could probably get away with sharing a key with a couple of friends. Important: You will also be asked to create other accounts. These should all be free, and were indeed all free when this document was written. At the time of writing, none of them required any payment details. If you are asked for credit card details, please stop, and tell us! Tip: Some sites we will use, for example Postman, are likely to be useful to you in your future career. We are using them specifically for this reason! You might therefore prefer to create your account using a personal email address, so that you will not lose access after Uni. This is absolutely fine. 3.8 A Note on Keys Keys do not belong in Git! For obvious reasons, API keys and login details should never be stored in Git, and certainly never stored online in GitHub. In our current example, little damage would be done if the keys were compromised, but think about what would happen if, say Amazon Web Services credentials were leaked. Keys can given eviltons access to your account, and they can run up charges on your credit card. The usual workaround for this is to store these details together in one file (often called conf.py or secrets.py) and to include in the Git repo a file that indicates what needs to be provided. That is what we will do here. 3.9 Preamble: Postman We will use Postman later to check the API calls before we include them in a program. This should help debugging! Create a free account on Postman, and have a look at the available help. There is a bit of a learning curve (this is actually a very powerful tool!), but make sure you know how to send HTTP requests and see the resulting response code and JSON payload. There is a lot going on in Postman, but we really do only want one screen. Tip: Postman is a good name to drop on to the CV if you plan to work in web development. For this reason, as we have said before, you might want to create your account with a personal email. This is fine. 3.7. Worksheet: Querying an API with Python 21 Web Application Technologies, Release 0.01 To get started you need to create a Workspace in Postman. All the defaults (at the time of writing, anyway) should be fine. As a very simple test, use Postman to send a GET request to http://www.leedsbeckett.ac.uk/. After a short pause, you should see a 200 code (success!) along with the code that makes up the main Uni web page. Try a page that doesn’t exist (http://www.leedsbeckett.ac.uk/nosuchpage/, say) and you will see a 404. (Note that if the domain doesn’t exist, Postman will fail to send the request with a DNS error, which is a totally separate problem.) We will use Postman shortly to check our first API call. 3.10 Part 1: Where am I? 3.10.1 Creating the API Call Our first program will use our IP address to find our current geographical location. Important: Remember that this needs to be the IP address as it appears to the Internet. This may not be the IP address that an individual machine has, and on a home network it almost certainly won’t be. We will do this by sending a request to a site that provides this service. Have a browse of https://ipinfo.io/ and you will see what this site offers. Essentially it is a reliable way to customise web content based on the location of the user; so prices could be displayed in the local currency, for example. The program we will write will follow the common pattern of reading some data, getting some results, and displaying them. It will: 1. Form an HTTP request to the IpInfo API, and send it. 2. Receive a response, in JSON format. 3. Parse the JSON to extract the required result, and display it. IPinfo follows a similar pricing model to Postman, and offers a “freemium” account which will be quite enough for our purposes here. You need to create an account (a quick way to do this is via your GitHub account). Tip: On your account page in IPinfo (probably https://ipinfo.io/account/home) you will see a Your IP Address button at the top. Click and see what IPinfo can work out from your IP address. This is the information we are after. IPinfo charges business users for API accesses, so it needs to identify who is sending the requests. This is done via a key, or Access Token. You can see yours at the bottom of the account page, along with an example API call. We will send the token as part of a GET for simplicity, and because this is what is shown! IPinfo allows us to specify an IP address, in which case it will give information about that IP. For example, the IP address range for the Leeds Beckett is 160.9.0.0 to 160.9.255.255, so if we pick a random address in the Uni (if you’re in a lab, use that of the PC in front of you): tony@myrddin /home/tony/wat$ curl "ipinfo.io/160.9.16.21?token=yourtokenhere" { "ip": "160.9.16.21", "city": "Leeds", "region": "England", "country": "GB", (continues on next page) 22 Chapter 3. How the Web Works Web Application Technologies, Release 0.01 (continued from previous page) "loc": "53.7965,-1.5478", "org": "AS786 Jisc Services Limited", "postal": "LS1", "timezone": "Europe/London" } Tip: Windows users can get hold of curl quickly by installing GitBash. Starting the bash command prompt provides access to command-line curl. So here curl (you can use GitBash or Postman) has retrieved some JSON that gives is basic information about that host. If we omit the IP address, we get information about the current machine. So a typical home connection might yield: tony@myrddin /home/tony/wat$ curl "ipinfo.io?token=yourtokenhere" { "ip": "2.219.249.169", "hostname": "02dbf9a9.bb.sky.com", "city": "London", "region": "England", "country": "GB", "loc": "51.5085,-0.1257", "org": "AS5607 Sky UK Limited", "postal": "E1W", "timezone": "Europe/London" } This also shows how geolocation is difficult if we just rely on IP addresses, especially from home connections. The home where this command was run was in Leeds, but the IP is registered as being from London. Networks of businesses (especially large ones) are likely to be more accurate, though. Ignoring the probable inaccuracy, we now have what all that we need to know in order to write our program: The endpoint is ipinfo.io?token=yourtokenhere. The API call can be a GET. We receive a JSON response that contains, inter alia, city and location. 3.10.2 Programming So, the required program will work something like this: # Read the access token from a separate file. # Construct the API call. # Send the API call as a GET. # Receive the results. # Parse the JSON results to get the required information. # Display the results. The standard way to send HTTP requests in Python is by using the urllib module from the standard library. This can be quite cumbersome, so most developers reach for the requests module from PyPi instead. Start by creating a new repo and project, with a virtual environment, and include requests in the usual way via pip. There needs to be a way to manage the API keys we will ise. In the project, create secrets.py to include your access token. It could look something like: 3.10. Part 1: Where am I? 23 Web Application Technologies, Release 0.01 Listing 1: Template for secrets.py #!/usr/bin/env python3 IPINFO_KEY = '' Tip: Obviously you can call this file anything you like, but secrets.py is a common name. In a real application you would probably also include a file called something like secrets_template.py that would be checked in to Git, and would contain dummy values. This would be a good idea for this project, too. Remember that this new secrets file should not be stored in Git, so it will need to be listed in the.gtignore. Now we can write a function that will build the API endpoint. It will be a one-liner once the access token is imported. There is a small tweak from the examples so far to state explicitly that we want a JSON response. Listing 2: Creating the Endpoint def build_query_url(): from secrets import IPINFO_KEY return f'https://ipinfo.io/json?token={IPINFO_KEY}' The next step is to send the request. This is easy with the requests module. The result comes back as JSON, and we can use the Python json module to transform this into a dictionary. Another short function does the job (note that the function calls the function above to get the URL - this keeps the code portable). Listing 3: Getting the Response def get_all_from_ipinfo(): response = requests.get(build_query_url()) facts_dict = json.loads(response.text) return facts_dict You can check the docs for the two modules being used (remember to import them) if you want details. Finally, we just need to look in the dictionary for the location. For our final aim, the location (longitude and latitude) will be useful, but for the moment we can just display the city. You could print out the dictionary to see the key values, or the code will be something like: Listing 4: Printing the Results if __name__ == '__main__': try: facts = get_all_from_ipinfo() print(f'Your current IP address is {facts['ip']}, which is probably located in ˓→{facts['city']}.') except (requests.ConnectionError, requests.HTTPError, requests.Timeout,): print('Error accessing the IPInfo server.') The code here includes some error-handling in case the IpInfo server is inaccessible. 24 Chapter 3. How the Web Works Web Application Technologies, Release 0.01 You can find a complete version of this program at the end of this section of the module. Before we move on, you might have spotted that IpInfo also provide a Python package, and that it is available on PyPi. This simplifies the code, and hides all the JSON. It requires some use of Python OOP, so is out of scope here, but there is a version at the end of this section for those who are interested. 3.11 Part 2: What’s the Weather? You should have the idea by now, so should be able to complete this with a few hints. There are actually many ways to do this, but this is how the solution at the end of this section works: 1. Create an account at OpenWeather. 2. Generate an API key for OpenWeather. Include it in secrets.py. 3. Use your existing IPInfo program to get the probable longitude and latitude for the IP address. 4. Plug the API key, the longitude, and latitude into an API request, and fire it off! 5. Convert the JSON response into a Python dictionary, as before. 6. Find the weather in the dictionary, and print. (Hint: Print out the whole dictionary, and hunt!) A complete program is at the end. Note that it uses two functions that have been added into the first program we created. These are not needed for the program they’re in, but they are useful elsewhere. This is proper OOP thinking! Make sure you have copies of all your programs in your GitHub repo, and commit. Then, it’s time for a coffee. 3.12 Summary You should now have a fair idea of what goes on when a web application is running. There is a lot to take in, but it will all become clear as we explore how web apps are created. You need to know: That HTTP is the protocol of the web, and that it is stateless. That HTTP requests are sent from a client to a server, sometimes with a payload. These requests can be GET or POST (and some others, which we do not need at the moment). An HTTP response consists of a code, and response data. Data is usually exchanged using JSON. Many website provide an API that can be queried using HTTP. Phew! Now let’s get on with creating some simple web pages. It’s time for some HTML. 3.11. Part 2: What’s the Weather? 25 Web Application Technologies, Release 0.01 3.13 Code Listings 3.13.1 Finding Location with IPInfo #!/usr/bin/env python3 import json import requests def build_query_url(): from secrets import IPINFO_KEY return f'https://ipinfo.io/json?token={IPINFO_KEY}' def get_all_from_ipinfo(): response = requests.get(build_query_url()) facts_dict = json.loads(response.text) return facts_dict def get_location_from_ipinfo(): return get_all_from_ipinfo()['city'] def get_long_and_lat_from_ipinfo(): location = get_all_from_ipinfo()['loc'] return tuple(location.split(',')) if __name__ == '__main__': try: facts = get_all_from_ipinfo() print(f'Your current IP address is {facts['ip']}, which is probably located in ˓→{facts['city']}.') except (requests.ConnectionError, requests.HTTPError, requests.Timeout,): print('Error accessing the IPInfo server.') 26 Chapter 3. How the Web Works Web Application Technologies, Release 0.01 3.13.2 Finding Location with IPInfo and ipinfo Package #!/usr/bin/env python3 import ipinfo import requests def ipinfo_query(): from secrets import IPINFO_KEY return ipinfo.getHandler(access_token=IPINFO_KEY).getDetails() def get_location_from_ipinfo(ip_info): return ip_info.city def get_long_and_lat_from_ipinfo(ip_info): return tuple(ip_info.loc.split(',')) if __name__ == '__main__': try: facts = ipinfo_query() print(f'Your current IP address is {facts.ip}, which is probably located in ˓→{facts.city}.') except (requests.ConnectionError, requests.HTTPError, requests.Timeout,): print('Error accessing the IPInfo server.') 3.13.3 Getting Current Weather Conditions from IP Address #!/usr/bin/env python3 import json import requests from my_ip import get_long_and_lat_from_ipinfo def build_openweather_url(location): from secrets import OPENWEATHER_KEY as api_key lat, lon = location return f'https://api.openweathermap.org/data/2.5/weather?lat={lat}&lon={lon}&appid= ˓→{api_key}&units=metric' (continues on next page) 3.13. Code Listings 27 Web Application Technologies, Release 0.01 (continued from previous page) def get_current_conditions(openweather_response): response_dict = json.loads(openweather_response) return response_dict['weather']['description'].title() def get_current_temp(openweather_response): response_dict = json.loads(openweather_response) return response_dict['main']['temp'] if __name__ == '__main__': try: location = get_long_and_lat_from_ipinfo() query = build_openweather_url(location) response = requests.get(query).text print(f'The current weather at your location is "{get_current_ ˓→conditions(response)}".') print(f'The current temperature at your location is {get_current_temp(response)} ˓→C.') except (requests.ConnectionError, requests.HTTPError, requests.Timeout,): print('Error accessing the IPInfo server.') 28 Chapter 3. How the Web Works CHAPTER FOUR WEB DOCUMENTS AND HTML 4.1 A brief history of HTML Table 1: HTML Specification (by Year) HTML Specification Year Additional Notes (ap- prox) HTML 1.x 1992 HTML 2.x 1994 HTML 3.x 1997 Short lived - browser wars and IE looking to implement own tags HTML 4.x 1997 Introduced 3 variants; strict, transitional and frameset XHTML 1.x 2000 Focused on rewriting HTML to conform to XML, no longer being developed as a separate standard HTML 5.x 2011 There are several ways of viewing the history of HTML depending on the purpose of that review. In this document, the focus is on how the specification evolved to match new features that web developers and designers sought. However a fuller review would also take into account the changing technology landscape that occurred at this time. One should not forget the browser wars of the late 1990s where ownership of the internet was under threat. From our perspective a better representation of this table and HTML specifications is: Table 2: HTML Specification (by Requirement) HTML Specification User Requirement HTML 1.x Text / Terminal type browsing see Lynx HTML 2.x Graphical Browsers (Netscape, IE) HTML 3.x See browser wars HTML 4.x Interactive browsing, dynamic pages and javascript XHTML 1.x Encouraging good practice and strict use of elements (tags) HTML 5.x Learning from all of the above and pragmatism Internet usage and browsing has changed significantly since the early 1990s. Back then, most browsing focused on information gathering and accessing static pages. Static pages referring to pages that have no content that changes. As bandwidth and network speed increased from 14.4Kbps to 54Kbps and above then the type of content and user expectation from the internet changed. With these changes came the requirement to update specifications. As we delve into HTML further in the next sections, what we will see if that specification changes predominantly expand the size of HTML and lay foundations for adding interactivity (and presentation) to the pages that are viewed. 29 Web Application Technologies, Release 0.01 This can easily be seen by contrasting what a text based browser like Lynx would have provided in the raly 1990s to the rich interactive pages that we expect to view these days. 4.2 Terminology There is a language to web development. Some will be specific to HTML, others to the field as a whole. A non- exhaustive list of key terms used in this section is included below. Table 3: Some Key Terms NAME DEFINITION element or tag defines a portion of a document that is marked up. This provides the semantic meaning, i.e. being a top level header DOM (Document Object Model) This is the tree like structure that a document retains in memory. A re- quirement for a webpage to be interactive Browser Firefox / Chrome / Opera etc. The full piece of software including the page and extra features (menu bar, history, bookmarks) Browser or Rendering Engine The part of the browser responsible for rendering / drawing the received text (http response) into a webpage viewable in the browser. A compari- son of rendering engines is linked in the further reading section. CSS (style sheets) HTML is often referred to as the semantic markup of a document. CSS provides the visual presentation (style) for the document. Browsers in- clude default style sheets. JS (javascript) Javascript provides interactivity and allows for pages to be dynamic (up- dateable) at the client side. Attributes Elements have attributes. What attributes an element has is defined in the specification document for HTML. As an example will have a src attribute which is the image to be displayed. class An element can belong to multiple classes. Class being an attribute de- fined as a HTML specification. This class can be referred to in css rules or in javascript programming. id An id is similar to a class in that it can be referred to ib css and/or js, but has the important distinction that it can only occur once on a page. This is very important in dynamic pages when you want to update a particular paragraph with new text or change an image. block (level element) Some html elements are referred to as block level as they represent an area or block. , , would be examples of these. inline (element) These are elements that are contained inside of a block. , , are some examples. 4.3 Elements What is mark up then? Lets consider the following text: (Please note this is part of a larger tutorial on MDN Subjects of study At the Awesome Science Faculty, we have a pretty open-minded research facility — as long␣ ˓→as the subjects fall somewhere in the realm of science and technology. You seem like an intelligent,␣ (continues on next page) 30 Chapter 4. Web Documents and HTML Web Application Technologies, Release 0.01 (continued from previous page) ˓→ dedicated researcher, and just the kind of person we'd like to have on our team. Saying that, of the ideas you submitted we␣ ˓→were most intrigued by are as follows, in order of priority: Turning H2O into wine, and the health benefits of Resveratrol (C14H12O3.) Measuring the effect on performance of funk bassplayers at temperatures exceeding 30°C␣ ˓→(86°F), when the audience size exponentially increases (effect of 3 × 103 increasing to 3 × 104.) HTML and CSS constructs for representing musical scores. Mark up identifies elements inside the provided text. Some of these will be block level, others inline. In the above: Subjects of Study is a header At the Awesome -> priority is a paragraph Turning H20 -> HTML form a list, with no particular weighting to each item in the list H20, C14, H12, O3 here the numbers are part of chemical compounds and should be subscript 3103 and 3104 contain superscript characters, 10 to power of 3 and 10 to power of 4 HTML and CSS are abbreviations Markup means that these features can be identified. As HTML markup the text is: Listing 1: Marked Up as HTML Subjects of study At the Awesome Science Faculty, we have a pretty open-minded research facility — as␣ ˓→long as the subjects fall somewhere in the realm of science and technology. You seem like an intelligent,␣ ˓→dedicated researcher, and just the kind of person we'd like to have on our team. Saying that, of the ideas you submitted we␣ ˓→were most intrigued by are as follows, in order of priority: Turning H2O into wine, and the health benefits of Resveratrol (C14H12O3.) Measuring the effect on performance of funk bassplayers at temperatures exceeding␣ ˓→30°C (86°F), when the audience size exponentially increases (effect of 3 × 103 increasing to 3␣ ˓→× 104.) HTML, Hypertext Markup Language, and CSS, Cascading␣ ˓→Style Sheets, constructs for representing musical scores. The browser (rendering) engine would then take this text and decide on how it would be presented back to the user based on the presentational (style) rules attached to each element. As can be seen, at the basic level markup corresponds to an element - which will open and close....., with content in between. 4.3. Elements 31 Web Application Technologies, Release 0.01 If this is matched to the HTML 5 specification document - you will start to see more complexity to what elements can have for their definition. Do compare that with it’s definition for HTML 2. This is a great example of seeing how the web and HTML has evolved to meet the requirements expected by Users as they interact with the WWW medium. Revisiting the HTML5 Specification - key features in the specification are highlighted: What will often be seen is: Listing 2: HTML Sample Sample page Listing 3: CSS Sample.my_h1_style { color: blue; } Listing 4: Javascript Sample document.getElementById("section1_header").style.innerHTML = "My new header text"; Modern web development is a combination of all of these; elements to semantically give meaning to content 32 Chapter 4. Web Documents and HTML Web Application Technologies, Release 0.01 style rules to set visual and layout properties across devices javascript via the DOM to dynamically change the content of a page either through remote API calls or Interac- tivity But back to basics - what is the minimum expectation of a well-formed web page. 4.4 A basic example A well constructed webpage requires the following structure. It is a webpage in it’s simplest (valid) format. Listing 5: A Basic HTML Document Sample page Sample page This is a simple sample. Note the elements , , and. While a browser may be forgiving and still render a page out, it should as a minimum contain these elements, and in the order shown above. Future browser versions may insist upon them, especially if the range of documents viewable become wider. indicates to the browser the type of document that is being read. where the html markup begins, and ends information about the document what will be viewable in the browser render window. So in the case above, the text above would display: Sample page This is a sample (page will return a 404 as not present). 4.5 An extended example Listing 6: A Larger HTML Document Sample page This page demonstrates some of the HTML5 structural elements (continues on next page) 4.4. A basic example 33 Web Application Technologies, Release 0.01 (continued from previous page) HTML5 Structural elements HTML5 Elements HTML5 Structural Elements HTML structural elements include: header, nav, section, article, aside, and footer A header defines the header of a page or a page section HTML5 introduced the nav to replace: div id="nav" Element section is a grouped area of content HTML5 Common Elements © no rights what so ever! Just viewing the element names gives a semantic breakdown of the page header content navigational content main section content containing subsection 1 subsection 2 footer Using styles (css) will give control over the presentation and layout (positioning) of these areas. Add some javascript and interactivity can be included. The web quickly becomes a powerful medium. MDN provides an excellent browsable list of HTML elements. Special characters and their associated syntax is best sourced directly from the specification document 34 Chapter 4. Web Documents and HTML Web Application Technologies, Release 0.01 4.6 Browsers and Compatibility The MDN Web Docs provide a full list of elements and their compatibility with different browsers. As an example of the element, a core element available from HTML 1.0 specification, the page on MDN has the following. Browsers and compatibility is still a major issue especially if looking to incorporate newer CSS rules, and/or javascript. If you are building for maximum adoption across multiple devices then running sites through test environments is a key feature of modern builds. 4.7 Validation HTML is a mark up language - well written HTML should be well formed. Well formed-ness means that a document is structured so that it can be parsed unambiguously. While browsers can be forgiving in parsing through a document, if the intention is for the page to be dynamic and interactive - then leaving no ‘room for error’ in how behaviours will be interpreted will make a website more robust. Unambiguous means that elements open and close correctly, and conform to how they are defined in the specification document. In our earlier snippet of HTML code, elements open and close correctly, with nested elements being enclosed inside of their containing elements. If webpages misbehave (behaviour and/or styling), then checking their validity is a good place to start. The web devel- opers tools that are part of modern browsers (see next section) is one starting place. Another is the markup validation service provided by W3C. Here files can be uploaded or HTML copied and pasted directly, which will then be checked and a report passed back. Have a go by copying the earlier provided code and pasting in. Check the report that is provided back. W3C HTML validation service If you are still unsure of the significance of validation, have a read through the following document. Why Validate? 4.6. Browsers and Compatibility 35 Web Application Technologies, Release 0.01 4.8 Web Developer Tools (Browsers) Most modern browsers come with a suite of tools that aid in web development and are useful for checking what the website is doing. They are referred to as web development tools and are usually accessed through the Tools menu on a browser. These tools are useful for inspecting HTML and CSS, as well as providing a javascript console window + a listing of network calls (HTTP Requests and Responses). Below is an example of the HTML inspector using the code snippet from above. As you can see the source window displays the HTML in a tree like structure, and selecting a node will highlight its content in the browser window. If you were to inspect the current page - this is what you would see. It is a treasure trove of useful information for a developer. 36 Chapter 4. Web Documents and HTML Web Application Technologies, Release 0.01 4.9 Future Directions The web has changed considerably over the last 30 years and will no doubt has many changes over the next 30 years. From a developers point of view, building for multiple devices and screen sizes formed a large part since the advent of mobile phones and tablets. Alongside this, building sites that are data driven through back end APIs have aided the separation of front end from back end. Security and data protection are major concerns - with ownership of data leading many business conversations. The big 4, Apple, Google, Facebook and Microsoft want data, and centralise services - but this paradigm might well shift with further EU regulations and Web 3.0 While the mid 90s saw Browser Wars and an attempt to own the Internet with browser specific elements, the new challenges look to be through data and where it resides. 4.10 Worksheet: HTML Elements In this exercise you will review HTML elements and familiarise yourself with the specification documents. You will find the following websites useful for completion of this exercise. Tip: Mozilla Developer Network HTML Elements (from specification document HTML5) HTML Elements (from specification document HTML2) Below are a series of questions - please look to answer. When did html tags become lowercase, and what led to this change? 4.9. Future Directions 37 Web Application Technologies, Release 0.01 Find 2 elements from HTML2 that are not present in HTML5? Why do you think they have been removed? and are elements in HTML5. Audio and Video were present to users on the web prior, how was this managed? Using the HTML 5 specification section for elements, identify what the following elements are used to represent; , , , , 38 Chapter 4. Web Documents and HTML Web Application Technologies, Release 0.01 4.11 Source / Suggested Reading HTML Specification Comparison of Browser Engines Mozilla Developer Network W3C HTML validation service 4.11. Source / Suggested Reading 39 Web Application Technologies, Release 0.01 40 Chapter 4. Web Documents and HTML CHAPTER FIVE STYLING WITH CSS 5.1 A brief history of CSS The previous chapter looked at HTML which is the structure of a web document (DOM). The presentational layer, how this document appears to a user is defined through CSS (Cascading Style Rules). Similarly with HTML, CSS has gone through revisions as the web has evolved. In the early days, web documents appeared in terminal windows as pure text. Now with GUI and graphical browsers, how pages are viewed has changed. In addition screen resolution, devices, accessibility has exponentially increased the variety of ways content is accessed. CSS, like HTML, has evolved to meet these challenges. Table 1: CSS Specification (by Year) CSS Specification Year Additional Notes (ap- prox) CSS1 1996 First proposed in 1994, with the specification being for- malised in 1996. Internet Explorer being the first browser to support CSS2 1998 improved layout control and ability to specify how content would appear on different devices (screen, print, tv) CSS3 (CSS2.1) 2009 You will often hear CSS referred to now as 3 or 2.1, but in re- ality CSS is a yearly snapshot. This is because it now consists of 40+ separate modules that each have their own specifica- tions and timeline A list of specifications and modules is available on W3 It is important to remember that the pages created as just HTML are styled - but that these styles are default styles built into the browser. HTML is just the structure. Really what should be seen is shown below. This uses the example in the earlier chapter. Listing 1: Removing the Browser’s default styling Sample page body * { all: unset; } (continues on next page) 41 Web Application Technologies, Release 0.01 (continued from previous page) As can be seen - the structure still exists in the document. Without style rules, there is no layout, font size, weighting etc. The rule that was added in the above snippet informed the browser to unset all styles for elements inside the body. There is therefore no visual (presentational) distinction between lists, headers, paragraphs. Pictorially, CSS works through the DOM tree. (Image sourced from MDN) 42 Chapter 5. Styling with CSS Web Application Technologies, Release 0.01 5.2 Terminology There is a language to CSS. A non-exhaustive list of key terms used in this section is included below. Table 2: Some Key Terms NAME DEFINITION selector This refers to the element that is being styled. declarations this are the styles themselves and will consist of a property and value (e.g. color, red) modules modules form part of the current specification, for example, there is a module for the box model box-model css represents block level elements as boxes (margin, padding, borders) grid-layout current versions of CSS, allow for grids to be defined which will contain the content. flex This is relatively new to CSS and allows for more intuitive control of how elements are spaced out in rows inline This refers to where the styling is contained directly in the html document. The above example is inline linking (external sheets) This refers to css files being loaded through tag (i.e. a separate css file) selector This refers to the element that is being styled. cascading The word cascading in CSS refers to the fact that a style declaration will cascade down to child definitions for the element until explicitly overwrit- ten 5.2. Terminology 43 Web Application Technologies, Release 0.01 5.3 A basic example In our example we will work with external sheets (separate css files). This is generally considered best practice as it separates presentation from structure, and allows for style sheets to be reused. As a simple HTML file use the following: Listing 2: Basic CSS Example Sample page Web Application Technologies This is my first CSS example With an id selector With a class selector You will need to create a styles.css file. This will be in the same location as the html file. Keep the selectors free from declarations at this point Listing 3: Basic CSS Example (styles.css) h1 { } p { } #selector-id { }.selector-class { } li em { } Viewing the page in the browser will use the default styling as no additional declarations have been made. 44 Chapter 5. Styling with CSS Web Application Technologies, Release 0.01 Now add the following declarations to styles.css Listing 4: Basic CSS Example (styles.css) h1 { color: #1f76cf; } p { background: #00A000; color: #d0d0d0; } #selector-id { font-size: 1.2em; }.selector-class { font-size: 0.8em; } li em { font-weight: bolder; } The output in the browser will be: 5.3. A basic example 45 Web Application Technologies, Release 0.01 5.4 The Box Model - An extended example CSS blocks are described as having a box layout. Boxes have several attributes as can be seen: (image taken from browser developer tool panel) While not apparent from this screenshot - the selected element in the browser window is the full. Knowing this to be the case, this means that by browser is currently sized at 920x1200 pixels and there is NO padding, margin or border on the. Consider the following HTML: Listing 5: Box Layout Example Sample page This is Box 1 This is Box 2 This is Box 3 with the CSS: 46 Chapter 5. Styling with CSS Web Application Technologies, Release 0.01 Listing 6: Basic CSS Box Layout Example (styles.css) body { margin: 0; padding: 0; } div { background:

Web Application Technologies PDF

Document Details

Tags

Related

Summary

Full Transcript