Software Installation and Package Management PDF
Document Details
Uploaded by EasygoingLimit
University of Hail
Mohammad Zunnun Khan
Tags
Summary
This document provides an overview of software installation and package management, discussing the general principles, different software types, file system layout, and the different steps involved in OS installation. It also covers operating systems, and software used in real-world situations like web servers.
Full Transcript
Mohammad Zunnun Khan Having covered the details of file systems and storage devices in the previous chapter, we can now focus on installing an operating system and adding software according to the intended purpose of the system. Most people never have to actually perform...
Mohammad Zunnun Khan Having covered the details of file systems and storage devices in the previous chapter, we can now focus on installing an operating system and adding software according to the intended purpose of the system. Most people never have to actually perform an OS installation, but System Administrators are not most people. We routinely set up new systems, create new services, and need to be able to fine-tune the software from the very lowest layers, defining exactly what software is installed, what drivers the kernel needs to support, what modules it should load, and what services should be started at boot time. In addition, even within a single host, some subsystems that control components on a lower layer than even the OS kernel, such as a RAID controller or a Fibre Channel HBA, are driven by their own firmware, accessed by the OS via its kernel drivers and exposed to the administrative users via additional software tools. All of these types of software have specific requirements, yet all of them also have in common the general mechanisms to install, update, maintain and remove the software. In this chapter we will discuss in some detail the general principles underlying all software installations, beginning with a distinction of the different types of software we commonly encounter, most notably the two main categories of OS components and add-on or third-party software. We will look at how an operating system is installed on a server and what installation mechanisms are commonly used. Since different operating systems install different components into different locations, we will also take a close look at the file system layout and explain some of the reasons behind the common hierarchy found on most Unix systems. In its most general definition, “software” is just another term for a program telling a computer to perform a certain task. Practically, however, there are a number of deferent types of software. In this section, we will attempt to categorize these types, even though the distinctions or differentiating factors are far from clear-cut. In the previous chapter, we have already identified a specific component that happens to be implemented in software: the file system. Instinctively, we categorize the file system as being in a lower layer than certain applications, such as a web server for example. But the file system is only one component of the operating system, which in turn comprises regular application software (such as a text editor or a compiler), software libraries used by many of these applications (such as the resolver library, used to turn hostnames into IP addresses), device drivers providing access to and control of certain hardware components, and the most central component of the OS itself, the kernel. Looking at software from this perspective quickly makes it clear that the term “operating system” itself requires some definition, as different providers include different components under this umbrella term. Recall from our discussion of the Unix history that the term “Linux”, for example, may refer to the linux kernel as well as any number of “distributions”, each bundling a variety of components to make up a version of the GNU/Linux operating system. But before we attempt to tackle the question of what, exactly, defines an operating system, let us take a step back and attempt to better categorize software based on its proximity to the hardware or the end user. One of the most common systems a system administrator may be in charge of is probably a generic HTTP server, offering a web service of some sort. Such a service nowadays consists of a perhaps surprising number of components and software dependencies running – in our example here, anyway – on virtual hardware within the AWS cloud service. From a ”top down” perspective, it might: require an HTTP server – the actual daemon handling the network connections on port 80 and 443 require e.g. PHP, Perl, Python, Ruby, or Java – entirely static pages are seldom found on the internet; instead, most web servers provide dynamic content via integration with other programming languages and modules which use generic library functions – the web server and the programming languages or frameworks are utilizing the standard libraries available on the system which make various system calls which the kernel handles for the OS which is running in a virtual machine – in the case of AWS, the OS our server is running on does not actually sit on top of the hardware itself which is running on top of a hypervisor – the host system managing the hardware for the different guest OS, such as for example the Xen hypervisor which uses firmware to manage various hardware components which, finally, is running on some hardware in a data center somewhere Going down this stack, we can already identify a number of different interacting and overlapping categories of software. As such a Unix system boots up, it goes through a number of phases, each of which handled by a different type of software. The typical boot sequence begins at a very low level close to the hardware and with each step adds another layer of abstraction, bringing the system closer to interaction with the user. In order not to complicate things more than absolutely necessary, we will skip over the layers involving the hypervisor in the following. Pretending that the OS runs on actual, not virtual hardware, the boot process might then include these steps: Power-On Self Test (POST) – a few very basic routines intended to insure the hardware is not obviously faulty; Figure 5.1a shows a typical POST screen execution of the primary boot loader – a simple program stored in readonly memory; examples include the BIOS (as shown in Figure 5.1b), UEFI, and OpenBoot/Open Firmware access of the mbr – a special boot sector found on the primary boot device; the code found in this location allows the system to access the file system(s) and transfer control to a second-stage boot loader execution of a secondary or second-stage boot loader – a small program allowing the user to interactively choose an operating systems or kernel to boot; examples include GRUB (Figure 5.2a) or boot(8) (Figure 5.2b) loading of the hypervisor and starting of the privileged domain (dom0) initialization of the virtual hardware by the domU for the guest OS (domU) loading of the guest OS kernel – booting, initializing (virtual) hardware, loading modules init(8) – one of the few processes created explicitly by the kernel, init(8) spawns all other processes of the running OS and boostraps the final system starting of system services – started by init(8), commonly via a number of shell scripts using e.g. the rc(8) or /etc/init.d frameworks the interactive system runs, presenting a login prompt; the web server accepts network connections and begins trac Most Unix systems display diagnostic messages from the boot process on the system console and may retain a copy on the file system under e.g. /var/run/dmesg.boot. The dmesg(8) command can be used to display these messages. On Amazon’s EC2 systems, you can use the aws EC2 get-console-output command to retrieve the information displayed on the (virtual) console. Listing 5.1 and 5.2 show the output on the serial console displayed during the boot process of a NetBSD Amazon EC2 instance showing information from the hypervisor and the virtualized guest OS respectively. Our attempt to draw the software stack as described in the previous section is illustrated in Figure 5.3. As we review these layers, it quickly becomes obvious that unfortunately the distinctions are not as well-defined as we would like them to be. It has become increasingly dicult to clearly identify any given piece of software as being an “add-on package”, as OS providers include more and more software in their distributions in order to make their product suitable for the widest variety of uses. Many Linux distributions, for example, include the canonical examples cited above (web browsers, database management systems, web servers, etc.), as they target both the server and desktop markets, while the BSD derived systems tend to favor a leaner setup, explicitly targeting the server market. In Section 4.6, we briefly summarized the hierarchical tree-like structure of the Unix file system layout and noted that the purpose or common use of the different subdirectories as described on many Unix versions in the hier(7) manual page. Unfortunately, this layout is not standardized and different versions of Unix have, for historical reasons as well as purely by coincidence or developers’ preferences, diverged. But almost all Unix versions do still share a number of common directories, especially as far as the base OS is concerned. In the early days of Unix, disk space was still expensive, and it was not uncommon to split the files of the operating system across multiple devices. As the system boots up, it obviously requires a number of files to be present and services to run before it can mount any other partitions or disks. For this reason, Unix systems used to have a small root partition (mounted at /, pronounced “slash”), sucient to boot the OS, with additional software stored on a separate disk, commonly mounted under /usr. / contained all the system binaries (found in /bin) and libraries (found in /lib) to perform System administrators often maintain large numbers of hosts, and so it comes as no surprise that new machines – both physical and virtual – have to be brought up and integrated into the infrastructure on a regular basis. The more systems are maintained and created, the more this process needs to be automated, but at the end of the day, each one follows the same set of common steps. In an ideal world, these steps could be summarized as requiring the installation of a base operating system with subsequent adjustments for the specific task at hand. As we will see in this section, each step depends on a large number of variables and site-specific customizations, and so most large-scale organizations have written their own automation framework around a few common tools to accomplish the goal of quick and scalable system and service deployment. In fact, the topic of automated deployment tools is too large to adequately cover here (though we will revisit it in later chapters) and we shall instead focus on the essential concepts needed to understand the requirements for such systems. Installing a new system is a process unique to each OS; installation methods range from unattended deployment systems using information deter mined at runtime to create the correct configuration to interactive graphical installers (such as seen in Figure 5.4) allowing users to select amongst common options to create a general purpose Unix server. Before a system is installed, a number of important choices have to be made: ◦ What file system will be used? ◦ What are the requirements of the final system with respect to file I/O? ◦ How many partitions will be created? ◦ What OS will be installed? ◦ What add-on software? ◦ What is the final purpose of the machine? Many of these questions are interdependent and answering one may restrict possible answers to other questions. For example, if the purpose of the server is to run a specific piece of software that is only available for a given OS, then this might also influence a very specific partitioning schema or other file system considerations. In fact, the final purpose of the machine and the software it needs to run will likely dictate your hardware choices – processor architecture, amount of memory and disk space, number or types of network interfaces – which in turn restrict your OS choices. On a high level, the OS installation process itself consists of a few distinct phases: ◦ Hardware identification, provisioning and registration. ◦ Before a new system can be installed, suitable hardware needs to be identified, physically installed, and registered with the inventory system. ◦ Depending on the size of your organization, this may be a manual step performed immediately before the OS installation is started, or it might be done continously in a data center where hardware is racked, asset tags scanned and information automatically entered into a pool of “ready to install” systems. Base OS installation. ◦ The installation of the software itself. ◦ Even though any reasonable deployment system combines these steps, we distinguish between the basic OS and additional software added (or unused components removed), to more clearly illustrate at which point in the process customization to our needs tends to happen. Installation of add-on applications. ◦ In this step, we transform the generic OS into a server with a specific purpose. ◦ The installation of the right add-on applications may entail fetching software updates over the network from a remote system or even building binaries from sources on the fly. Initial minimum system configuration. ◦ After all software has been installed, a number of very basic configuration steps have to be performed. ◦ This phase may include setting a hostname, applying a specific network configuration, adding user accounts or enabling certain system daemons. ◦ Note that in most cases a configuration management system is installed, which will perform this (and then ongoing) customization. System registration. ◦ When all software is installed and configured and the system is ready to run and fulfill its purpose, we need to integrate it into our larger infrastructure. ◦ Our inventory of which systems perform which function needs to be updated, our monitoring system needs to be made aware of this host, other components with which this server interacts may need to be updated etc. ◦ It is important to consider this “paperwork” to be part of installation, lest it be deprioritized or forgotten. System restart. ◦ Finally, the system needs to be rebooted at least once before it is put into service. ◦ This ensures that it can boot on the right media, all daemons start up, the network is functional, the system can be monitored and in general everything behaves exactly as it should. ◦ It is important to always include this step: during the installation process the system is in a very different state than under normal circumstances, and it is possible to forget to enable or disable a service. ◦ Placing the system into production use without a fresh reboot might lead to unexepcted results when the system is rebooted at a later point. The list of phases outlined in the previous section can be further refined. It is clear that a number of these steps are likely (and necessarily) dependent on and intricately intertwined with a number of your organization’s infrastructure components, such as an inventory system or a database keeping track of configuration types or service roles, for example. I This integration into the larger infrastructure ecosystem of a company tends to be complicated, which is why most of these organizations end up writing their own custom software for this purpose. However, eventually all of these systems, be they custom, proprietary, public, open source, developed by the OS provider or by a third party – all of them have to perform the same basic steps to actually get the OS onto the disks. Most installers hide the details of these steps from the ordinary end-user, but, hey, we’re system administrators – let’s take a look under the hood! This entails a decision of which media to boot from. Manual installations tend to be performed by booting from a CD/DVD, while automated, unattended installs have for years relied on booting the system from the network using the Preboot eXecution Environment (PXE), which utilizes a combination of protocols such as DHCP, TFTP, NFS and/or memory-based filesystems to load a small version of the OS – such as, for example, the Linux initial ramdisk (initrd(4)) – which then performs all the steps to actually install the final system. Once booted, the install process needs to identify all available disks. This requires the “miniroot” to include or be able to load any drivers required to access the storage devices in question. Before the system can be installed, the disks identified in the previous step need to be partitioned. This includes both the creation of a suitable BIOS partition table (via e.g. fdisk(8) and/or the OS- specific disk label such as disklabel(8)). At this step, we also need to decide on which of the available disks will become the target root device, i.e. which will contain the root file system, and where the other disks and partitions will be mounted later on. Each partition defined in the previous step needs to be formatted for its respective file system. It is important to pass the correct flags to mkfs(8)/newfs(8) since, as we discussed in Section 4.7, most file system features cannot (easily) be tuned at run time. At some point after the disks have been partitioned and before the host is rebooted for the first time, it needs to be made bootable. The details depend again on the OS in question, but generally involve the installation of the disk bootstrap software in the MBR and the configuration of the first and second stage boot loader, for example via installboot(8) or grub(1). Finally we have reached the point where the actual OS is installed. This generally requires the retrieval of the system’s base packages or archive files from a CD or DVD, via NFS from another server or (perhaps via FTP) from a remote host (an approach which brings with it a number of security implications – how do you verify source authenticity and integrity? – and in which case initial network configuration of the miniroot system is a pre-requisite). After the data files have been retrieved, they are extracted or the necessary packages installed into the target root device. Many interactive installers allow the user to select different sets of software to install and/or will automatically identify additional packages as required. After the base OS has been installed, any optional software may be added, based on either the system’s configuration or interactive user input. Depending on the installer, this step may be combined with the previous step. We explicitly note this as a separate step, since “add-on” software here may not only include optional packages from our OS vendor, but also your own software, such as your configuration management system, third-party applications licensed to your organization, or your software product or serving stack. As we mentioned before, any server requires ongoing maintenance that, in large part, is performed by automated systems performing configuration management. But regardless of whether or not such a (complex) system is integrated into the OS installation process, there are a few things that need to be defined at install time. This usually includes a host’s network configuration and hostname, timezone, NTP and syslog servers, root password, and services started at boot time, to name just a few examples. Finally, after the system has been installed and basic configuration performed, it needs to be rebooted. At this time, the host boots the way it would under normal circumstances and enters a “first boot”. This stage in a host’s deployment process tends to be significantly different from a normal boot: a number of services are run for the first time, and further initial system configuration and system registration (as noted above) is likely to take place. Since this initial boot sequence may install software upgrades and change the runtime configuration, it is advisable to reboot the host another time after this “first boot”. This ensures that the system does in fact come up in precisely the same state as it would in the future. Even though software installation methods across different Unix versions can generally be achieved using the same set of manual commands, most third-party software is added using the operating system’s native package management tools to ensure a tight integration with the OS as well as a consistent and predictable file system hierarchy. As we discussed earlier, almost all Unix systems have a distinction between what they consider to be part of the “base OS” and what is considered an “add-on” application. For some versions, such as the BSD derived systems, this distinction is clear and extends to the way that the software is installed and maintained: the core operating system is provided in software archives or built from source, while all other software is installed through the use of a set of specific, native tools managing a system-wide package inventory. Other Unix versions take a different approach: in order to better express software dependencies on the OS and its capabilities, as well as to make it easier for system administrators to manage all software, including OS or core-component upgrades using the same set of tools, they break all software into individual packages. This provides the benefit of a more consistent user experience in controlling software across all layers of the stack, but it is easy to lose the distinction of which parts are essential components of the operating system, and which are optional. As many vendors strive to provide an OS suitable for many different purposes, including as a general Unix workstation, a desktop system, or a web server, more and more software is included in a given OS release – a lot of it unnecessarily so. Regardless of which package management system is provided by the OS vendor, across all Unix versions system administrators may at times choose to install software “by hand”, that is, by manually downloading and possibly building the software before copying it into the desired location. With a plethora of available open source software, it is the rule more than the norm that an application you wish to install is available only in source form, not as a binary package. The sequence of commands to install most such software (./configure; make; make install) has become a ubiquitous part of a system administrator’s life, but unfortunately this approach does not scale Nevertheless, at times there are good reasons to choose a manual installation method, if only for evaluation purposes of the software or perhaps as a first step prior to deployment on a larger scale. These include: Availability. There are tens of thousands of applications and libraries available, but not all software providers package their software. In other words, you may simply have no choice but to (initially) build the software from source by hand. Nowadays a lot of software developers target almost exclusively specific Linux versions for their releases, so that even if a package may exist for this platform, you still have to build your own if you are running any other system. Customization and Optimization. By using pre-compiled binary packages, you give up a certain amount of control over the installed software. When building from source, you have the option to specify configuration options (such as including support for certain features normally disabled) or to create executables that are specifically optimized for your target platform. Adding Support for your OS. Sometimes software provided to you will not work correctly, not work at all, or not even build on your specific OS7. With access to the source code, you can – often with minimal effort – port the software to your platform. Adding/removing features. Binary packages are built to target the largest user base possible. As a result, they tend to include built-in support for all possible features, including those that you may not have any use for or that may actually be undesirable in your environment. Along the same lines, you may be able to fix or eliminate security vulnerabilities that have been discovered, but not yet included in an ocial package. Use of latest version. Sometimes you may be interested in using the latest release of a given piece of software, for example to take advantage of a new feature or a bug fix. If binary packages are not yet provided for this version, then building the software yourself may yet be an option. Security. If you download and install binaries from the internet, you are inherently trusting the sites from which you retrieve the executables as well as the transport mechanisms by which you did retrieve them. You are implicitly trusting the build system and the access restrictions of the software provider.