US Private Sector Privacy Chapter 03 PDF

MGT 6727 (Spring Semester 2024) at Georgia Tech Chapter 3 – as of 01/15/2024 © IAPP Chapter 3 Introduction to Technological Aspects of Privacy 3.1 Overview This chapter introduces the technological aspects of privacy protection. 1 An organization’s overall privacy program often is shaped initially by legal requirements and the policy decisions of the organization about which personal data to collect and use. Multiple chapters in this book discuss such laws and policies, including for specific sectors such as health care or financial services. To implement the relevant laws and policies, organizations need to manage the processing of personal data – the next chapter introduces the relevant management concepts. In addition, organizations need to build and operate a technical infrastructure so that the laws, policies, and management objectives of the organization actually are implemented in practice. In addressing the technological aspects of privacy, this chapter seeks to be understandable to the non-technical reader, while providing insights and links that will be helpful both for technical and non-technical readers. This chapter draws extensively on the IAAP book, “An Introduction to Privacy for Technology Professionals,” edited by Travis D. Breaux. We recommend that book (or any update) for those wishing to dig deeper into the technological aspects of privacy. We have drawn from its text, sometimes closely, in writing this chapter. This chapter discusses the basics of the internet, including definition of numerous terms relevant to privacy protection. It next addresses computing architectures, such as client/server and cloud systems. The chapter provides a bit more detail on types of digital surveillance and tracking, including cookies and other types of web tracking. Along with online tracking, smartphones, home assistants, and other technologies are deploying sensors that can track individuals, including location, audio, and video tracking. Because technology creates the possibility of so much tracking, we next turn to the realm of privacy-enhancing technologies, including encryption and the de-identification of data. The chapter concludes with some key components of cybersecurity – even the best privacy policy will not protect data if it is easy for malicious attackers to grab the data. 3.2 Basics of the Internet 3.2.1. The Internet The internet is a network of networks. The precursor of the internet we know today is the ARPAnet, a computer network developed in the 1960’s by the U.S. military, which expanded to scientific research in the 1970’s. 2 Commercial activity did not become substantial on the internet until the early 1990’s. 3 Distant in time and size from its origins, the internet today has the same 1 NOT FOR DISSEMINATION The materials in this course are provided only for the personal use of students in this class in association with this class. MGT 6727 (Spring Semester 2024) at Georgia Tech Chapter 3 – as of 01/15/2024 © IAPP basic architecture as when it was first designed. The open and dynamic nature of the internet enables its speed, functionality and continued growth but—as will be described later in this chapter—also exposes it to information privacy vulnerabilities. The internet uses the internet protocol suite known as TCP/IP to communication among networks and devices. Transmission control protocol (TCP) enables two devices to establish a reliable data connection. Before it transmits data, TCP establishes a connection between a source and its destination, which it ensures remains live until communication begins. It then breaks large amounts of data into smaller packets, while ensuring data integrity is in place throughout the process. 4 The internet protocol (IP) specifies the format of data packets that travel over the internet and also provides the appropriate addressing protocol. An IP address is a unique number assigned to each connected device—it is similar to a phone number because the IP address shows where data should be sent, such as when a website transmits text, videos, and other information to a user’s IP address. The most recent version of the protocol is IPv6, which is in the process of replacing the earlier IPv4 version. 5 To move information from the source to the destination, TCP/IP relies on packet switching. 6 Data in the header of each packet directs the packet to its destination. The payload of the packet – such as text or video -- is extracted upon arrival. To arrive at the destination, different packets in the same communication, such as parts of a video, may travel on a different route – a different set of nodes on the internet. Upon arrival, the receiving device re-assembles the packets in the correct order, essentially putting all the pieces of a video back together correctly. Any packets that do not arrive are retransmitted. 3.2.2 Web Infrastructure Although the terms are often used interchangeably, the “internet” has a broader meaning than the “web” or “World Wide Web.” The internet carries a wide range of information types, including both web traffic and non-web traffic such as electronic mail, IP telephony, file sharing, and many communications for the Internet of Things. The most familiar way of accessing the Internet, however, is through the web. Historically, the web functioned based on two key technologies, both of which have received updates: Hypertext transfer protocol (HTTP) is an application protocol that manages data communications over the internet, and defines how messages are formatted and transmitted over a TCP/IP network for websites. Further, it defines what actions web servers and web browsers take in response to various commands. Hypertext markup language (HTML) is a content-authoring language used to create web pages. Today, HTML is used together with other computer languages including JavaScript, CSS, 2 NOT FOR DISSEMINATION The materials in this course are provided only for the personal use of students in this class in association with this class. MGT 6727 (Spring Semester 2024) at Georgia Tech Chapter 3 – as of 01/15/2024 © IAPP and JSON, the details of which we do not discuss in this chapter. A key task for these languages is to enable the web browser to determine how the content on the page should be rendered. Document tags can be used to format and lay out a web page’s content and to hyperlink—to jump to other web content. Forms, links, pictures and text may all be added with minimal commands. Headings are also embedded into the text and are used by web servers to process commands and return data with each request. 7 Sir Tim Berners-Lee, a British computer scientist working out of the Switzerland-based particle physics laboratory known as CERN, invented HTTP and HTML in the early 1990s. Berners-Lee recognized the inherent limitations of the early internet and advanced the HTML language as a means for research scientists such as himself to dynamically tie documents and files together—a capability he referred to as hyperlinking. 3.2.2.1 Developments in the Technology of the Web Numerous advancements have occurred since the web’s creation. The development of Hyper Text Transfer Protocol Secure (HTTPS) allows the transfer of data from a browser to a website over an encrypted connection. 8 By 2016, the total amount of HTTPS traffic sent over the internet was greater than HTTP traffic. 9 HTML has continually evolved since it was first developed in the 1990s. Today, many browsers support features of HTML5, the fifth and most recent version of the HTML standard. 10 HTML5 has new capabilities and features, such as the ability to run video, audio and animation directly from websites without the need for a plug-in (a piece of software that runs in the browser and renders media such as audio or video). Another feature of HTML5 is an increased ability to store information offline, in web applications that can run when not connected to the internet. 11 Extensible markup language (XML) is another language that facilitates the transport, creation, retrieval and storage of documents. Like HTML, XML uses tags to describe the contents of a web page or file. HTML describes the content of a web page in terms of how it should be displayed. Unlike HTML, XML describes the content of a web page in terms of the data that is being produced, enabling automatic processing of data in large volumes and consequently requiring additional attention to privacy issues. 12 The web browser software is considered a “web client” application in that it is used by the computer or other device (the “client”) to navigate the web and retrieve web content from web servers for viewing. Some web server firewalls also function as a web client. 13 To protect the inner system, the firewall will interact with the inner web proxy as a client and then relay the same request out to the web server. Two of the more common web-browser-level functions are uniform resource locators (URLs) and hyperlinks. A Uniform Resource Locator (URL) is the address of documents and other content that are located on a web server. An example of a URL is https://www.iapp.org. This URL contains (1) 3 NOT FOR DISSEMINATION The materials in this course are provided only for the personal use of students in this class in association with this class. MGT 6727 (Spring Semester 2024) at Georgia Tech Chapter 3 – as of 01/15/2024 © IAPP an HTTPS prefix to indicate its use of the protocol; (2) often, “www” to signify a location on the World Wide Web, (3) a domain name (e.g., “iapp”) and (4) an indicator of the top-level domain. Today there are over 1,500 top-level domains. 14 Some of the most familiar include “com” for a commercial organization, “org” for an organization, “gov” for government, “edu” for an educational institution, or a two-letter country code, such as “uk” for United Kingdom or “jp” for Japan. 15 A URL can also include a “deep link” to a specific page within the domain, such as “news” in https://iapp.org/news, or even to a specific paragraph within a page. URLs are a subset of a larger class of identifiers called Uniform Resource Identifier (URI), which are formatted as URLs but may not include information to locate the resource on a network. You may also encounter a Uniform Resource Name (URN). The distinctions are confusing and not relevant in most usages, but you should be aware of them because you may encounter these terms interchangeably in some documents. A hyperlink is used to connect an end user to other websites, parts of websites, and/or webenabled services. The URL of another site may be embedded in the HTML code of a site so that when a user clicks on the link in the web browser, the end user is transported to the destination website or page. 3.2.3 Key web infrastructure definitions The web is built from a conglomeration of hardware and software technologies that include server computers, client applications (such as browsers, discussed above) and various networking protocols. A web server is a computer that is connected to the internet, hosts web content, and is configured to share that content. Documents that are viewed on the web are actually located on individual web servers and accessed by a browser. A proxy server is an intermediary server that provides a gateway to the web. Employee access to the web often goes through an organization’s proxy server. A proxy server may mask what is happening behind the organization’s firewall, so that an outside website may see only the IP address and other characteristics of the proxy server and not detailed information about which part of an organization is communicating with the outside website. A proxy server generally logs each user interaction, filters out malicious software downloads, and improves performance by caching popular, regularly fetched content. Virtual private networks (VPNs) are similar to proxy servers, widely used in the United States for employee web access but not used as widely by consumers. VPNs can encrypt the information from the user to the organization’s proxy server, potentially masking both the content and the final destination from the internet service provider (ISP). Caching occurs when a server saves a copy of content, reducing the need to download the same content again from the web server. 4 NOT FOR DISSEMINATION The materials in this course are provided only for the personal use of students in this class in association with this class. MGT 6727 (Spring Semester 2024) at Georgia Tech Chapter 3 – as of 01/15/2024 © IAPP A web server log is sometimes automatically created when a visitor requests a web page. Examples of the information automatically logged include the IP address of the visitor, the date and time of the web page request, the URL of the requested file, the URL visited immediately prior to the web page request, and the visitor’s web browser type and computer operating system. Depending on how the web server is configured, it is possible for personal information such as a username and password to appear in web server logs. IP addresses themselves, and thus web server logs containing them, are considered personal information by some regulators but not by others. In generally, passwords should not be logged, but they have been on many occasions, causing privacy and security problems. The following additional terms are essential to understanding the online privacy concepts to be addressed in this chapter. An internet service provider (ISP) connects users and their devices to the internet. Common examples of ISPs are cable or wireless providers. ISPs can dedicate a specific IP address to a specific user or business. This is called a static IP address. ISPs can also assign IP addresses “dynamically” as needed. This can happen on a session-by-session basis, but other ISPs sometimes assign the same dynamic IP address to a particular customer for months or even longer. When an IP address does not change, a website can use the IP address as a way to recognize a device that returns to the site. 16 This persistent link to a device is a basis for the European Union (EU) and some other regulators considering an IP address as personal information, because of the greater likelihood that data can be linked to a specific user. 17 Transport layer security (TLS) is a protocol that ensures privacy between a user and a web server. When a server and client communicate, TLS secures the connection to ensure that no third party can eavesdrop on or corrupt the message. TLS is a successor to secure sockets layer (SSL). 18 3.3 Computing Architectures In order to perform tasks such as a Privacy Impact Assessment and Privacy by Design, privacy professionals often need to understand the key elements of IT architecture. This section explains client-server architecture, cloud computing, and edge computing. It also explains the basics of how emails and text messages work. 3.3.1 Client-server architecture Figure 2-14 portrays common elements of the client-server architecture. [insert Figure 2-14 from 2020 version of Introduction to Privacy for Technology Professionals] Individual consumers are most familiar with the perspective of the client, a piece of computer hardware or software that accesses a service made available by a server in the client-server model of computer architecture. Familiar clients include a desktop, laptop, or mobile phone. The term also applies generally to a computer or program that relies on sending a request to a 5 NOT FOR DISSEMINATION The materials in this course are provided only for the personal use of students in this class in association with this class. MGT 6727 (Spring Semester 2024) at Georgia Tech Chapter 3 – as of 01/15/2024 © IAPP server in order to access a service. A thick client is capable of performing many data processing actions itself, such as a computer or mobile phone that can run many programs even when not connected to the internet. A thin client relies predominantly on processing performed remotely, such as a cloud-based application that the user can only operate when connected to that application. An example of running as a thin client is a computer that only runs a web browser and uses web-based editing tools such as Google Docs or Microsoft’s web-based Office 365. The client often uses a web browser to interact with the server. In addition to a browser, the client may interact using a different protocol, such as the Simple Mail Transfer Protocol (SMTP) used in sending and receiving e-mail. The client communicates with the server, which is the computer process that responds to client requests. For instance, the server on a news website will make a news story available, and the server on an e-commerce site will enable the customer to place an order. These interactions of the client and the server are called the front end of the organization’s computer system. At present, the term “front end” is generally used to describe a web-based interface coded in HTML, CSS and JavaScript. The organization often chooses to have other devices and software operate significantly separate from the web server. These other devices and software are the back end, including databases and other computing activity not essential for operating the server. Separation of the front end and back end can help protect privacy and security. Suppose, for instance, that a hacker is able to put malicious code on the server, on the front end. If the front end and back end interact only in limited ways, the organization may be able to carefully monitor those interactions, preventing the malicious code from infecting the back end. A database in the back end thus becomes better protected, reducing the likelihood of a data breach of sensitive personal information. 3.3.2 Cloud computing Organizations have been shifting a growing fraction of their data processing to cloud computing, defined as on-demand availability of computing resources. Multiple reasons have driven the shift away from computing on resources owned and managed by the organization itself, sometimes called on-premises computing. These reasons include cost savings, the ability to scale without costly capital investment, and increased demand for anytime, anywhere remote access. 19 For many organizations, there are cybersecurity and privacy reasons to select cloud computing. 20 Cloud architectures tend to be homogeneous and easier to manage, in contrast to legacy onpremises systems. For smaller organizations, it can be challenging to hire top IT talent in-house, while cloud providers can have dedicated in-house staff for security and privacy. Each organization must analyze its particular situation, but the scale of a cloud provider can often assist in meeting the complexity of security and privacy compliance. Although vendors and experts differ on precisely what is covered by each category, 21 there are three commonly-used categories of cloud computing. 6 NOT FOR DISSEMINATION The materials in this course are provided only for the personal use of students in this class in association with this class. MGT 6727 (Spring Semester 2024) at Georgia Tech Chapter 3 – as of 01/15/2024 © IAPP Software as a Service (SAAS). SaaS uses the internet to deliver applications, which are managed by a third-party vendor, to its users. Many SaaS applications run directly through a web browser, which means they do not require any downloads or installations on the client side. Platform as a Service (PAAS). PAAS provides cloud components to certain software while being used mainly for applications. PaaS delivers a framework for developers to build upon and use to create customized applications. Infrastructure as a Service (IAAS). IaaS is self-service for the user, for accessing and monitoring computers, networking, storage, and other services. Although the infrastructure is provided by the third party, such as physical control over data storage, the company buying the service retains complete control over what is done with the infrastructure, including the management of databases of personal information. 3.2.3 Edge computing. Edge computing is a distributed IT architecture where data is processed at the periphery of the network, often as close to the originating source as possible. 22 One driver for edge computing is the Internet of Things, discussed in Section 3.4. The IoT means that the number of Internetconnected sensors is growing exponentially, raising the costs of having all data processed through a centralized data center. Processing data close to the origin reduces the cost of connecting to a centralized data center. By keeping data closer to the edge, it also reduces latency, or the delay in communicating over a network. 3.2.4 How Emails and Texts Work Privacy professionals should be aware of key technologies used for emails and texts. Here we highlight the various protocols used to implement these communications. For sending emails, the most common protocol is Simple Mail Transport Protocol (SMTP). 23 For receiving emails, Internet Message Access Protocol (IMAP) has become more common over time, with Post Office Protocol (POP or POP3 referring to the latest version) becoming less used. IMAP typically leaves messages on the server, enabling synchronization of multiple devices, while POP typically deletes the mail on the server. 24 The key difference is that IMAP allows management of multiple mailboxes and executing search commands on the server. Commercial email services such as Gmail and Outlook support all of these protocols, but with different degrees of fidelity. Privacy professionals should be aware of the CAN-SPAM Act, discussed in Chapter 11, which requires commercial email marketing to honor customer opt-outs. Text messages became pervasive using SMS – the short message service that uses the Short Message Peer to Peer protocol. 25 SMS is “short” because the maximum size of text messages is 160 characters – longer messages need to be split into multiple short messages. SMS can operate through cell service, without the need for an Internet connection. 26 More recently, many text messages have shifted to what are called over-the-top (OTT) services, which are services that stream content using the Internet. 27 (The service is delivered “over the top” of another service, 7 NOT FOR DISSEMINATION The materials in this course are provided only for the personal use of students in this class in association with this class. MGT 6727 (Spring Semester 2024) at Georgia Tech Chapter 3 – as of 01/15/2024 © IAPP such as cell or cable service). Many users now use OTT messaging services, such as Apple’s iChat,Signal, Telegram, or WhatsApp. These OTT services do not have the limitations of SMS and can provide for end-to-end encryption. Privacy professionals should be aware of laws such as the Telephone Consumer Protection Act, discussed in Chapter 11, which regulates how business can use text messages for marketing purposes. 3.4 Digital Surveillance and Tracking The smartphone in your pocket likely has greater processing power than the world’s largest mainframe computers in the 1970’s. In computer science, the rapid progress in computing is often symbolized by “Moore’s Law,” the statement in the 1980’s by computer scientist Gordon Moore that the capacity of a digital transistor doubles roughly every two years. 28 More generally, it is roughly correct to say that the power and speed of computers has doubled every two years going back multiple decades. As computers improve, they can handle far larger databases of personal information, process the data more quickly in unprecedented ways, and thus pose new threats to personal privacy. With faster and cheaper computers, it becomes economical to place new Internet-connected sensors into more and more devices, a phenomenon known as the Internet of Things. Your smartphone now takes incredibly detailed pictures measured in mega-pixels, while many homes and businesses increasingly create and store video footage in quantity and quality that would have been impossible or unaffordable in the recent past. These advances in computing technology thus create the possibility, and often the reality, of new types of digital surveillance and tracking. The discussion here first explains key aspects of tracking over the Internet, including for online advertising purposes. It then turns to the variety of sensors that increasingly collect personal information. The discussion here relies in many places on the chapter on “Tracking and Surveillance” by Cranor, Ur, and Sleeper in the IAPP text “An Introduction to Privacy for Technology Professionals.” 3.4.1 Internet monitoring The Internet provides many opportunities for tracking and surveillance. These include protective measures, such as detecting malicious software, and criminal measures, to steal passwords and account information. This section describes deep-packet inspection, Wi-Fi and other wireless eavesdropping, internet monitoring by those in control of systems, such as employers, and spyware and other phishing attacks. Each section explains the techniques for monitoring as well as measures that can address privacy concerns. 3.4.1.1 Deep packet inspection As discussed in Section 3.2 above, communications from sender to receiver on the Internet are split into packets. Only the header of the packet is needed to route the packet to the correct IP address. Those administering a node on the Internet, however, can also examine some or all of 8 NOT FOR DISSEMINATION The materials in this course are provided only for the personal use of students in this class in association with this class. MGT 6727 (Spring Semester 2024) at Georgia Tech Chapter 3 – as of 01/15/2024 © IAPP the packet for a variety of purposes. When a node looks at this additional data, it is called deep packet inspection. Deep packet inspection can serve useful purposes. For instance, examining packets before they enter a company network can help determine whether the packets contain known viruses or other malicious content. Similarly, examining packets before they leave a network can help detect and prevent data leaks of sensitive information. On the other hand, deep packet inspection can also be used for purposes that raise serious privacy issues. A company might use this inspection to track all of a user’s online behavior, such as to target ads. A government might use it to censor or track citizens’ online behaviors, as has occurred in China as part of the “Great Firewall.” In the early years of the Internet, deep packet inspection was technically possible for most communications. When communications are effectively encrypted, however, deep packet inspection no longer can see the contents of the communication. Major email providers shifted to encrypted emails after the Snowden disclosures in 2013. The email content is protected from deep packet inspection as it moves over the network, but it can still be reviewed by the email provider on the mail server, which is used to allow for spam filtering . For web browsing, recent years have seen rapid growth in the use of HTTPS, the encrypted version of the HTTP protocol. Going forward, deep packet inspection will only operate on the diminished fraction of Internet traffic that fails to use encryption, or where encryption is broken such as by use of stolen keys. 3.4.1.2 Wireless eavesdropping In the absence of effective encryption, it is possible to eavesdrop on data sent through a wireless network, including for Wi-Fi networks. Packet-sniffing systems can capture packets sent over such networks. This risk is often present in Wi-Fi hotspots in public places, such as hotels or coffee shops, where many users share a network that is either unprotected or protected with a password known to a large group of users. Several defenses can address this risk. Using any password to access a modern Wi-Fi encrypted network forces the network to negotiate a per-user encryption key, which makes it nearly impossible for an eavesdropper to gain access to the contents of wireless communications. The communications can still be monitored by the wireless router, however. Virtual private networks encrypt communications from a user’s device to the VPN server, such as where an employer deploys a VPN for its employees, providing a much higher degree of protection against not just the wireless access point, but also the local ISP. This is especially important when traveling internationally. In addition, regardless of the security of the network itself, encrypting web requests using HTTPS can prevent eavesdroppers from intercepting passwords and other sensitive information, although a system can be misconfigured so that sensitive information leaks out even when HTTPS is being used. 3.4.1.3 Internet Monitoring by Employers, Schools, and Parents 9 NOT FOR DISSEMINATION The materials in this course are provided only for the personal use of students in this class in association with this class.

US Private Sector Privacy Chapter 03 PDF

Document Details

Tags

Related

Summary

Full Transcript

Upgrade to continue