001-2023-1016_DLBCSICS01_Course_Book.pdf

INTRODUCTION TO COMPUTER SCIENCE DLBCSICS01 INTRODUCTION TO COMPUTER SCIENCE MASTHEAD Publisher: IU Internationale Hochschule GmbH IU International University of Applied Sciences Juri-Gagarin-Ring 152 D-99084 Erfurt Mailing address: Albert-Proeller-Straße 15-19 D-86675 Buchdorf [email protected] www.iu.de DLBCSICS01 Version No.: 001-2023-1016 Stephen Weese © 2023 IU Internationale Hochschule GmbH This course book is protected by copyright. All rights reserved. This course book may not be reproduced and/or electronically edited, duplicated, or dis- tributed in any kind of form without written permission by the IU Internationale Hoch- schule GmbH (hereinafter referred to as IU). The authors/publishers have identified the authors and sources of all graphics to the best of their abilities. However, if any erroneous information has been provided, please notify us accordingly. 2 TABLE OF CONTENTS INTRODUCTION TO COMPUTER SCIENCE Introduction Signposts Throughout the Course Book............................................. 6 Basic Reading.................................................................... 7 Further Reading.................................................................. 8 Learning Objectives.............................................................. 10 Unit 1 Basic Concepts of Data Processing 11 1.1 Data, Information, and Messages.............................................. 12 1.2 Software, Firmware, and Hardware............................................ 18 1.3 Languages, Syntax, and Semantics............................................ 21 1.4 Historical Overview of Computers............................................. 23 Unit 2 Information Representation 29 2.1 Number Representation: Formats............................................. 31 2.2 Representation of Non-Numerical Information.................................. 36 2.3 Data Types.................................................................. 40 2.4 Redundancy and Error Tolerance.............................................. 43 Unit 3 Algorithms and Data Structures 49 3.1 Algorithms and Flowcharts................................................... 51 3.2 Simple Data Structures....................................................... 53 3.3 Searching and Sorting........................................................ 60 3.4 Quality of Algorithms......................................................... 65 Unit 4 Propositional Logic, Boolean Algebra and Circuit Design 71 4.1 Propositions and Logical Conclusions.......................................... 73 4.2 Conjunctive and Disjunctive Normal Form...................................... 77 4.3 Digital Circuit Design......................................................... 80 3 Unit 5 Hardware and Computer Architectures 87 5.1 Computer Types and their Architecture........................................ 90 5.2 Processors and Memory...................................................... 92 5.3 Input and Output............................................................ 95 5.4 Interfaces and Drivers........................................................ 96 5.5 High-Performance Computing............................................... 100 Unit 6 Networks and the Internet 105 6.1 Wired and Wireless Networks and Topologies.................................. 106 6.2 The OSI Model and TCP/IP................................................... 114 6.3 Internet Structure and Services.............................................. 119 6.4 The Internet of Things....................................................... 124 Unit 7 Software 129 7.1 BIOS and Operating Systems................................................. 130 7.2 Application Software and Information Systems................................ 135 7.3 Applications................................................................ 137 7.4 Embedded Systems......................................................... 141 7.5 Software Development...................................................... 145 Unit 8 Computer Science as a Discipline 151 8.1 The Role and Subdisciplines of Computer Science............................. 152 8.2 Artificial Intelligence, Data Science, and Computer Science..................... 155 8.3 Ethical Aspects of Computer Science......................................... 158 8.4 The ACM Code of Ethics and Professional Conduct............................. 160 Appendix List of References............................................................... 166 List of Tables and Figures........................................................ 171 4 INTRODUCTION WELCOME SIGNPOSTS THROUGHOUT THE COURSE BOOK This course book contains the core content for this course. Additional learning materials can be found on the learning platform, but this course book should form the basis for your learning. The content of this course book is divided into units, which are divided further into sec- tions. Each section contains only one new key concept to allow you to quickly and effi- ciently add new learning material to your existing knowledge. At the end of each section of the digital course book, you will find self-check questions. These questions are designed to help you check whether you have understood the con- cepts in each section. For all modules with a final exam, you must complete the knowledge tests on the learning platform. You will pass the knowledge test for each unit when you answer at least 80% of the questions correctly. When you have passed the knowledge tests for all the units, the course is considered fin- ished and you will be able to register for the final assessment. Please ensure that you com- plete the evaluation prior to registering for the assessment. Good luck! 6 BASIC READING Dale, N., & Lewis, J. (2020). Computer science illuminated (7th ed.). Jones & Bartlett Learn- ing. http://search.ebscohost.com.pxz.iubh.de:8080/login.aspx?direct=true&db=cat051 14a&AN=ihb.48072&site=eds-live&scope=site Downey, A. B., & Mayfield, C. (2019). Think Java: how to think like a computer scientist (Sec- ond edition). O’Reilly. http://search.ebscohost.com.pxz.iubh.de:8080/login.aspx?direc t=true&db=cat05114a&AN=ihb.48073&site=eds-live&scope=site Filho, W. F. (2020). Computer science distilled: Learn the art of solving computational prob- lems. Code Energy LLC. (Available on the Internet) 7 FURTHER READING UNIT 1 Balakrishnan, H., Terman, C., & Verghese, G. (2012). Bits, signals, and packets: An introduc- tion to digital communications and networks [Course notes]. MIT Open Courseware. htt p://search.ebscohost.com.pxz.iubh.de:8080/login.aspx?direct=true&db=edsbas&AN=e dsbas.F2B3E5D&site=eds-live&scope=site Kapko, M. (2015, October 7). History of Apple and Microsoft: 4 decades of peaks and valleys. CIO. (Available on the Internet) Rojas, R., & Hashagen, U. (2000). The first computers: History and architectures. MIT Press. h ttp://search.ebscohost.com.pxz.iubh.de:8080/login.aspx?direct=true&db=cat05114a& AN=ihb.47516&site=eds-live&scope=site UNIT 2 Henderson, H. (2017). Data types. In Encyclopedia of computer science and technology. Facts On File. (Available on the Internet) UNIT 3 Henderson, H. (2017). Data structures. In Encyclopedia of computer science and technology. Facts On File. (Available on the Internet) Mehlhorn, K., Sanders, P., Dietzfelbinger, M., & Dementiev, R. (2019). Sequential and paral- lel algorithms and data structures: The basic toolbox. Springer. http://search.ebscohost.com.pxz.iubh.de:8080/login.aspx?direct=true&db=edshbz&AN=edshbz.DE.605.HBZ01.036392268&site=eds-live&scope=site Nakov, S. (2013). Fundamentals of computer programming with C#: The Bulgarian C# pro- gramming book. Faber. Chapter 19 (Available on the Internet) UNIT 4 Harris, D. M., & Harris, S. L. (2007). Digital design and computer architecture: From gates to processors. Elsevier. http://search.ebscohost.com.pxz.iubh.de:8080/login.aspx?direct= true&db=cat05114a&AN=ihb.47513&site=eds-live&scope=site Rautenberg, W. (2010). A concise introduction to mathematical logic (3rd ed.). Springer. http ://search.ebscohost.com.pxz.iubh.de:8080/login.aspx?direct=true&db=cat05114a&AN =ihb.47515&site=eds-live&scope=site 8 Read, C. (1909). Logic: Deductive and inductive. Alexander Moring Limited. http://search.eb scohost.com.pxz.iubh.de:8080/login.aspx?direct=true&db=edsbas&AN=edsbas.2105A AAC&site=eds-live&scope=site UNIT 5 Henderson, H. (2009). Neumann, John von (b. 1903—d. 1957). In Encyclopedia of computer science and technology. Facts On File. (Available on the Internet) Upton, E., Duntemann, J., Roberts, R., Mamtora, T., & Everard, B. (2016). Learning computer architecture with Raspberry Pi. Wiley. http://search.ebscohost.com.pxz.iubh.de:8080/l ogin.aspx?direct=true&db=cat05114a&AN=ihb.47518&site=eds-live&scope=site Viswanath, D. (2017). Scientific programming and computer architecture. The MIT Press. htt p://search.ebscohost.com.pxz.iubh.de:8080/login.aspx?direct=true&db=edsors&AN=e dsors.e7a2504f.08b9.4c44.b9f9.5cc42096d21c&site=eds-live&scope=site UNIT 6 Bonaventure, O. (2012). Computer Networking : Principles, Protocols and Practice. http://s earch.ebscohost.com.pxz.iubh.de:8080/login.aspx?direct=true&db=edsotl&AN=edsotl.OTLid0000352&site=eds-live&scope=site Kurose, J. F., & Ross, K. W. (2017). Computer networking : a top-down approach (seventh edition, global edition). Pearson. http://search.ebscohost.com.pxz.iubh.de:8080/login.aspx?direct=true&db=cat05114a&AN=ihb.47514&site=eds-live&scope=site UNIT 7 Arpaci-Dusseau, R. H., & Arpaci-Dusseau, A. C. (2016). Operating Systems: Three Easy Pieces. (Available on the Internet) Tanenbaum, A. S., & Bos, H. (2015). Modern operating systems (4th ed.). Pearson. http://sea rch.ebscohost.com.pxz.iubh.de:8080/login.aspx?direct=true&db=cat05114a&AN=ihb.4 7517&site=eds-live&scope=site Salihun, D. M. (2007). BIOS disassembly ninjutsu uncovered (1st ed.). Unpublished manu- script. (Available on the Internet) UNIT 8 North, M. (2012). Data mining for the masses. Global Text Project. (Available on the Inter- net) 9 LEARNING OBJECTIVES Computers were initially created to assist humanity in the calculation of data. The idea was simple: computers will perform these tasks with greater certainty and speed than a human being. A computer does not have a human being’s propensity to become tired or bored. It does not make simple mathematical errors, no matter how many times it exe- cutes the same equation—that is, as long as the computer is programmed and designed correctly. You will begin your Introduction to Computer Science by looking at the most basic con- cept of computing: data. Interpreting numerical data as an abstract human concept is the goal of man-to-machine cooperation. Data are processed by hardware, firmware, and soft- ware working together. To create a computing machine, it is not only essential to assemble the parts, but also to program them. This programming involves a complex language that was created to be understood by humans and machines alike—you will be introduced to the concepts of programming languages, data types, syntax, and semantics. From hardware to advanced software, computer logic and structure permeates the essence of computer science. You will encounter algorithms, data structure, circuit design, propositional logic, and basic computer architecture. The invention of networks revolu- tionized computing, and we will be examining these concepts along with TCP/IP, fault tol- erance, and network topologies. This overview of data, hardware, software, programming, logic, and networking will equip you, the student, with a foundational knowledge of the vocabulary and concepts neces- sary for studying the discipline of computer science. 10 UNIT 1 BASIC CONCEPTS OF DATA PROCESSING STUDY GOALS On completion of this unit, you will have learned … – about the concepts of data, information, and computer messaging. – the difference between hardware, firmware, and software. – the basics of binary and data interpretation. – what syntax and semantics refer to in programming languages. – the history of computers and data processing. 1. BASIC CONCEPTS OF DATA PROCESSING Introduction In 1977, a high school student named Richard Garriott took a summer course in computer programming at the University of Oklahoma. An avowed fan of Tolkien’s Lord of the Rings, he became enamored of a pencil and paper game called Dungeons & Dragons (D&D), which is set in a similar fantasy world. Not long after, Garriott started taking the rules of the game and programming them into a computer. The game of D&D is a nearly perfect match for the computer. It involves ability scores, numbers that are assigned to different human attributes such as strength, intelligence, and dexterity (data). These numbers are then used to calculate modifiers to determine chances of success (probability) to perform imaginary actions in the game, such as hitting an orc with a sword, deciphering ancient mystical ruins, or climbing up the side of a steep tower. As a player attempts to take certain actions, dice are rolled, with these rolls—modi- fied by said attributes—determining success or failure. Simple computer equations and algorithms could easily simulate dice rolling (randomization) and data calculation. Garriott created his own fantasy world based on the concepts of D&D and completed the game Alakebeth: World of Doom while still in high school. He then got a job working at a software store where the owner proposed to sell Richard’s game, saying that it was actually better than the games in the store. Eventually, the game sold 30,000 copies. With Garriott’s $5 per game share, he earned $150,000, which was more than his father, a NASA astronaut, earned in a year at that time (Bebergal, 2020). Any type of programming involves taking human concepts and ideas and boiling them down to numbers, logic, rules, and, finally, output that can be interpreted by a human user. To begin our study of Computer Science, we are going to take a look at the core con- cept of data. 1.1 Data, Information, and Messages Data consists of raw numbers and symbols that can be arranged into meaningful informa- tion, which is then used for specific research or analysis purposes. To a computer, data are Binary numerical and the basic data storage system is the binary numbering system. While This is a base 2 number- human beings are used to using many symbols to represent information, including letters ing system consisting of only ones and zeroes. of the alphabet and decimal numbers, computers are restricted to representing every sin- gle kind of information imaginable as ones and zeroes. If you asked a computer to remem- ber how you liked your coffee, it would store this information as something like this: 01011100. To you, this would be meaningless, but this combination of zeroes and ones could easily represent sugar, cream, size, and temperature of the first drink you have in the morning. Your favorite song, to a computer, would be a longer string of ones and zeroes, as would a picture of your grandma, your résumé, and that cat video you love so much. 12 The reason for this is simple: when computers were first designed nearly a century ago, the only way to store different states that represent information was to turn switches ON or OFF, which corresponded to the numerical values 1 and 0. In fact, this is still represen- ted today in most digital devices: the power button is a zero intersected by a one. Figure 1: Modern Power Switch Source: Vincentg, 2007. If we put a series of switches together, we can represent more combinations and therefore more information. For instance, placing eight binary digits together gives us a way to rep- resent 256 different combinations; these 8 digits together are called a byte. A byte could then be used to represent a letter of the alphabet, a specific color, or different application preferences. The type of information represented to a computer depends on the context. Context is Everything Without context, even the letters you are reading right now are meaningless. However, once the rules of the English language are applied, these letters form meaning. We can use the same alphabet to represent other information in other languages or codes. Without context, data are completely meaningless, especially binary data, which are merely a long progression of zeroes and ones. As mentioned earlier, binary can represent letters of the alphabet. Using a byte for each letter, the English alphabet can be encoded into a data file. Fortunately, we don’t have to create our own encoding rules for this—someone already did that years ago when they formed ASCII, the American Standard Code for Information Interchange. For example, the letter “A” in ASCII is represented as: 13 01000001 Since a capital “A” is distinct from a lowercase “a”, there is a separate code for it: 01100001. This way, a computer can represent all of the letters of the English alphabet along with various punctuation symbols. Using this context, we can now communicate with letters. You may have realized at this point that some data translation must take place. Humans do not input data in binary (at least, they probably would not find it very efficient) so com- puters must translate to binary to store the data—and then translate them back to show them to humans again! This is a good introductory concept of how computers work. They are constantly translating back and forth between human context and binary. 14 Figure 2: Full ASCII Table Source: ZZT32, 2007. 15 Another form that data can take is a graphical image. In the 1950s, Russell Kirsch invented Pixel the pixel (Ehrenberg, 2010). He decided that the simplest way to divide up the data in a A pixel is a single square photo or image was to demarcate it into discrete squares—that is what we have used to which is assigned a color and used to assemble a this day. It was a simple solution and worked well enough that humans could recognize larger image. photographs on a computer screen. Figure 3: Pixels Used in a Photo Source: Kissel, 2012. The concept behind pixel images is to break any image up into small squares, so that each square would then have one (and only one) color. If the squares are small enough, the human eye cannot distinguish them individually, and so our brain will see a photographic image. Pixels are used in all types of images including fonts, icons, and graphics, along with photos. How are these data stored? The basic idea is that each pixel has a sequence; the image could start at the top left for the first pixel, and then go across the first row in that order until completing it and then jumping down to the next row of pixels. Since the order is implicit, there is no need to number the pixels. Each pixel has a color as its basic data, so we must represent each separate color as a binary number. If we use eight bits (a byte), then there are 256 different colors we can represent uniquely. This is called 8-bit color. In earlier computers, the VGA graphics system used 8-bit color schemes where the first three bits represented red, the next three green, and the last two were blue. This is where the term RGB comes from for color displays—these three colors of light, in combination, can create any color visible to the human eye. However, with only 8 bits, there are only 256 possible color combinations to represent out of the potentially infinite grades of coloriza- tion. 16 Figure 4: Digital Representation of Color Source: Stephen Weese, 2020. If we combine the most intense red and the most intense green with no blue, we will get a bright yellow color. Music can also be stored digitally (which simply means “as digits”) in binary. In the same way that we chop an image into little squares (pixels) we can cut a sound or song into tiny slices of time and then string them back together. If the slices are small enough, our ear cannot tell the difference in much the same way that the eye cannot see very tiny pixels. A document in Microsoft Word, for instance, is also stored completely as ones and zeroes. The context for this data includes some binary codes that stand for formatting, such as bold and italic, and others that stand for page margins, font color, other settings, and, of course, binary code that stands for letters, numbers, and punctuation. MS Word, in a sense, has invented its own code for storing data in a.docx file. When you open a Word document, the binary gets translated back into what you view on the screen so you can begin to read and edit it. As a programmer, your job will be to decide how to encode and store data and give this data meaning and context so that both a human user and the computer system can under- stand it. Data Messaging Computers use different hardware components to perform tasks and must communicate using binary messages. Computer components have a special list of commands that indi- cate information and instructions to each other; these are sent primarily through the motherboard or through data cabling. Computers also send messages to each other through networking. Now that we have seen how to store letters, images, music, and documents as binary data, we will examine the process of sending these data over a network. The actual process of sending data is complex—there are many stages that include establishing and verifying a connection, agreeing on the mode of communication between computers, preparing the data to be sent, and then, finally, sending it and performing error checking to make sure it has arrived properly. For our purposes let’s say you’re going to send a picture of a cat over the internet to your grandma. The cat picture, of course, is just zeroes and ones. But we have to make sure they get to grandma’s computer. Here is a simplified demonstration of that process. 17 Figure 5: Sending a Cat Photo Source: Stephen Weese, 2020. You see the photo of the cat on your screen and save it to a file. Then you send it to your grandma, say, through email. But to send it to her, you have to have two addresses; both TO and FROM fields need to exist for delivery. Once you fill this in, everything is converted to binary: To, From, and all the data of the cat picture. It is sent out over the network to Grandma’s computer, which then translates it back so she can see that she has received a cat photo from you. Once the photo data is received, the To and From data are stripped away, as they are no longer needed. Grandma opens up the file, and the computer trans- lates the binary data into a viewable photo once again. 1.2 Software, Firmware, and Hardware Three types of components exist that work together to enable a computer to function. The most visible component is the hardware; it consists of all the physical components of a device. For our purposes in this unit, we will examine the modern personal computer. Other types of computers also use a similar hardware architecture. Hardware The Central Processing Unit (CPU) is often referred to as the “brain” of the computer. It does all the processing, which includes doing calculations and sending instructions to other hardware. It is a square computer chip which is inserted into a motherboard—this is a circuit board designed for computer components to connect to and communicate through. Computers need memory storage. A personal computer will need both long-term and short-term memory storage. One analogy to use is actually human consciousness. We process with our brain, but we also store things in our memory. We have short-term and long-term memory. 18 Figure 6: Analogy: Computer to Human Thinking Source: Stephen Weese, 2020. We all have short-term memory; it contains the things you are thinking about right now. How many things can you think of at once? It may seem like a lot, but let’s compare it to your long-term memory. Your long-term memory stores everything that you know. You can’t possibly think of all of those things at once; they won’t fit into your short-term mem- ory. A computer works the same way. Everything that the computer knows (data) is stored on the hard drive. Whatever you are currently working on is moved into RAM, the short- RAM term memory, while it is being used. This stands for Random Access Memory and means that the memory Let’s say you are working on your résumé. At that moment, the computer is using an appli- can be read randomly and cation such as MS Word to edit it. While you’re working on the document the data in your at any point in the data. resume as well as the Word application are taking up RAM (short-term memory). Once you save the document, a copy is taken from the RAM and placed in the hard drive which is Saving long-term storage. When you’re done working, you close Word, which means RAM is freed The process of converting data from short-term up to perform the next task. Since you’ve saved the file to long-term storage, you know it memory to a file stored in will be there when you want it again. Just like in real life, when you finish the résumé you long-term memory is stop thinking about it and start thinking about the next thing you are going to do. Your called saving. short-term memory clears out space, and you think about the current task. Most modern computer systems follow this model. You might wonder what the need for RAM is when the hard drive works perfectly well as a storage medium. The short answer is that RAM is a lot faster than the hard drive in terms of memory accessing speed and reading and writing data. It serves as a middle stage between the CPU and the hard drive to provide data quickly for calculations and other uses. Another comparison is a library. The hard drive is the entire library, whereas the RAM is your desk. It has a smaller data size than the hard drive. You may want to look at one book, or a few at a time, but only so many will fit on your desk. Besides, you can only really look at a few books at a time due to the way your brain (CPU) is structured. When you’re done with the current books, you put them back into the library, freeing up the desk (RAM space) so you can put some different books on it. 19 Firmware Before we discuss firmware, we must briefly explain software. Software is data; it is repre- sented in binary. The difference between normal data and software is that it consists of instructions to the computer—in other words, it is a specialized type of data. Some soft- ware is optional or can vary, but other software is required by the hardware for essential functionality. This type of software is known as firmware. When you first power on a com- puter, it uses special firmware chips to boot up. Boot up process RAM is volatile—this means that it only preserves data when it has electrical power. This is why in years past (before laptops with batteries became more popular) if you lost power while working on a computer project, it was gone unless you had saved it to the hard Boot process drive. When a computer first starts up (the boot process), the RAM is empty and the com- This is the startup puter is waiting for instructions on what to do. It finds those instructions (software) in a sequence of a computer. The term comes from the special chip called a BIOS, which stands for Basic Input Output System (Tarnoff, 2007). It is phrase “pull yourself up also referred to as ROM or Read Only Memory. This means it is static and cannot normally by your bootstraps.” be written to. (There is a special process to update these chips.) This is how firmware is designed to be: stable and not easily changed. These instructions are what the computer follows when first it starts up. They are always the same, so they are programmed into a stable chip that retains the data even without electrical power (a non-volatile chip). A modern computer first runs the instructions on the BIOS chip for booting up, which includes identifying all of the hardware attached to the machine (hard drives, amount of RAM, graphics card, etc.). Once that is done, it finds the operating system (OS) on the hard drive or other long-term storage device and begins the process of copying it into RAM. That is what you are waiting for when a computer is “booting” up. You stare at the logo for your computer until this process is finished and you can interact with it. Firmware describes computer chips that have instructional software on them that remain the same. They are either used for booting up a computer or electronic devices that always perform the same task, such as your smart refrigerator or a child’s toy. A single-task device does not need RAM to run different applications. Advanced personal computers can perform many tasks and need the RAM and extra hard drive storage to do so. Software Software, as mentioned above, are data that specifically tells the computer to do some- thing—in other words, instructions. (Remember, all computer data is binary.) There are two main types of software: operating systems and applications. Both types of software are updatable, but applications are optional; this means both are not stored on perma- nent ROM chips, but long-term storage such as hard drives that can be written to. 20 Operating systems An operating system (OS) is the first software loaded into the RAM during the boot process. An OS is essential for a computer to function (assuming a standard personal computer). Examples include Windows, Linux, and MacOS. This software provides three fundamental functions for the computer: 1. Provide an interface Interface 2. Control hardware The interface is how a human talks to the com- 3. Run applications (apps) puter. Most modern inter- faces include a mouse An interface is required so a human being can interact with a computer. The hardware in a pointer and icons on a desktop. computer system would sit, dormant, without the OS to give it instructions. Applications need an operating system to run. They are a second layer of functionality. This is why when you get an application you must get the application for Windows or for MacOS; they are designed to work with a specific set of instructions that are part of the unique OS. Applications Applications (apps) are used to perform specific tasks. If you want to write a document, you load an app. If you want to play a game, listen to a song, edit photos, or create music— all these are tasks performed on a computer that rely on apps. You open the app (loading it into RAM) and use it for your task. When you’re done, you close or quit the app. Apps provide specific instructions to the hardware to complete your task, which is why they are also considered software. 1.3 Languages, Syntax, and Semantics Now that we understand that machines work on a binary level, the next step is to find a way for humans to communicate with a computer. We’ve learned that an OS and applica- tions provide instructions so people can use the computer, but where does this software come from? The answer is: programmers. Programmers (or coders) have to use a special language (or languages) that both humans and computers can understand. To this end, programming languages were invented. There are many languages, but they all perform a similar function: they are words and symbols that translate directly to binary commands for the computer. Fortunately, one simple word in English could stand for a long series of binary commands, making it much easier for the programmer to specify what they want to do (Eck, 2009). One such language is Java. Syntax Besides coming up with words and symbols which will translate to computer instructions (or machine language), the correct ordering of the symbols and words must also be established. Just like the words in an English sentence, programming instructions only 21 Machine language make sense in the correct order. Otherwise, it is just a jumble. If you took the words of this A machine language is a sentence and put them in a random order, it would be ignoring proper syntax and make set of binary instructions that can be executed little sense. Syntax also includes rules on when certain symbols are allowed to be used, directly by a CPU. just like rules for English as to where punctuation can go in a sentence. foodItem = JOptionPane.showInputDialog(“Your choices are: \n1.Hamburger\n2.Cheeseburger\n3.Fish Burger\n4.Veggie Burger”); The above line of code in Java is an assignment statement—it takes a variable, foodItem, and stores in it the result of a dialog box that appears on the screen. The text to be displayed is in quotes; however, there is a special code “\n”, which means: insert an End of Line here. To humans, the “\n” probably looks odd, but in Java, the com- puter knows it means it should go to the next line. Syntax is crucial in computer program- ming. If just one character or symbol is out of place, the code will not work. Computers are terrible at guessing—if you don’t say exactly what you want according to the rules, they will just stop working. Semantics After you’re sure you’ve followed all the rules, you have to make sure that you’ve got the correct meaning—the semantics. If I very clearly describe how to make a peanut butter sandwich, but actually wanted a chicken salad, then I have a semantics problem. It’s pos- sible to create perfectly readable code for a computer, yet have it do something different than your intention. Experienced coders learn many skills to prevent that kind of thing from happening; it is best if you are aware of this from the beginning, since it is difficult to unlearn bad programming habits. One such skill is adding comments to code. These are notes to yourself (and other coders) that describe what the code is doing in a certain sec- tion. It’s especially helpful if you haven’t looked at that section of the program for a day or longer, so you know what the code was intended to do instead of trying to figure it out all over again. Advanced programs consist of thousands of lines of code; without comments, they are very difficult for any coder to work with. Speaking computerese In many ways, a computer language is similar to human language. There are words and symbols that have meaning, and these elements have a specific set of rules about the order that they can be used in and when they can be used at all. Finally, following these rules, you must assemble a message that makes sense and that explains the concept that you intend to convey. Just like with learning a new human language, you have to start with the basics, and it definitely takes time. The good thing is, once you have learned one computer language, the concepts and functionality are similar so that most others are also easily learned. At first, the semantics of programming languages may seem pedantic and difficult to understand. However, if we “think like a computer” it will become easier to understand. In English, we have several ways of conveying the same concept: for humans, this is artistic 22 expression, but for a computer this is ambiguous and unnecessary. The more precise and exact you are with a computer, the better results you can expect. Look at the following sentence: “One day I ate a hot dog in my underwear.” You’d probably imagine someone sitting in their boxers eating a hot dog. But you might also imagine that the hot dog is wearing the underwear. The sentence is grammatically correct, but semantically it could be clearer. Computer code should be used precisely to get the proper results. 1.4 Historical Overview of Computers There’s no machine that we can point to and say “here is the first computer.” It was more of an evolution of advanced machines into something that eventually resembles the com- puters we have today. In 1801, Joseph Maria Jacquard invented a loom that used punch cards made of wood to create fabric designs automatically (Zimmermann, 2017). Techni- cally, this was programming—a pre-made set of instructions changed into a “machine lan- guage” that tells a machine what to do. However, this machine wasn’t doing any comput- ing, simply weaving cloth. Later, during World War II, machines were used to encode secret messages. These machines used gears that would align with symbols to create a coded text from regular or “plaintext” messages. Breaking those codes, however, required actual computing power. The 2014 film The Imitation Game tells the story of Alan Turing; his Turing machine was able to take in encrypted messages, process them, and output an answer. However, his machine had no keyboard or monitor, let alone a mouse. To look at the history of modern computing machines, we will break it down into four eras. Figure 7: Eras of Computing Source: Stephen Weese, 2020. 23 Behemoths These enormous computers were built with vacuum tubes and wires and took up entire rooms. The input was usually entered by flipping switches and turning dials. The output was often given via lights or punched holes on paper. They were basically number-crunch- ing machines that were used to perform large mathematical computations, or to work with large amounts of numerical data. The U.S. Census Bureau purchased one of these huge computers, the UNIVAC, in 1951 to help with counting the population of the United States (Fabry, 2016). The ENIAC (Electronic Numerical Integrator and Computer), the first general-purpose computer, which began operation in 1946 and was programmed by a team of six female operators, should also be mentioned in this context (Schwartz, 2019). Figure 8: ENIAC Computer 1945 Source: TexasDex, 2006. Business Eventually, these large behemoths got a little smaller and were able to be produced more quickly. They eventually became affordable for medium to large-sized businesses (instead of requiring an entire government’s GDP to purchase). 24 Figure 9: IBM System/360 Mainframe 1964 Source: Sandstein, 2011. The output of these machines was a teletype terminal, where text would come automati- cally out of a typewriter like device. Eventually, in the 1970’s, “dumb terminals” were cre- ated with monochrome CRT screens. Unlike the personal computers of today, these Monochrome screens with a keyboard did not have their own CPU or RAM, they were connected via an This means using only a single color. Early moni- electrical signal to the mainframe which did all of the calculations. Several “dumb termi- tors were monochrome; nals” could be connected to a mainframe at the same time, sharing the central computer’s green, orange, or white processing power. Universities also became major purchasers of these mainframes at this on a black background. Mainframe time. Many of these mainframes ran the Unix operating system, the predecessor of Linux. A large, powerful, central- It wasn’t until the 1980s that the idea of taking a computer home for personal use became ized computer, a main- widespread. Early home computers were text only—there were no graphics yet and no frame is what terminals connect to for processing need for a mouse, so only technical people and hobbyists owned them. requests. 25 Graphical User Interfaces The invention of Apple’s Macintosh computer in the 1980s started a personal computer revolution. Though there were previously computers with graphics capabilities (e.g., the Commodore 64 in 1982), the Macintosh included a mouse and a Graphical User Interface (GUI). Instead of having to type in precise and obscure text commands to use a computer, users could now point-and-click. This made them much more accessible and led to a new, booming market for home computers. They were not only used for business applications, home budgeting, and the like—they were also used for gaming. Richard Garriott’s Ultima II was released for the Macintosh in 1985. At first, the Mac’s display was only black and white, but as computers became more powerful, more colors and pixels were available. The main competitor to Macintosh was the PC, a computer originally made by IBM using an Intel CPU. Even though the Mac was technically also a personal computer, “PC” became synonymous with the Intel-based IBM system (and later other similar “clones”). In the 1990s, Microsoft released Windows, a GUI operating system for the PC. This was the begin- ning of the legendary “Mac versus PC” divide that still exists today. Modern Macs and PCs now both use Intel CPUs and have very similar hardware designs. Portable The next generation of personal computing was driven by display technology—the inven- tion and widespread use of flat screens. If you were born in the mid-1990s, you may never CRT have known a world filled with clunky CRT monitors giving off waves of heat. Once flat- A cathode ray tube (CRT) screen technology was refined, it enabled truly portable computers to be mass-produced. was a type of monitor or television that used a In the past, some enterprising computer manufacturers had attempted to create “porta- large tube to display ble” computers such as the Osborne, which weighed over 24 pounds and had a five-inch images. monochrome CRT built-in. Unsurprisingly, these did not become popular. The flat screens of the 1990s were used to make easy-to-carry laptops and also replaced the bulky CRT monitors used for desktops. As the technology developed, more pixels and colors were made to fit on ever-shrinking screens for tablets and smartphones. Today’s smartphone is still a computer—it has a CPU, as well as long-term and short-term storage. SUMMARY Data, in reference to computing, is stored in the binary numbering sys- tem. This base 2 system only allows for 0 and 1. Large strings of binary numbers, or bits, can be used to represent all different types of human data. Text, documents, photographs, music, and video data can all be transla- ted to binary; however, this means that there must be context for the digital information. The bits have to have meaning beyond just the val- ues of zero and one. 26 Once we have translated and stored ideas from the human world into a computer with binary, they must be translated back again so humans can understand them. That is the job of the computer. A basic computer consists of a brain, the CPU, and memory storage. Some storage is short-term, like RAM, and other types are long-term, like the hard drive. Software is binary instructions to the computer. It takes the form of either an operating system or an application. The OS provides basic functionality including an interface, while applications (apps) are availa- ble to perform specific tasks. To program a computer, languages had to be invented that were some- where between natural language and binary. These provide a way for humans to give instructions to a computer without having to write everything in binary. Programming languages have rules, including syn- tax, which dictate how to use the words and symbols of that language. Computers were designed to automate complex tasks and process data. Some of the earliest machines were used by governments to decode encrypted messages during World War II. Eventually, the size and price of computers was reduced, and businesses became able to purchase them. The invention of CRT monitors allowed people to have “personal computers” in their homes. The invention of the GUI made computer use easier, causing its popularity to explode. 27 UNIT 2 INFORMATION REPRESENTATION STUDY GOALS On completion of this unit, you will have learned … – about the binary numbering system and how to convert decimal to binary. – about data storage sizes, such as kilobyte, megabyte, and terabyte. – how graphics, documents, music, and other data are represented in memory. – about the different data types used in programming such as char, string, int, and float. – how computers perform error checking on stored and network data. 2. INFORMATION REPRESENTATION Introduction In 1936, kids from all over the United States would tune their radios in to the Little Orphan Annie radio show. They knew that sometime during the show, it would be time to break out their decoder pin—the one they ordered after sending in proof of purchase of the chocolate drink Ovaltine, a sponsor of the show. This pin had two concentric rings: the inside was a ring of letters and the outside a ring of numbers. The kids would listen for what letter to turn to the top of the inner ring, and then be given a sequence of numbers which were the secret code for that week. The kids eagerly wrote down these numbers and began the work of decoding the secret message, which was usually a preview of what would happen on the show next week. It was brilliant marketing as it not only encouraged the children to ask their parents to buy more Ovaltine, but also made kids look forward to tuning in the next week, keeping up the audience numbers. Figure 10: Ring Cipher Source: Stephen Weese, 2020. The children probably didn’t realize that they were using something called a ring cipher: a simple device used to encode and decode messages. 30 The Orphan Annie membership pin they used is an apt representation of how computers deal with non-digital data. The 26 letters of the English alphabet were converted into deci- mal numbers. You may notice that this conversion is not symmetric—there are only ten different symbols for numbers (0—9), not nearly enough to have one for each letter. There- fore, some of the letters would require two-digit numbers to symbolize them. Since all data in computerized machines are stored digitally (as numbers), conversion has to occur from all types of non-numeric data, not unlike the decoder pin. In the case of computers, the data are not necessarily a “secret” message, but they certainly are in code, a code that computers are fluent in: the binary numbering system. 2.1 Number Representation: Formats Human beings have ten fingers. It is a common reason given for why we use a base 10 numbering system: decimal. Decimal uses ten symbols, 0 through 9, to represent all possi- ble numbers. Computers, however, don’t have fingers. What early computers did have were switches. These could be turned on or off, giving two possibilities and a base 2 num- bering system: binary. There are two symbols, 0 and 1, that are used in binary. Since humans and machines use different numbering systems, translation must occur. Figure 11: Decimal Table Source: Stephen Weese, 2020. In decimal, the number 806,412 is easily decipherable to a human. We recognize that each successive number to the left of the first one represents the next power of ten. In other words, 800,000 + 6,000 + 400 + 10 + 2 = 806,412. Most of us learned this very early in school. Each time a digit is added, there are ten new combinations of possibilities, which is why we multiply by a power of ten. In binary, there are only two possibilities for each digit, which means adding a digit creates a power of 2 more possibilities. 31 Figure 12: Binary Table Source: Stephen Weese, 2020. This eight-digit binary number represents a numeric value to a computer. A human would have to translate 00101101 this way: 32 + 8 + 4 + 1 = 45. There is a 1 in the 32's place, the 8's place, the 4's place, and the 1's place. There are no 16s or 2s. This demonstrates how to Byte translate binary into decimal. Since a byte is a common unit of data in computing, it is This refers to eight bits useful to know that 28 is 256, meaning that 8 digits in decimal can represent 256 combina- (digits) together in binary, often used to measure tions, or 0 to 255. Just as we can keep adding digits in decimal to get larger numbers, we data sizes. can add more digits in binary to increase the value. Figure 13: Binary to Decimal Examples Source: Stephen Weese, 2020. It is simple to convert from decimal to binary—the process is to keep dividing the decimal number by 2 while recording the remainder. This will reveal the binary number (in reverse order of digits). For instance, let’s look at the number 97 in decimal and change it to binary. 32 Figure 14: Decimal to Binary Examples Source: Stephen Weese, 2020. Going in reverse order of the remainder, the binary result is 1100001 for a decimal value of 97. We can quickly check our math by going back to decimal: 0 1 1 0 0 0 0 1 0 +64 +32 +0 +0 +0 +0 +1 = 97 Bytes and Data Measurement Since a single bit only represents two possibilities, a larger grouping of bytes is often used for measuring data. A byte can represent simple data like a letter from a certain alphabet, a color, or a musical note. Different types of data can take up varying amounts of space, which are measured in bytes or multiples of bytes. Figure 15: Data Storage Quantification Source: Stephen Weese, 2020. For instance, a Word document might take up around 28KB of storage, a JPG photograph about 3MB, and a DVD movie around 4GB. Most modern hard drives are capable of storing around 1TB of data. Video, applications (especially video games), and operating systems take up the largest amounts of data storage. Notice that each naming convention above is 1,000 times the previous. 33 Octal You may have observed that it takes quite a large number of digits to represent numbers in binary in comparison to decimal. A good method to abbreviate binary is to use the octal number system, which, as you may have surmised from the name, is a base 8 system. This gives us combinations of digits, with 0 through 7 as possible values for each digit. Each digit’s place value is a multiple of 8, which is a power of 2, so it maps nicely—one digit of octal can represent exactly three digits in binary. Table 1: Octal Numbers Binary Octal Decimal 001 1 1 010 2 2 011 3 3 100 4 4 101 5 5 110 6 6 111 7 7 1000 10 8 1001 11 9 1010 12 10 Source: Stephen Weese, 2020. As you can see, though we are using some of the same symbols in all three systems, they mean different things. When we write “10” in decimal, to us humans it means 10. In octal, “10” means 8 to us, and “10” means 2 in binary. In fact, there is an old joke which illus- trates this concept. Figure 16: Binary Humor Source: Stephen Weese, 2020. 34 Hexadecimal Though octal may be useful in some situations, Hexadecimal is the preferred intermediate between decimal and binary. One digit of hexadecimal (or “hex”) can represent four digits of binary. Hexadecimal is base 16—that means there are sixteen possible symbols per digit. Obviously, we will run out of decimal numbers for symbols, so hex recruits letters from the alphabet. Table 2: Hexadecimal Numbers Binary Hexadecimal Decimal 1000 08 8 1001 09 9 1010 0A 10 1011 0B 11 1100 0C 12 1101 0D 13 1110 0E 14 1111 0F 15 10000 10 16 10001 11 17 Source: Stephen Weese, 2020. In computer applications, hexadecimal numbers are usually written in groups of two, since two digits represent eight digits in binary (a byte). You will often see a leading zero before a hex number to represent this, as in the table above. Hex is often used to view raw data in a file; it is a sort of shorthand for binary. It represents 16 different possibilities with one digit, 0—F. “A””represents decimal 10, “B” represents 11, and so on. Often, computer scientists may want to analyze the contents of a file. Viewing it in binary is very difficult and nearly meaningless for a human. “Hex editors” are often used that show a view of the hex translated content, as are ASCII decoders. ASCII This is a binary code scheme where letters, numbers, and symbols are represented by exactly one byte of data. 35 Figure 17: Hex Editor Source: Hörz, M, 2020. This is the raw hexadecimal data from a photo taken by the author. Note the hex number D8 is highlighted, showing a binary value of 11011000. On the “Decoded text” pane the ASCII interpretation is displayed. Computing and Numbering Since computers use binary, people must use different ways to represent data that are more compatible with human thinking and concepts. These ways include different codes, such as ASCII, and different numbering systems such as octal and hexadecimal. Hexadeci- mal is often used as a go-between for humans since: a) it has an exact 1:4 ratio of digits with binary, and b) it has more symbols, so it is easier for humans to read. Displaying data in hex takes four times less space than in binary. 2.2 Representation of Non-Numerical Information We have demonstrated how decimal is converted to binary and stored in a computer. This will work for any data set that consists only of numbers. However, human data are also stored in words, images, and even sound. This means that to store this information on a computer we must also have a system to convert all of this to ones and zeroes. 36 Everything is Binary Think of your favorite song right now. On a computer, it is stored in binary. The entirety of all the subtle notes and sounds are captured as zeroes and ones. What about a beautiful painting? This, too, is broken down into zeroes and ones and then displayed to you as light on a monitor screen. From Handel’s greatest works to The Rolling Stones, from DaVinci to Dali, a computer sees everything, stores everything, as binary numbers. The Bible, the writings of Goethe, the sayings of Confucius—all become digital data when stored on a computer. How is this done? It all depends on context. Data Contextualization One of the simpler ways to represent data is to convert alphabetical letters into numbers. Looking back to our decoder ring from the 1930s, we can see that it is a simple matter to assign a specific number to a letter. But what about upper and lowercase? Then we create separate numbers for that. Punctuation? Once again, they get unique numbers. Once we have decided how many letters and symbols we have to represent (including things like spaces) then we can create a 1-to-1 assignment of numbers. It goes without saying, of course, that these numbers will all be in binary. We have discussed ASCII before, but this concept was expanded for more modern computers and international languages with an encoding standard called Unicode. Digital Documents ASCII can only display English characters (with a few exceptions), while Unicode was designed to represent most of the alphabets across the entire globe. It includes world cur- rency symbols and italics. Since it uses 16 bits, it can represent a total of 216 or 65,536 different symbols. Unicode is often used for internet communication, from email to web- pages. With Unicode we are able to represent raw, unformatted characters and symbols in many languages, but to create something like a Word document, another layer of com- plexity is needed. In 2007, Microsoft released an update of their Office software suite which included a new way to store its data files. This new method was based on XML, or Extensible Markup Lan- guage. A document, Word or otherwise, must have other information besides letters and symbols: this includes margins, page size, fonts, sizes, and colors, for example. Put simply, Word documents and other documents use binary codes to “tag” different sections of text with different attributes such as bold or a certain indent. This, of course, is also all stored in binary in a very specific order. When a Word document is read by an application, the binary is interpreted according to the rules for the file type and it is displayed on your screen. A Word document (or any other data file) is a meaningless string of zeroes and ones with- out context. When you use the Word application to open a Word document, it knows the specific context, and therefore exactly what all those numbers mean. 37 Images Photos and other graphics that are displayed on a screen are broken down into pixels, or very small squares (sometimes rectangles). For instance, right now you are viewing a document that is stored as text and formatting; these must be converted into pixels on your screen. When a human uses a computer, several layers of conversion and translation are continually occurring, carried out by the operating system and the application. Once we developed large enough data storage, humans were able to design computers with very small pixels so that millions of them could fit on your display screen, giving you a Resolution high resolution, which makes photographic images look realistic and text very readable. The number of pixels able to be displayed is called the resolution. This is Many displays now use 24-bit color. This means there are 8 bits each for red, blue, and usually measured as green. These colors of light combined together can represent the full spectrum of visual width by height, e.g., light. To store any image as binary, it is broken up into pixels which are each given a binary 1920 x 1280. value that shows how much red, blue, and green light intensity to display it with. Figure 18: Digital Pixel Colorization Source: Stephen Weese, 2020. Since 8 bits (a byte) can store values from 0—255, each color can have that many degrees of intensity. Absence of a color would be zero, while the most intense is 255. Using 255 for all colors will display white, while using 0 for all colors would be black. This requires enough memory to store 24 bits multiplied by the number of pixels on the screen. This means that to store an HD image (1920 x 1080 pixels) with 24-bit (3 byte) color, you would need: 1920 · 1080 · 3 = 6,220,800bytes Which, in other words, is about 6MB. Older computers from the 1990s, with only 1MB of memory, had no means of displaying anything with close to this many pixels and colors, Pixelated which is why images on those screens looked very pixelated. This term describes an image represented on a computer with large enough pixels that they are individually visible. 38 Audio and Music In the real world, light and sound exist in waves. Information represented in waves (e.g., an audio cassette tape) is referred to as analog. When we store it on a computer it must be converted into digital data. Does this mean that some of the information is lost? Definitely. Analog waves have an infinite continuum of values, whereas digital representations of waves must simulate them with sampling. Sampling This is the representation of analog data by taking Analog data are sliced into very small time samples and a digital value is assigned for that samples of the wave val- time slice. The smaller the slices, the more accurate a representation. However, we cannot ues over small time inter- have infinitely small samples, just as we cannot have infinite pixels. vals. When we look at the world with our eyes, we are not seeing pixels, but continuous waves of light. However, when a computer makes the pixels on the screen small enough our eye simply cannot tell the difference. This is the same principle with audio sampling. If we cut up an audio wave into small enough slices and reassemble it, the human ear cannot tell the difference. Figure 19: Digital Audio Sampling Source: Aquegg, 2013. Even before digital video, the same concept was used for film. A movie viewed in the thea- ter is really a series of still images shown rapidly enough that to the human eye (and brain) it seems like real motion. Whether the audio is music, singing, sound effects, or someone speaking, it is sampled and stored as binary data, where the context tells the computer to reproduce a sound through a speaker so humans can listen. 39 Context is Key Computers need assistance in deciphering the message we are sending with binary. There are two primary ways that a file (a discrete unit of data) is labeled to contextualize the data. The first, simple way is through the file extension. The three or four letters appended to the filename tell the operating system what type of application is able to read the data in this file properly. Not only that, but files also have headers (or file signatures) at the beginning with specific binary sequences that indicate that the data are an image, a docu- ment, an executable program, or something else. For any specialized type of data, a computer system must first be created to store all of the information in binary. That standard will then be used by any application that wants to read or write these types of data. This is where we get file types such as.jpg,.docx,.exe,.dmg, etc. New file types and data contexts are continually created every day. 2.3 Data Types In this unit, we have learned that files, or large blocks of data, need to have context. On a more granular level, individual units of data must also have context. When we work with a data point, it must be defined to the computer as a number, a character, a memory loca- tion, or something else. For instance, a Word document contains all kinds of different data: some numerical, some character, some formatting. Each separate character would need a data type to identify it correctly. Variables In programming, the coder creates variables to store individual units of data. A variable is a container for information. It has three important defining factors: a name, a type, and a memory location. The name is what it will be referred to in code, just as if you wrote a label on a box. The memory location is where the computer can find this box in RAM. Figure 20: Variable as a Box Source: Stephen Weese, 2020. 40 Let’s say you create a variable called “AGE”. The computer will label that box with the name and store it somewhere in memory. But there’s nothing in the box yet. We need to store a value in the box; a variable is a container for a data value. The value of 37 is assigned to the variable “AGE”. Now when the computer goes to find the box and opens it, the value 37 is discovered within. Just like in real life, a box can be reused; we can put other values in the box at different times. This is why it is called a variable—the value inside it can vary. It is different from a constant which is declared at the beginning of the program and cannot ever change. The data inside must also have a “type”. Just as a paper bag is not made to contain liquids, but solids, variables are designed to contain specific types of data. Integers An integer is a unit number; this means it contains no decimal component. Integers include all the natural numbers (1,2,3, etc.), their negative versions, and zero. This defini- tion is important to a computer since it does not have to assign any memory to decimals, only unit numbers. In the previous case of the variable “AGE”, an integer would be a good data type to use, since it is a standard convention to list someone’s age as a unit number without a fraction or decimal on many forms and records. In Java, the integer data type is abbreviated “int.” To declare (create) a variable for age, it would look like this: int age = 0; When declaring a variable, in some languages a starting value must be given; this is refer- red to as initializing. In this case, the variable is initialized to zero. It presumes that an actual age will be calculated or input by the user later. Numerical data types Other standard data types include longer integers and decimals. The following are some examples of numerical data types in the programming language Java. Byte: integer from -128 to 127 (8 bits) Int: integer from -2,147,483,648 to 2,147,483,647 (32 bits) Float (smallest value): 1.4E-45 to 1.4E-45 to 3.4E38 (32 bits) Float (largest value): 3.4028235E38 (32 bits) Double (smallest value): 4.9E-324 to 49E-324 to 1.7E308 (64 bits) Double (largest value): 1.7976931348623157E308 (64 bits) 41 As you can see, data types are specified in bits—this is important to programmers. You don’t need to use a huge box to store a pair of sunglasses; it’s a waste of space. Coders also shouldn’t use large variables such as int when a byte will do. Note that a “byte” data type can store 256 different values; some of them are used for negative, some positive, and one is zero. To declare these variables in Java, you’d do something like this: byte age = 37; int jellybeans = 2243; float fluidOunces = 28.73; double solarDistance = 932842983.2340273; The reason a decimal is called a “float” is because the decimal point can float around to different places in the number. This is an important distinction for a computer: where is the decimal? That must be stored somewhere in the variable itself. It is something a human would take for granted, but something that has to be specified for a computer. Many programming languages are also case-sensitive; this means that the variable “age” is not the same as “Age” or even “AGE”. Good coders should follow a naming convention for all of their variables for consistency. Non-numeric data types Most languages, including Java, have a basic data type to store one “character.” This repre- sents one symbol in a language, so would include letters, numbers, spaces, and punctua- tion. You might think that we have already covered numbers, but remember computers are very specific. The numeric value of “2” is a separate concept than the character for “2”. Humans automatically deal with the different contexts where we see the symbol for “2”, but computers are more specific. Consider that displaying the number 2 on the screen as a specific group of pixels is completely different operation than multiplying a value by 2. In English and many other languages, words and sentences are represented by groups of characters strung together—this is the reason that computers have a data type called a string. A string is a group of characters strung together. Therefore, it is actually a complex data type consisting of several simple (or “primitive”) variables combined together (primi- tive data types). Here are some non-numeric data types from Java (Oracle, 2020): Char (a character): letter, number or symbol (16 bits) Boolean: represents “true or false” (1 bit) String: a sequence of character information (varies in size) Array: an ordered sequence of a primitive data type (varies in size) A Boolean variable (named after mathematician George Boole) represents either a TRUE or FALSE value, and therefore only has two possible states, which technically can be stored as one bit; zero is used for false and one for true. 42 An array is an ordered arrangement of a simpler data type. You could, for instance, have an array of integers, doubles, or even Booleans. However, an array of characters is so com- mon that the data type string usually exists in most languages for storing language text. Let’s look at how these data types would be declared as Java variables: char middleInitial = 'A'; String firstName = 'Joseph'; int[] lottoNumbers = {5, 22, 11, 7, 16}; The array contains elements that represent different lottery number picks. To reference Element the second number, you would type “lottoNumbers” and it would return the value This is a single value in an array, addressed by its “22”. ordinal number; these start at 0 for the first ele- ment Arrays are often used when you have a collection of data of the same type that you want to store in order. You can search through them sequentially, pick out singular items that you want to work with, and even sort the list of elements in order. Uses of arrays include lists of options to choose from, arranging data groupings, or collecting user input, to name a few. Using Variables Correctly When creating variables, you should be considering the correct data type as well as the memory usage. Not only that, but descriptive names should be given that also follow the standard naming convention that is used for that language. Many new programmers begin by naming their variables “x” and “y” for simple programs, but these names are non- descriptive and will be problematic in code of any reasonable length. In code with hun- dreds of variables, non-descriptive names will leave the programming unreadable to any- one, including the original coder. Variables should not be a mystery; they should be named in a way that describes their use in code. Computers have very specific data types and expect them to be used in the correct context. You cannot “add” together two char data types and expect the same result as adding two integers. The computer will see this as two different things. If you have the characters (char data type) “4” and “2” and com- bine them, in many computer languages you will get “42” instead of 6. The characters are seen as symbols and simply placed together; if they were numeric variables (such as int) then adding them would produce a numeric answer rather than a character based answer. Computers do exactly what they are told. This is why data types and variables must be used with precision. 2.4 Redundancy and Error Tolerance Imagine writing a 100,000-word novel. You carefully read the manuscript after your first draft, and check for mistakes—but did you really find all of them? There is still a chance that errors slipped through. You might do a second check or hire an editor. Are you sure you got all of them this time? It is very difficult to be sure of perfection in such a large vol- 43 ume of words. Computers process words and information at a much higher rate than human beings. A simple image on your screen of 800 x 600 pixels has 480,000 pixels. Are they all exactly the right color? Did some bits get confused in transmission? A computer, though very accurate in its calculation, can also suffer from errors in data transmission. Whenever data is moved from one location to another, there is a chance that the electrical signal may be garbled, interfered with, or interrupted. Data can move from RAM to the hard drive, from the hard drive to the internet, then from the internet to your friend’s house in Argentina. So just sending a photo of a cat to Mari Rubi in Buenos Aires has sev- eral points where errors can occur. Therefore, computers must have built in precautions against error corruption. Storage Error Checking Some errors are not caused by transmission, but media corruption. Hard drives typically last about three to five years. Some drives experience a gradual corruption, where parts of the physical surface become unstable and will no longer store data. An early method to protect files in storage from corruption is the checksum. Recall that any file on a computer is a series of binary numbers—in fact, it could be consid- ered one very long binary number. One way to check if it has been changed (corrupted) would be to store an exact copy somewhere and then compare it. But because some files are megabytes, or even gigabytes, in size, this would be very inefficient. Instead, since the file is one large binary number, it is placed into an algorithm. This algorithm is like a math- ematical function—it has a numeric solution. The checksum is the solution to this prob- lem; it is calculated and appended to the file when it is saved. When the file is opened, the calculation is run again to see if it matches the number stored in the file. If it does, then the chances of it being uncorrupted are incredibly high. (There is a small chance that changes to a file might give the same checksum, but it is remote enough that the error checking is considered reliable.) Usually corruption in files is very minor and easy to detect in a checksum. 44 Figure 21: Checksum Examples Source: Helix84, 2008. Notice in this example the text is only changed slightly in the later texts, but the checksum is very different. Since the number is 10 digits, there are 10,000,000,000 different possibili- ties, making the random chance that two texts have the same checksum 1 in 10 billion. Transmission Data Fault Tolerance The problem of corrupted transmission of data has existed since before computer net- works were invented. Radio transmission of secret codes by different militaries risked this problem, as did communication by telegram, and even a letter written by hand could be damaged by water or other environmental hazards. Parity One of the earliest methods of error-checking for binary data is called parity. Communica- tion with telephone-based modems was serial, or one bit at a time. For every seven bits, Modem an eighth parity bit was added for error checking. These basically worked like a very sim- A portmanteau of “modu- lator-demodulator”, this ple checksum—in this case we can even do the math in our heads. Communications via is a device used to trans- modem were set to either “even” or “odd” parity, which meant that for every 8 bits there late one type of signal to must be an even or odd number of ones in the data. If it did not match, then it was another. assumed there was an error, and a request was made by the receiving computer for retransmission. 45 Table 3: Odd Parity 01101011 11101001 10011000 01101011 00000100 Source: Stephen Weese, 2020. The parity bit is sent first in a group of eight bits, and its value must make the total number of ones in the byte odd. In the example above, you can see that the values are 5, 5, 3, 5, and 1 when the ones are tallied. This first bit is not counted as the primary data, it is used for error-checking and discarded, just like the checksum on the end of a data file. If a com- puter set to odd parity received 8 bits with an even number of ones, it would assume it was an error. Of course, if two bits happened to flip during transmission it could still pass parity. The assumption here is that the most common error is losing one bit in an 8-bit sequence. This also does not account for the possibility that the parity bit itself is corrup- ted, but this was used as a basic method of error-checking for many years in telecommuni- cations. Redundancy Check A more advanced type of error-detection is the Cyclic Redundancy Check, or CRC. It works with a similar principle to the checksum. A block of data is processed through a mathe- matical algorithm and a result is appended to the data. After the data is sent, a calculation is repeated to check the integrity. The CRC can be applied to any length of binary data and will always return a code of the exact same length. This makes CRC a “hashing” algorithm, one that returns a value of consistent length in digits. TCP/IP Error Detection and Correction TCP/IP The main protocol, or set of rules, for internet communication is TCP/IP. This network This stands for Transmis- standard has several layers of protection for data corruption. The first one is one we’re sion Control Protocol/ Internet Protocol, a speci- already familiar with: the checksum. fication for internet com- munications. TCP/IP divides data into pieces to be sent over a network into segments or datagrams. These segments have a “header” at the beginning which defines the sending computer, the receiving computer, transmission settings, and also a 16-bit checksum to verify the data. The maximum size of this segment is 65,536 bytes (Stevens, 2004.) When each seg- ment arrives at the destination computer it is checked against the checksum value; if the values don’t match, a retransmit request is sent to the source computer. Along with a checksum, TCP/IP also checks for transmission errors on other levels. The receiving computer must also send an acknowledgement (ACK) for each datagram received. This is accomplished by giving each datagram a sequence number in the header. The recipient machine sends an ACK for each sequence number—if one of the numbers is not acknowledged after a certain period of time, the source computer will retransmit. 46 At another level, TCP/IP will also detect broken routes over the network and re-route to new ones. This is one reason why TCP/IP has been used for decades on the internet: it is fault-tolerant. Part of the network can stop working without taking the entire network off- line. Routers and devices that stop forwarding network traffic can be worked around by checking for alternate routes to the destination. SUMMARY All data on computers is stored as binary. There must be a method of translating these numbers into useful human concepts. One of these is to give this data context. This means assigning the binary numbers a sig- nificance beyond their numeric value. For images, that means pixels and colors. Documents are stored as sym- bols, punctuation, formatting, margins, and other settings. Audio is sam- pled digitally and saved as binary numbers to be reproduced as sound waves. A pre-formed code or set of rules is determined for each context to give the binary data an extra layer of meaning. This is done just as different letter combinations in English can be used to create words and paragraphs with more meaning that just their original symbology as individual letters. In computer programming, all information must be given a specific data “type” which defines how the data will be used and their limitations. A good coder knows which variable types to use and how to name them properly for maximum code readability. There are many standard data types. Some are numeric and used to rep- resent integers and decimals. Others can represent letters, symbols, and strings of characters. We can group simple data types together into arrays to arrange and search them. The volume of data that computers deal with necessitates some type of error checking. It is possible that during data storage or transmission some of the information can be corrupted. To detect this, various error checking systems have been created. Checksums and CRC take the binary data through a mathematical function that returns a specific value for that data stream. If the sending value matches the receiving value, then the data is assumed to be unchanged. TCP/IP has various error-prevention technologies built in, including a checksum in the header of its data segments. It also has a system of acknowledgement, where the recipient computer confirms the receipt of data sent. If some data is not confirmed as received, or found to be corrupted, a retransmission request is sent to the source computer. 47 The more we understand how digital data is used by computers, the bet- ter we can utilize computing and code to create software applications. 48 UNIT 3 ALGORITHMS AND DATA STRUCTURES STUDY GOALS On completion of this unit, you will have learned … – how the term algorithm is used in applications of computer science. – how to create flowcharts. – the details of data structures, such as arrays and stacks. – how sorting algorithms work, including quick and bubble sort. – how the quality of different algorithms is evaluated. 3. ALGORITHMS AND DATA STRUCTURES Introduction For the last 20 years, Google has been the main search engine used on the internet, accounting for over 90 percent of the market share internationally. (StatCounter.com, 2020.) But how does it work? A Google search works by a set of rules called an algorithm. An algorithm is a set of rules or instructions for completing a task. This applies to human beings as well as computers. The vast amount of data that is available on the internet poses a challenge to anyone who would want to search through it. Google has established a method to search through this data with excellent results. According to Google, there are actually several algorithms working together to achieve the results for an internet search. One of these is used to determine meaning: Google can sub- stitute synonyms for words in its search such as “move” and “relocate” to find similar matches. Another algorithm is used to analyze relevance. The keywords typed in the search are matched to the page—the better the match, the higher the search results. The search algorithms for Google also incorporate a quality assessment of the website itself in terms of expertise, authoritativeness, and trustworthiness on a given topic (Google, n.d.). This seems like a good start, but the algorithms go even further. The page will also be ranked for “usability”, which means how well the page displays in different browsers and how quickly the pages load. Finally, the algorithm includes personal information about the user that is searching. If that user lives in Barcelona and searches for “fútbol”, Google will prioritize local teams and FC Barcelona would be a higher result compared to results for someone in Berlin. Of course, the exact details of Google’s algorithms are secret, like the secret spice recipe for Kentucky Fried Chicken. It is, after all, the most popular search engine in the world, and Google would not want anyone stealing their market share by implementing the same algorithms. Is the algorithm perfect? No. New sites take a while to start to appear in search results. There is also a chance that a good site match for a particular search does not come up in the first page of searches. However, it is a good enough algorithm that it is the top choice of most web searchers around the world. Often in data processing, there will be a data space that is extensively large, like the inter- net, that poses a challenge. Certain mathematical problems also have a vast number of solutions and can be difficult to manage. Often, it is impossible even for a computer to search through every possibility (at least, if quick results are desired) and an algorithm will be designed to give a “good” result quickly instead of a “perfect” result that takes much longer. 50 3.1 Algorithms and Flowcharts An algorithm at its most simple definition is a set of instructions. This works for computing applications as well as human beings. Let’s say that you want to tell someone how to fix a broken lamp. You’d probably ask the usual questions like, “Is it plugged in?” and “Did you replace the light bulb?” However, to put these into a set of instructions, you’d have to organize things a bit more. You’d do something like this: 1. Check to see if the lamp is plugged in. If it isn’t, plug it in. If it works, you’re done. If not, go to step 2. 2. Check the lightbulb. Is it burned out? If so, replace it. If it works, you’re done. If not, go to step 3. 3. The lamp is broken. Buy a new lamp. Flowcharts (Flow Diagrams) To translate human instructions into computer instructions, coders have used flowcharts for many years. These charts make it easy to observe the structure of an algorithm and create it with a programming language. They use standard symbols to indicate different types of instructions, as shown below. Figure 22: Flowchart Symbols Source: Stephen Weese, 2020. A terminator symbol (oval or rectangle) is used at the beginning of a flowchart and at all the endings. They represent the external nodes of flowcharts. A rectangle represents a process, for instance, some type of data calculation or filtering. The diamond is a decision —a decision has multiple pathways that can be taken. A parallelogram in a flow chart rep- resents input or output, where data are sent into or out of the process. Finally, the arrows represent the flow from beginning to end of the sequence. There are many other symbols that can be used in a flowchart, but these will suffice for now.

001-2023-1016_DLBCSICS01_Course_Book.pdf

Document Details

Tags

Related

Full Transcript

Upgrade to continue