Bits & Bytes PDF
Document Details
Uploaded by SimplifiedIodine
Tags
Summary
This presentation explains how computers store and represent data using binary. It details the concept of bits, bytes, and various data representation methods, including how different data types, like text and color, are translated into binary code.
Full Transcript
Bits & Bytes Return to Table of Contents https://njctl.org/video/?v=1S5Va3pcug8 ...
Bits & Bytes Return to Table of Contents https://njctl.org/video/?v=1S5Va3pcug8 Storing Data Computers store all data in binary, that is why it is so crucial to understand. However, so far, we have only learned how numbers translate to binary. When we are on a computer, we see many more characters on the screen than just numbers. Computer readable data Bits are how a computer reads all data. It has taken many individuals and companies years to learn how to translate human data such as pictures, text messages, voice messages, and colors into computer readable data. We will learn how to convert the decimal number 145 into binary digits. 145 10010001 Any color be converted into binary digits based on the color's composition of red, green, and blue values. 11110101 00011111 00011111 By understanding how to convert integers into bits, computers can also understand letters. A 01000001 Storing Data For example, when you see the letter "a" on the screen, the computer does not interpret the same thing. The computer reads the "a" as 0110 0001. In other words, at the hardware level, the computer doesn't store a symbol that looks like "a", instead it stores a combination of 0s and 1s that represent the "a". More accurately, a combination of on and off transistors in the computer encode the "a" so that we see the representation on screen. Storing Data So why not just store the "a" at the hardware level? The reason for this is SIMPLE. At the hardware level, simplicity is key. It would be impractical to come up with a symbol for every character you could possibly use and quite difficult for a computer to interpret. Think of this like Braille or Morse Code. Braille uses combinations of bumps and valleys to represent language. Morse Code uses combinations of short and long beeps. Data Representation It is much easier for someone to decipher the pattern of short beeps and long beeps in Morse Code then it would be if they heard a sound that represented the letter "a" versus another sound that represented the letter "b", and so on. The concept of data (information) being stored and sent through binary representation is an ancient one! Smoke signals were used to communicate between villages quickly, etc. Short puffs (0's, off's) and long puffs (1's, on's) in different combinations sent different messages. Data Representation In more modern times, switchboards were used to connect telephones. A particular combination of on and off switches connected someone to a specific telephone. New ways of representing data using binary interpretation are being invented all the time. Here are just some: 1) On and Off Voltage 2) High and Low Frequency Radio Waves 3) Open and Closed Vacuum Tubes With each new binary representation invented, more and more technological advances become possible. Regardless of how 0s and 1s are interpreted at a hardware level, the result on the software level is the same. Binary Data Representation So, how much data can 0s and 1s hold in a computer? That all depends on how many you have! Remember a "bit" is a single binary digit holding either a value of 1 or 0. That is only two possible options: yes or no, true or false. There are 4 possible combinations of 2bit binary numbers (00, 01, 10, 11). Binary Data Representation When we learned about the Octal Number System, we learned that the single digits of that system had 3bit binary number combinations. That is because there is a total of 8 possible combinations of 3bit binary digits: 000, 001, 010, 011, 100, 101, 110, 111 Likewise, the single digits of the Hexadecimal Number System have 16 combinations of 4bit binary numbers because that is the total number of combinations possible with 4 bits. Have you noticed that the possible combinations double with every bit that is added on? Binary Data Representation Binary has base2 because there are only two possible numerals: 0 and 1. Which means that: one bit has 21 = 2 two bits has 22 = 4 three bits has 23 = 8 four bits has 24 = 16 n bits has 2n combinations So how do all these bits translate to the letters and text you see on a computer screen? Binary Data Representation Another form of abstraction similar to a legend of a map. For example, you could create a table where a series of binary numbers represent certain data that you are encoding like this: 000 represents "A" 001 represents "B" Can you see anything that 010 represents "C" might go wrong with this? 011 represents "D" Any problems that we will run 100 represents "E" into? 101 represents "F" 110 represents "G" 111 represents "H" Binary Data Representation The first issue is that we have quickly run out of 3bit binary numbers and have many more characters to encode. The second issue is that only we know this encoding because we read the last slide. In order for a table like this to work overall for computers, everyone would need to agree on one, in other words, the entire world needs to know. Morse Code is again an example of a "mapping" that is globally recognized. Various combinations of dots and lines represent the same thing regardless of where you are from. Binary Data Representation Let's work on the first issue. How many bits is enough? Can you guess? It's a byte's worth! Yes, 8 bits provide enough combinations to store all necessary single characters on a keyboard. This is why a byte is so important! A byte can store exactly one character. It has 28 = 256 possibilities. Many parameters in computers are designed for a range from 0 to 256...for instance colors. Computer Memory To review, computer memory is stored in bytes. A kilobyte (kB) is equal to 1024 bytes (256 x 4) and can hold about 1 paragraph of text. A megabyte (MB) is equal to 1024 kB and can hold a song or audio clip. A gigabyte (GB) is equal to 1024 MB. This is one of the most familiar, as it is currently used to describe the storage size of most devices such as laptops, phones, iPads, etc. In the 1990s and early 2000s kilobytes and megabytes were more typically used, as storage capacities were not that large yet and have grown over the years with need. Computer Memory There are two more less typically used groups. However, in the future, they could become the norm and the days of gigabytes could be history. A terabyte (TB) is equal to 1024 GB. It is possible to purchase external hard drives and cloud spaces with 1 TB of memory. To show you how large this group is, the Library of Congress requires 10 TB to store all its information. A petabyte (PB) is equal to 1024 TB and would be used to store social media like billions of TiKToK accounts. Binary Data Representation A very interesting math phenomenon occurs when using 8 bit combinations. Twodigit Base16 numbers also have a total of 256 possibilities. That means that a 2digit hexadecimal number can describe every character that can be stored in one byte of data. The value of using 2digit hexadecimal numbers to describe a byte (8 bits) is a key abstraction that simplifies many things on a computer like IP Addresses, which will be discussed in later units. ASCII Chart Now that we have enough combinations from 0000 0000 to 1111 1111 to satisfy every character on a keyboard, let's focus on the second issue of assigning them. But what assignment should be used that can be globally recognized? It turns out, this problem was solved a long time ago and a protocol already exists. It is called the American Standard Code for Information Interchange or ASCII Table. It has no specific reason for its assignment of characters, however, it has been globally agreed to and now recognized, which is more important. Computer Character Sets ASCII is a computer character set. There are other character sets, including: Unicode Transformation Format, or UTF, has a few versions, including the fully backwards ASCIIcompatible UTF8. UTF8 is the dominant encoding scheme for the world wide web, and is used for 98% of all websites. It uses 1 to 4 byte code units. Latin1, also known as "Extended ASCII", is an 8bit character encoding that extends the 7bit ASCII encoding scheme and is used to encode most European Languages. Latin1 is widely used as it can be used for most of the common European languages like German, Italian, Spanish, French etc. ASCII Chart The ASCII Table is too large to display on one slide. Below is a snippet from the table, you can click on it to see the whole table and investigate it further. It encodes numbers, letters, symbols, and punctuation. As well as nonprintable commands such as ENTER or ESC. There are a couple things to note about the table. ASCII Chart The chart is based on English letters because it was originally invented for American purposes. Other languages translate their letters to English equivalents that align with the chart. Ever use Google Translate versus another online translator? At times, they can be inaccurate due to letters of other languages being assigned to ASCII. ASCII Chart The only binary digits listed always start with a 1. Any leading 0s that would create the 8bit binary number are cut off and implied. The character 'A' is represented by 0100 0001, however, on the chart it appears as 100 0001. ASCII Chart Hexadecimal and octal equivalents are provided. It is more often that there is a need to convert to and from hexadecimal than actual binary. ASCII Chart An 8bit binary number has a Base10 (decimal) conversion, however, on the ASCII Chart, it also represents a character. For example, the number 65 and the letter 'A' both are represented by 0100 0001. Data Representation There are many ways to display data which has been organized into useful information. These ways include reports, charts, graphs, illustrations, frequency distribution tables, histograms, scatter plots, etc. Choosing the correct way to show the information is an important skill for a data scientist. Different representations will be appropriate for different data sets, and need to be selected wisely. https://differencecamp.com/piechartvsbargraph/