Computer Science Fundamentals PDF

Summary

This document provides an overview of computer science fundamentals, focusing on data representation techniques. It discusses different data types and their corresponding codes, like ASCII, ISO, and EBCDIC. The document also covers numeric data representation, such as zoned and packed decimal numbers, and their implementations in various computer systems.

Full Transcript

Computer Science Fundamentals - Basic Theory on Discrete Mathematics (Part 2) - AWS Internal Use Only Representation Form of Data All data is represented with 0 or 1 inside the computers. This representation form can be classified as follo...

Computer Science Fundamentals - Basic Theory on Discrete Mathematics (Part 2) - AWS Internal Use Only Representation Form of Data All data is represented with 0 or 1 inside the computers. This representation form can be classified as follows: Page ▪ 2 AWS Internal Use Only 1 Representation Form of Data ❑ CHARACTER DATA Inside computers, character data is represented with combinations of 0 and 1. In computers during the early stage of their development, 1 character was linked to a bit pattern of 8 bits (1 byte). Moreover, bit patterns linked to characters were referred to as character codes. When data between computers is exchanged by using different character codes, character corruption (garbled characters) may occur where characters that are different from the original data are displayed. 1) ASCII (American Standard Code for Information Interchange) code. It is the character code defined by ANSI (American National Standards Institute) in 1962. It is composed of 8 bits, which include code bits (7 bits) that represent an alphabetic character or a number, and a parity bit (1 bit) that detects an error. 2) ISO code. It is a 7-bit character code defined by the standardization agency ISO (International Organization for Standardization) on the basis of ASCII code in 1967. This character code forms the basis of character codes used in different countries all over the world. 3) EBCDIC (Extended Binary Coded Decimal Interchange Code). It is an 8-bit character code developed by IBM Corporation of the United States. This code was originally developed by IBM for its own computers. However, 3rd generation (1960s) computers were mostly IBM computers, and therefore this code became industry standard (de facto standard) in the area of large computers. Page ▪ 3 AWS Internal Use Only Representation Form of Data 4) Unicode. It is the character code that was developed and proposed by U.S.-based companies like Apple, IBM, and Microsoft as a 2-byte universal uniform code for smooth exchange of computer data. It supports characters of several countries such as alphabets, kanji, and hiragana/katakana, Hangul characters, and Arabic letters. This was standardized by ISO as an international standard, and at present, UCS-2 (2 bytes) and UCS-4 (4 bytes) are defined. 5) EUC (Extended Unix Code). It is the character code that was defined by the AT&T Corporation for internationalization support of UNIX (an OS (Operating System), which is a software program for controlling computers). An alphanumeric character is represented with 1 byte, and a kanji or kana character is represented with 2 bytes. Page ▪ 4 AWS Internal Use Only 2 Representation Form of Data ❑ NUMERIC DATA ❖ Decimal Notation. This is a numeric representation method of introducing the way of thinking about decimal numbers that are easy-to-understand for humans. In specific terms, BCD code (Binary-Coded Decimal code) where each digit of a decimal number is converted into a 4-bit binary number is used. Page ▪ 5 AWS Internal Use Only Representation Form of Data ❖ Decimal Notation. 1. Zoned (Unpacked) decimal number. This is a format that represents 1 digit of a decimal number with 1 byte. A binary-coded decimal code is stored in the lower 4 bits of 1 byte corresponding to each digit, while zoned bits are stored in the higher 4 bits. However, a sign bit indicating a sign is stored in the higher 4 bits of the lowest digit. Zoned bits and sign bits differ depending on the character code used in computers. When EBCDIC is used, “1111 (F)” is stored in the zoned bit, while “1100 (C)” is stored as a sign bit if it is positive (+) and “1101 (D)” is stored as a sign bit if it is negative (-). Example: Represent (+672)10 and (-1905)10 with zoned decimal numbers (EBCDIC) 1 byte 1 byte 1 byte 1111 0110 1111 0111 1100 0010 zoned bits 6 zoned bits 7 + 2 1 byte 1 byte 1 byte 1 byte 1111 0001 1111 1001 1111 0000 1101 0101 zoned bits 1 zoned bits 9 zoned bits 0 - 5 Page ▪ 6 AWS Internal Use Only 3 Representation Form of Data Zoned (Unpacked) decimal format +78910 = F7F8C916 -78910 = F7F8D916 Page ▪ 7 AWS Internal Use Only Representation Form of Data 2. Packed decimal number. This is a format that represents 2 digits of a decimal number with 1 byte. Each digit of a decimal number is represented with a binary- coded decimal code and this is the same as the zoned decimal number, however, it differs because the sign is shown in the lowest 4 bits, and when it falls short of a byte unit, 0 is inserted to make it a byte unit. Page ▪ 8 AWS Internal Use Only 4 Representation Form of Data Packed decimal format ✓ 1 byte represents a numeric value of 2 digits ✓ the least significant 4 bits represent the sign ✓ bit pattern for the sign is the same as per zoned (unpacked) decimal format +78910 = 789C16 -78910 = 789D16 Page ▪ 9 AWS Internal Use Only Representation Form of Data If we compare zoned decimal numbers and packed decimal numbers, by packing after the omission of the zoned bits (higher 4 bits) of zoned decimal numbers, the packed decimal numbers can represent more information (digits) with fewer bytes. Moreover, it is called unpacking to represent packed decimal numbers by using zoned decimal numbers, and therefore the zoned decimal numbers are also referred to as unpacked decimal numbers. While zoned decimal numbers cannot be used in calculations, packed decimal numbers can be used in calculations. However, when decimal numbers from an input unit are entered, or when decimal numbers are sent to an output unit, it is preferable to use zoned decimal numbers because the unit (1-digit numerical value) of input/output and the unit (1 byte) of information correspond to each other. Therefore, conversion between zoned decimal numbers and packed decimal numbers is performed inside computers. Page ▪ 10 AWS Internal Use Only 5 Representation Form of Data ❖ Binary Notation. Forms for representing numerical values as binary numbers that are easy to handle inside computers include fixed point numbers where the number of digits of integer part and fractional part are decided in advance, and floating point numbers where the position of radix point is changed according to the numerical value to be represented. 1. Fixed point numbers. In this form of representation, numerical values are handled by fixing the radix point at a specific position. Generally, it is mostly used for handling integer numbers by fixing the position of the radix point in the least significant bit. In this form of representation, after numerical values are converted into binary numbers, they are used without modification according to the position of the radix point. Page ▪ 11 AWS Internal Use Only Representation Form of Data In fixed point numbers, methods of representing negative numbers (representation forms of negative numbers) are important. The two main methods are described below: A: Signed absolute value notation In this method, the most significant 1 bit is used as a sign bit, and 0 is stored in it if the numerical value is positive, while 1 is stored in it if the numerical value is negative. In the remaining bits, the binary bit string is stored as it is. Page ▪ 12 AWS Internal Use Only 6 Representation Form of Data B: Complement notation In this method, when a negative number is represented, the complement of the positive number is used. [Complement] By using a certain number as the reference value, the shortfall is shown with respect to this reference value. In n-adic number, there is a complement of n and a complement of n-1. Complement of n-1… Reference value is the maximum value that has the same number of digits. Complement of n … Reference value is the minimum value that has one more additional digit. The complement can be determined by subtracting from the reference value the number for which the complement is to be determined. Page ▪ 13 AWS Internal Use Only Representation Form of Data Page ▪ 14 AWS Internal Use Only 7 Representation Form of Data Shortcut method of getting the "1's complement" and "2's complement” of a given numeric value 1’s complement of a given numeric value is the result of the subtraction of each of the digits of this numeric value from 1, as a result, all the “0” and “1” bits of the original bit string are switched. 2’s complement is “1’s complement” + 1 Another shortcut for 2’s complement Page ▪ 15 AWS Internal Use Only Representation Form of Data Page ▪ 16 AWS Internal Use Only 8 Representation Form of Data Generally, 2’s complement notation is often used in fixed point numbers for the following reasons: 1. Subtraction can be substituted for addition. When signed absolute values are used, addition or subtraction must be appropriately selected according to the first bit of each numerical value to be calculated. However, when 2’s complement is used, calculation can be performed without any selection. By using this, subtraction can be represented with addition. By using this concept, a feature for subtraction is not required if computer has the feature for addition and the feature for calculating complement. Moreover, if multiplication is done with iterations of addition, and division is done with iterations of subtractions, four basic arithmetic operations (addition, subtraction, multiplication, division) can be done with addition only. Page ▪ 17 AWS Internal Use Only Representation Form of Data 2. A wide range of numerical values can be represented. When signed absolute values are used, two types of 0s, namely (+0) and (-0), occur. However, there is only one type of 0 when 2’s complement is used. Since the information amount of identical numbers of bits is the same, 2’s complement enables the representation of a wider range of numerical values. o Range of represented numeric values when n-bit binary number is represented by adopting the 1’s complement” method: -(2n-1 – 1) to (2n-1 – 1) o Range of represented numeric values when n-bit binary number is represented by adopting the 2’s complement” method: -(2n-1) to (2n-1 – 1) Page ▪ 18 AWS Internal Use Only 9 Representation Form of Data ❖ Binary Notation. 2) Floating point numbers. If the number of bits that can be used for representing numerical value is decided, the range of numerical values that can be represented with fixed point numbers is limited. Therefore, when a very large numerical value is handled, the number of bits should also be increased accordingly. However, in reality, the number of bits used for recording data cannot be increased infinitely. Therefore, floating point numbers are used to represent extremely large numbers or extremely small numbers below the radix point. Before the explanation of the representation of numbers inside computers, the mechanism of floating point numbers is explained by using decimal numbers. +456,000,000,000 For the ease of explanation, when 1 unit is used for recording a sign or one number, 13 units are required for recording this numerical value as it is. Here, if we convert the numerical value and then record only required information in the following manner, we can represent this numerical value with a total of 6 units. +456,000,000,000 = +0.456x1012 = (-1)0x0.456x1012 *(-1)0 = +1, (-1)1 = -1 Page ▪ 19 AWS Internal Use Only Representation Form of Data This is the concept of floating point numbers. Using this concept, from extremely large numerical values to extremely small numerical values can be represented with the same number of digits. “10” in power of 10 used in representing exponent is called radix. Since a decimal number was used in this explanation, “10” was used as it is easy to understand in terms of moving the radix point. However, since binary numbers are used in computers, either “2” or “16” is used. In reality, the form of representation of floating point numbers used inside computers differ depending on the model. Here, we explain the single-precision floating point number (32-bit format) and double-precision floating point number (64-bit format) standardized as IEEE 754 format. A double-precision floating point number can handle a wider range of numerical value with higher precision. Page ▪ 20 AWS Internal Use Only 10 Representation Form of Data There are several forms of representation of information in each part of the floating point number format. Main forms of representation of each part are as follows: A: Sign (S). This is used for representing the sign of a numerical value. It is 0 for positive values, and 1 for negative values. B: Exponent (E). This is used for representing exponent with respect to radix. The two main representation methods below are used: 2’s complement: In this method, exponent is recorded as a binary number, and 2’s complement is used if it is negative (-). Excess method: This method records a value (i.e., bias value) that is obtained by adding a certain value to the exponent. The value (i.e., bias value) to be added is decided according to the number of bits of exponent. Below is an example of the excess method used with single-precision floating point numbers. Format of exponent Intended format Method name Number of Information Bias value (Typical example) bits amount Excess 127 IEEE 754 format 8 bits 256 types 127 Excess 64 IBM format 7 bits 128 types 64 Page ▪ 21 AWS Internal Use Only Representation Form of Data C: Fraction (mantissa) (M). This is used for representing a numerical value after the radix point. In order to represent the part after the radix point, it is common to perform normalization. In this case, the integer part is either set to 0 or 1 (storing after the omission of 1 from the integer part). When the integer part is set to 0: 0.101101 → Store “101101” in fraction (mantissa) When the integer part is set to 1: 1.011010 → Store “011010” in fraction (mantissa) Page ▪ 22 AWS Internal Use Only 11 Representation Form of Data NORMALIZATION This operation is for maintaining the precision of numerical value by increasing digits that can be used in fraction (mantissa). In most cases, this is done by reducing extra 0s after the radix point. For example, in the case of the decimal number (0.000123456789)10 represented by using a 7-digit fraction (mantissa), if the 7-digit fraction is registered in fraction (mantissa) as is, the following result is obtained. In contrast, if the 7-digit fraction is registered in fraction (mantissa) after normalization, the following result is obtained. In other words, more digits can be represented if normalization is performed (the number of significant digits increase). Therefore, the precision of numerical values becomes high. Page ▪ 23 AWS Internal Use Only Representation Form of Data Example 1: Express the decimal number (1234.625)10 with the single-precision floating point number (32 bits) of IEEE 754 format shown below. Sign (1 bit): Show a positive value with 0, and a negative value with 1. Exponent (8 bits): Show with excess 127 where radix is 2. Fraction (23 bits): Show with normalized expression where integer part is 1. 1) Convert the decimal number (1234.625)10 into a binary number (1234.625)10 = (10011010010.101)2 2) Normalize the binary number determined in 1). Normalized form: +(1.0011010010101)2 x 210 3) Show exponent as a binary number (8 bits) of excess 127. Power of 10 → 10 + 127 = 137 → (10001001)2 4) Describe according to the form of representation. Page ▪ 24 AWS Internal Use Only 12 Representation Form of Data Example 2: Which is the normalized representation of the decimal number 0.375? Here, the numeric value is represented using the 16-bit floating point format and the format is indicated in the figure. The normalization performed is an operation that adjusts the exponent portion in order to eliminate the 0s of higher digit rather than the significant values of the mantissa portion. Page ▪ 25 AWS Internal Use Only Representation Form of Data NORMALIZATION Example 2: Page ▪ 26 AWS Internal Use Only 13 Representation Form of Data ❑ ERROR Error refers to the difference between the actual value and the value represented inside the computer. For example, when (0.1)10 is converted into a binary number, it is represented as follows: (0.1)10 = (0.00011001100110011…)2 The conversion result becomes a recurring fraction where “0011” is repeated an infinite number of times. However, since there is a limit for the number of bits that can be used for representing numerical values inside computers, the only option for storing in fraction (mantissa) is to store after the loop is cut in the middle. If it is 8 bits, only up to (0.00011001)2 can be stored. When this binary number is converted into a decimal numbers, (0.09765625)10 is obtained. In other words, this is different from the original numerical value (0.1)10 by only (0.00234375)10. This is the error. Page ▪ 27 AWS Internal Use Only Representation Form of Data ❖ ROUNDING ERROR Rounding error occurs when the part smaller than the least digit is rounded off, rounded up, or rounded down in order to represent the real number with the effective number of digits in the computer. When a decimal number is converted into a binary number, this kind of error occurs to represent the resulting binary number with the effective number of digits. As a measure against rounding error, there are methods such as minimizing the value of error as much as possible by changing the single precision (32 bits) into double precision (64 bits). ❖ LOSS of TRAILING DIGITS Loss of trailing digits is the error that occurs when an extremely small value is ignored at the time of computing two values: one absolute value is extremely large and the other is extremely small. Page ▪ 28 AWS Internal Use Only 14 Representation Form of Data ❖ CANCELLATION of SIGNIFICANT DIGIT Cancellation of significant digit is the error that occurs because of decline in the number of significant digits that can be trusted as numerical values when calculation is performed between almost equal numerical values. Page ▪ 29 AWS Internal Use Only Representation Form of Data ❖ OVERFLOW, UNDERFLOW Inside a computer, the range of numerical values that can be represented is already decided, because the limited number of bits is used for representing them. Flow is the error that occurs when the calculation results exceed this range of representation. When the calculation results exceed the maximum value of range of representation, it is referred to as overflow, and when it exceeds the minimum value, it is referred to as underflow. When it simply referred to as “flow”, it mostly means overflow. The following 2 indexes are used as the concept of evaluating these errors (whether precision is high or low). Absolute error: This is an index that evaluates on the basis of how large the actual error is. Absolute error = | True value – Computed value (including error) | Relative error: This is an index that evaluates on the basis of the proportion (ratio) of error with respect to true value Relative error = | True value – Computed value (including error) | Page ▪ 30 |True value| 15 Representation Form of Data ❑ SHIFT OPERATION Shift operation is the operation of shifting the position of bit to left or right. Shift operation is used in computation of numerical values, and in changing the position of bits. ❖ Arithmetic shift. It is used when numerical values are computed. It is mainly used in fixed point numbers that represent negative values in 2’s complement. [Rules of arithmetic shift] Do not shift the sign bit. Truncate extra bits that are shifted out as a result of the shift. Store the following bit in the empty bit position that is created as a result of the shift. In the case of left shift: 0 In the case of right shift: Same as the sign bit Page ▪ 31 AWS Internal Use Only Representation Form of Data Binary numbers have weight of power of 2 in each digit. Therefore, even for the same numeral 1, 1 as the second digit and 1 as the third digit have different meanings. 1 as second digit: (10)2 → 21 = 2 1 as third digit: (100)2 → 22 = 4 Because of this, computation rules of arithmetic shift can be summarized as follows: With an arithmetic shift to left by n bits, the value becomes 2n times of the original number. With an arithmetic shift to right by n bits, the value becomes 2-n times of the original number. (It is the value that is obtained by dividing the original value by 2n) Page ▪ 32 AWS Internal Use Only 16 Representation Form of Data ❖ Logical shift. It is used when the position of bits are changed. Its main difference from the arithmetic shift is that it does not treat the sign bit in a special manner. Rules of logical shift: Shift (move) the sign bit as well. Truncate extra bits that are shifted out as a result of the shift. Store 0 in the empty bit position that is created as a result of shift. Page ▪ 33 AWS Internal Use Only Representation Form of Data Rotation shift (circular shift) is a type of logical shift. In rotation shift, the bits shifted out are circulated to the empty positions. Page ▪ 34 AWS Internal Use Only 17 END OF 2ND PART Page ▪ 35 AWS Internal Use Only 18

Use Quizgecko on...
Browser
Browser