Communication Concepts Ch. 1.docx
Document Details
Uploaded by ExquisiteAmetrine
Full Transcript
Communication Concepts This first chapter deals with the fundamentals of data communications. We primarily probe and discuss the factors that affect all communications from a “big picture” perspective, so that when the details are presented in later chapters, you will know which niche is being fille...
Communication Concepts This first chapter deals with the fundamentals of data communications. We primarily probe and discuss the factors that affect all communications from a “big picture” perspective, so that when the details are presented in later chapters, you will know which niche is being filled. A portion of this chapter—the discussion of data organization sometimes called a coding (ASCII is an example)—is almost a technical history in itself (more on this subject is provided in appendix B). Elements Communication has particular elements. In any communication there must be a source (in data communications this is often called the transmitter) and one or more destinations (typically called the receivers). The purpose of communication is to transmit data from the source to the destination(s). The data is transmitted through a medium of one kind or another, varying according to the technology used. When we speak, we use audio waves to project sound data through the medium of air. In data communications, the media we use are: electrical conductors (usually referred to as copper connections), light pipes (often referred to as fiber-optic cabling), and electromagnetic transmission (usually referred to as wireless or radio). This book is all about how we move and organize data. (Data itself is useless unless organized, at which time it becomes information.) Data communications topologies are organized as one-to-one (point-to-point), one-to-many (multi-drop), or many-to-many (networked). Figure 1-1 illustrates these three forms of data communications organization. Note that while data communications terms can be couched in technical symbology, the concepts behind them are relatively simple. Members of a network may have defined relationships: they may have no ranking (be peers, all have the same communications value) or they may be ranked (such as master/slave). Point-to-point means just that, from one point to another, or directly from source to destination (whether workstation-to-server or peer-to-peer). Multi-drop topology more closely resembles a network than it does point-to-point topology. In general, multi-drop involves a master station of some kind with slave stations, as opposed to peer stations. (According to the following definition of a network, a multi-drop system can also be classified a network.) For now, we will define a network simply as three or more stations (whether peers, master/slave, or some other ranking) connected by a common medium through which they may share data. Later in this book we will tighten our definition, dividing networks into wide area or local area, and so on. Figure 1-1. Forms of Data Communications Organization Modes In data communications, there are three modes of transmission: simplex, half-duplex, and duplex (see figure 1-2). These terms may be used to describe all types of communications circuitry or modes of transmission, whether they are point-to-point, multi-drop, or networked. It is important to understand these three terms because almost all descriptive language pertaining to data communications uses them. Figure 1-2. Modes of Data Communications A communications channel may consist of a communications circuit, which can be either a hardware configuration (consisting of hardware components and wiring) or a “virtual” circuit (a channel that consists of software programming for communication channels that are not physically connected). “Virtual circuit” refers more to the process of communication than to the hardware configuration. NOTES The differences between a “mode” and a “circuit” are rather arbitrary. However, the reader needs to be aware that not all channels are hardware, and even though a channel is physically capable of operating in a certain mode, it does not mean that the channel is being utilized in that mode. As an example, a duplex channel could be operated in half-duplex mode. In many cases, the literature still uses the term “full-duplex” when referring to duplex mode. You should be aware that constraints on the mode the communication channel is capable of using may be due to hardware or software. For example, if the hardware is duplex, the software may constrain the hardware to half-duplex. However, if the hardware cannot support a mode, no amount of software will cause it to support that mode, although it may appear to do so to human observation. An example is a half-duplex system that appears (due to the speed and message attributes) to be duplex to the human user. The three data communications modes are as follows: Simplex or Unidirectional Mode. In this mode, communication occurs only in one direction—never in the opposite direction; in figure 1-2 it is from Station A to Station B. The circuit that provided this mode of operation was originally called simplex (in the 1960’s telephone industry), but this led to confusion with more current telephony terminology. “Unidirectional” is a more appropriate name for this mode of transmission and using the name for this circuit would be much more descriptive; however, old habits (and names) are hard to change, therefore we will use the term simplex in this text (even though we would prefer to use unidirectional) so the reader will not be confused when referencing technical data. Half-Duplex Mode. In this mode, communication may travel in either direction, from A to B or from B to A, but not at the same time. Half-duplex communication functions much like meaningful human conversation does, that is, one speaker at a time and one (or more) listener(s). Duplex Mode. In duplex mode, communication can travel in both directions simultaneously: from A to B and from B to A at the same time. Serial and Parallel Transmission Figure 1-3. Serial and Parallel Transmission Concepts Serial transmission (see figure 1-3) uses one channel (one medium of transmission) and every bit (binary digit, defined below) follows one after the other, much like a group of people marching in single file. Because there is only one channel, the user has to send bits one after the other at a much higher speed in order to achieve the same throughput as parallel transmission. NOTE A nibble is 4 bits (a small byte, a byte is 8 bits). In parallel transmission (regardless of the media employed), the signal must traverse more than one transmission channel, much like a group of people marching in four or more columns abreast. For a given message, four parallel channels can transmit four times as much data as a serial channel running at the same data rate (bits per second). However, parallel transmission running any appreciable distance (based on data rate—the faster the data rate, the more the effects for a given distance) encounters two serious problems: First, the logistics of having parallel media is sure to increase equipment costs. Second, ensuring the data’s simultaneous reception over some distance (based on data rate—the higher the data rate, the shorter the distance) is technically quite difficult, along with ensuring that cross-talk (a signal from one transmission line being coupled—electrostatically or electromagnetically—onto another) is kept low. Cross-talk increases with signaling rate, so attempting to obtain a faster data rate by using additional parallel conductorsbecomes increasingly difficult. Figure 1-4 illustrates what the signals would look like in serial and parallel transmission. Note that for the two 4-bit combinations, it took only two timing periods (t0-t2) to transmit all 8 bits by parallel transmission, whereas the serial transmission took eight timing periods (t0-t8). The reason is that the serial transmission used one media channel, while the parallel transmission used four media channels (Channel 1 for bits A–E; Channel 2 for bits B–F; Channel 3 for bits C–G; and Channel 4 for bits D–H). Figure 1-4. Serial and Parallel Signals For these reasons, most data transmissions outside of the computer cabinet, with the exception of those at very low speeds, are by serial transmission. In the past, inside the computer cabinet (and indeed up to about 1 meter outside it), the buses used parallel transmission since the necessity for high-speed data transfer outweighed the cost. However, as bus speeds (the bus data rate) continue to increase, newer technologies, such as Peripheral Component Interconnect Express (PCIe) and Serial AT Attachment (SATA, a form of Integrated Drive Electronics [IDE]), are using serial transmission. This is because the problem of maintaining transition synchrony over a parallel bus increases drastically as speed increases. Over the past 2 decades, PC printers have been parallel (they started out as serial) because the signal bit lengths are long enough in duration (the signal is slow enough) to permit parallel transmission over a limited distance. Parallel transmission in is now being replaced by USB and other technological advances in serial transmission. Data Organization: Signals Digital Signals A digital signal is defined as one with defined discrete states; no other state can exist. Binary digital signals have only two states: a binary digit is either a “1” or a “0.” Do not confuse a “0” with nothing. Zero conveys half of the information in a binary signal; think of it more as “1” being “true” or “on” and “0” being “false” or “off,” with no allowance for “maybe.” All number systems (by definition) are digital, and we use not only binary (Base 2) and decimal (Base 10) but octal (Base 8) and hexadecimal (Base 16) as well. Octal and hexadecimal (hex) are used mostly to present numbers in a human readable form (humans apparently dislike long rows of 1s and 0s when coding or performing data analysis). Binary is used in contemporary data transmission systems as the signaling means because it can be represented by a simple on-off. (A binary digit is contracted to the term “bit”.) Now that we have defined channels and bits, what are we going to send as a signal over our channel? It is usually a pattern of bits. Bits alone are just data. We must organize our data into some form so that it becomes information. When this is done at higher levels of organization, we may call this organization “protocols” or even “application programming interfaces.” One of the bit patterns to be organized has to represent text, for in order for data organization to yield anything usable to humans, it is necessary to store and present information in a human-readable form. The patterns used for this form of information are called codings. A coding is a generally understood “shorthand” representation of a signal. Codings are not used for secrecy—ciphers are. American Standard Code for Information Interchange (ASCII) is an example of a text coding. A “standard signal” could be called a coding and in fact is referred to as such in the daily work of industry. A standard signal is one that has the approval of the users and/or a standardization agency; it specifies a way to organize data. This book focuses, in many respects, on how data is organized for different functional tasks. There are many different organizations of digital signals in use, classified as standard (approved by standards organizations and may be open or proprietary, depending on the standardizing agency), open (in general use), and proprietary (limited to a specific organization; i.e., owned wholly by a specific organization). In this book, we will only touch on a few. Analog Standard Signals In all areas of data communications, a number of standard signals exist. An analog (or analogue) signal is any continuous signal for which the time varying feature (variable) of the signal is a representation of some other time varying quantity, i.e., analogous to another time varying signal (Wikipedia). In short, an analog signal is a model (usually electrical or pneumatic) representing quantities at any value between a specified set of upper and lower limits (whether temperature, pressure, level flow, etc.). This contrasts with a binary digital signal, which only has one of two values: 1 or 0. Perhaps the easiest of these standard signals to visualize is the standard 4–20 mA (milliamp) current loop signal, long used in process measurement and control (see figure 1-5). Figure 1-5. Process Analog Signals These signals represent 0% to 100% of the range (full scale) specified. All instrumentation readings could be likened to a meter face, in which any value is allowed between 0 and 100%, perhaps specified in engineering value measurements. An example is a range of 200°C to 500°C. The 4–20 mA standard instrument signal would represent 200°C as 4 mA and 500°C as 20 mA. These are electrical values for the quantity of energy used in signaling. If you are not familiar with these units, you may not fully understand some of the signal standards and their implications. This text will try to draw the conclusions for you. However, any basic electrical text, particularly one for non-technical people, will provide more than enough background to help you understand the terms used in this book. The 4–20 mA signal is one of the most commonly used industrial communications standards and it is used in the two-wire current loop that connects devices in an instrument circuit. The reader needs to become familiar with the terms: voltage, current, resistance, impedance, capacitive reactance, and inductive reactance. While not an electrical manual, these terms are further defined in the appendices. What follows is an abbreviated glossary: Definitions Voltage (measured in volts) is the difference in electrical potential (charge) that is necessary prior to any work being performed. Current (measured in amperes or amps, nominally thousandths of an amp or milliamps) is what performs the work, as long as there is a difference in potential. Resistance (measured in ohms) is the opposition a conductor offers to current. NOTE The relationship between voltage, current, and resistance is known as Ohm’s Law (E (volts) = I (amps) X R (ohms). Capacitive and Inductive reactance are opposite in action and dependent upon frequency of change. Impedance (measured in Ohms) is the opposition a conductor offers to a changing current value, as well as the steady state (static) resistance. It is represented as follows: One of the bit patterns to be organized has to represent text, for in order for data organization to yield anything usable to humans, it is necessary to store and present information in a human-readable form. The patterns used for this form of information are called codings. A coding is a generally understood “shorthand” representation of a signal. Codings are not used for secrecy—ciphers are. American Standard Code for Information Interchange (ASCII) is an example of a text coding. A “standard signal” could be called a coding and in fact is referred to as such in the The ISA50 committee (ISA-50.01 standard) established a current loop as a standard because it has low impedance, it has a greater immunity to noise (electrical signals that are unwanted) than a high-impedance voltage circuit, and it can power the loop instruments. The receiving devices themselves are high impedance (voltage input) and acquire their input across a 250-ohm resistor (1–5VDC). This allows a known loading for each receiver in the loop. Conversion between current and voltage is provided by a 250-ohm resistor in the current loop. This simple arrangement allows the two-wire loop to both power the loop instrument and to provide the measurement signal. Standard signals allow users to pick and choose among vendors with the confidence that the inputs and outputs will be compatible. The standard signals also allow a manufacturer to build instruments for a larger group of users than if each instrument had its own defined set of signals (non-standard). Digital Standard Signals There have been, and still are, both open and proprietary standard digital signals in the telephone and digital data communications area (we will cover these as we go along). However, in the measurement and control areas, no open, all-digital standard signal has been established as an alternative to the analog 4–20 mA signal described in the last section. The international standard for an industrial fieldbus (IEC 61158) defines eight different fieldbuses, four of which—including PROFIBUS PA and FOUNDATION Fieldbus—are primarily used in process control. (A fieldbus is generally a local area network with its defined signaling protocols and field instruments—control being accomplished in the field instruments rather than a control room cabinet.) Although all these fieldbuses are set as standards, they are not necessarily compatible with each other. While it appears that some of the fieldbuses have gained a majority of sales in certain niches, the marketplace has yet to determine which one will become the de facto fieldbus standard throughout automation. While not a fieldbus itself, but a physical and signal standard, Ethernet continues to gain market share and acceptance in automation areas; indeed, almost all fieldbuses and even proprietary systems now provide networking services via Ethernet. Process measurement and control has used and still uses several digital communications standards—such as EIA/TIA 232(F) or EIA/TIA 485(A)—that detail placing data on (and removing it from) the media, which is a requirement if one is going to signal via an electrical channel. We will discuss these in chapter 3, “Serial Communications.” NOTE The Electronic Industries Alliance (formerly Electronic Association, formerly Radio & Television Manufacturers Association, formerly … all the way back to 1924) is an association of manufacturers that develops standards. The TIA(Telecommunications Industry Association) is a subdivision of the EIA. Data Organization: Communications Codes Communications codes are the components of data organization that are designed for interface with humans. These codes represent letters, numerals, and controls actions that are stored and recalled, printed, sorted, and, in general, processed. IBM designed one of the first digital communications codes: the IBM 4 of 8 Code (circa 1947) which allowed error detection on a character-by-character basis. This section presents a cursory review of this and other communications codes. The IBM 4 of 8 Code was a proprietary code. Other manufacturers typically had their own proprietary codes, all of which were incompatible with each other. In 1963, the U.S. government released the American Standard Code for Information Interchange (ASCII). Several months later, IBM released its own code, an extended version of the 4 of 8 Code: EBCDIC, which is short for “Extended Binary Coded Decimal Interchange Code” (more on this later). EBCDIC is mathematically related to the 4 of 8 Code. The 4 of 8 Code was uppercase and allowed only 64 of the 256 bit combinations available with an 8-bit code. EBCDIC had all combinations available to it because it used a cyclic redundancy check character (CRCC) for error detection instead of the parity bit (bit 7) in the 8-bit (ASCII) character. The U.S. government was (and is) a large buyer of data equipment, thus, ASCII gradually gained acceptance with many vendors (other than IBM), who relinquished their proprietary codings and adopted ASCII. The IBM 4 of 8 Code The IBM 4 of 8 Code is illustrated in table 1-1. In this code, there are four 1s (ones) and four 0s (zeros)—that is 8 bits, or an octet—for each character. This arrangement was used to detect character errors, yet it carried a large overhead; that is, the ratio of bit patterns used for error detection to those for data transmission is high. About 70 characters were used for error detection out of the 256 characters available. International Telegraphic Alphabet #5 Also known as ASCII, the International Telegraphic Alphabet (ITA) #5 (see table 1-2) does not make use of letter frequencies, as did the Morse code, the ITA2 Telegraph Code, and the IBM 4 of 8 Code. It does, however, use a numerically arranged code, in which the alphabet is in ascending order, which permits sorting operations and the like. ASCII is a 7-bit code. Its arrangement is well thought out; particularly in light of the technology available at the time it was designed. It was usually (and is now) transmitted as an 8-bit signal with the “Most Significant Bit” (B7) reserved for parity—an error detection scheme. The parity bit was used for a number of years; however, the technological advantages to using an 8-bit data byte (particularly for sending program code) eliminated the need to use the B7 bit for parity. Today when parity is used, another bit is added to the 8 bits as the parity bit, making the signal 9 bits in length (11 bits when the start and stop are added). For our present purposes, we will set B7 to 0 and concern ourselves only with B0 through B6. Reading table 1-2 is simple; though, it takes some getting used to. First, notice the representations of the bit order: B4, B5, and B6 are grouped as three bits and so have a decimal value between 0 through 7 (same as hex); the lower four bits (B0 through B3) are a group of 4 bits and have a decimal value of 0 to15 (or hex 0 through F). (For a thorough discussion of number systems, including hexadecimal, refer to appendix A.) The most significant bit (MSB) is B6. B6, B5, and B4 are read vertically. Bits 3 through 0 are read horizontally. As an example, what is the coding for the uppercase C? Locate the uppercase C; trace the column vertically and find that the first three bits are (B6) 1, (B5) 0, and (B4) 0: 100. Locate the uppercase C again and travel horizontally to obtain the value of bits 3 through 0, or (B3) 0, (B2) 0, (B1) 1, and (B0) 1: 0011. Put the bits together (B6 through B0) and you have 1000011 for an uppercase C (hex 43). So, what would a lower case c be? Follow the same procedure and you will determine that it is 1100011 (hex 63). The difference between uppercase and lowercase is bit 5. If B5 is a 0, the character will be an uppercase letter; if B5 is a 1, the character will be lowercase. When B6 is a 0 and B5 is a 1, the characters are numerals and punctuation. Hex uses a different convention: to obtain the lower case letter from an upper case value, add hex 20; to obtain an upper case letter from a lower case value, subtract hex 20. If both B6 and B5 are 0, then it is a non-printing (control) character. Control characters were based on the technology of the time. Table 1-3 gives the assigned meaning of these characters. The one exception to these rules is DEL (delete), which is comprised of all 1s. The reason for using all 1s can be traced back to when the main storage medium in ASCII’s beginnings was paper tape. If you made a mistake punching a paper tape (a 1 was a hole in the tape, a 0 was no hole), you then backed it up to the offending character and used the delete key, which punched all holes in the tape, masking whatever punching had been there. Extended Binary Coded Decimal Interchange Code As described previously, the Extended Binary Coded Decimal Interchange Code (EBCDIC) was developed in the early 1960s by IBM from its 4 of 8 Code and it is proprietary to them. ASCII has only seven bits, the eighth bit being reserved for parity. One problem in using only 7 bits plus 1 bit for parity arises when computers transmit program instruction coding. Computers normally operate using an 8-bit octet or a multiple of 8 (i.e., a “word” that is 16 bits, or a “double word” of 32 bits). All 256 possible 8-bit combinations may not be used but it is likely that the computer’s instruction set would use the eighth bit. With the “7 for information + 1 for parity” bit scheme, the eighth bit isn’t available. Most computers using ASCII transmit 8 bits and use a ninth bit for parity, if parity is used. In EBCDIC, the blank spaces are used for special or graphic characters particular to the device using them. If required, this code can transmit “object” code—that is, the combinations of 1s and 0s used to program a computer in 8-bit increments—with little difficulty. EBCDIC did not require a parity bit for error detection but instead used a different error-detection scheme—CRCC. One of the first interface problems of the PC age was how to perform PC-to-mainframe communications. By and large, Personal Computers (PCs) and other devices transmitted in ASCII, while many of IBM’s minicomputers and mainframe computers used EBCDIC. This meant that they could not talk directly without some form of translator, generally a software program or firmware in a protocol converter. While such a conversion was not difficult to perform, it was another step that consumed both memory and CPU cycles. Unicode Character coding by electrical means originally became established with the Morse code, in which letter frequencies and a form of compression were used for efficiency. Fixed-length character representations, of which ASCII and EBCDIC are examples, came about in the early 1960s and are still in wide use today. However ASCII was designed for English (American English at that, the term American being the definitive) and, as communications are now global, there are many languages that require a larger number than 128 patterns (7 bits). Today, Unicode is an industry standard for encoding, representing, and handling text as expressed in the majority of writing systems. The latest version of Unicode has more than 100,000 characters, covering multiple symbol sets. As of June 2015, the most recent version is Unicode 8.0, and the standard is maintained by the Unicode Consortium (Wikipedia). Instead of representing a character with 7 or 8 bits, Unicode uses 16 bits. For users of ASCII, this is not a problem (as long as the upper 8, more significant, bits of the 16 total are accounted for) because the Unicode conversion for ASCII is hex 00 + ASCII. In other words, the upper eight bits are set to 0 and the ASCII character set follows. The following is excerpted from Microsoft Corporation’s Help file for Visual Studio 6 (MSDN Library, April 2000): “Code elements are grouped logically throughout the range of code values, which is called the codespace. The coding begins at U+0000 with standard ASCII characters, and then continues with Greek, Cyrillic, Hebrew, Arabic, Indic, and other scripts. Then symbols and punctuation are inserted, followed by Hiragana, Katakana, and Bopomofo. The complete set of modern Hangul appears next, followed by the unified ideographs. The end of the codespace contains code values that are reserved Data Organization: Error Coding How we organize our data has a lot to do with how we recognize and correct transmission errors. These are alterations to our intended signal and are caused by noise or other media properties. For example: when listening to a music selection on a DVD, if there is a scratch on the DVD surface (greater than the DVD player’s error correcting abilities), it will change the music (not necessarily improving it). For the purposes of this discussion, we will assume that the data to be transmitted is correct; the matter of erroneous data is not within the scope of this book. Anytime you place data on a medium and take it off, there is a probability of error. Whether the probability of an error is high or low depends on the media and signal properties. After the data has been taken off of the medium, we can only detect errors if the data was organized in a way to detect errors prior to transmission. The IBM 4 of 8 Code used an early form of error detection. If any character had more or less than four 1s in 8 bits, it was in error. The entire block (84 characters) would then be retransmitted. Parity is an extension of this concept. Parity Parity is a means of error detection. Using 7 bits of an 8-bit structure, as ASCII does, will leave 1 bit extra (in most cases B7 is set to 0) and the parity bit is added to the 8-bit character, making it 9 bits. Parity (in the telecommunications sense of the word) means to count the number of 1s in a character (the 0s could be counted, but traditionally only the 1s were counted). The agency determining the system’s operating specifications will have decided on which parity to use: “odd” or “even.” If “odd” parity is used (see table 1-4), then the parity bit will be whatever value is required to ensure an odd number of 1s in the character (including the parity bit). Block Parity The vertical character format mentioned previously is based on the punched cards used in early data entry and programming. Because punched cards (at least IBM’s cards) contained 80 columns and historically most data was on these cards, transmissions were in an 80-character “block.” (The 80 columns of data were known as a record.) Two framing characters were added at the start of the data block and one framing character was added at the end of the data block. An 84th character, called the block parity character, was added to the end of the transmitted block. It was computed from the other 83 characters. Table 1-6 illustrates the 84-column vertical format. Parity was first determined vertically for each of the block’s 83 characters. Next, parity was determined horizontally for each bit by counting the number of 1s in each row for all 83 counted columns (the 84th column is not counted). The horizontal parity bits were placed in the 84th column, thus creating the block parity character. The 1s in the 84th column were counted to determine the vertical parity bit for the block parity character, the same process as determining parity for all other characters. To determine if parity was met for all the characters in the block, parity was determined horizontally for the Parity row by counting the 1s in the 83 counted columns and the result was compared to the vertical parity bit for the block parity character. The vertical parity bit for the block parity character must match the horizontal parity result for the Parity row, if not, there was an error. This scheme is known as vertical and horizontal parity checking, or block parity. With block parity, fewer than one in 10,000 errors remain undetected. However, this scheme exacts a heavy penalty in overhead. One parity bit for each character adds up to 84 bits, or more than 10 8-bit characters. Error Correction In addition to having bits devoted to assisting with error detection, what means could be used to correct an error, once one was detected? The answer is normally an automatic retransmission (repeat) query (reQuest), known as an ARQ (explained later in this chapter). This method had significant ramifications in many applications since the transmission device had to store at least the last transmitted block. (When using horizontal and vertical parity, data has to be transmitted in blocks.) There also had to be some scheme to notify the transmitter that the receiver had successfully received the transmission or that it was necessary to retransmit the block due to error. In most cases, half-duplex would be too inefficient in terms of transmission time versus line-setup time (i.e., the time it takes for the transmitting device and the receiving device to establish connection and synchronization). Over time (less than 60 years), several schemes have been devised to increase the “throughput” (defined here as the number of bits correctly received at the end device). Today, many schemes use a block that varies in length, depending on the number of errors detected. In transmissions that have few errors, such as those over optical fiber or a local area network (LAN), the block could be made longer. In media that has frequent errors, such as short wave radio, the block could be made shorter. Blocks are generally of a fixed length, or a multiple thereof. Packets, on the other hand, have a fixed minimum and maximum length but can vary in length between these two limits. We normally speak of packet transmission in modern data communications, although the difference between a block and a packet is now more one of semantics than of practice. (It is worth noting that the longer the block/packet, the greater the chance of an undetected error and that parity and block parity check schemes were far less effective with multiple-errored bits than with single-bit errors. This is the reason the CRCC method is superior in terms of efficient use of bits for error detection, particularly for multiple errored bits.) Cyclic Redundancy Checks Any time data is placed on a medium (e.g., wireless, unshielded twisted pair [UTP], magnetic or optical rotating media, and fiber cable), there is a probability of error; thus, an error detection scheme must be used. Most modern devices use a far more efficient scheme of error detection than parity checking. Though ASCII was designed for a vertical and horizontal (at times called longitudinal) block-checking code, it can also be used with a cyclic redundancy check (CRC) scheme that creates a check character. A CRC character (CRCC) is the number 8-bit check character developed by the CRC. A cyclic code divides the text bits with a binary polynomial (the CRC), resulting in a check character (today, nominally 16, 32, or 64 bits; that is 2, 4, or 8 characters). This is done by checking every bit in a serial bit stream (the bits that make up a packet, also called a bit stream) and combining data from selected (and fixed) bit positions. Figure 1-6 illustrates the generation of a simple CRCC, which is sometimes (incorrectly) called a checksum. A number of different CRCCs are used today. The primary difference between them is the “pick-off point” (the power of X in the representative equation, such as the CRCC-CCITT [International Telegraph and Telephone Consultative Committee or Comité Consultatif International Téléphonique et Télégraphique] that uses G(x) = X16+X12+X5+1). The advantages of one pick-off point or another depends on the application. For example, a communications channel may have different error behavior than a magnetic medium, such as a hard drive. In any event, the transmission protocols used in industrial data communications will probably use the CRCC-32 scheme (a local area network CRC of 32 bits). Within a given size block (packet or frame), a certain size character or characters will be generated. A common (CRC-CCITT) type uses two 8-bit octets, forming a 16-bit check character. The CRCC characters are generated during transmission and added to the transmitted packet. The receiving end receives the bit stream and computes its own CRCCs. These must match the transmitted CRCCs or there is a detected error. Note that even on a block of 80 characters, this scheme only uses 16 bits, in comparison to the 88 bits used with the vertical and horizontal parity check. The CRCC method is used for writing to a disk drive or to almost any magnetic medium and is the method most often used (whether the CRC is 16-bit or 32-bit) in modern data communications. Checksum Any of a number of error-detection codes may be called a checksum. Many times the CRC-16 or CRC-CCITT is called a checksum; however, they are actually cyclic codes, whereas a checksum was originally designed as a linear code. For instance, all of the 1 states in a block may be totaled to obtain a checksum. (This usually happens through a process known as modulo (mod) addition, where a number is divided by the modulo number but only the remainder is used. An example is 11 mod 4: 11 divided by 4 equals 2 with a remainder of 3, so the answer is 3. This checksum is tacked on as a block check character or characters. The efficiency of a checksum in detecting errors is not as high as with a cyclic code, yet the circuitry to produce the checksum is less complex. NOTE The abbreviation for modulo varies depending on the programming language. In COBOL and older programming languages, modulo is abbreviated MOD (11 MOD 4). In C-style languages, the abbreviation is the percent (%) symbol (11%4). There are other error detecting codes besides the checksum and the CRCC. These codes exist specifically for detecting and (sometimes) correcting errors. Some of these codes are cyclic, some are linear, some are quasi-cyclic, and some are polynomial. Some of these codes are sufficiently long in pattern length to be used for error detection and correction. ARQ What hasn’t yet been explained is what happens when an error is detected (at least in general terms). The following is a simplified discussion of a generalized automatic retransmission (repeat) query (reQuest), the ARQ method of error correction. Before beginning our discussion, it is important to note the change in terminology as multiple terms are used to refer to the data being transmitted. Prior to HDLC, data sent (typically asynchronously) was defined by control characters and was referred to as data blocks (usually an 80-character [IBM punched card] record). There would be a set of delimiters at the start (DLE or STX) and an ETX and block parity character at the end of the data for a total of 84 characters. As time went by and data began to be sent by bit-oriented protocols (HDLC); these blocks of data were called packets (about 1970). Since 1990, the terminology has been changed again (but certainly not by everybody); the Layer 2 octets are called frames (framed by the protocol standard) and the arrangement of data in Layers 3 and above is referred to as packet data units (PDU). Of course, some still refer to the Layer 2 octets as “packets” when they mean “frames.” In normal operation, the transmitter sends a frame (a multiple set of octets) to the receiver. The frame is received (we are using a CRCC in this example as the error detection method; methods may vary but the ARQ system is basically the same). If the CRCCs match, the receiver sends an ACK (Acknowledgment). If they do not match, the receiver sends a NAK (Negative Acknowledgment) and the transmitter then retransmits the frame. This is illustrated in figure 1-7. If an occasion should occur where the frame is so badly corrupted that the receiver does not recognize it as a frame, then the receiver will send nothing. After sending a frame, the transmitter waits for a response (in many systems, this is an adjustable [by software] period of time). If a response does not arrive within the expected time, the transmitter assumes the worst and resends the frame. This condition (not getting a response in time) is called a timeout and is illustrated in figure 1-8. On the other hand, suppose that the frame is received correctly and an ACK is returned. What if the ACK is “bombed” (or lost)? If this happens, the transmitter will time out and will resend the frame. The receiver will detect that it has two frames with the same frame number and that both were received correctly. It will then drop one of the frames and send another ACK, as shown in figure 1-9. There are many ways to enhance error correction via ARQ and they are accomplished in individual protocols. Determining how many frames can be transmitted prior to receiving an ACK/NAK has ramifications for throughput. The more frames transmitted prior to an ACK/NACK, the greater the throughput; however, more frames must be stored after being received, so the memory requirements are greater. Forward Error Correction Although the ARQ method of error correction discussed above detects an error (by whatever means) and then retransmits the portion of the message that was in error, what happens when there is simplex (unidirectional) transmission? The ARQ method will not receive an ACK/NAK because the circuit is unidirectional, so ARQ cannot be used on a simplex type circuit. In this case, or in cases where the medium is too noisy to allow for any significant throughput, forward error correction (FEC) may be employed. FEC requires that the communications network’s pattern of errors be analyzed and an error-correcting algorithm be designed to fit the identified error patterns. This can be done for most real-world circuits without extraordinary difficulty. The main problem is that error-detection and location overhead can be as much as 1 parity bit (or error-detecting symbol) for each data bit. Using this approach would immediately cut throughput in half. In some cases, however, the data must be 100% error-free (without regard to capital expense or throughput) because this transmission may be the only meaningful transmission sent or the only time given to the circuit for any throughput to occur. (Think real-time process data, satellite tracking data, nuclear plant critical monitoring, and so on.) For these applications, FEC is appropriately employed. A thorough discussion of error-correcting algorithmic codes and associated theory is beyond the scope of this text. However, many good references, at many different knowledge levels, are available on this subject. Error correction is integral to data communications. Knowing that an error exists is not the end goal, the goal is to detect an error and determine how to correct it while ensuring data integrity. Data Organization: Protocol Concepts A protocol is an accepted procedure. We use protocols in personal interactions all the time. When you are introduced to someone in the United States, the protocol is to shake hands, unless there is a ceremony requiring another form of protocol or you are being introduced to a large number of people. In a class, or throughout a conversation, there is a protocol that tells you to wait until another person is finished speaking before speaking yourself. There is a physiological reason for this protocol: humans are effectively half-duplex—they typically cannot talk and listen very well at the same time. Terminologies Asynchronous generally means that something may occur at any time and its occurrence is not tied to any other event. The old “start-stop” teletypewriter signal (see appendix B), with its 1 start bit and 1 stop bit, is a good example of an asynchronous transmission method. The teletypewriter signal started out using motor speed (reference appendix B) as the main means of synchronizing the start-stop bits within each character. In today’s vernacular, any start-stop signal (which is seldom used in today’s world) is assumed to be asynchronous. In actuality, a start-stop signal may be transmitted synchronously or asynchronously. Synchronous generally means “tied to a common clock.” The clock signal is typically transmitted along with the data, usually in the transition of data from one state to another. Since synchronous transmission uses bit timing (the device clock [receive and transmit] is in synchrony with the change in bit value [from 0 to 1, or 1 to 0]), each bit of data must be accounted for. Almost all modern data communications are synchronous transmissions. The terms baud rate and bits per second are often used interchangeably: this is incorrect. Baud rate is the line modulation rate; that is, the number of signal changes (decisions such as: one to a zero, zero to a one, or number of degrees leading or lagging the reference phase) per second that are placed on the communication medium in a given time (the maximum value of the number of changes supported by the medium is called the bandwidth of the channel). Bits per second (bps), on the other hand, is the data transmission rate of the device that is transmitting or that a device is capable of receiving. Depending on the type of signaling and encoding used, the baud rate (the line modulation rate) and the bit rate (generally the throughput speed) may be quite different. Baud per second is a term that is used to describe a change (like in acceleration or de-acceleration) in a transmission medium signaling state, not a line speed. Ads for a “33.6 Kbaud modem” are in error; such a modem would require a line with a bandwidth great enough for 1/33600-second rectangular pulses (not possible with a voice-grade wireline). What this specification means in reality: these are 33,600-bit-per-second (33.6 Kbps, the data bit rate) modems. Such modems use a 1200 baud line and achieve their higher data rate by sending a signal state representing 56 bits of data per baud. That is, they make one line state change (baud) for each 56 bits of data (one change of phase or amplitude for each 56-bit combination). This is accomplished through a high-level coding technique called trellis encoding. Modems running at 33.6 Kbps (kilobits per second) require a 600 baud line in each direction (a total of 1200), rather than one that supports 33,600 signal changes per second in each direction. A standard dial-up telephone line can support a total of 1200 signal changes per second (about 3.3 KHz bandwidth). Some refer to the baud rate as symbols per second (see appendix C). Protocols As electronic message handling became the norm, it became imperative to build as much of the communications link control into the terminals as possible. To this end, “communications protocols” arose. Most protocols were developed by individual vendors for their own systems. Since these protocols are generally incompatible with the protocols and equipment of other vendors, the customer is generally locked into one manufacturer, promoting incompatibility. All data communications efforts involve protocols of one kind or another. A protocol is a different organization of data, as opposed to shaping groups of 1s and 0s into characters. Communications protocols are either character-based or bit-oriented. This means that the information we are looking for will take the form either of characters telling us information or of bit patterns (other than characters) telling us information. Character-Based Protocol One example of a character-based protocol is Binary Synchronous Communication (Bi-Sync) (another is the teletypewriter system; see appendix B). The IBM Bi-Sync protocol, one of the first and most widely used of the proprietary protocols, was developed to link the IBM 3270 line of terminals to IBM computers in a synchronous manner. (This protocol may also be used in a system with asynchronous signaling; that is, with start-stop characters, provided the text mode is used.) The Bi-Sync protocol is character-based; control depends on certain character combinations, rather than on bit patterns. It also requires that the transmitting terminal be able to store at least one block of data while transmitting another. When transmitting under Bi-Sync, the hardware is responsible for avoiding long strings of 1s or 0s. In a synchronous system, if a significant length of time is used to transmit only one state or the other, the receiver loses its bit synchrony. As a result, communication is disrupted or, at least, it is in error. Most modern hardware uses a scrambler to ensure data state transitions. The scrambler contribution is removed at the receiver, since the scrambler pattern is performed on a scheduled basis. In addition, most phase continuous systems use Manchester encoding, which moves the state of the bit (1 or 0) into the transition of data (a transition from 1 to 0 indicates a 1 state, while a 0 to 1 state transition indicates a 0, rather than the level—a voltage state representing 1 or 0). Manchester encoding is explained in detail later. Table 1-7 lists some of the Bi-Sync control characters. Note that the ASCII control characters are used if Bi-Sync is used in an ASCII-based system. Bi-Sync normally uses duplex transmission (both directions simultaneously). Although it could use half-duplex, it was not primarily intended for that type of operation. Before duplex Bi-Sync was used, a block (packet or frame) would be transmitted, the line would be turned around (the transmit and receive modems would swap functions and be resynchronized), and then the original transmitter (now a receiver) would wait for the original receiver (now a transmitter) to send either an ACK or a NAK. If a NAK was received, the line would be turned around again (swapping the transmit and receive functions) and the original transmitter would retransmit the block. Duplex operation would not have been faster, except for eliminating the turnaround time, because the transmitter could do nothing else until it received a response to its transmitted block. To speed things up a bit, the Bi-Sync protocol had the transmitter store two blocks, so it could wait for an ACK while transmitting the second block. ACK1 and ACK2 signals were used to differentiate between ACKs. The primary benefit in this case was that transmission could take place without line turnaround (duplex) or without halting transmission until an ACK was received. It should not be forgotten that when Bi-Sync was developed (1967), the cost of data storage was approximately one US dollar per bit in 1980 dollars—a significant expense. Because the control characters are intertwined with the text, they must be sent in pairs to ensure that they are identified. How would you transmit a machine or computer program in object (machine-executable) form, if that code were composed of 8-bit octets, some of which may very well be the same as the control codes? To make this possible, Bi-Sync allows a transparent mode, in which control characters are ignored until the receiver detects several data link escape (DLE) characters. Bi-Sync is dependent upon character-oriented codes. In modern communications, there is a need to make the protocol independent of the transmitted message type. In other words, it should make no difference to the protocol what bit patterns the message consists of or even in what language it is composed, as long as it is in 8-bit octets. Link Access Procedure-Balanced (LAP-B) protocol (described later in this chapter) and other bit-oriented protocols provide that functionality rather gracefully. Bit-Oriented Protocol Bit-oriented protocols use a concept called framing, where there are bit patterns before and (in some schemes) after the information frame (which is why it is called a frame now, rather than packet or block). In framing, there is a binary pattern (which the protocol ensures cannot occur in the bit stream) that is the start delimiter (starting point). There are also binary patterns that indicate the addressing and what type of frame it is (e.g., the frame contains information or is of a supervisory nature), as well as some method of sequence numbering followed by the user data. A frame check sequence (FCS), normally a CRCC, follows the user data, which is typically a variable number of octets. The user data is surrounded by the protocol; that is, the protocol “frames” the user data (see figure 1-10). There may also be a stop delimiter or the frame may use the CRCC as the delimiter. Link Access Protocol-Balanced (LAP-B) is a bit-oriented protocol. It is very similar in both structure and format to other bit-oriented protocols: High Level Data Link Control (ISO HDLC), Advanced Data Communications Control Procedure (ANSI ADCCP), and IBM’s Synchronous Data Link Control (SDLC). IBM uses SDLC, which is a subset of HDLC, in its Synchronous Network Architecture (SNA). LAP-B, ADCCP, and HDLC are quite similar. For that reason, only LAP-B will be discussed here. Figure 1-11 illustrates an LAP-B frame. Notice that it is bounded by “flag” characters. In other words, the flags “frame” the data. The flag is an 8-bit octet, starting with a 0 followed by six 1s and ending with a 0. It is inserted at the beginning and end of each transmitted frame. The protocol only allows the frame to have this pattern at its start and end. It does so by using a technique called zero insertion or bit stuffing. Zero Insertion. In normal transmission, any time the protocol detects five 1 bits in the data stream, the rule is to insert a 0. The receiver protocol, upon detection of five consecutive 1s, knows to remove the 0 that follows. It is that simple. Figure 1-12 illustrates 0 insertion and removal. The LAP-B protocol will allow reception of up to 7 or 128 frames, depending on the system requirements, before it must have the first frame acknowledged. Medium-induced errors are detected through the frame check sequence (FCS) and identified by frame number. An error will destroy the frame number. The transmitter receives notice because the destination receiver ignores an errored frame and notifies the transmitter upon receiving the next good frame that it is not the one it anticipated. The receiver will request retransmission of the damaged frame and perhaps all subsequent frames, depending on what type of system is in operation. Note that the data carried in the frame has no bearing on the protocol, as it concerned only with individual bits in a frame, not what their ultimate meaning will be. We will discuss bit-oriented protocols in more detail in chapters 2 and 7. Protocol Summary We have looked at two protocols, one character-based and one bit-oriented. We left out much of the inner workings and subsequent details from the discussion in order to illustrate only the salient points of some protocol concepts. As you proceed through the next chapter, you will find that the discussed protocols were specifically Data Link protocols; these bit-oriented protocols will again be discussed in chapter 2. For this chapter, it is only important that you see how 1s and 0s are organized to frame the characters you are transmitting. Character-based protocol commands are directly related to the data within the frame. Certain characters must be sent twice or by some other method used to detect control characters as text. It is an older method, as bit-oriented protocols have generally replaced character-oriented protocols. The bit-oriented protocols do not depend on the data stream contents; they operate independently. This means that such a protocol recognizes certain bit patterns that the protocol does not allow to occur in the data (e.g., LAP-B addresses) as it frames and error-checks packets (frames). LAP-B may be used in any system but it is primarily used in point-to-point and multi-drop systems. Framing the data usually means having some form of start delimiter (flag), some sort of addressing and control process, the actual user data, an error check on the frame, and a stop delimiter (if the error check is not the stop delimiter itself). In lieu of a stop delimiter, some protocols count the octets in the frame and provide a frame length count (for variable length frames). Other protocols send frames with a fixed length. As the user data is bookended by the framing, the user data is then said to be encapsulated. Figure 1-10 illustrated the general frame; take note of its similarity to the LAP-B frame in figure 1-11. In the end, the user data is a series of organized 1s and 0s and a bit-oriented protocol only knows how many bits it requires in a given protocol, not what they represent. Summary In this chapter we have reviewed the organization of data, as organized into characters (for transmission, storage, and presentation to humans) and as organized to detect errors in transmission. The first data transmission code (IBM’s 4 of 8 Code) devoted more overhead to the error-detection scheme than to the data transmitted. Use of cyclic redundancy codes has minimized the necessary overhead, while enhancing the accuracy of error detection. More than anything, this chapter has served to introduce you to the foundational concepts of data communications and how communications are organized into modes of transmission, into character codes, and into protocols. In the following chapters, we will organize data further into information (protocols and such) and implement what we have discussed in this chapter. The key point to acquire from this chapter is that data, 1s and 0s, means nothing unless it is organized into some accepted structure that then enables it to become useful information. Bibliography Please note that when Internet references are shown in this book, the address was valid at the time the chapter was created. Websites come and go, so occasionally one of the references will no longer work. It is then best to use a search engine to locate the topic. The web addresses are shown to provide credit for the information referenced. Keogh, J. Essential Guide to Networking. Upper Saddle River: Prentice Hall, 2001 . Microsoft Corp. Unicode. MSDN Library, April 2000. Peterson, W. W. and E. J. Weldon, eds. Error Correcting Codes. 2nd ed. Boston: MIT Press, 1988. Sveum, M. E. Data Communications: An Overview. Upper Saddle River: Prentice Hall, 2001. Thompson, L. Electronic Controllers. Research Triangle Park: ISA, 1989. Thompson, L. Industrial Data Communications. 4th ed. Research Triangle Park: ISA, 2006. Wikipedia. Various pages.