Communication Concepts Ch. 1Condensedpart2.docx
Document Details
Uploaded by ExquisiteAmetrine
Full Transcript
Anytime you place data on a medium and take it off, there is a probability of error. Whether the probability of an error is high or low depends on the media and signal properties. After the data has been taken off of the medium, we can only detect errors if the data was organized in a way to detect...
Anytime you place data on a medium and take it off, there is a probability of error. Whether the probability of an error is high or low depends on the media and signal properties. After the data has been taken off of the medium, we can only detect errors if the data was organized in a way to detect errors prior to transmission. The IBM 4 of 8 Code used an early form of error detection. If any character had more or less than four 1s in 8 bits, it was in error. The entire block (84 characters) would then be retransmitted. Parity is an extension of this concept. Parity is a means of error detection. Using 7 bits of an 8-bit structure, as ASCII does, will leave 1 bit extra (in most cases B7 is set to 0) and the parity bit is added to the 8-bit character, making it 9 bits. Parity means to count the number of 1s in a character. The agency determining the system’s operating specifications will have decided on which parity to use: “odd” or “even.” If “odd” parity is used, then the parity bit will be whatever value is required to ensure an odd number of 1s in the character (including the parity bit). Block Parity The vertical character format mentioned previously is based on the punched cards used in early data entry and programming. Because punched cards (at least IBM’s cards) contained 80 columns and historically most data was on these cards, transmissions were in an 80-character “block.” (The 80 columns of data were known as a record.) Two framing characters were added at the start of the data block and one framing character was added at the end of the data block. An 84th character, called the block parity character, was added to the end of the transmitted block. It was computed from the other 83 characters. Table 1-6 illustrates the 84-column vertical format. Parity was first determined vertically for each of the block’s 83 characters. Next, parity was determined horizontally for each bit by counting the number of 1s in each row for all 83 counted columns (the 84th column is not counted). The horizontal parity bits were placed in the 84th column, thus creating the block parity character. The 1s in the 84th column were counted to determine the vertical parity bit for the block parity character, the same process as determining parity for all other characters. To determine if parity was met for all the characters in the block, parity was determined horizontally for the Parity row by counting the 1s in the 83 counted columns and the result was compared to the vertical parity bit for the block parity character. The vertical parity bit for the block parity character must match the horizontal parity result for the Parity row, if not, there was an error. This scheme is known as vertical and horizontal parity checking, or block parity. With block parity, fewer than one in 10,000 errors remain undetected. However, this scheme exacts a heavy penalty in overhead. One parity bit for each character adds up to 84 bits, or more than 10 8-bit characters. Error Correction In addition to having bits devoted to assisting with error detection, what means could be used to correct an error, once one was detected? The answer is normally an automatic retransmission (repeat) query (reQuest), known as an ARQ. This method had significant ramifications in many applications since the transmission device had to store at least the last transmitted block. (When using horizontal and vertical parity, data has to be transmitted in blocks.) There also had to be some scheme to notify the transmitter that the receiver had successfully received the transmission or that it was necessary to retransmit the block due to error. In most cases, half-duplex would be too inefficient in terms of transmission time versus line-setup time. Over time schemes have been devised to increase the “throughput” (defined here as the number of bits correctly received at the end device). Today, many schemes use a block that varies in length, depending on the number of errors detected. In transmissions that have few errors, such as those over optical fiber or a local area network, the block could be made longer. In media that has frequent errors, such as short-wave radio, the block could be made shorter. Blocks are generally of a fixed length, or a multiple thereof. Packets, on the other hand, have a fixed minimum and maximum length but can vary in length between these two limits. We normally speak of packet transmission in modern data communications, although the difference between a block and a packet is now more one of semantics than of practice. (It is worth noting that the longer the block/packet, the greater the chance of an undetected error and that parity and block parity check schemes were far less effective with multiple-errored bits than with single-bit errors. This is the reason the CRCC method is superior in terms of efficient use of bits for error detection, particularly for multiple errored bits.) Cyclic Redundancy Checks Any time data is placed on a medium there is a probability of error; thus, an error detection scheme must be used. Most modern devices use a far more efficient scheme of error detection than parity checking. Though ASCII was designed for a vertical and horizontal (at times called longitudinal) block-checking code, it can also be used with a cyclic redundancy check scheme that creates a check character. A CRC character is the number 8-bit check character developed by the CRC. A cyclic code divides the text bits with a binary polynomial, resulting in a check character. This is done by checking every bit in a serial bit stream and combining data from selected (and fixed) bit positions. Figure 1-6 illustrates the generation of a simple CRCC, which is sometimes (incorrectly) called a checksum. A number of different CRCCs are used today. The primary difference between them is the “pick-off point”. The advantages of one pick-off point or another depends on the application. For example, a communications channel may have different error behavior than a magnetic medium, such as a hard drive. In any event, the transmission protocols used in industrial data communications will probably use the CRCC-32 scheme. Within a given size block, a certain size character or characters will be generated. A common type uses two 8-bit octets, forming a 16-bit check character. The CRCC characters are generated during transmission and added to the transmitted packet. The receiving end receives the bit stream and computes its own CRCCs. These must match the transmitted CRCCs or there is a detected error. Note that even on a block of 80 characters, this scheme only uses 16 bits, in comparison to the 88 bits used with the vertical and horizontal parity check. The CRCC method is used for writing to a disk drive or to almost any magnetic medium and is the method most often used in modern data communications. Checksum Any of a number of error-detection codes may be called a checksum. Many times the CRC-16 or CRC-CCITT is called a checksum; however, they are actually cyclic codes, whereas a checksum was originally designed as a linear code. For instance, all of the 1 states in a block may be totaled to obtain a checksum. (This usually happens through a process known as modulo (mod) addition, where a number is divided by the modulo number but only the remainder is used. An example is 11 mod 4: 11 divided by 4 equals 2 with a remainder of 3, so the answer is 3. This checksum is tacked on as a block check character or characters. The efficiency of a checksum in detecting errors is not as high as with a cyclic code, yet the circuitry to produce the checksum is less complex. NOTE The abbreviation for modulo varies depending on the programming language. In COBOL and older programming languages, modulo is abbreviated MOD (11 MOD 4). In C-style languages, the abbreviation is the percent symbol (11%4). There are other error detecting codes besides the checksum and the CRCC. These codes exist specifically for detecting and (sometimes) correcting errors. Some of these codes are cyclic, some are linear, some are quasi-cyclic, and some are polynomial. Some of these codes are sufficiently long in pattern length to be used for error detection and correction. ARQ What hasn’t yet been explained is what happens when an error is detected (at least in general terms). The following is a simplified discussion of a generalized automatic retransmission (repeat) query (reQuest), the ARQ method of error correction. Before beginning our discussion, it is important to note the change in terminology as multiple terms are used to refer to the data being transmitted. Prior to HDLC, data sent (typically asynchronously) was defined by control characters and was referred to as data blocks. There would be a set of delimiters at the start and an ETX and block parity character at the end of the data for a total of 84 characters. As time went by and data began to be sent by bit-oriented protocols (HDLC); these blocks of data were called packets (about 1970). Since 1990, the terminology has been changed again (but certainly not by everybody); the Layer 2 octets are called frames (framed by the protocol standard) and the arrangement of data in Layers 3 and above is referred to as packet data units (PDU). Of course, some still refer to the Layer 2 octets as “packets” when they mean “frames.” In normal operation, the transmitter sends a frame to the receiver. The frame is received. If the CRCCs match, the receiver sends an ACK. If they do not match, the receiver sends a NAK and the transmitter then retransmits the frame. This is illustrated in figure 1-7. If an occasion should occur where the frame is so badly corrupted that the receiver does not recognize it as a frame, then the receiver will send nothing. After sending a frame, the transmitter waits for a response. If a response does not arrive within the expected time, the transmitter assumes the worst and resends the frame. This condition (not getting a response in time) is called a timeout and is illustrated in figure 1-8. On the other hand, suppose that the frame is received correctly and an ACK is returned. What if the ACK is “bombed”? If this happens, the transmitter will time out and will resend the frame. The receiver will detect that it has two frames with the same frame number and that both were received correctly. It will then drop one of the frames and send another ACK, as shown in figure 1-9. There are many ways to enhance error correction via ARQ and they are accomplished in individual protocols. Determining how many frames can be transmitted prior to receiving an ACK/NAK has ramifications for throughput. The more frames transmitted prior to an ACK/NACK, the greater the throughput; however, more frames must be stored after being received, so the memory requirements are greater. Forward Error Correction Although the ARQ method of error correction discussed above detects an error (by whatever means) and then retransmits the portion of the message that was in error, what happens when there is simplex (unidirectional) transmission? The ARQ method will not receive an ACK/NAK because the circuit is unidirectional, so ARQ cannot be used on a simplex type circuit. In this case, or in cases where the medium is too noisy to allow for any significant throughput, forward error correction (FEC) may be employed. FEC requires that the communications network’s pattern of errors be analyzed and an error-correcting algorithm be designed to fit the identified error patterns. This can be done for most real-world circuits without extraordinary difficulty. The main problem is that error-detection and location overhead can be as much as 1 parity bit (or error-detecting symbol) for each data bit. Using this approach would immediately cut throughput in half. In some cases, however, the data must be 100% error-free (without regard to capital expense or throughput) because this transmission may be the only meaningful transmission sent or the only time given to the circuit for any throughput to occur. For these applications, FEC is appropriately employed. A thorough discussion of error-correcting algorithmic codes and associated theory is beyond the scope of this text. However, many good references, at many different knowledge levels, are available on this subject. Error correction is integral to data communications. Knowing that an error exists is not the end goal, the goal is to detect an error and determine how to correct it while ensuring data integrity. Data Organization: Protocol Concepts A protocol is an accepted procedure. We use protocols in personal interactions all the time. When you are introduced to someone in the United States, the protocol is to shake hands, unless there is a ceremony requiring another form of protocol or you are being introduced to a large number of people. In a class, or throughout a conversation, there is a protocol that tells you to wait until another person is finished speaking before speaking yourself. There is a physiological reason for this protocol: humans are effectively half-duplex—they typically cannot talk and listen very well at the same time. Terminologies Asynchronous generally means that something may occur at any time and its occurrence is not tied to any other event. The old “start-stop” teletypewriter signal (see appendix B), with its 1 start bit and 1 stop bit, is a good example of an asynchronous transmission method. The teletypewriter signal started out using motor speed as the main means of synchronizing the start-stop bits within each character. In today’s vernacular, any start-stop signal is assumed to be asynchronous. In actuality, a start-stop signal may be transmitted synchronously or asynchronously. Synchronous generally means “tied to a common clock.” The clock signal is typically transmitted along with the data, usually in the transition of data from one state to another. Since synchronous transmission uses bit timing each bit of data must be accounted for. Almost all modern data communications are synchronous transmissions. The terms baud rate and bits per second are often used interchangeably: this is incorrect. Baud rate is the line modulation rate; that is, the number of signal changes per second that are placed on the communication medium in a given time. Bits per second, on the other hand, is the data transmission rate of the device that is transmitting or that a device is capable of receiving. Depending on the type of signaling and encoding used, the baud rate and the bit rate may be quite different. Baud per second is a term that is used to describe a change in a transmission medium signaling state, not a line speed. Ads for a “33.6 Kbaud modem” are in error; such a modem would require a line with a bandwidth great enough for 1/33600-second rectangular pulses (not possible with a voice-grade wireline). What this specification means in reality: these are 33,600-bit-per-second (33.6 Kbps, the data bit rate) modems. Such modems use a 1200 baud line and achieve their higher data rate by sending a signal state representing 56 bits of data per baud. That is, they make one line state change (baud) for each 56 bits of data (one change of phase or amplitude for each 56-bit combination). This is accomplished through a high-level coding technique called trellis encoding. Modems running at 33.6 Kbps (kilobits per second) require a 600 baud line in each direction (a total of 1200), rather than one that supports 33,600 signal changes per second in each direction. A standard dial-up telephone line can support a total of 1200 signal changes per second (about 3.3 KHz bandwidth). Some refer to the baud rate as symbols per second (see appendix C). Protocols As electronic message handling became the norm, it became imperative to build as much of the communications link control into the terminals as possible. To this end, “communications protocols” arose. Most protocols were developed by individual vendors for their own systems. Since these protocols are generally incompatible with the protocols and equipment of other vendors, the customer is generally locked into one manufacturer, promoting incompatibility. All data communications efforts involve protocols of one kind or another. A protocol is a different organization of data, as opposed to shaping groups of 1s and 0s into characters. Communications protocols are either character-based or bit-oriented. This means that the information we are looking for will take the form either of characters telling us information or of bit patterns telling us information. Character-Based Protocol One example of a character-based protocol is Binary Synchronous Communication. The IBM Bi-Sync protocol, one of the first and most widely used of the proprietary protocols, was developed to link the IBM 3270 line of terminals to IBM computers in a synchronous manner. The Bi-Sync protocol is character-based; control depends on certain character combinations, rather than on bit patterns. It also requires that the transmitting terminal be able to store at least one block of data while transmitting another. When transmitting under Bi-Sync, the hardware is responsible for avoiding long strings of 1s or 0s. In a synchronous system, if a significant length of time is used to transmit only one state or the other, the receiver loses its bit synchrony. As a result, communication is disrupted or, at least, it is in error. Most modern hardware uses a scrambler to ensure data state transitions. The scrambler contribution is removed at the receiver, since the scrambler pattern is performed on a scheduled basis. In addition, most phase continuous systems use Manchester encoding, which moves the state of the bit into the transition of data Manchester encoding is explained in detail later. Table 1-7 lists some of the Bi-Sync control characters. Note that the ASCII control characters are used if Bi-Sync is used in an ASCII-based system. Bi-Sync normally uses duplex transmission. Although it could use half-duplex, it was not primarily intended for that type of operation. Before duplex Bi-Sync was used, a block would be transmitted, the line would be turned around and then the original transmitter would wait for the original receiver (now a transmitter) to send either an ACK or a NAK. If a NAK was received, the line would be turned around again and the original transmitter would retransmit the block. Duplex operation would not have been faster, except for eliminating the turnaround time, because the transmitter could do nothing else until it received a response to its transmitted block. To speed things up a bit, the Bi-Sync protocol had the transmitter store two blocks, so it could wait for an ACK while transmitting the second block. ACK1 and ACK2 signals were used to differentiate between ACKs. The primary benefit in this case was that transmission could take place without line turnaround (duplex) or without halting transmission until an ACK was received. It should not be forgotten that when Bi-Sync was developed (1967), the cost of data storage was approximately one US dollar per bit in 1980 dollars—a significant expense. Because the control characters are intertwined with the text, they must be sent in pairs to ensure that they are identified. How would you transmit a machine or computer program in object form, if that code were composed of 8-bit octets, some of which may very well be the same as the control codes? To make this possible, Bi-Sync allows a transparent mode, in which control characters are ignored until the receiver detects several data link escape characters. Bi-Sync is dependent upon character-oriented codes. In modern communications, there is a need to make the protocol independent of the transmitted message type. In other words, it should make no difference to the protocol what bit patterns the message consists of or even in what language it is composed, as long as it is in 8-bit octets. Link Access Procedure-Balanced protocol and other bit-oriented protocols provide that functionality rather gracefully. Bit-Oriented Protocol Bit-oriented protocols use a concept called framing, where there are bit patterns before and (in some schemes) after the information frame (which is why it is called a frame now, rather than packet or block). In framing, there is a binary pattern that is the start delimiter. There are also binary patterns that indicate the addressing and what type of frame it is (e.g., the frame contains information or is of a supervisory nature), as well as some method of sequence numbering followed by the user data. A frame check sequence (FCS), normally a CRCC, follows the user data, which is typically a variable number of octets. The user data is surrounded by the protocol; that is, the protocol “frames” the user data. There may also be a stop delimiter or the frame may use the CRCC as the delimiter. Link Access Protocol-Balanced (LAP-B) is a bit-oriented protocol. It is very similar in both structure and format to other bit-oriented protocols: High Level Data Link Control, Advanced Data Communications Control Procedure, and IBM’s Synchronous Data Link Control. IBM uses SDLC, which is a subset of HDLC, in its Synchronous Network Architecture (SNA). LAP-B, ADCCP, and HDLC are quite similar. For that reason, only LAP-B will be discussed here. Figure 1-11 illustrates an LAP-B frame. Notice that it is bounded by “flag” characters. In other words, the flags “frame” the data. The flag is an 8-bit octet, starting with a 0 followed by six 1s and ending with a 0. It is inserted at the beginning and end of each transmitted frame. The protocol only allows the frame to have this pattern at its start and end. It does so by using a technique called zero insertion or bit stuffing. Zero Insertion. In normal transmission, any time the protocol detects five 1 bits in the data stream, the rule is to insert a 0. The receiver protocol, upon detection of five consecutive 1s, knows to remove the 0 that follows. It is that simple. Figure 1-12 illustrates 0 insertion and removal. The LAP-B protocol will allow reception of up to 7 or 128 frames, depending on the system requirements, before it must have the first frame acknowledged. Medium-induced errors are detected through the frame check sequence and identified by frame number. An error will destroy the frame number. The transmitter receives notice because the destination receiver ignores an errored frame and notifies the transmitter upon receiving the next good frame that it is not the one it anticipated. The receiver will request retransmission of the damaged frame and perhaps all subsequent frames, depending on what type of system is in operation. Note that the data carried in the frame has no bearing on the protocol, as it concerned only with individual bits in a frame, not what their ultimate meaning will be. We will discuss bit-oriented protocols in more detail in chapters 2 and 7. Protocol Summary We have looked at two protocols, one character-based and one bit-oriented. We left out much of the inner workings and subsequent details from the discussion in order to illustrate only the salient points of some protocol concepts. As you proceed through the next chapter, you will find that the discussed protocols were specifically Data Link protocols; these bit-oriented protocols will again be discussed in chapter 2. For this chapter, it is only important that you see how 1s and 0s are organized to frame the characters you are transmitting. Character-based protocol commands are directly related to the data within the frame. Certain characters must be sent twice or by some other method used to detect control characters as text. It is an older method, as bit-oriented protocols have generally replaced character-oriented protocols. The bit-oriented protocols do not depend on the data stream contents; they operate independently. This means that such a protocol recognizes certain bit patterns that the protocol does not allow to occur in the data (e.g., LAP-B addresses) as it frames and error-checks packets (frames). LAP-B may be used in any system but it is primarily used in point-to-point and multi-drop systems. Framing the data usually means having some form of start delimiter (flag), some sort of addressing and control process, the actual user data, an error check on the frame, and a stop delimiter. In lieu of a stop delimiter, some protocols count the octets in the frame and provide a frame length count. Other protocols send frames with a fixed length. As the user data is bookended by the framing, the user data is then said to be encapsulated. Figure 1-10 illustrated the general frame; take note of its similarity to the LAP-B frame in figure 1-11. In the end, the user data is a series of organized 1s and 0s and a bit-oriented protocol only knows how many bits it requires in a given protocol, not what they represent.