COMP90007 FINAL QUIZ.pdf
Document Details
Uploaded by GratifyingMars
The University of Melbourne
Full Transcript
COMP90007 Lecture 11 Sliding Window Protocol Adds buffers to senders and receivers The buffer at sender side keeps frames in the buffer until receiver acknowledgement If receiver asks for retransmission, frame can be sent from buffer immediat...
COMP90007 Lecture 11 Sliding Window Protocol Adds buffers to senders and receivers The buffer at sender side keeps frames in the buffer until receiver acknowledgement If receiver asks for retransmission, frame can be sent from buffer immediately which is a lot less complicated than getting the frame again Maintains a set of frames, not just 1 frame as it also puts unprocessed frames in buffer instead of dropping them. Benefit is that we can keep sending and increase proportion of time used for time transmission which increases link utilization Tradeoff is that we have to buffer them at the receiver side before processing them 2 different methods to implement sliding window Go-back-N Sender side has a window size of N They can send up to N frames without acknowledgement Receiver side window size = 1 Improvement on sender size but not on receiver side In the example, the receiver has to stop at the error as it doesn’t have window size greater than 1, so they can’t put other frames in the buffer, so they have to stop and discard all the following frames rather than processing them The sender doesn’t know this and will keep sending until the timeout interval passes before it resends the signal. The problem of this method is that whenever we have an error, we have to discard following frame because we can’t process them. We have to go back to the frame that receiver hasn’t process which is why its called Go back N Long transmission time needs to be considered when programming timeouts e.g. low bandwidth or long distance Therefore, further improvements needed to be found in sliding window technology Selective Repeat Increases receiver window side to N but doesn’t need to be same as sender side Receiver accepts frames anywhere in receive window Negative acknowledgement NAK triggers retransmission of a missing frame before a timeout. This means that after an Error, we can notify the sender earlier to resend before go back N. Cumulative acknowledgement indicates the highest in order frame received. The receiver in this case doesn’t send individual acknowledgement for frames before 5, but does a cumulative acknowledgement for 5. In this case, we don’t have to waste bandwidth on sending more acknowledgements increasing link utilization (U) Go Back N vs Selective Repeat Large buffer space, we can store the frames in the buffer and increase link utilization Examples of Data link Protocols Point to point protocol (PPP) Used on different links Either over fiber optics through packet over SONET where SONET refers to the protocol used for fiber optics PPP over ADSL is how to use PPP over telephone lines Is a data link layer protocol We can check their framing, error control and flow control Framing uses a flag and byte stuffing – this refers to we define some flag byte, and we escape if that flag byte appears inside the payload Flag is defined as 6 consecutive 0’s – 01111110 and then we define another special term for escape in byte stuffing Inside the frame, there are several fields Payload is IP packet from network layer which we want to encapsulate into frame Other fields are added on data link layer for services like error control, type of service info Packet over SONET PPP is a general data link layer service so they don’t care about transmission media. In this method, a big picture shows that we have some frames, then we split them into several pieces based on the requirements which is around 800 bytes and then we send it at some regular time interval ADSL Asymmetric, so they have wider frequency band reserved for downloading and smaller band for upstream ADSL on physical layer and IP on network layer The other protocols all on data link layer AAL5 and ATM are protocols used for supporting PPP over ADSL Full name of ATM is asynchronous transmission mode PPP frames will be sent to AAL5 which is an adaptation layer which then passes it to ATM to add more information before being sent to ADSL. These are intermediate steps before sending data to ADSL We have to make sure right-hand side and left-hand side of the router for the ISP should use the same steps of protocol to understand eachother in ADSL ATM defines fixed size cells where each cell has 53 bytes This is not the case for PPP so they need AAL5 frame as adaptation layer to prepare something for ATM to create those fixed length cells Essentially PPP payload is encapsulated in AAL5 frame, and this frame is further encapsulated in ATM cell. ATM cell will send information to the other side Padding on AAL5 is used to create fixed length ATM cells – this include 5 bytes for header and making sure 48 bytes from payload. They do however from payload need to be a multiple of 48 Advantages of this include easier of management as we can always assume 53 bytes of information. However, the drawback is that it is not flexible which leads to it not being as successful. There are several factors to distinguish different networks. One of which includes transmission technology, and we discussed point to point and broadcast networks Medium Access Control Unnecessary for point-to-point networks and mainly used for broadcast networks It’s a sub layer because it’s a part of the data link layer and below it. It is between data link layer and physical layer but is still a part of data link layer Its not about error control for flow control but how to use the channels One we have a frame from data link layer, it will be passed to mac sub layer and mac sub layer will then compete the usage with others, determining who can use the channel, sending the frame to the next step which is the intermediate network of data link layer to complete flow control and error control Types of Channel Allocation Mechanisms Can be shared by static or dynamic way Static channel allocation Uses time division multiplexing (TDM) and Frequency division multiplexing (FDM) Both of them divide the channel into N segments depending on number of users using the channel and each user is allocated a dedicated segment for transmission Time division multiplexing The fixed schedule and channel is split based on the number of users and allocated to users to use in turn. If user doesn’t have any data to send, the slot for their turn will be empty which needs to be skipped which can lead to waste of bandwidth Frequency division multiplexing Split frequency and assigned to each user Each user can have continuous access to part of the channel COMP90007 Lecture 10 Summary of Error detection code methods Parity Bit (1 bit): (Hamming distance = 2) – assumes number of 1’s should remain even or odd depending on method used, if there are no errors. If errors are present, number of 1’s should change from even to odd and other side can detect that Internet checksum assumes sums of values will not change if there are no errors, if there are any transmission errors, the sum will be different. Cyclic Redundancy Check One that’s used on the internet and is the practical algorithm and its stronger than other 2 It assumes that if remainder of a long division is different, then there is an error present Dividend is the original data + some 0’s reserving for check bits Divisor is from coefficient of generator polynomial G(x) The number of bits we keep is determined by the highest power of G(x). if highest power is 4, we keep 4 bits as the tech bits We then subtract the tech bits from the original data These 3 methods have different assumptions at different levels which determines their complexity Error correction: Hamming Code To correct the errors, we have to count them and locate More complicated than detect where its just yes or no We have to find errors and fix them One error correction method is called hamming code which we need to find how many check bits are required for n bits of data Check bits are required for n bits of data when using hamming code They are the redundancy, and are used by the receiver to correct errors Relationship shown below where k is the number of check bits In 4 bits of data, it requires 3 check bits. Number of bits of data must be lower or equal to right side Say for 4 check bits, we can use up to 11 bits of data In the example where we have 4 bits and data and 3 check bits, we get a total of 7 bits. We will then have to determine which bits are the original data and which bits are one of the check bits. These can be represented by 7 positions, p1, p2, p3……, p7 Starts from position 1 P1 should be a check bit Second check bit should be p2 P1 is 2^0 and P2 is 2^1 The next check bit is at P4 This is because check bits are in positions p that are power of 2 The other 4 bits are for the original data Check bits are ? because system hasn’t determined them yet After this we then decide what kind of parity bits we are using (even or odd) To determine who has P terms, we can list all the decimal numbers of P and determine the P term inside them. Assuming even parity example to find check bits and find out final data we need to send On the other side, assuming the data was put on a transmission media and sent to the other side, the receiver will identify the group as well following the same steps, counting number of 1’s and if there are any errors, there will be an assumption that total number of 1’s is not even. Example shows an example with 0 instead of 1 at p7. The receiver needs to count the number of 1’s in these 3 groups which results in 1 whereas they are meant to be 0. The receiver will then get all the check bits in the 3 groups with 1’s and get the sum of their indices and locate the error at p7 through P1+P2+P4 P7 In the second example, where P2 has the error, it could be shown only the sum of P2 has a result of 1 which means error is just at P2. Assuming we have 2 errors, both at P1 and P7, the first line should still be correct but the second and third lines would be incorrect. This makes the receiver think P6 is incorrect This hamming code can only correct 1 error and in 2 errors the correction will fail Also because hamming distance, if we have 3 errors, the method cant even detect them. Error control discussion Error correction: More efficient in noisy transmission media, e.g. wireless. Generally, for error control methods, we use an algorithm to convert the actual data and bits of actual data to codeword, which is M plus K where K is amount of redundancy. In this system, we’ll therefore have 2 to the power of M+K codewords which contains both valid and invalid code. However, since we only need n bits of original data, we only need 2 to the power of N valid code word, where each valid code word corresponds to each combination of original data. Strength of algorithm is determined by invalid codewords, the distance between those valid codewords. If minimum distance is just 1, this algorithm will not work. If there is a single point that the distance between 2 valid code words is just 1 this method cannot be used for error detection and correction Quality of channel matters to decide between correction or detection Error detection: more efficient in the transmission media with low error rates, e.g. quality wires Require assumption on a specific number of errors occurring in transmission. Errors can occur in the check bits. Every method has some computational cost If we want to use stronger methods, we need higher time complexity. Hamming code is the example we went through, and it can detect errors within the code and in the check bits Flow Control Error control is about sending single frames reliable Flow control cares about a series of frames It is strategies to control when sender can send next frame If sender and receiver can process frames at same speed, then there’s no problem. If sender sends more frames than receiver can accept them and receiver doesn’t have buffer at the end to store the frames, it will be overloaded and start dropping the frames Flow control tells the sender when to send the next frame through positive or negative feedback There are different methods which include feedback-based flow control and rate-based flow control (limits the rate at which sender can transmit, happens at transport layer) Right side is an example of not being able to process everything sent from sender so they put them in a buffer and start processing 1 frame at a time. After accepting that frame, they will remove it from the buffer and process the next frame. The processed framer will be delivered to the network layer. This is an ideal case where sender always has frames ready, and receiver has a buffer The channel also only goes one way with no complicated situations – noiseless channel doesn’t need flow control transmission with faster sender vs slow receiver. to deal with this, we have acknowledgement Still kind of ideal where we don’t have lost frames The frames acknowledged won’t be damaged or lost Type of channel needed would be half duplex At a single time point there should only be 1 channel but it supports channel both ways Noisy Channels In a case where acknowledgment is lost, the sender would continue to wait for it and be blocked for that. we need timeout function to avoid being blocked by a missing acknowledgment, and prompts it to resend the frame, where it assumes it won’t receive the acknowledgement anymore Distinction is used to reject duplicate frame and accept lost frames. In this part, we need sequence numbers for the distinguishment Stop and Wait Protocol Has acknowledgement Timeout function Sequence number Stop and wait means we have to stop and wait for the acknowledgement Normal process is similar to acknowledged transmission, but in this figure shown on the right, it shows acknowledgement signal being lost and 1 frame lost The sender will keep waiting until the timer runs out in a case where acknowledgement signal is lost. The resent frame has to be distinguished as a duplicate and ignored by the receiver and send back acknowledgement In a case where a frame is lost, there will also be a timeout. The sender will resend the frame, but that frame won’t be treated as a duplicate by the receiver since it doesn’t know it has been sent twice or more. The receiver will use a sequence number to tell if send signal is a duplicate or not Only sequence number we need to tell from current frame and previous frame is 0 and 1 The frame number we are expecting should be initialized at 0. After accepting the frame, the next frame expected should be 1 and since a duplicate sends a 0 again, it should be ignored When buffer size is 1, we only need 0 and 1 to tell current frame from next or previous frame Link utilization in stop and wait protocols Measures efficiency of communication Defined as proportion of time on transmitting a frame on total time Stop and wait protocol is bottlenecked by the waiting for acknowledgement part If we can keep transmitting frames while waiting for acknowledgement, it can increase our link utilization If we increase bandwidth of the channel in link utilization, we decrease the link utilization which means that if we have a good channel and we block that and wait for acknowledgment, link utilization will be even lower 2 protocols exist to keep sending frames while waiting for acknowledgement which are both under the sliding window protocol COMP90007 Lecture 9 Error control simple mechanism Repeat the bits, if a copy is different from the other, there is an error. To send 0, we send 000. To send 1 we send 111. This is known as known as redundancy which is the 2 extra bits added Overhead is 2 extra bits Given the 3 bits received, the receiver can detect whether the 3 numbers are the same. Therefore for 3 bits, only 2 types of errors detected The receiver can only correct 1 bit of error and anything over that, they will fail Minimum number errors that can fail the algorithm is 3 errors since we can detect 2 errors, and fix 1 error. Error Bounds – Haming distance A code turns data of n bits into codewords of n+k bits The codewords is the actual data plus the redundancy (extra bits) We have k bits of redundancy in this case Hamming distance is to measure the distance between two sequences It is the minimum bit flips to turn one valid codeword into any other valid one Example with 4 codewords of 10 bits would have (n=2, k=8), therefore to send 2 zeros you send 10 instead. You Add 4 redundancy unit to each bit Minimum bit flips in this system is 5, since it takes five turns to change. Minimum number of flips is the weakest point in the system When we try to detect errors, the receiver has to compare the data they receive with the codeword whether their valid or not Then the system fails if we send some data and but receive rthinks their valid When there are 5 errors, the system fails as we can only detect up to 4 errors So when distance is 5, d is 4, so we can detect up to 4 errors We can only claim that we successfully corrected errors if final results are same as original More efficient error detection and control Parity Bit Redundancy by adding 1 bit of extra information, to make sure that total of 1’s is even or odd depending on algorithm Even parity means even number of 1’s Odd parity means odd number of 1’s These all include the last bit they can only detect a odd number of errors Cannot be used for error correction as there is no location detection in parity bit Internet Checksum A group of check bits for a message at the end there are different variations of checksum it is about attaching sum of numbers at the end internet checksum (16-bit word): sum modulo 2^16 and add any overflow of high order bits back into low order bits assumption of this method is that if the number is changed, if there are any errors, the sum would change. Based on the sum, the receiver can detect that to make it easier for the receiver to detect errors, we could also attach the negative of sum at the end. In this case, the receiver just has to add up everything and check whether result is 0. Then we care about how to complete these process in binary When adding bits, 0 + 1 = 1 and 1 + 1 = 10 In case of overflow, we just add it to the last bit which is how we handle overflow Since we are trying to get negative of this value to make it easier for receiver side, you just simply flip the resultant bits from 11100 into 00011 and send this as the final checksum to be attached to the end of the data At the receiver side, we just repeat this one and get sum of all values including checksum bits The receiver will check if its all 1’s which is also equivalent to all 0’s since its one’s compliment as we were using negative sum This method will fail if we flip some bits without changing the sum. Examples include flipping 1 bit to 0 and another 0 to 1, which keeps the final checksum, and the receiver can’t detect it Other cases that it might fail is that if we insert some 0’s where the sum remains the same but length changes Limitation is essentially any errors that wont change the sum will fail it CRC (cyclic redundancy check) State of art error detection method Used in local area network and wide area network in several protocols and is stronger than previous methods Based on generator polynomial G(x) – based on division and remainder Assumption is if there are any random errors, remainder should be different Principle of CRC is long division Exclusive or (XOR) is used in binary division where similar means 0 and different means 1 The last bit we get is the remainder and we care about the 4 bits in the remainder as in the redundancy (in the extra bit) we need 4 bits and add it to the data to send It looks like we added 0101 but the actual computation is subtracting the remainder. Idea is that when we subtract the remainder from the original dividend the number, we send is divisible by the divisor These 0000 are placeholders for remainders. If we are using 32 bit CRC then we attach 32 bits at the end, and after computing remainder we replace the 0’s with 32 bits of remainder. INFO90002 Lecture 8 – relationship between packets and frames Network layer can request different services Connection oriented vs connectionless Acknowledged vs unacknowledged All these services are provided to network layer Transferring data from the network layer on source host to the network layer on destination host Services provided: We need to consider when selecting a service: reliability, capacity, cost (time and money) of the service In an unreliable channel, we may need more mechanisms to provide high quality service and network layer can request a particular class of service considering these factors Unacknowledged connection service No confirmation from receiver If frame is lost, data link layer would try to recover that Host transmits independent frames to recipient host without acknowledgement No logical connection establishment or release No lost frame recovery mechanism (or left to higher levels) Suitable applications include Ethernet LANs and Real-time traffic, e.g. audio video data. For real time traffic we care about speed Ethernet has physical connection, but they don’t have logical connection, so they don’t have to set up a connection Acknowledged connectionless service Source host transmits independent frames to recipient host with acknowledgement No logical connection establishment or release Each frame is individually acknowledged and retransmitted if lost or errors Applications include Wireless since wireless is not as reliable as ethernet, so it needs this conformation from the receiver. If they haven’t received correct frame, then we have to retransmit Transport layer, transmission layer is called segment but on data link layer it’s a frame Essentially, if its reliable channel we don’t need acknowledgement, if its not reliable we need it Acknowledged connection oriented service Framing Breaks raw bit stream into discrete units Primary purpose is to provide some level of reliability over the unreliable physical layer When we have the packet from network later, but physical layer only has very limited capacity to control flow and error, but channels can have a lot of noise This is why we need data link layer To provide some level of reliability we need frame to do that Example: one of the functions for error control is called checksums where if there were any errors the sum would be different. If they were the same, then they will accept the data When we send a frame, there are sometimes some raw bits to the other side. Then there’s a challenge how the receiver would tell the start and end of the frame. A simple strategy is to define a fixed length where I request that all frames should be 100 bytes (800 bits) and the other side can count if its 100. This however is not flexible Synchronous transfer mode requires that each unit should be 53 bytes, but if we need some flexibility we need to leave some message so that the other side can tell the start and end of the frame 1st one is called character count (byte count) 2nd flag bytes with byte stuffing 3rd start and end flags with bit stuffing All of them are helping receiver tell start from end. Character count (byte count) To send a byte of 5 we will have to convert the 5 into 8 bits representing five To determine the binary numbers, we’ll have to convert it to numbers with 8 bits and each bit represents 2 to the power of n starting from 0. Five would be 1 0 1 where the second 1 represents 4 The 5 was mistaken as 7 because it was interpreted as 111 instead of 101. Where 111 adds up to 7 Its very to get out of sync with an error so other methods must be explored Flag bytes with byte stuffing We define a special character percentage sign in this example If a percentage sign occurs in a payload field, which shouldn’t be interpreted as the end of the frame. Therefore, we need to develop an escape character (\) to escape that and tell others to keep reading You need to insert escape byte for each flag byte in the message, telling others that it’s a normal byte and they are not the end. This is called byte staffing The escape itself, we also have to tell that its not the actual escape, so we have to escape each escape since each escape is only valid for the next one. The receiver will remove each of them accordingly as for 2 escapes, you need 3 Start and end flags with bit Stuffing Works on bit level instead of byte level Doesn’t have to be a multiple of 8 Is more flexible There also has to be a special pattern for start and end frame and we have to avoid having this appearing in the message to stop other side from interpreting it as the end of the frame The strategy of this is to insert a 0 after every 5 1’s when the special bit pattern is 01111110. Then on the other side, destuffing must occur, where after each 5 1’s they remove one 0 Error control To add check bit to ensure other side can distinguish the correct message due to physical layer being impacted by interference and external factors Two types of services include 1. detecting the error, and retransmitting 2. Correcting the error Can occur randomly (single-bit) or in bursts (burst error) Burst error is caused by external noise and depends on data rate and duration of the noise If we are transmitting the data at 1mb/s per second and external noise is 1ms. It may impact one thousand bits in just 1ms depending on data rate If we want to detect errors its just yes or no but if we want to correct them its more challenging When we want to correct single bit error (1 bit of error), its 8 cases that we have since a single burst is 8 bits in burst error However, for 2 errors, it would be 28 choices since it increased quadratically with the amount of bits we have In this example, when we have the message, we pass it to the algorithm to add those extra bits which is also called redundancy so that we have message + redundancy This redundancy is important for other side to tell whether there are errors or not After transmission to the other side, received information is passed through the checker and the checker would apply the same algorithm as the encoder side to check whether there are errors or not If they just want to detect error, they will just discard the message and ask for retransmission If they wanna correct it they will do it themselves without asking retransmission and send it to the upper layer Error control can happen in other layers too Redundancy is the extra bits computed by the algorithm Internet Technologies lecture 7 How would the receiver handle the signal to understand data communications Process of using signals Assuming 2 computers A and B and we would like to send 1010 Now we’d like to modulate signals and decide to use amplitude Higher amplitude means 1 and 0 amplitude means 0 Because of noise and interference during transmission, signal on other side might be different Receiver will then have to sample the values and map it to the closest symbol. Frequency of sample measurement is called sampling theorem. If the signal we receive has highest frequency at h, then lowest sampling rate would be 2 times average The more samples we get the better, the more accurate we can retrieve the signal If we sample lower than this frequency, sampling only 1 point per signal, then we can’t recover the signal Example we send code in pairs of 2 bits and the number of patterns determines number of bits we can send per code through log2N however, there’s a limit to number of patterns we add due to ability to memorize the patterns and interruptions. So if we keep increasing, N, it makes it easier as we can send more bits and hence signals more efficiently but it makes it more difficult to distinguish and identify the information Symbol rate – 1 piece of signal One symbol can represent multiple bits (data elements) Symbol rate (Baud rate): number of signal changes per second Data rate = number of bis per second Data rate = log2N x symbol rate N is just the amount of different symbols we have Maximum data rate of a channel Nyquist theorem relates the data rate of a channel without noise to the bandwidth (B) and number of signal levels (V) Data rate is measured by number of bits per second, number of bits, number of 0 and 1’s we can send per second Related to bandwidth B measured in hertz. Number of cycles per second 2B refers to maximum number of symbols we can send using this channel Log base 2 V is number of bits per symbol Putting them together we have number of bits per second Second theorem proposed by Claude Shannon takes into consideration what if there was noise in the channel It links signal strength (S) to noise strength (N) S/N If we have high signal to noise ratio, like the first one, its still very easy to distinguish between higher level and lower level and connect to the raw bits as they have defined However, lower signal strength makes it hard for the other side to tell the original signal In this theorem, we don’t care about the number of symbols we use, we care about the capacity of the channel. High signal to noise ratio, means we can achieve higher data rate Maximum number of symbols we can send is still 2B, its just been simplified off the square root, cancelling out the 2 at the front of 2B S/N that’s equals 0 means maximum data rate is 0 Putting these 2 theorems together, we can determine that the limit of the data rate is determined by 3 factors: bandwidth (hz), quality (noise), and signal to noise ratio. If signal to noise ratio is low, no matter how many levels we use, we cant achieve a high data rate because the otherside cant distinguish them Example 1: considering Nyquist first If a binary signal is sent over a 3kHz channel, what is the maximum data rate? Ans: Example 2 SNR of 30dB is equivalent to 1000, SNR of 10db is equivalent to 10 example 3 First step is to convert SNR to SN 20dB = S/N = 100 Therefore Shannon limit is 3 x logx(101) = 19.975 kbps To then calculate maximum data rate, we must check and consider Nyquist limit as we are using binary signal Nyquist limit is 2b log2V = 2 x 3 x log2(2) = 6kbps The bottleneck will always be the lower limit, giving a maximum capacity derived from Nyquist limit Only conserving Shannon limit, we only have half the result Nyquist limit is not just relevant to noiseless channels, but also noisy channels. So we should check this Lower Nyquist limit tells us that we should increase the number of signal levels as the channel has the capacity to do so, and we are wasting this channel by not doing this increase When Shannon limit is lower than Nyquist limit, it means we should increase the S/N ratio, which means increasing quality of channel If the assessments haven’t told you to consider which channel, then both of them should be considered Share a Channel The maximum data rate we mentioned is about 1 pair of sender and receiver like how to use different levels of signals and to maximum data rate. Its also possible to share a channel Channel are categorized by whether or not they let multiple users to transmit at the same time and the directions of them Categorized into full-duplex, half-duplex, and simple link If we have multiple users who want to access the channel at the same time we use multiplexing. The basic strategy of multiplexing involves queues, where its first come first serves Better approaches include time division multiplexing and frequency division multiplexing. Both of them can divide the channel so multiple users can use them Time division multiplexing Users can send according to a fixed schedule Slotted access to the full speed of the channel Between timeslots are guard times to avoid collisions for accommodation of small variances Frequency Division Multiplexing Users can only use specific frequencies to send their data Continuous access with lower speed Users can divide the frequency bands and each of them gets a part of their channel (band) but each user can access that frequency band continuously We divide the channel based on the schedule among them and if its not used its wasted Data link layer Second layer of the hybrid model after physical and below network layer Still the second within OSI In host-to-network division (first layer) of the TCP/IP layer Supports reliable, efficient communication of “frames’ between two adjacent machines Handles transmission errors and flow control to prevent overflow Frames is the unit of transmission in data link layer Typical implementation Physical layer and part of data link layer is implemented on hardware called link Link allows computer to be connected to network Driver contains interface, allowing operation system to use new hardware without knowing all the details Different layers are at different levels and perform different functions This solution, however, is not unique, this is a design choice Functions of Data Link Layer Provide a well-defined service interface to network layer Handling transmission errors Data flow regulation Network layer doesn’t need to know the detail, it has an interface Relationship between packets and frames Link layer accepts packets from the network layer, encapsulates them into frames, sends using the physical layer; reception is the opposite process Data link later pairs think they are talking to each other, but its virtual communication and actual physical communication happens in physical layer Packet from the network layer is encapsulated into the frame as the payload field of the frame On the other side, data link layer will read this frame and complete error control and flow control, then extract the packet delivered to network layer All these services are provided by network layer Internet technology lecture 6 In twisted pairs, higher quality has more twists as they will have less interference and can lead to higher bandwitdth Fibre optic connectors Connectors and fibre sockets lead to 10-20% loss Mechanical splice leads to 10% loss Fusion leads to less than 1% loss We cant have a single line running towards the end, and which is why we need the above options Thse are all to keep the light in the fibre and lower cost usually means higher loss Fibre optic networks Fibre optic cable is a scalable network media Fibre optic cable network can be organized either as a ring or as a bus network We care about ring topology even though its not used in practice because its easy to manage Comparison=: Wires and Fibre Wireless transmission Mobile users require a mobility enabled network – contrast with the wired networks Uses electromagnetic wave propagation Wireless signals are broadcasted over a region. Broading may mean less security, so you need some mechanism to manage security Potential signal collisions – need regulations Basis of electromagnetic waves Frequency: number of oscillations per second of a wave, measured in Hertz (Hz) Wavelength: Disance between two consecutive minima or maxima Speed: All EM waves travel at the same speed – speed of light 3 x 10^8 m/s Fundamental relationship: wavelength x Frequency = speed of light Units: m x 1/s = m/s Wavelength and frequency is inversely proportionate to eachother. Higher frequency means lower wavelength Electromagnetic Spectrum Different bands have different uses Radio: Wide-area broadcast – log scale is used since it’s a very wide variance Higher frequency can carry more information per unit of time Microwave: LANs and 3G/4G Infrared/Light: Line of Sight. Even easier to be blocked, can be blocked by physical matter like objects and walls Communication satellites Uses EM waves (microwaves) Effective for broadcast distribution and anywhere anytime communications Types of satellites include Geostationary, Medium-Earth Orbit, and Low-Earth Orbit Distinguished by their altitude from earth Highest one is from 35k km altitude (GEO), it also has the highest orbital period (24 hours). It has a fixed location in the sky in perspective to us Medium Earth Orbit (MEO) (10k km) used for GPS Lower earth orbit from 0 to 5k altitude Higher altitude has higher latency Higher altitude also needs less sats needed for global coverage Geostationary Satellites Orbit 35800km above a fixed location VSAT(computers) can communicate with the help of a hub Different bands (L, S, C, Ku, Ka) in GHz are in use but may be crowded or susceptible to rain Uses 4 way transmission Low earth orbit satellites Altitude ranges from 550 to 750km Short latency Many of them needed (50) Supports high speed internet service especially to places difficult to reach for fibre optics Data Communication using Signals Information is transmitted by varying a physical property e.g. voltage, current Contuinuous signals can be represented using function f(t) = c x sin(a x t +b) C – amplitude, a/2(pie) -frequency, b phase Amplitude means absolute value from its highest intensity which is proportional to the energy of the signal Amplitude decreases over time due to attenuation We care about these changes in signals because we can use different signals to represent bits ( 0 and 1) Digital modulation Modem we use at home is performing modulation and de modulation Has 2 different types of transmissions – baseband and passband Baseband = signal that run from 0 up to a maximum frequency. Passband transmission occupy a higher range of frequencies. Those signals in the passband transmission are called carrier bands and can be changed in properties to represent 0’s and 1’s Microwave is a passband and we can use a certain frequency in this band and change their amplitude to send information Higher ampltitude is 1 and lower is 0 Higher frequency is 1 and lower is 0 Phase shifting 180 is 0 and 90 is 1 Essentially changing properties of these carrier signals represents binary numbers to send information To modulate the amplitude, frequency, and phase is known as digital modulation Of the modulation types, only NRZ signals of bits uses baseband transmissions whilst others use passband transmissions left hand side of ADSL are transmission lines for telephones Other frequency bands are used for upstream and downstream ADSL refers to asymmetric which refers to wider frequency bands for downstream than upstream. They make it wider so that the downstream speed should be faster A note is that symmetric has same width for upstream or downstream ADSL 2+ uses even higher frequencies for downstream to improve the downstream speed (download speed) Example Assuming we have computer A and computer B and we care about sending data like 0101 to the other side Then we use digital modulation in this step to convert them to signals since we cant send bits to the other side Signal should arrive in the same shape as they have been sent but in practice, if attenuation happens or we have some noise, we may receive something different To combat this, computers after receiving the signal, they will have to collect and read the values regularly of data points to recover the signal The more samples we collect, the more accurately we can recover the signal, but limitations include budget so sampling theorem comes in where it deems to recover signals correctly, we need to collect at least 2 times the frequency of the signal. At least 2 points per signal for proper recovery Internet technologies Lecture 5 Hybrid Model Removes session layer and presentation layer, only keeping 5 layers including application, transport, network, data link, and physical layer Origins of internet Telephone system not reliable since if the toll office fails, the whole network does too To avoid this, they proposed Baran’s distributed switching system, where users can have alternative paths ARPANAT Based on Barans distributed switching system Architecture of internet In our home internet, they are connected to a local area network which is then connected to tier one ISP backbone networks Tier one service providers include Telstra, optus, etc Those service providers are connected through internet exchange points Backbone networks support high speed and long transmission networks Physical Layer In OSI Model, the physical layer is in the lowest layer of the 7 layers In TCP/IP model, physical layer is in host to link division, and there’s no separate division. However, both of them physical is at the level Physical layer is concerned with the electrical timing and mechanical interfaces of the network Electrical perspective represents how to use signals (voltage levels, signal strength) Mechanical talks about material, and cable length Link model Its an abstract model of the physical channel as a link ignoring mechanical Consider the network as a connected link between computers Bandwidth: the rate of transmission in bits/second Delay: the time required for the first bit to travel from computer A to computer B Message latency Latency is the time delay associated with sending a message over a link and composed of two steps – transmission delay and propagation delay Transmission delay = message strength in bits/ rate of transmission Propagation delay is the time we spend traveling one side to the other which is length of the channel/ speed of signals Speed of signals is around 2/3 of the speed of light (C) for wire C = 300000km/sec, it is a constant Latency = T-delay + P-delay Example: First exercise, transmission is dominant due to low bandwidth Second example, we have higher bandwidth, so the propagation delay becomes the bottleneck When transmitting using high bandwidth over long distances, propagation delay is dominant Low bandwidth transmission, leads to transmission being the bottleneck The growth of bandwidth CPU speeds increase by a factor of 20 per decade and depends on the transistors on the circuit CPU speed is dependent on number of transistors However, we cannot keep adding transistors as there is a physical limit pertaining to granularity of engraving on silicon bandwidth increases by a factor of 125 per decade and it does not have a physical limit when using fiber optics, we are trying to convert electrical impulses to optical impulses, to keep increasing bandwidth transmission media can be categorized into wired and wireless wired transmission can include twisted pair, co axial, and fiber optics wireless includes electromagnetic waves and satellites performance of different physical media is affected by their physical properties, impacting signal strength Signal attenuation ideally, we want signal strength to remain through travel but in reality signal attenuation happens it is the loss or reduction in the amplitude (strength) of a signal as it passes through a medium signal attenuation impacts how far and how much data a medium can carry copper cable leads to higher loss than fiber optics Twisted pair used by telephone lines two insulated copper wires twisted in helical form twisting reduces interference: canceling out electromagnetic interference from external sources has 2 lines so that both wires will be impacted by external sources, leading to difference representing useful information distance up to 5km, repeaters can extend this distance twisting makes sure they have equal impact from external sources there’s a limit to amount of repeaters we can have Properties ad types of twisted pair different types of twisted pair leads to different bandwidths higher category number corresponds to higher bandwidth bandwidth measured in Hz Coaxial cable copper core with insulation, mesh and sheath better shielding than twisted pair = higher speeds over greater distances bandwidth approaches 1Ghz still widely used for cable TV/internet Fiber Optics has enormous bandwidth (THz) and tiny signal loss data transmission over a fiber of glass common for high rates and long distances Transmission of light through fiber 3 components including light source, transmission medium, and detector Signaling using LED’s or semiconductor lasers Semantics: light = 1, no light = 0 (basic binary system) A detector generates electrical pulse when light hits it Converts from signal to light pulse and then back to signal again and the conversion is the bottleneck Depends on refraction between air/silica There are different types of cables including single mode and multi-mode. They are distinguished by their diameters and how many rays of layers that can be transmitted Internet technologies – lecture 4 Analysis using Wireshark Networking is essentially set up as layers Every single functionality that’s used to make sure data is communicated from point A to point B successfully consists of multiple layers of functionalities where each layer performs one specific function The layers work together Example of an ideal layered connection Transmission of data across a network The item I want to transfer to another person is called a user data My data will go from my computer and through the multiple layers through the application, presentation, session, transport, network, data link, and then physical layer When entering a layer, a header will be added to the user data, and it appends to it and tells the system what it should be done at that specific layer Upon appending to the data, it then payloads down to the layer below. Data link layer is a bit more special. It has a header but also has an FCS (frame check sequence) FCS is used to verify data being transmitted that it hasn’t been modified Once that’s done, it reaches the physical layer (bunch of cables or media in which the data can be transmitted). Its something tangible such as wires or wireless media. This whole process is known as encapsulation Receiving data Is the exact opposite and known as decapsulation Looks in the header and checks how to process the information, and works in the opposite way up from physical layer to application layer and on to device that wants to receive the layer How exactly do these devices know who to send the data to? Addresses exist These include IP Address (network layer): helps route information from one network to another. Can be static or dynamic. Also, public or private MAC address (data link layer): helps identify specific devices from a multitude of devices out there. It is unique for each device There is different version of IP address (IPV4 and IPV6) however we will look primarily at IPV4 Transmission Control Protocol (TCP) Protocol that exists at the transport layer Provides a connection-oriented approach towards transmission of data between two devices Ensures reliability through re-transmission of lost/corrupted data. Treats data as segments Could have failures as this layer is highly complex (transport layer). This is what TCP exists for Does all of its magic through a TCP 3-way handshake 3-way handshake Host A Sends SYN request Syn received by Host B Host B sends SYN, ACK (acknowledgment) Syn received by Host A Connection established on Host A, and then Host A sends it ACK to host B User Datagram protocol (UDP) Provides connection-less approach towards transmission of data between devices If data loss happens, it happens No reliability Very widely used, since its fast due to no retransmissions One way message, no full-duplex connections outright in this Treats data as segments What is Wireshark Packet capture and analysis tool Provides a GUI for Packet Analysis Capture live packets or… review previously captures network data Wireshark example When just trying to capture data about accessing a certain website, we wanna limit the amount of information Wireshark is capturing. Filter that allows wireshark to only capture traffic from HTTP is “tcp port http” – Berekly packet filter format. Very popular format that’s used to programmatically specify what kind of restrictions or what kind of capture systems you want for any given flow of network traffic that exists on your device Pretty much everything on internet can be identified using IP address. Computers understand these IP addresses to locate certain things. They do not understand human readable names They convert these human readable names to IP addresses TCP is a connection-oriented protocol, so it first needs to form a connection before it can do anything In terms of Wireshark, each of the rows are individual packets, and the first 3 TCP protocols usually are forming a 3-way handshake To validate what’s the network setting in my device is “ipconfig” (windows) in the command terminal. Ways to validate whether an ip belongs to the particular web service that’s hosting the receiver is to just copy the address onto google (most reliable way). Other ways involve using command called “dig (name of actual thing)” One IP addresses validated, we send a HTTP request from our device to web service (GET request) All subsequent parts is involved in getting access to the web service Once receiving data we want, connection is terminated through sending an RST message (reset message) – this is a brute force way where one side sends an RCT message Other way is the FIN, ACK method, where both sides send a FIN, ACK to mutually disconnect COMP90007 – lecture 3 Choice of service type has a corresponding impact on the reliability and quality of the service Connection oriented services connect, use, disconnect. To implement different services we need building blocks known as service primitives Primitives are a formal set of operations for services The number and type of primitives depends on the nature of service – in general more complex services require more service primitives Six service primitives for implementing a simple connection-oriented service include Listen, connect, accept, receive, send and, disconnect The service can be a program and primitives can be functions inside that large program to implement some service. Relationship of services and protocols Protocols define details on how to provide a service by a set of rules. Services are abstract and don’t have details Service can be implemented using the primitive Essentially Services = a set of primitive that a layer provides to a layer above it Protocols = a set of rules governing the format and meaning of packets that are exchanged by peers within a layer Service is abstract Protocol contains those details for what a service does Reference model Concepts and their relationships Shows conceptual framework for understanding relationships between those concepts Examples include business models Can be presented to others and explains the framework to explain complex systems to a non-specialist Why do we need a reference model Different companies may have their own implementations but a reference model provides a common baseline for development Its engineering best practice to have an abstract reference model, and corresponding implementations are always required for validation purposes Networks are very complex systems, a reference model can serve to simplify the design process Internet has 2 main reference models including OSI model and TSP model OSI model stands for open system interconnection Proposed by ISO 7 layers Layer divisions based on principled decisions There are 5-layer principles Each layer is dependent on how many groups of tasks needed OSI layer division principles OSI reference model The layers include Bit, frame, packet, segment, SPDU, PPDU, and APDU Top layer always uses lower layer, building upon them Lower layer – physical layer Sends a stream 0 or 1 (raw data) They care about things like transmission rates Sending 1000 bit means other site agrees to 1000 bit being sent Cares about different transmission media – materials being used Data link Works on error detection, correction, flow control Tries to provide reliable and efficient transmission to upper layer Access control – controlling who can use what channel with multiple users Network Cares about how packets are sent from source to destination across different networks Along the path there are intermediate steps and the network cares about the optimal path from sender to destination Network layer determines where the next hub is Network layer cannot send information to other network layer and has to go through a path down to physical Transport layer Doesn’t care about the steps Only cares about sending information/message from source to destination Network layer will provide that service Controls reliability through the flow control, error control similar to data link but at a different level (segment level) Can split messages depending on protocol and combines at other side Session Controls the session Morel like functions matching audio and video of a multimedia data Contains checkpoints for crash management Presentation Cares about how data is presented and the syntax of them Encryption and decryption Compression to ensure data from application can accept them and correctly use them Application There are different applications for different uses Provides user interfaces and services People who use application layer are software’s TCP/IP reference models Transmission control protocol/internet protocol 4 layers Vint cerf and bob kahn (1974 Doesn’t have session and presentation layers, moving all the functions to application layer Didn’t split physical and data link layer to make host-to-network layer Strengths lies in its protocols Data link layer Cares about how to use different transmission media – DSL, SONET, 802.11, Ethernet Each of them ahs their own protocols Internet Main protocol is internet protocol (IP) ICMP Transport TCP – connection oriented UDP – connectionless services Application Multiple application levels for different purposes – HTTP, SMTP, RTP, DNS OSI is more complex and distinguishes the following three concepts explicitly – services, interfaces, protocols TCP/IP has successful protocols CRITIQUE OF OSI Critique of OSI model includes bad technology, bad implementations, and bad timing It does not use TCP or UDP Some layers only has a few functions and some layers are crowded Timing of developing a standard is also important Critique of TCP/IP Not a general model Service, interface, and protocol not distinguished Did not split physical and data link layers Minor protocol deeply entrenched, hard to replace Hybrid model 5 layers including, physical, data link, network, transport, and application layers COMP90007 – internet technologies lecture 2 The Internet is not a single network but a network of networks. It is the infrastructure to support distributed system like the World wide web The WWW is a distributed system runs on top of the internet They complete a common task together but they are 2 different things Uses of computer networks Business and personal applications Internet of things – parking, smart meter, vending machines The client server model involves requests and replies Types of transmission technology Broadcast links Networks have a single communication channel shared by all machines on a network Packets sent by any machine are received by all others. Intended recipients process the packet contents, others simply ignore it Point to point links Data from sender machine is not seen and processed by other machines 1 to 1 conversations Consists of many connections between individual pairs of machines Unicasting is the term used where point to point networks with a single sender and receiver pair can exchange data Multicasting Transmission to a subset of the machines Differentiating factors of networks can be by interprocessor distance Topology is the shape of the network and it talks about how those networks connect Mesh is that each device has a dedicated point to point link to every other device. Increasing the number of computers in this fully mesh will increased the number of connections quadratically Benefits of fully mesh is that its very robust but has bad scalability Bus has all devices attached to a shared medium. Only a single device on the network cant transmit at any point in time. Its runs an essential cable as the backbone and all computers are connected to that backbone Network can travel in both directions but only 1 message can be sent without conflict/collision in this network Ethernet is the most common bus network Very easy to add new machines Star topology has all devices attached to a central device The clients are not connected to eachother and information’s needs to be sent through the HUB Very easy to add or remove devices Weakness include collision due to multiple messages If hub fails whole network fails Ring topology has each device on a ring and receives data from the previous device and forwards it to the next If one device is slow it bottlenecks the speed of the whole network Requires access control to resolve propagation queuing What makes the internet work include protocols, layers and services Layers Each layer should have its own task Consider the network as a stack of layers and they build on top of eachother Each layer offers services to layers above it through interface Protocol is an agreement between communicating parties on how communications is to proceed Overall OBJECTIVE IS to support communication at the highest layer With an interface, the user doesn’t have to know the details of the technology used. Some layers can have headers which may be addresses of other layers to transfer information To prevent bottlenecking and overloading the network, we use smaller messages splitting apart big ones Information is from upper layer to bottom layer and service and protocols are provided from bottom to upper layers. COMP90007 L30 Digital Video Compression Main application of streaming multimedia data Video is represented by a sequence of frames Frame means 1 image and 1 frame is a rectangular grade of pixels Each pixel can be represented by bits In pixel, we store color information If there is 2 bits then there will be 2^2 colours 24 bit color bit pixels are used as a standard which is 2^24 colours Inside this video there is a lot of frames and each second a lot of frames are played per second which can lead to 200mbps uncompressed Video is sent compressed due to its large bandwidth requirements We use lossy compression that exploits the perception limits which include spatial redundancy and temporal redundancy Typically can achieve compression ratio 50:1 Most popular standard is proposed by MPEG and there are different version of MPEG MPEG 1 40:1 MPEG 2 200:1 MPEG 4 1200:1 Video and audio is compressed separately as different algorithms are used Streaming live media Data is produced in real time Most significant different is when we play that, we can’t skip forward Buffer at the client side to smooth out jitter Desires to use multicasting with RTP over UDP because there are multiple users watching at the same time However, TCP is used in practice Better to set up multiple servers for distribution which is content distribution service Real time conferencing Requires low latency Should be interactive Buffer should be small Increase bandwidth Compress video Uses QoS offered by network layer, marking packets for differentiated services Example protocol we could use for this is session initiation protocol (SIP) Network security Essential properties include confidentiality, integrity, and availability Can use authentication and non-repudiation It is essentially about protecting our network Layers responsible for network security include all the layers but they all have different tasks Physical layer: enclosing transmission line to avoid wiretapping. Fiber optics can do this better than other media Network layer: using firewalls to monitor traffic Transport layer: encrypting connections Transport layer cares about end to end communication and we have end to end encryption Most implementations are based on cryptography and authentication Cryptography Algorithms that can create secrets Cipher refers to algorithm performing encryption and decryption Ciphertext is combined of key and plaintext Key is a string that allows the selection of one of many potential encryption Kerckhoff principle COMP90007 L28 Streaming Multimedia Data Refers to audio and video data Made available with increased bandwidth Popular with increased technology devices Focuses on 3 categories of task including streaming stored media, streaming live media, and interactive audio and video conference Difficulty of these applications increase gradually Jitter impacts user experience directly unlike other programs This is why main challenges is high bandwidth requirement and high QoS requirements Streaming stored media Playing media over the web via simple downloads – this is simplest solution of solving for jitter and bandwidth Client does this by sending HTTP request for media file, and web server responds to this request using HTTP After web server sends downloads, client browser saves media to disk and can be played with media player Weaknesses of this simple model is the long delay at the start and cannot start watching immediately after accessing it Only supports point to point data distribution which means there’s no broadcasting which can cause waste of network resources Specialized multimedia software can fix this where we decouple web server that stores metafile of media and the media server that stores actual file Metafile is very short and can include durations, or other information like this. After getting metafile, browser sends to media player which then sends for actual media request (RTSP), which will request actual file and download part of data, stream it, and download more. This is streaming Once started, browser is no longer involved, and media goes straight to media player We can essentially play that media after downloading metafile RTSP (real time streaming protocol) is on application layer and mainly for media player. Can use TCP RTP (real time transport protocol) on transport layer, UDP based and allows multicasting and time stamping. Also responsible for reliability MPEG-4 is standard protocol which is used for compressing large files which is important for handling media data Specialized multimedia software User interface allows functions such as volume control, playback, next, etc. Behind the scenes, we’d like to ensure good user experience Handle transmission errors in conjunction with transport protocols using RTP/UDP, playback software must manage errors gracefully (managed by media player). Using TCP, task is completed by transport layer but usually UDP is used when speed is concerned Eliminate jitter using buffer Compress and decompress the multimedia files to reduce size Challenge: handling errors FEC (forward error correction) Uses redundant bits for error correction Parity bits, checksum and CRC or hemming code all belong to FEC For every X data packets, we add Y new packets for redundancy to check for errors Limit of this method is that it can only correct 1 error Other approaches include designing methods that are error resilient – reduce impact of error on data. Example include adding markers for resync Error concealment, and retransmission Jitter management Longer delay doesn’t mean high jitter Consistency is delay is important for jitter Can use buffer Inside buffer, we can define low and high water mark Used to prevent emptying the buffer In high jitter network environments, set low water mark higher than other applications can help solving the problem of that environment Without low water mark, there is a risk that buffer will be emptied and cause a gap Weakness of high water mark can be defined when we don’t want to overflow. We can tell media to stop when it reached high water mark Overflowing data leads to retransmission so this needs to be managed Low water mark is to avoid jitter – unsmooth experience, high is used to avoid overflowing ESSENTIALLY LOW WATER MARK IS TO PREVENT EMPTYING OF BUFFER, HIGH WATERMARK IS TO PREVENT OVERFLOW Compression Dealing with large file Original input of multimedia data is usually analog signals We need analog to digital converter (ADC) to convert into digital before we can compress We need sampling of some data points before conversion Sampling frequency is determined by 2 x maximum frequency within signal If signal has max freq 1hz, then we have to sample at 2hz which means collecting 2 points in 1 cycle (second) Methods including basing them on sequence pattern and the repetition General process of compression and decompression can be summarized as encoding and decoding Can be symmetric or asymmetric Symmetric: compression and decompression are reverses of each other. You can fully recover data during decompression and take similar amount of time for both Asymmetric method are provide: encoding algorithm is usually slow and complicated and decoding should be simple and fast. Process can also be lossy, when the decoded output is not equal to the original input but ideally it won’t impact our experience Using this we can achieve high compression rate and lead to it being faster Lossless represents symmetric and lossy represents lossy For emails we need lossless For multimedia, we need lossy as it takes advantage of human perception limitations as we cant notice some small changes Modern standard for compression algorithms are lossy Audio compression Perceptual coding is based on how people can periceve sound Masking some data over other data Tempoeral masking are for when humans ears can miss soft sounds immediately after loud sounds Frequency maskings is when loud sounds in one frequency band hide a soft soud in another frequency These can all be used to save space MP3 is a standard for audiocompression It is MPEG audio layer 3 which is lossy and can reduce by about 9 times of original audio size EMAIL One of the most important applications 3 main components including user agent, message transfer agent, and message transfer protocol User agent is the software application – email client for providing user interface like gmail and Microsoft outlook Message transfer agent is the male servers that provide server to transport message from source to destination like Microsoft exchange Message transfer protocols are those protocols that help us send email like simple mail transfer protocol (SMTP) 3 main steps in architecture is services where its mail submission, message transfer (uses SMTP) , and final delivery Protocols include SMTP which is involved in message transfer Final delivery uses POP3, and IMAP IMAP (internet message access protocol) is designed for multiple devices in mind where user can access the same mail on the server through multiple devices POP3 only considers single device, and downloads mail to that device and removes it from server User Agent Basic function like compose, display, and reply Also supports function of manipulating mailboxes We must provide message in standard format like provide recipient, cc, body Message format Sent as an envelope where related information is encapsulated for transport To, cc, BCC, from, sender, subject is all provided inside header Then we provide full message body which can then be sent to transfer agent Address is similar Original design uses RFC822 format which could only handle alphabetical letters in English To handle those cases, standard has been replaced by new RFC This solution of called MIME multipurpose internet mail extensions MIME is a standard that adds structure to message body and defines some encoding rules for non-ASCII messages MIME is an indicator for other side that says it contains multimedia data This is a message header There are also 4 more additional message headers including content - description, id, transfer, and type Message type is adding a message on a message Multipart type is to define messages with multiple attachments. These include mixed, alternative, parallel, and digest. Parallel type is for playing video Digest type is for packing multiple messages into single message SMTP Key protocol for email Involved in 2 steps The senders user agent to the senders mail server The senders mail server to the receivers mail servers Report the delivery status and any errors All users share the queue of the mail server To send messages, we need services from transport layer Transport layer need network layer to find all the routers on the path and send Lets see how this application layer implements transport layer Both mail submission and message transfer involves transport layer and uses TCP Mail submission client is user agent, and server is its own mail server There will need TCP connection on port 587 and user agent submits mail to that port We have IP address + port number to identify correct server to send to AUTH extension is used to verify the credentials of the client After submitting, server will put them in outgoing queue In message transfer, client is our mail server and recipient mail server is server Server has to check if recipient is valid email address Full stop. is used to signal end of message Al the transfer happens in 1 single hop so we send directly from sending MTA to receiving MTA Can repeat multiple times for automatic forwarding or implementing a mailing list Final delivery Final delivery is when recipient (BOB) request the mail from the mail server Bobs user agent is now client and mail server is still server Commands after forming TCP connection include LOGIN, LIST, CREATE, DELETE, COPY, etc. clients can contact server using these commands. This is IMAP POP3 simpler protocol and mail is downloaded to the user agent computer, not remaining on the server Webmail Uses website to provide interface for managing emails Same as user agent, but not a software downloaded on your pc, but used on web browser like how I usually use gmail Other side can use its user agent or web mail Webpage already provides login, so they use HTTPS to post data to mail server in mail submission step instead of using SMTP This is because they are submitting it as a webpage instead Server still listens to port 25 SPAM Unwanted emails Main countermeasures are checking sending domain, blacklisting, collecting spam and creating knowledge base, parking emails from unknown servers This is done automatically with these kind of identifications COMP90007 L26 In recursive query, when you are asking for IP address of specific domain, DNS server will do everything on your behalf to find IP address and send it back to you Iterative one, every query you are querying, the server will give you next DNS IP address, which you need to keep looking into Once a name server learns a mapping, it caches the mapping IP addresses of top level domain servers typically cached in local name servers Root servers are not often visited as they are very limited with only 13 things World wide Web One of the most famous Many components when browsing web Client and server software Web mark-up language Web scripting language Protocols: about how to transfer, e.g. HTTP We are accessing web using URL which has host name (handled by domain name service) and path name A web page consist of base HTML file which includes several referenced objects An object can be HTML file but also JPEG image, audio file, and java applet Sometimes you see ? before path which is used for dynamically generated paths (like PHP). This query allows you to pass some information to the web browser HTML is a markup language to build webpages HTTP Responsible for communication and transfer of web pages It is a request response protocol You send request, (give me a specific page), and you get a response (content you want) One server can handle many clients Browsers are doing all these kind of things behind the scenes Telnet is designed for HTTP and can be used to manually send HTTP requests to web servers and receive responses 2 types of HTTP connections including non-persistent and persistent DNS is required to be fast and usually using UDP HTTP need connections, so we are using TCP Non persistent, for every object in the page, you have to make a new connection. For each single object, you make connection, three way handshake, data transfer, disconnect. Not ideal usually and sometimes we keep connection open which is what persistent uses In persistent HTTP, multiple objects can be sent over a single TCP connection between client and server which is good. However, sometimes it is not a good thing as keeping servers always open is causing a waste of resources Some servers work in non-persistent and some work in persistent Total response time for non-persistent is 2RTT (1 for send, 1 for receive) + file transmission time Issues of non-persistent include a lot of connections and disconnections which adds overhead In persistent, when we make a connection, we pipeline where client sends request as soon as it encounters referenced object. This results in it being a lot faster than being sequential As soon as we see URL of next object, we send In non-persistent, it could also function like pipelining (check this not too sure) HTTP request always starts with GET request POST request also important and similar to get. The only difference is that in POST, you can have a body and you are sending some content with the request (an example usage is password, where you don’t want to send with GET since everyone can see it) In connection, if we say keep-alive, it means its persistent Cookies When talking about WEB, cookies are an integral part HTTP are known as stateless because it usually doesn’t record what the user is doing as it can lead to an overly large database if it tries to record everyone’s activity Since server doesn’t save them, they are saved in the client where client saves a file in the computer known as cookies This allows server to have some database that can relate the content based on type of cookies you saved and bring information from database based on your needs Advantages of cookies include authorization, user session state, shopping carts, recommendations, etc. They are somehow questionable mechanism for tracking users Maintains client information until deleted Cookies information size is small and limited Stored in both client and server and transferred between Cookie isn’t actually storing the data, but it is a link/relationship to pieces of data on databases of other websites Session Session information regarding user interaction is stored at server side up to some hours When user closes the website, session ends Sessions information size can be large Web caching Similar to DNS of caching on local DNS server When user sets browser to access web via cache, browser sends all HTTP request to cache If object in cache, cache returns object Else cache requests objects from origin server, then returns object to client Saved communication, bandwidth, very common It essentially saves a copy web page to give it to next user after previous user already went through process of asking it to fetch it from origin server It however, cant account for frequently changing data, where you need other update mechanisms to fix this COMP90007 L25 Application layer Layer where we run processes within operating system These processes send data through transport layer to be delivered to end point Many application layers but we will focus on famous ones 4 main things including domain name system, world wide web, email, and streaming multimedia data We know hosts by their IP addresses, so when you want to access website son web services, we need IP addresses. We don’t actually know it though and they are converted to domain name Domain name system (DNS) maps IP address and name DNS is core application of internet which is responsible for when you want to open a website, or sending an email, etc. It is about associating IP address with name but can also be used to find IP address using domain name There is a hierarchical naming convention for domain names There are over 250 Top level domains (TLD) Domain names are case insensitive Each components between those dots can be 63 characters and up to 255 characters overall Can be internationalized to have different letters Naming conventions follow either organizational or physical Absolute domain names end in a “.”, which specifies its exact location in the tree hierarchy. The “.” Is a root Other than address translation from hostname to IP, they could also be used for host aliasing, to shorten the name of the canonical names using alias names Can usually associate a mail server associated to domain which is also handled by DNS DNS is not only used for these naming conventional means, but they can also be used for load distribution to handle busy sites by replicating over multiple servers. The set of IP addresses duplicated is associated with 1 canonical name. DNS server routes the order of the addresses to distribute the load The DNS name space is divided into non-overlapping zones, where each zone is associated with some DNS service. DNS is not centralized because it keeps it from 1 single point for failure. It is also not centralized because if so many users are going to 1 place, it can lead to high traffic volume Maintenance is hard Long distance to centralized database for some locations Instead, we use hierarchical model At top of that list of domain list server, there is root server and there are 13 root servers globally The role of these root servers is to give us the IP addresses of TLD’s, sending request to one of these root servers and asking them for address of a certain TLD, they will have it Whenever we do address translation of some domain name services, we cache IP address of those top level things in DNS server There is a time to live for keeping the record of those cached addresses alive, so we don’t need to keep going to root servers for the addresses. Other than the root, there are authoritative DNS servers – responsible for zone where there is 1 primary and secondary zone Sometimes secondary can be outside the zone which could handle as a backup as failing in 1 network is possible, but 2 networks simultaneously is rare Everything we keep in DNS is resource records which store information for DNS to function efficiently RR includes domain name, TTL, class, type, and value Most of the time class is IN which is internet Famous types include A, CNAME, MX, NS, and SOA SOA has information about zone MX is for mail service There are some public domain name servers that everyone can use For all computer and host registered under a specific domain, records should be in the zone of that domain Every ISP has a DNS End of subdomain is the leaf, and if what you’re searching for isn’t there, it isn’t anywhere If it isn’t found there, then it isn’t a subdomain for what you’re searching In any of these steps, there is a chance for the cache record that can contain what we are looking for. There are 2 types of queries: recursive and iterative In recursive query, the server obtains mapping on clients behalf, instead of returning partial answers. When you’re asking that server, it will on your behalf find that IP address, do everything its required and return it to you Local DNS server is usually recursive, but root server is usually iterative Iterative query: the contacted server replies with the name of server to contact – it gives you instructions Usually only local is recursive, since it should handle the bulk of the job for your computer instead of computer doing this There is a hierarchy and in iterative one, each part of the hierarchy gives you the IP address of the DNS server in next level COMP90007 L23 The way that TCP works is that sender will send some data to client based on buffer size of receiver size If buffer size is 4kb and we have sent 2kb, then that 2kb sits in the buffer and we have 2kb leftover Whenever the receiver acknowledges data it has received, it advertises remaining buffer size as well Receiver sends ACK = 2048 to acknowledge 2kb of data Also sends WIN = 2048 to signal 2kb space available in buffer WIN = 0, means receiver doesn’t want any more data and couldn’t consume whatever was sent so far Sender will not send anything anymore and wait for another packet until WIN has space Sender is not obliged to send whatever window space is available, it could also be less than it and size that sender may send will vary 1 type of problem that might occur is silly window syndrome that can happen if you advertise the maximum amount of data you have which results in it frequently becoming full and having to wait to send that WIN = signal again Solution is for receiver to wait for specific amount of buffer to be available before sending availability Fast sender and slow receiver can cause this We need to set a timer for every packet we are sending. If we are not receiving ack in that timer, we assume packets are loss and resume with retransmission TCP measures a good timer by making a round trip acknowledgement called SRTT (smooth round trip time). This can vary based on the situation of the network Because of this TCP follows a mechanism of dynamically setting RTO, and uses an averaging It looks at the history and gradually changes timeout based on history and most recent RTT If we set timeout = to RTT, its risky, because jitter can affect this and variation of network delay Usually we do multiple of RTT (commonly 4) 4xRTT which is common for timeout There is also a standard variation, where if network discrepancies are high, there will be increased SRTT RTO=SRTT+4⋅RTTVAR We essentially set our timeout based on average of RTT and standard deviation Lower standard deviation means timeout closer to the average RTT Quality of service Important for networking These include Connection reliability, delay, bandwidth (how much data we can send), and jitter Jitter – standard deviation of delay Low variation in delays is good because so many applications rely on this delay to perform their purpose. Variations in packets coming in will cause freezing, or lots of packets coming in. overall, it causes inconsistencies 4 main: bandwidth, reliability, delay, jitter Jitter Jitter is the variation in packet arrival times High variation is high jitter and low is low Control of jitter for some applications is very important Approaches to jitter control include buffer packets at the receiver Shuffle transmission: slower packets sent first, faster packets wait in a queue QoS requirements Techniques of achieving QoS Over-provisioning – more than adequate buffer, router CPU, higher bandwidth Buffering – increased delay but reduced jitter Traffic shaping – regulate the average rate of transmission and burstiness of transmission. Example includes leaky bucket Large bursts of traffic is buffered and smoothed while sending, which can be done at the sender This guarantees smooth output rates and not spiky ones Resource reservation Admission control – routers can decide based on traffic patterns whether to accept new flows, or reject/reroute them Proportional routing – instead of sending data into single route, divide them into multiple Packet scheduling – create queues based on priority done through fair queueing, or weighted fair queuing What happens when congested Cause packet loss, duplicates, delay, etc. Congestion results when too much traffic is offered Goodput = useful packets Ideally we sent based on capacity, but we usually don’t get to it, since there are burstiness in networks which means 1 packet loss can cause significant problems We need a bit of space so this variation can be absorbed or else this could become a problem in special circumstances Flow control is an end to end control of traffic, primarily concerned with preventing sender transmitting data faster than receiver can receive Congestion control is an issue affecting the ability of the subnet to actually carry the available traffic, in global context. Its not about sender or receiver and rather about internal congestion. Ways to avoid congestion include provisioning, traffic-aware routing, admission control, and load shedding Congestion control of TCP The sender maintains two windows, each regulates the number of bytes the sender can transmit One window for flow control One window for congestion control There are 2 type of signals that we can get from network to tell us that there may be congestion. 1 such includes duplicate acknowledgements where we get from when a packet is lost. This is because the receive only sends ACK for the last correctly received in order packet The receiver keeps sending duplicate ACKs for packet 1 because TCP uses cumulative acknowledgments. It means the receiver acknowledges the last in-order packet it got (packet 1) and will keep requesting the next expected packet (packet 2) until it arrives. The other thing is timeout and if we don’t get acknowledgement after sending something, it’s also a signal for congestion Based on timeout and duplicate ACK, we manage it using additive increase multiplicative decrease AIMD When we want to increase a window size, we do it by adding it up When we want to decrease a window size, we decrease it multiplicatively There are 2 versions of TCP behaves slightly differently TCP Tahoe: slow start with additive increase TCP Reno: TCP Tahoe + fast recovery Slow Start Instead of adding 1 by 1, we add exponentially until we see 1 loss, then we decrease We are trying to get to that point that’s the optimal size of window It gets 1 packet, it goes up by 2, then 4, then 8, then 16….. Sender defines a slow start threshold, before slow start threshold we are going exponentially, then linearly We set this threshold very high in the beginning, as soon as getting 1 timeout, it halves the threshold and then starts again This is what TCP Tahoe does and what’s called additive increase and slow start Reno More intelligent When we are getting duplicates, we don’t need to wait for timeout. It already indicates that there is congestion Instead of waiting for timeout, we make threshold half when this happens and go linearly from there (additive increase from that point) If it gets timeout, then it is same as Tahoe It only works when you get duplicate ACK When we are getting 3 duplicates in TCP, instead of waiting for timeout, we could immediately send that packet again. This is called fast retransmission After fast retransmission, immediately fast recovery happens In wireless networks, there is a lot of packet loss and issues due to environmental factors. If we are going to do TCP in that kind of environment, we are in trouble. Solutions for this is that network itself handles retransmission – common way Other method is to change TCP to not react so sensitively to such issues COMP90007 L22 TCP is connection oriented Increased reliability Here we are not just sending packets like UDP, here we are making a connection and sending data streams Main difference is that UDP, you are sending a specific chunk of data in a segment, whereas TCP, you are sending a stream where there is no clear end and beginning of a message TCP sits in the kernel (most likely), library, and user process TCP is building this reliable transport layer on top of unreliable transport network (IP). On recipient side, we have entity that data transfers to buffer Recipient puts pieces together to build a string TCP has 2 pairs of sockets, where sender and receiver both create sockets consisting of IP address of the host and port number For TCP to activate connections must explicitly establish a connection between socket at sending host and socket at receiving host TCP can handle multiple connections at the same time TCP has 20 bytes header Data that you’re sending is added to this header and everything goes as payload of IP packet. For IP we can have up to 65k bytes so header shouldn’t be bigger than this Size of frame in data link layer is also a bottleneck – for ethernet, it generally has 1.5k bytes frame size If your segments are bigger than the frame, you break it down, but keeping it in 1 frame provides best result TCP has mechanism that detects MTU (maximum size of the packet) and then sends the segments in that size to achieve the best performance TCP protocol has sliding window strategy which sends data to this window that acts as a buffer Usually in TCP, acknowledgement happens in a collective way. Destination acknowledges the next byte it is expecting to receive and remaining window size REVIEW SLIDING WINDOW STRATEGY FOR STUDY LATER Selective repeat sends a piece of acknowledgement for all pieces of segment Go back N acknowledges segments in bulk. It will slide and acknowledge to the point of the latest piece of segment sent. TCP works like go back N, where you receive what you have so far and what you are expecting In TCP size of window changes dynamically based on situation of network and buffer size of the other party 2 mechanisms include flow and congestion control If it detects lots of duplicates or packets getting loss, it slows down – congestion control Flow control – sometimes a client is a bit slow which gradually consumes the data you are sending, it communicates with other size to adjust to the capacity of the other side Flow c