CS 533 - HOMEWORK 5 SOLUTION ---------------------------- Version 2, Apr. 27, 2004 Problem 1 (0 Points) --------- a) A LAN router must determine if the destination is on an attached bus or the IP address of the best router leading to the destination. Each LAN has a subnet number. If an incoming destination address "matches" one of the attached LANs, the router will create a packet (with destination MAC address) and send the packet out the appropriate output port. If not, the router does the same thing, but must use the MAC address for the appropriate attached router interface. The IP address must represent the subnet(s) to which the router provides access. B's routing table must have an entry for each attached LAN (N1, N3, N4) and reachable routers and include the following fields for each: 1) The destination address 2) Hop count 3) Output interface So, for our problem: Destination Hop Count Output Interface ------------------------------------------------ A (N2) 1 b1 (access to N2) C 1 b4 (access to ???) D (N5) 1 b3 (access to N5) N1 0 b1 N4 0 b4 N3 0 b3 b) A distance-vector protocol must deal with the count-to-infinity problem. There has to be a definition of infinity. We represent infinity by 'Inf'. For example, in the RIP protocol, Inf = 16. The global cost matrix is the aggregation of the distance vectors from all routers. For our problem, the cost matrix before the failure is: | A B C D ----|------------------------------ A | 0 1 2 2 B | 1 0 1 1 C | 2 1 0 2 D | 2 1 2 0 When B discovers that D has failed, the global cost matrix will be: | A B C D ----|------------------------------ A | 0 1 2 2 B | 1 0 1 Inf C | 2 1 0 2 D | 2 1 2 0 But when B exchanges distance vectors with its neighbors A, C and D, B will think that D is still reachable through the other routers and will pick the best one. Since B is 1 hop from A and C, it will think that the cost to D will be 3 = 1 + 2. Since A and C must go through B to get to D, they will know D is unreachable after they exchange distance vectors with B. Another exchange will reset the cost of B-D to Inf for B, but now A and C will think that they can reach D through B with cost equal to 4 = 3 + 1. This continues for a total of (Inf - 2) exchanges until all routers determine that reaching D will take Inf hops. ------------------------------------------------------------------- | Inf - 2 exchanges | ------------------------------------------------------------------- c) We first describe split horizon and then split horizon with poisonous reverse: - Split horizon can prevent the count-to-infinity problem for two adjacent routers. The rule: Router Z should never send its neighbor X information about a route to Y that was learned through X since X is nearer to Y. For example, A and C should not send information about destination D (paths A-D and C-D respectively). So, B will leave its cost to D at Inf. - Split horizon with poisonous reverse can accelerate convergence, but it can't prevent 3-way count-to-infinity. The rule: Router Z sends its neighbor X a cost of Inf for a route to Y that was learned through X since X is nearer to Y. For example, A (and C) should send a cost of Inf for its cost to destination D. So, B will leave its cost to D at Inf. d) Since a path vector gives the entire sequence of routers from source to destination, loops can be avoided by searching the path for the current router's entry. Problem 2 (0 Points) --------- a) Router starts with the oldest sequence number (most negative number in the a lollipop sequence). Sends an LSP to its neighbors. A router receiving an older LSP replies with its idea of the sequence number (which will be higher/newer). The booting router: Takes neighbor's idea of sequence number S as the value before it crashed and jumps the LSP sequence number to S+1 so that it becomes the newest number. The LSP database can be synchronized by getting the LSP database from its neighbor (or a designated master). b) Usually done by a HELLO protocol that periodically tests the link status. If a link fails, the new LSP entry will be flooded to the entire network. A distance vector approach exchanges distance vectors between neighbors. A failed link causes an infinite cost between the two end nodes to appear in the distance vectors of the two neighbors. The new link metric is propagated to the network through a sequence of neighbor exchanges (updates). Problem 3 (0 Points) --------- a) The window doubles every RTT. Initially, cwnd = 1. The receiver's window is 64 = 64 KB / 1 KB segments. So, we need to find n such that: n 2 = 64 or n = log 64 = 6 2 i.e., it will take 6 RTTs or 240 ms before the sender is allowed to send 64 KB. b) In general if the RTT is 40K ms, it will take 240K ms (240 = 6 x K x 40). Problem 4 (0 Points) --------- a) After the timeout, TCP sets ssthresh to cwnd/2 = 9 KB and enters slow-start. We expect: o The window (cwnd) should double every RTT and increment for each ACK. o Each ACK at the sender is followed by two packet transmissions (assuming a ready sender). ACK# cwnd after ACK recvd Seq# to Send -------------------------------------------- 1 2 2, 3 2 3 4, 5 3 4 6, 7 4 5 8, 9 cwnd (segment#) 1 |-------------->| p1 2 |<--------------| ACK 1 |-------------->| p2 |-------------->| p3 3 |<--------------| ACK 2 Actually, the packet and |-------------->| p4 ACK curves cross |-------------->| p5 4 |<--------------| ACK 3 |-------------->| p6 |-------------->| p7 5 |<--------------| ACK 4 | | b) 5 segments or 5 KB Problem 5 (4 Points) --------- [ Presented in class. ] Problem 6 (4 Points) --------- [ Presented in class. ] Problem 7 (6 Points) --------- [ You can assume either form of fast recovery (Reno or New Reno). ] Short Answer: Fast retransmit/recovery poorly handles multiple packet losses. The retransmission of p1 is triggered by 3 duplicate ACKs. As the retransmitted p1 moves towards the receiver, the sender continues to get the same duplicate ACKs. The duplicate ACK generated by pkt p256 will arrive at the sender before the ACK generated by the retransmitted p1 and R's buffers will be full (indicated by R's advertised window). When the ACK generated by p1 gets to S, Reno will exit fast recover and enter congestion avoidance. But there is only 1 pkt in the pipe. So, only an RTO will trigger a retransmission of p2. But because cwnd will still be too small, another timeout will occur. This continues for pkts p3, ... , p8. This is the original fast recovery algorithm as implemented in Reno. The newer fast recovery algorithm implemented in New Reno will retransmit p2, ... , p8 in 1 RTT each (assuming no ACK drops) since an ACK of only 1 more pkt (not all outstanding pkts) indicates that the next pkt was also dropped. Long Answer: a) The details of fast retransmit, and Reno fast recovery are sumarized below. The situation is: o The sender S should not send any more packets because cwnd is equal to the bandwidth-delay product (256 KB = 32 Mbps x 64 msec / 8). The loss of p1 will be signalled by 3 duplicate ACKs wanting p1 when packets p9, p10, p11, and p12 arrive at the receiver R. S enters fast retransmit and will resend p1. Then, S will continue to: o Receive the same duplicate ACKs wanting p1 until the ACK for p1 arrives at S. o Send 1 new packet until R's buffers fill. This occurs after pkt 256 is sent since the advertised window is 256 pkts. But S must wait 1 RTT for p1' (p1's ACK). Afer S receives p1', S: o Leaves fast recovery since it has received p1', a non-duplicate ACK. o Since there are no more ACKs coming back, S can not send any more pkts. b) Then only an RTO will cause S to resend packets p3, p4, ... , and p8. Fast Retransmit: o Retransmission Policy: Retransmit packet X when sender receives 3 duplicate ACKs for packet X-1. o ssthresh <-- floor(cwnd/2) o cwnd <-- ssthresh + 3 o Enter linear phase of cwnd Fast Recovery: o Send a packet for every duplicate ACK received if allowed by the new cwnd value o cwnd <- ssthresh When ACK for retransmitted packet X arrives at the sender o Exit the fast recovery phase when sender receives the first non-duplicate ACK and enter the linear phase (congestion avoidance). But now cwnd is too small and there no more ACKs coming back except for the retransmitted pkt. New Reno's fast recovery improves upon this by recognizing that a partial ACK(n) (i.e., there are still unACKed pkts) means that pkt n is also missing. c) Window inflation allows transmission to proceed if there are enough ACKs still coming back from the receiver. Recovery is followed by reentry into a congestion avoidance phase instead of slow-start. If there is only a small number of random packet drops, fast retransmit and fast recovery work well. But when consecutive packets are dropped in a long fat pipe, the receiver's buffers can become quickly filled preventing enough duplicate ACKs to recover all lost packets. Unless the receiver's advertised window is larger than the bandwidth delay product, the receiver's window will close after around 1 RTT. d) Slow-start after a timeout allows the network to drain its congested queues. e) Clustered packet drops are often caused by tail drops at congested queues. By entering slow-start, TCP allows these queues to drain; i.e., leave the congested state. By chance(not design), Tahoe will fill in the missing pkts during slow start. The paper "Simulation-Based Comparisons of Tahoe, Reno, and SACK TCP" by Fall and Floyd (www-nrg.ee.lbl.gov/papers/sacks.pdf) discuss this consecutive drop issue. Note the results of the 3-packet-drop simulation. In particular, New-Reno TCP tries to keep the ACKs flowing by inflating the window and staying in fast recovery until a new packet (new since entering fast recovery) is ACKed. It also adds a new wrinkle to a retransmission due to 3 duplicate ACKs: Retransmit packet n if the returning ACK n packet does not ACK the entire send buffer. The reasoning is that packet n must have been dropped since the retransmission due to 3 duplicate ACKs only filled in the packet sequence up to an including packet n-1. In SACK TCP, ACK packets contain information about the receiver's buffer state that can be used by the sender in retransmission. It also maintains a variable 'pipe' that attempts to estimate the number of outstanding packets in the pipeline. Problem 8 (0 Points) --------- There is no solution. Problem 9 (6 Points = 1 + 1 + 1 + 2 + 1) --------- a) The 'trace.txt' file shows: "throughput: 238071 Bps" i.e., approximately 1.8 Mbps (= 8 x 238071 Bps) which includes idle time but not duplicate bytes. In particular, the value is computed in the following manner: throughput = (unique bytes sent) / (elapsed time) = 16,774,275 / (1 min, 10.459102 sec) = 238,071 Bps [ Note: The connection was attempting to get a 16 MB object from the San Diego Supercomputer Center. ] b) The 'trace.txt' file shows: "idletime max: 20682.4 ms" This is not the total idle time but the maximum time period in which 'snoop' did not see a packet. In fact, this period occurs at the front of the session and is probably due to the application doing something else besides pumping data. In fact, the application is communicating with another server during this period to get a catalogue entry (you can't tell this from the 'xplot' output though. If you could remove this time, you would increase the throughput to: throughput' = throughput x (elapsed time) / (elapsed time - idle time) = 1.8 Mbps x 70.5 sec / (70.5 sec - 20.7 sec) = 2.5 Mbps If you also removed the remaining idle time, the throughput would be even higher. c) The ramps indicate forward progress, and the plateaus indicate either idle or retransmission periods. (See Part c for more details.) In particular, if you zoom in on a plateau, every "R" indicates a retransmitted packet. d) The longest plateau (excluding the 20 sec leading idle time period and the period that terminates the connection), occurs around the interval [53.54, 57.67]. This is a 4.13 sec interval in which not much is happening. The two continuous staircase lines that form an envelope around the packet transmissions indicates the TCP window. The bottom line moves up when ACKs come in from the receiver (ghidorah.sdsc.edu), and the top line moves up when the receiver opens up its advertised window. If we look at the left end of this interval, the sequence numbers are near the end of the window and the window closes momentarily. This means that the sender (brainmap.arl.wustl.edu) is doing a good job of filling the pipe. Since there are no retransmissions at the beginning of this idle period, the idleness must be due to the application. Note also that the ACK line "catches up" with the sequence number curve indicating that ghidorah is ACKing packets. At the right end of the idle interval, brainmap begins to transmit again trying to fill the window. The window begins to close, but then opens up again as the ACKs come back on the return path. But at around 57.81 sec things stall, and we see 3 duplicate ACKs (actually 2 duplicate ACKs!!!) which trigger a retransmission. Note also that downward ticks on the ACK line that indicate the receipt of ACKs with ACK numbers that duplicate the preceding ACK packet. So, in this long retransmit interval, ghidorah is sending ACKs saying it is getting packets but not the right ones. Fast retransmit is now running, but brainmap is at the end of the window and can't do much until it gets the right ACK. About 120 msec later it does. [Aside: The next plateau that occurs around 57.8 sec is interesting since it repeats a phenomena seen in other plateaus as well: a retransmission due to 3 duplicate ACKs followed by 2 retransmission spaced about 80 msec (1 RTT) apart. This is due to New-Reno TCPs fast recovery algorithm. The fact that the ACK due to the first retransmission did not ACK all of the packets in the window indicates that there is at least one other packet that was dropped. In fact, there were 3 consecutive packets dropped (look at the ACK numbers). New-Reno TCP responds by retransmitting the lowest unACKed packet even though it has room in the send window. ] e) At a course level, the three steepest ramps occur at (actually, the third ramp is about the same as all of the other remaining ramps): Time Interval Size Effective Throughput -------------------------------------------------- 1) [32, 34.5] 2 MB 6.4 Mbps = 2 MB / 2.5 sec 2) [22.3, 25.2] 2 MB 5.5 Mbps = 2 MB / 2.9 sec 3) [40.3, 44.0] 2 MB 4.3 Mbps = 2 MB / 3.7 sec Problem 10 (6 Points = 2+2+1+1) ---------- a) Not shown. b) The output below shows the RFC 793 RTO as beta*RTT and Van Jacobson's estimate as RTO. The "under" column shows how many times the RTO estimate was below the sample value (Rn). The RTOerr column is the value of RTO-Rn. n Rn Rnew Vnew beta*Rold RTO RTOerr under 1 40.00 40.000 0.000 80.000 40.000 0.000 0 2 40.10 40.013 0.025 80.000 40.000 -0.100 1 3 40.20 40.036 0.066 80.025 40.113 -0.087 2 4 40.30 40.069 0.115 80.072 40.298 -0.002 3 5 40.40 40.110 0.169 80.138 40.530 0.130 3 6 40.50 40.159 0.224 80.221 40.787 0.287 3 7 40.60 40.214 0.278 80.318 41.056 0.456 3 8 40.70 40.275 0.330 80.428 41.328 0.628 3 9 40.80 40.341 0.379 80.550 41.596 0.796 3 10 40.90 40.410 0.424 80.681 41.857 0.957 3 11 41.00 40.484 0.465 80.821 42.107 1.107 3 12 41.10 40.561 0.503 80.968 42.346 1.246 3 13 41.20 40.641 0.537 81.122 42.573 1.373 3 14 41.30 40.723 0.568 81.282 42.789 1.489 3 15 41.40 40.808 0.595 81.447 42.993 1.593 3 ... 16-85 omitted ... 86 48.50 47.800 0.800 95.400 50.900 2.400 3 87 48.60 47.900 0.800 95.600 51.000 2.400 3 88 48.70 48.000 0.800 95.800 51.100 2.400 3 89 48.80 48.100 0.800 96.000 51.200 2.400 3 90 148.90 60.700 25.800 96.200 51.300 -97.600 4 91 149.00 71.738 41.425 121.400 163.900 14.900 4 92 149.10 81.408 50.409 143.475 237.437 88.337 4 93 149.20 89.882 54.755 162.816 283.045 133.845 4 94 149.30 97.309 55.921 179.764 308.902 159.602 4 95 149.40 103.820 54.963 194.618 320.992 171.592 4 96 149.50 109.530 52.642 207.641 323.674 174.174 4 97 149.60 114.539 49.499 219.061 320.100 170.500 4 98 149.70 118.934 45.915 229.078 312.536 162.836 4 99 149.80 122.792 42.152 237.868 302.593 152.793 4 100 49.90 113.681 49.837 245.585 291.402 241.502 4 101 50.00 105.721 53.298 227.362 313.031 263.031 4 102 50.10 98.768 53.879 211.442 318.914 268.814 4 103 50.20 92.697 52.551 197.536 314.284 264.084 4 104 50.30 87.398 50.013 185.394 302.902 252.602 4 105 50.40 82.773 46.759 174.795 287.448 237.048 4 106 50.50 78.739 43.137 165.546 269.808 219.308 4 107 50.60 75.221 39.388 157.477 251.288 200.688 4 108 50.70 72.156 35.671 150.443 232.772 182.072 4 109 50.80 69.487 32.092 144.312 214.841 164.041 4 110 50.90 67.163 28.716 138.973 197.856 146.956 4 111 51.00 65.143 25.578 134.327 182.027 131.027 4 112 51.10 63.388 22.694 130.286 167.454 116.354 4 113 51.20 61.864 20.067 126.775 154.164 102.964 4 114 51.30 60.544 17.692 123.728 142.134 90.834 4 115 51.40 59.401 15.555 121.087 131.310 79.910 4 ... 116-185 omitted ... 186 58.50 57.801 0.799 115.402 60.895 2.395 4 187 58.60 57.901 0.799 115.601 60.995 2.395 4 188 58.70 58.001 0.799 115.801 61.096 2.396 4 189 58.80 58.100 0.799 116.001 61.196 2.396 4 190 158.90 70.700 25.799 116.201 61.297 -97.603 5 191 159.00 81.738 41.424 141.401 173.897 14.897 5 192 159.10 91.408 50.409 163.476 247.435 88.335 5 193 159.20 99.882 54.755 182.816 293.043 133.843 5 194 159.30 107.309 55.920 199.764 318.900 159.600 5 195 159.40 113.821 54.963 214.619 330.991 171.591 5 196 159.50 119.531 52.642 227.641 333.673 174.173 5 197 159.60 124.539 49.499 239.061 330.099 170.499 5 198 159.70 128.934 45.914 249.079 322.535 162.835 5 199 159.80 132.793 42.152 257.869 312.592 152.792 5 200 59.90 123.681 49.837 265.585 301.401 241.501 5 a) f_1(n) is almost flat. f_2(n) adds 2 peak RTT periods from samples 90-99 and 190-199. Since g_R = 0.125, alpha_R = 0.875 which gives more weight to old values; i.e., the RTO algorithm acts like a low-pass filter that filters out spikes like f_2(n). The RTO plot should show a slowly rising estimate that is slightly larger than f_1(n) until n = 100. Then, the R_n estimate rises sharply when the RTT is about 141. Then, the estimate drops assymptotically for about 35 samples. At that time, the estimate is about 10% higher than the RTT values. This gradual decrease continues until sample 191 when the RTT suddenly jumps up to 142. A large change (variance) increases the RTO. So, the RTO responded quickly to a sudden increase in the RTT, but remained high for 20-40 samples after the peak RTT period. RFC 793 though has no adjustment for variance but inherently multiplies the RTT estimate by beta=2 as a margin of safety. This means that it substantially over estimates the RTT during the low variance periods. g_v = 0.25 also weights the additive term (eta x V) by the past. This tends to hold the RTO high after the peak RTT period. d) The g values are appropriate values. One could attempt to increase g_v to perhaps 0.9 to give more weight to the present to decrease the RTO quicker after peak RTT periods, but doing so will also cause the RTO during the peak period to be even higher - perhaps much too high. Also, you want a conservative estimate of the RTO when there is high variability since overly aggressive timeouts will decrease performance with extraneous retransmissions.