PROGRAM B FAQ (FREQUENTLY ASKED QUESTIONS) ------------------------------------------ Last Updated: Mon, 255 PM, May 10, 2004 ERRATA ------ o The local mode (-l) flag is optional, and I suspect, useless since you have already tested your code in Program A using dad. o The -d flag at the receiver is optional. I do not plan to use it. o The last update to the documentation template was April 28. INDEX ----- Q1) I get the confirmation from netsim that it is entering 354 mode... Then, it goes on to send empty packets to my rcvr. ... Q2) 'netsim -d' is still printing this for the first 20 packets: outQueue: Got header outQueue: Hex NBO header Then, it prints something that is non-zero. I don't know what to do. Is it a netsim bug? Q3) Do I need to implement slow start and VJ RTO calculation? Q4) What fields in packet.h are really important; i.e., how can I convert my Program A header to the Program B with the minimum of effort? Q5) What if I really want the IP address of my host and the receiver? Q6) I noticed that with W=1 netsim looks like dad but with larger RTT (even with '-d 20') but then seems to decrease its RTT with W=2. But with W=64, it sometimes runs just like dad but other times my program never finishes. Is this right? Q7) I noticed that netsim seems to enter congestion almost deterministically. Can I assume that? Q8) With dad you used a+1 as the BW-delay product. But when I used it with netsim, it seemed to be way off. Q9) I can see that the first pkt gets to my receiver, but only the ACK shows up inside netsim (I am running locally using './netsim -d'). What can be wrong. Q10) I am almost done with my experiments. What do you expect as an explanation? Q11) For programB, you want us to do the RTT estimation using VJ method. I thought we want to use the past RTT samples to estimate the current RTO value, it looks like the RTO is adapted based on the past samples. I ... Q12) My VJ RTOs seem weird. From the initial 200msec, the computed values go up as high as 330msec and then fall to about 83msecs to 100msec even though my RTT sample is between 25msec and 60 msec. I am using the same computation I used in HW5. I am currently using the default gv = 0.1 and gr = 0.1 values. Is this that normal?? ... Because my ETR becomes quite low because of the large RTOs. Q13) I can't think of any simple way to implement congestion avoidance using my ACK policy. Currently (with no congestion avoidance), I send W number of frames then check until I have received all the acks for the frames sent before advancing the frame. When I timeout, I retransmit that last unacked frame until I have acked all the frames in the window, then I advance. So I guess I can set W = cwnd and when I am receiving the ACKS, if I receive a timeout, I should set cwnd = W/2 and when I am sending new frames I do not send W but cwnd. Q14) My program was working fine until I tried P1 = 0.0002. Now, I notice my ETR really drop and also I get about 10 pkt drops. Since P1 = 0.0002, shouldn't I see just 1 or 2 drops in the forward direction? Q15) Is the netsim that is running on sarlacc.arl and ncp.arl different than the one that is posted on the Web site? Q17) For the ProgramB:RTO estimation, I can only run the program many times to get the best gV and gR. Is there a better way to estimate the gV and gR by some special RTT samples? Q18) In the decumentation template:5.1 experiments, you said "how did you measure the time", here you mean how did you measure the RTO? Q19) In the documentation template:5.2 results, you said " compare your error free results with the theory". We only know the link is 1Mbps, if there is no dropping and reordering and the window size is big enough, the ETR will be close the 1Mbps. However if there are droppings and reordering or the widnow size is not big enough, how can we calculate the theoretical ETR? would you please give me some hints? Q20) How do I get getopt to understand 'gR' (and 'gV')? Q21) Would it be possible to post a general baseline of the kind of performance it seems like we should be getting with specific command-line arguments? Q22) I get a "Connection Refused" after I Connect() and then call my netsim config function. I connect to my netsim ok but not to the public one. What does that mean? Q23) I want to write a script that will work when you test it. What advice do you have? QUESTIONS/ANSWERS ----------------- Q1) I get the confirmation from netsim that it is entering 354 mode... Then, it goes on to send empty packets to my rcvr. My sender shows: SENDER: Header size : 20 Total size :101 Size of the Mesage : 80 SENDER: The Byte count is : 101 When I run netsim in -d mode it states that it receives a pkt with dlen 0 and seq 0. The netsim debug output is: outQueue: Got header outQueue: Hex NBO header Notice that it thinks the dlen is 0. What is going on? A1) The two most common mistakes are: 1) You do not call htons() and htonl() on the packet header. 2) You do not follow the netsim discipline of making the byte stream header-body-header-body-... where dlen indicates EXACTLY how many bytes are in the body. It looks like you violated #2 since your byte count is 101 instead of 100 (=20+80). Q2) 'netsim -d' is still printing this for the first 20 packets: outQueue: Got header outQueue: Hex NBO header Then, it prints something that is non-zero. I don't know what to do. Is it a netsim bug? A2) It's unlikely that it is a netsim bug since I have been running some version of it for the last 2 years. Basically, you are sending it unexpected bytes. You may be able to get a clue by modifying netsim's nsRoute.c file: Print out in Hex each byte in the header. Even more brutal is to find netsim's xread() code and add code to print each byte in hex as it reads them. Remember to fflush() after each printf. In fact, I should add in calls to fflush() in netsim's code. But here is another possibility. Since your first packet seems broken, it might be your preceding message to netsim that is wrong. That message is the configuration message which should be constructed as follows: sprintf(buf,"%s %s %d %f %f %f %f %d %d\n", user, rhst, rprt, p1, p2, p3, p4, delay, xdelay); write(sdNetsim, buf, strlen(buf)); Note that this must be very precise because netsim assumes that as soon as it sees the newline character that the packet header will begin!!! Q3) Do I need to implement slow start and VJ RTO calculation? A3) You can do whatever you want if you think you need it. But I suggest the following: o Make some RTT and maybe interpacket time measurements for atleast 2 cases: 1) Light load (W=1); 2) Heavy load (W >> 1). This is for the basic case of Pi = 0 and D = 20 msec. You will get a feel for RTT and interpkt times. Also, note the volume of pkt drops ... should be 0 but not if your fixed RTO is too small. You should be able to do this with your Program A (with maybe some more instrumentation code). o Think about what algorithms might be worthwhile based on your measurements. I would think that the following is in the highest payback first order when Pi = 0: - Congestion avoidance - Window size Note that you want to make sure RTO is not too small ... it doesn't have to be perfect. In fact, at this stage, if you just make it really big, that's good enough because there are no drops. o Examine your protocol's retransmission policy for reordered pkts. Improve it and test it. This is E2E, so you really don't want to congest the fake router with too many retransmitted pkts. o Examine your algorithm with respect to dropped pkts. Improve it if necessary and test it. Here, RTO may be a little more important but I suspect not enough to be anal about it. It will depend on your measurements. If you have been writing scripts to run your experiments, you can repeat all of the experiments in the Appendix of the Doc Template to see if you get repeatable results and see if things make sense. Q4) What fields in packet.h are really important; i.e., how can I convert my Program A header to the Program B with a minimum of effort? A4) All netsim cares about is the length of the header and length of the body. If you plan to run a local netsim and use -d, then you want to make sure the sequence number is correct. But it doesn't care about the sequence number although it prints it out for -d. It does what you expect: Read 20-bytes which it expects to be the header; looks at the length field; and reads the body. Since it doesn't even care what you call them (it is looking at memory), the easiest thing to do is to keep your same field names for the sequence number and body size and pad everything else out with 0s; e.g., typedef struct hdr { ulong da; ulong sa; ushort sp; ushort dp; ushort pad1; // well, maybe 'ushort cntrl;' ushort pad2; ushort mySeqNum; ushort myBodySz; } hdr_t; Then, zero the irrelevant fields out: da = sa = sp = dp = pad1 = pad2 = 0; But the header length and the myBodySz has to be correct. You can fill in the other fields later. In fact, you could even use: typedef struct hdr { ulong pad[4]; ushort mySeqNum; ushort myBodySz; } hdr_t; although I suspect that would be overly simplified. Q5) What if I really want the IP address of my host and the receiver? A5) gethostname() and getpeername() (after connecting) will get you the names (ASCII). gethostbyname() will return a host_ent entry that will contain the IP address (NBO). If you want it in dotted decimal notation then use inet_ntoa(). But the receiver can pull the IP addresses off of your header if your sender filled them in. Q6) I noticed that with W=1 netsim looks like dad but with larger RTT (even with '-d 20') but then seems to decrease its RTT with W=2. But with W=64, it sometimes runs just like dad but other times my program never finishes. Is this right? A6) In theory, netsim should look like dad with W=1 except for the extra delay that seems to go with running between CEC and ARL and the fact that now you have this heavyweight process netsim shared by perhaps other users ("#active" tells you how many other users are using netsim at the same time you are). The reason W=2 may have smaller RTT is that netsim uses 2 methods to decide when to transmit another pkt: 1) the unix clock (select); and 2) when it does wakeup from select, it will send out ALL pkts that have expired. But these apparent "acceleration" only lasts for small W > 1 and becomes insignificant for larger W. For very large W, netsim will simulate transient queue build up (e.g., other traffic). In fact, if things get bad enough, netsim will even drop a few pkts. And if you persist, it will drop all of your pkts if you continue inflate the window or retransmit pkts repeatedly within 1 RTT. Q7) I noticed that netsim seems to enter congestion almost deterministically. Can I assume that? A7) Well, no, since I will test your code using the same netsim but the onset of congestion may be slightly different. Q8) With dad you used a+1 as the BW-delay product. But when I used it with netsim, it seemed to be way off. A8) dad only had 1-way propagation delay. netsim has 2-way propagation delay. Furthermore, it has the "features" described in A6 which make the RTT load dependent and have transient congestive periods. Q9) I can see that the first pkt gets to my receiver, but only the ACK shows up inside netsim (I am running locally using './netsim -d'). What can be wrong. A9) Since you see the ACK header reach netsim, then look at the dlen field. Typically, people think that netsim ignores it ... but that is wrong. If you set it to say 10, netsim will expect a 10-byte body. If you don't send it, then netsim will get totally confused and who knows what it will do since it will consume your next header as data. Q10) I am almost done with my experiments. What do you expect as an explanation? A10) ... the following (among other things): a) Your results make sense. b) How far away your ETR is from what you expected (quantify) based on the various experiment conditions. c) How sensitive your protocol is to the exact details of the netsim characteristics. For example, will your protocol get the same high ETRs if I change the netsim parameters a little? i.e., how widely applicable is your protocol? Q11) For programB, you want us to do the RTT estimation using VJ method. I thought we want to use the past RTT samples to estimate the current RTO value, it looks like the RTO is adapted based on the past samples. I wrote the program this way to make the RTO adapted, however with more and more packets sent the estimated RTO is always too small, so there are lots of retranmission. If I simply set the RTO to 2000 and use fast retransmission, the ETR is very good. So I am not sure how to apply the RTT estimation into our program, would you please give me some hints. A11) Is the high ETR for all of the cases in Appendix B or just the simple ones? By definition RTT estimation is ALWAYS based on the present and past, never the future because no one knows how the future will turn out. Now, it may be true that a conservative RTO will yield good ETR since the drop probabilities are so small. But the RTO is based on the RTT estimate and the variability which are in turn a function of the alpha values and the nu value: RTO = rttEst + nu * varEst rttEst = f(alpha_R) varEst = g(alpha_V) You need to select the alpha values and nu values. If you select alpha values biased toward the present and a large nu value (larger than the standard 4), then you should get RTOs that will react to present conditions and be overly conservative. Q12) My VJ RTOs seem weird. From the initial 200msec, the computed values go up as high as 330msec and then fall to about 83msecs to 100msec even though my RTT sample is between 25msec and 60 msec. I am using the same computation I used in HW5. I am currently using the default gv = 0.1 and gr = 0.1 values. Is this that normal?? ... Because my ETR becomes quite low because of the large RTOs. A12) I would have to see the individual RTT samples, but if they are highly variable, then it is possible that the RTO could be quite high. If you record the RTTs and run them through your program for hw 5, and plot: sample RTT, RTTest, and VARest ... you might be able to see if things look ok. Also, do a few hand calculations around the points where the RTO looks really big to see if the results make sense. I should also note that there is some burstiness over a few interpkt times due to the implementation (select). Also, make sure that the RTT values are only for unretransmitted pkts (Karn). So, it is possible to see an RTT of 200 or 300 msec if your sender doesn't backoff. Q13) I can't think of any simple way to implement congestion avoidance using my ACK policy. Currently (with no congestion avoidance), I send W number of frames then check until I have received all the acks for the frames sent before advancing the frame. When I timeout, I retransmit that last unacked frame until I have acked all the frames in the window, then I advance. So I guess I can set W = cwnd and when I am receiving the ACKS, if I receive a timeout, I should set cwnd = W/2 and when I am sending new frames I do not send W but cwnd. A13) Suppose W = max window size. Then, you need to allow 1 <= cwnd <= W. If you are using your own buffering system and it is a linked finite length list, that should be quite easy. If you are using my ring buffer code, then you could fake it where W is the ring buffer size but now you only fill up to cwnd and allow cwnd to vary based on network conditions. Q14) My program was working fine until I tried P1 = 0.0002. Now, I notice my ETR really drop and also I get about 10 pkt drops. Since P1 = 0.0002, shouldn't I see just 1 or 2 drops in the forward direction? A14) First, P1 = 0.0002 means theoretically, that if you ran for infinite time then the fraction of drops is 0.0002. But there is always a small probability that in any interval of pkt transmissions that you could get multiple drops; e.g., In 1000 pkts, the probability that you will see N drops has a binomial distribution with probability C(1000,N) x P1^N x (1-P2)^{1000-N}. Yes, it is a small probability but not 0. Second, when P1 > 0, netsim 2.2 reacts differently to congestion than when P1=0: If you don't backoff it will get angry and drop some number of pkts ... but with some non-zero probability; i.e., some times it gets really angry and sometimes it doesn't get very angry at all. Q15) Is the netsim that is running on sarlacc.arl and ncp.arl different than the one that is posted on the Web site? A15) Yes, but only slightly. The one on the Web site always reacts to a congestion state the same way: dropping a few pkts. Note, however, that the congestion state is still non-trivial. The one running on sarlacc and ncp still reacts to a congestion state but randomly; i.e., it may decide to ignore it or drop some number of pkts ... it's anger level is random. In all cases though, there is added congestive delay that varies over time. Q16) My receiver is hung without receiving any pkts. But netsim says it sent something to the receiver. What do I do? A16) Put in some code before your first Xread call in the receiver that reads 1 byte at a time and prints it out in hex. See if you get anything. Also, print out the Xread parameter that is the number of bytes to read to see if it is correct. Q17) For the ProgramB:RTO estimation, I can only run the program many times to get the best gV and gR. Is there a better way to estimate the gV and gR by some special RTT samples? A17) Just pick gR and gV based on whether you to weight towards the past or present. This is not an exact science. Q18) In the decumentation template:5.1 experiments, you said "how did you measure the time", here you mean how did you measure the RTO? A18) ... measured the RTT sample and the time required to compute the ETR. Q19) In the documentation template:5.2 results, you said " compare your error free results with the theory". We only know the link is 1Mbps, if there is no dropping and reordering and the window size is big enough, the ETR will be close the 1Mbps. However if there are droppings and reordering or the widnow size is not big enough, how can we calculate the theoretical ETR? would you please give me some hints? A19) All we have is the case when all probabilities are 0. But the ETR depends on the bandwidth-delay product ... doesn't it? So, it depends on the window size you used ... but then your window size is probably changing, so maybe you need to use the average window size. Also, maybe most of the time the channel looks error free ... then, the error-free results can be used but must be weighted by the fraction of time it looks error-free. Now if the intervals where drops occur has a simple ETR, you can weight this case with the fraction of time that this case occurs: ETR = w1 * (Error-Free ETR) + (1-w1) * (Non-Error-Free ETR). Q20) How do I get getopt to understand 'gR' (and 'gV')? A20) Try this in your getArgs() function: case 'g': printf ("Got -gR arg = %s\n", optarg); printf ("optarg = %x, optind = %d\n", optarg, optind); if (*optarg == 'R') { gR = atof(argv[optind]); printf("gR = %8.4f\n", f); } else if (*optarg == 'V') { gV = atof(argv[optind]); printf("gV = %8.4f\n", f); } else { fprintf(stderr,"*** Error processing -g\n"); } optarg = argv[++optind]; break; This assumes that your getopt string contains "g:". Q21) Would it be possible to post a general baseline of the kind of performance it seems like we should be getting with specific command-line arguments? A21) Well, the whole point is that you should already know this (approximately) from theory for some cases. And if you can't attain these limits, you should collect data to show why you can't. Q22) I get a "Connection Refused" after I Connect() and then call my netsim config function. I connect to my netsim ok but not to the public one. What does that mean? A22) netsim could not connect to your receiver ... Look at the hostname and port that you are sending to netsim. Is it the host and port of your receiver? Note that "localhost" is not going to work since localhost to a public netsim is the host it is running on, not your receiver. Also, something like "chigger" will work on a CEC host, but the public netsims have no idea who chigger is. This is a DNS thing. The full name works; i.e., chigger.int.cec.wustl.edu. That's because the DNS server for sarlacc.arl and ncp.arl understands the full name (and some short names), but not chigger. Q23) I want to write a script that will work when you test it. What advice do you have? A23) I will run my own script. I will resort to your script only if my script fails. And, primarily, I will edit your script to run what I want. So, if you don't already have a script, don't spend too much time on this.