PROGRAM A FAQ Last Revision: 1215 PM, Mar 30, 2004 ERRATA ------ o The assignment mentions a 'Seed'. It's a typo. o RTO is Retransmission TimeOut. This is in msec and what you set on the sender side. o The receiver drop probability applies to both frame reads and writes (i.e., data and ACKs). o Omit parts 2c and 2d of the README since dad.c doesn't drop ACK pkts. INDEX ----- Q1) Since there is no frame reordering, can I eliminate the timeout if I ACK or NAK every frame? Q2) What do you mean by FIN algorithm? Q3) In the notes, you suggested to use "writev" or pack the header and body into a buffer and use "write", in order to avoid the header and body in more than one TCP segment. I usually use "send" to send out data, what make "write" better than "send"? because it is a system call? This two functions have very similar arguments. Q4) What is Nagle's Algorithm about? Q5) The argument to rcvr has drop prob. In dad, we have rdrop and wdrop. Are they both the same as the drop prob? Q6) Is there any reason to do two reads, header and body or can this be done in 1 read. Q7) With regards to RTO, how can I find the right or appropriate RTO. Can I initialize this by sending a number of frames to the sender and finding the average RTT. Is that a good RTO indicator? Q8) I am trying to use without redefining the variables (verbose delay dropStride rdrop wdrop ) etc defined in your dad.c inside my Receiver.cpp I thought I had to stick extern declarations on each variable in my Receiver.cpp and link Receiver.o and dad.o However, I keep getting a linker error "undefined reference to [xxxx] where [xxxx] is each of the variable names. What is the easiest way to get around this. I was going to move those static declarations and other method prototypes into the dad.h and include the file in my Receiver.cpp instead of linking. Q9) Do you have a suggested development sequence? Q10) My exit logic at Sender is : Send 'n' frames, wait for ACK for nth frame; Send n+1th frame to denote EOT to the rcvr with REXMIT bit set to avoid dropping at rcvr; I send this EOT frame with the REXMIT bit set such that its not dropped at sndr. rcvr sends the ACK with the REXMIT set such that Are we allowed to send this extra packet to denote EOT ? Q11) Is it guaranteed at the rcvr's dad module that any packet (data,ACK) with REXMIT bit set will not be dropped in any mode (STRIDE, PROB)? Q12) The assignment says: ... 2.b - I guess we need to use Stride Drop ? 2.c - Drop even no'd ACKS ? How do we do this ? Q13) Since seqNum in the frame header can hold only upto 255 , and since Prog. A requires us to run tests upto 1000 frames, are we supposed to write a wraparound logic at the sender ? Q14) Consider the following scenarios: Use dad_read() to read the header of a frame. Suppose this reading action will drop the header. Since we can not get the header, we don't know the length of payload. How can we skip the part of payload of this frame so that we can read from the begining of the next frame if we call dad_read() again ? Q15) Another question is about the seq. number. In this project, you require us to send 1000 frames. Since the maximum seq. number is 255, can you change the field of seq. number to be unsigned short integer ? Because the maximum seq. number is determined by the window size of sender and receiver, in this project, the sender just knows its window size and so does the receiver. Therefore, the sender and receiver must have a protocol to determine their maximum seq. number based on their supplied window size. That may make the project more complex. Q16) Can I just use the upper 4 bits of the cntrl field of the header to extend the sequence number field to 12 bits and therefore handle the 1,000 frame case? Q17) I plan to use my own receiver windowing code (not your RcvWinBuf) and I think I don't have to worry about virtual sequence numbers ... just use the integers from 0 to 255. Does that sound right? Q18) Could you expand on what you said in class about virtual sequence numbers? It seemed like you left out the hardest part: the receiver-side stuff. Q19) Since dad_read() fails on EOF, is it O.K for rcvr to exit during this condition ? Q20) For the project, the ACK policy you are referring to is a policy that simply recover by timeout, right? Or are we suppose to experiment with different policies, such as Taho, Reno, New Reno policies? Q21) How do we ensure we are getting the right ETR's ? Atleast within the ballpark. And not way off by order of ? For E.g : I get ETR's only @ around 300-500 Kbps running both Sndr & Rcvr on clarion ? I used: rcvr -w 64 -v 0 -D 0 sndr -h clarion -w 64 -v 0 -R 30 -n 1000 -b 1000 Q22) Is there any tools that I could use to verify against the no.'s that I should be getting ? Q23) On Solaris/Linux, how do we (command) figure out the NIC's B.W ? Q24) Although there are no delays/drop at the Recv'r there's still atleast 1-3 retransmits which varies randomly. How do we verify when we should see retransmits and when we should'nt ? For E.g : with less window sizes and with w=w'=1, with No Drop I dont see any retransmits. For with W=W'=64, I do see retransmits. Q25) Do I need to run the dad stuff at the sender? Q26) I am setting the cntrl field in the header to 'R' and dad doesn't seem to recognize that it is a retransmission. Q27) When I compile, I get an error message about an "unsigned short" parameter in the call to dad_init() being undefined; i.e., it doesn't understand what uint16_t means. It happens under Linux and when I try compiling and linking. Q28) In my FIN phase, your dad routine returns an error message ... something about an incorrect body size of 7680 bytes. Looks like a dad bug. Q29) Do you mean in A28 that select() will return with a ready file descriptor even if there is no data on the channel (i.e., after the sender has closed down the socket)? Q30) My program works except I have not been able to take care of the limited sequence number field in the header. So, I can only send 256 frames. What should I do? Q31) A number of email have been asking me if they can have a README file that is not plain text because they want to show me some nifty graphs of their wonderful results. Q32) Some how I still dont understand it but if I turned logging off the performance numbers seemed to be okay so I dont know why I was getting numbers greater than 1 Mbps for logging turned on. Q33) Also when you say co-ordinated fin phase is this a valid way of doing things? ... Q34) I have a bug in my code using dad_read. Hopefully you can clear some questions for me. Here is how I am using dad_read. First read header and then body.. ... I have modified your dad_read ... Q35) I understand that we're suppose to test our programs with body sizes of either 10 or 1000 bytes. However, in stdinc.h, you specify that the body size is limited to 20 bytes. How are we suppose to get around this limitation to make a body size larger than 20 bytes? I assume we shouldn't be changing anything in stdinc.h . Q36) I wanted to verify , as discussed in class, can we change Sndr's RTO to be 60 ms instead of the default 20 or 30 ms for Parts 2 and 3. Reason : RTO = twice * Worst case 1 way delay + Rcvr's Delay ... Q37) While running the test cases, should the two diff. hosts be diff. arch ? Q38) W.r.to theoretical ETR calc'n to compare against practical values ? Is the following right ? Assume there's no drop (P=0.0) and just D=20 ms, so RTO=60 ms ... Q39) When I did the experiment with the parameters, VerboseLevel(0), Nframes(1000), Delay(20), RTO(30), Win(64), DropProb(0), Body(1000), I got the "Full Queue" problem. Here is my explanation for this problem. Initially, the sender will ... Q40) I have a few qns regarding submission. The test outputs for 2 a and b is around 4 * 8 K files. Since its been explicitly stated that we should not attach files > 1K. what do we do, since its verbose output, the files are larger ? Q41) I don't know if anyone else had this problem, but whenever i tried to compile my receiver on Hilton (solaris), I would get the following compiler error in dad.c: ... Q42) If I don't need dad_dump(), can I still use the previous version of dad? Q43) I found that the behaviors of select on linux and Saloris are different. With Win = Win' = 1, RTO = 30 msec and Delay = 20 msec, running sender on chigger with receiver on calrissian.arl gave me a lot of timeouts, while if I did the other way around (sender on calrissian with receiver on chigger), I got quite fewer timeouts. Therefore, if I test my code on the second senario, I get pretty good performance (about 884 kbps when Win = Win' = 64, Body = 1000, Delay = 20, RTO = 30, and no drop occurs). Should I give additional explanation to this in my report? Q44) I found that even if my sender has received an ACK of the EOT, that it continues to receive pkts. Does that mean I need a FIN phase like the receiver where it flushes the channel? Q45) What should I do if some cases work but some just don't work by Tuesday? Q46) I am getting ETR = 500 Kbps when W=8, L=1000, P=0.0 and RTO=30 msec. But I get ETR = 700 Kbps for the same parameters except P=0.2; i.e., a higher ETR when there is dropping??? How can that be? Q47) How much is the virtual sequence number capability worth? Q48) Everything works except my FIN phase has a problem whenever my ACK gets dropped. This makes collecting data really hard. What should I do given that I don't have much time left? Q49) How do I get a double precision number from the command line ... like the drop probability? Everything is working except I haven't been able to figure out how to do that. QUESTIONS/ANSWERS ----------------- Q1) Since there is no frame reordering, can I eliminate the timeout if I ACK or NAK every frame? A1) No. As long as there is a chance that every ACK or NAK can be completely lost, you need a timeout. In the worst case, every ACK/NAK gets lost. Now, there is no chance for the sender to every send another packet unless the receiver spontaneously sends ACKs (or NAKs). Q2) What do you mean by FIN algorithm? A2) Both the sender and the receiver have to determine when they are actually done and close down the communication channel in a graceful manner. How do the sender and receiver do this? Q3) In the notes, you suggested to use "writev" or pack the header and body into a buffer and use "write", in order to avoid the header and body in more than one TCP segment. I usually use "send" to send out data, what make "write" better than "send"? because it is a system call? This two functions have very similar arguments. A3) When I say 'write', I also mean 'send' since 'write' is just 'send' with flags = 0. So, where ever I say 'write' you can use 'send'. Unfortunately, there is no 'sendv'. writev does gather-write where you can send a single segment composed from several memory areas. Q4) What is Nagle's Algorithm about? A4) Nagle's algorithm is normally turned on for TCP flows. It has a rule that attempts to delay write/send with byte counts less than 1 MTU in size when there is already 1 outstanding unAcked segment. So, turning it off means that a TCP segment will be sent even if you request a very small send. Q5) The argument to rcvr has drop prob. In dad, we have rdrop and wdrop. Are they both the same as the drop prob? A5) Ugh! I will make the drop probability passed into the receiver apply to all packets (both data and ACKs) ... Hmmm, that also means there is a bug in dad.c. I'll fix it today and send out email. Q6) Is there any reason to do two reads, header and body or can this be done in 1 read. A6) In general, you don't know the body size until you read the header ... although in this assignment you do. But dad assumes the general case ... so, you MUST use 2 reads. Q7) With regards to RTO, how can I find the right or appropriate RTO. Can I initialize this by sending a number of frames to the sender and finding the average RTT. Is that a good RTO indicator? A7) You know the theoretical lower bound on the RTO since you get to choose the delay. So, you might choose the RTO to be twice the delay or something like that. Q8) I am trying to use without redefining the variables (verbose delay dropStride rdrop wdrop ) etc defined in your dad.c inside my Receiver.cpp I thought I had to stick extern declarations on each variable in my Receiver.cpp and link Receiver.o and dad.o However, I keep getting a linker error "undefined reference to [xxxx] where [xxxx] is each of the variable names. What is the easiest way to get around this. I was going to move those static declarations and other method prototypes into the dad.h and include the file in my Receiver.cpp instead of linking. A8) All of those variables (e.g., verbose) inside dad.c are suppose to be invisible. So, there should be NO extern declarations except for the 3 interface functions ... and dad.h should take care of those. You MUST do compiling and linking of dad.c. 'static' means local to the file. So, if you put all of the stuff in dad.c into Receiver.cpp then you have exposed all of those variables that are suppose to be private! All you should have to do is include the original dad.h into your Receiver.cpp. Compile dad.c as stated: g++ -c dad.c g++ -o Receiver Receiver.cpp dad.o Q9) Do you have a suggested development sequence? A9) Here is one: Rudimentary Step: 1) Read the Errata in this FAQ. 2) Compile the dad.c file to produce the dad.o object file. 3) Call dad_init() in your receiver code with mode = NO_DROP and replace all calls to read() and write() with dad_read() and dad_write() and recompile and link. 4) Run the case W=4, W'=1, Nframes=8 or 10 to see if things still work with NO_DROP. Use verbose output. 5) Optional: Add in call to sleep(2) before call to dad_write() to slow down receiver ACKing to see interaction in real-time. Test Simple Dropping/Receiver: 6) Test response to data pkt dropping and timeouts (W=4, W'=1): Use STRIDE_DROP with stride = 4. Receiver only ACKs inorder pkts; silent otherwise. Sender timeout (i.e., select) and retransmit. 7) Improve receiver by sending ACKs and buffering out-of-order pkts W=4, W'=4 Use STRIDE_DROP with stride = 4. Receiver buffers and ACK/NAKs out-of-order pkts. 8) Test response to ACK drops: Hard code a kludge to drop 1 ACK pkt once. Pick an arbitrary ACK to drop (e.g., SN=3). More Difficult Cases: 9) Try PROB_DROP 10) Try non-zero delay. 11) Turnoff verbose. Measure ETR and compare with theory. 12) Improve algorithm. Repeat some of above steps. Q10) My exit logic at Sender is : Send 'n' frames, wait for ACK for nth frame; Send n+1th frame to denote EOT to the rcvr with REXMIT bit set to avoid dropping at rcvr; I send this EOT frame with the REXMIT bit set such that its not dropped at sndr. rcvr sends the ACK with the REXMIT set such that Are we allowed to send this extra packet to denote EOT ? A10) Yes (or mark the last data pkt with EOT flag on). However, your algorithm should not be depending on this REXMIT bit. Q11) Is it guaranteed at the rcvr's dad module that any packet (data,ACK) with REXMIT bit set will not be dropped in any mode (STRIDE, PROB)? A11) The REXMIT bit is used only for testing and only affects STRIDE_DROP mode. In probabilistic dropping, any pkt is subject to dropping. Q12) The assignment says: ... 2.b - I guess we need to use Stride Drop ? 2.c - Drop even no'd ACKS ? How do we do this ? A12) The Errata says the dad module doesn't support stride dropping of ACKs. Therefore, you don't need to do 2c and 2d. Q13) Since seqNum in the frame header can hold only upto 255 , and since Prog. A requires us to run tests upto 1000 frames, are we supposed to write a wraparound logic at the sender ? A13) Sequence numbers can wrap around. Although that is one of the last things I would worry about since you can test most of the code with a small number of frames. So, what we need is logic for testing two sequence numbers sn and sn' such that sn < sn' is well-defined and meaningful. Is there anything in the homework that might help us? Q14) Consider the following scenarios: Use dad_read() to read the header of a frame. Suppose this reading action will drop the header. Since we can not get the header, we don't know the length of payload. How can we skip the part of payload of this frame so that we can read from the begining of the next frame if we call dad_read() again ? A14) dad drops WHOLE pkts, NEVER partial pkts. Of course, this assumes that you are setting the header fields properly. dad will always read the header and body. If it decides to drop the pkt, it will drop the header AND body and read another header-body pair. When you call dad_read(), all of this should be invisible to you except that some pkts that you sent may seem to be missing ... but never partial pkts. Q15) Another question is about the seq. number. In this project, you require us to send 1000 frames. Since the maximum seq. number is 255, can you change the field of seq. number to be unsigned short integer ? Because the maximum seq. number is determined by the window size of sender and receiver, in this project, the sender just knows its window size and so does the receiver. Therefore, the sender and receiver must have a protocol to determine their maximum seq. number based on their supplied window size. That may make the project more complex. A15) No, the seq number field is 8 bits. It is sufficient (as long as your window size is 128 pkts or less). You just need a sequence number wraparound function. Since you will be debugging most of the time and therefore using a small number of pkts, the limitation won't even matter if you don't have a wraparound function. We will talk about this in class on Tues. Q16) Can I just use the upper 4 bits of the cntrl field of the header to extend the sequence number field to 12 bits and therefore handle the 1,000 frame case? A16) You could do that as a temporary fix, but it is just avoiding the finite sequence number set problem. Q17) I plan to use my own receiver windowing code (not your RcvWinBuf) and I think I don't have to worry about virtual sequence numbers ... just use the integers from 0 to 255. Does that sound right? A17) Yes. We know that as long as since the length field is only 8-bits, our sequence number set is limited to N = {0..255}. That's ok given the window constraint that window sizes will be no more than 64. Therefore, W+W' = 128 <= M = 256 where M is the size of the sequence number set; i.e., M = |N|. This constraint allows the receiver to know when a sequence number is an old number or not. The basic idea is that the infinite sequence of integers is formed from the 256 finite set of integers: 0..255, 0..255, etc. Note that the integers wrap around from 255 to 0. However, you probably already know that you will need the boolean inwin() function defined below. The basic idea is that the receiver slides a window of size W' through this sequence. At any time there is a W' subsequence of integers that are the "in-window" sequence numbers. The receiver accepts only in-window packets and rejects all others. This in-window sequence has the two numbers lo and hi that define the low and high end of the number sequence. Note that the face value of hi may be less than lo. But in reality, hi is the sequence number of the last in-window packet. Let n be a sequence number. Then, the boolean function inwin(n) is defined as follows: (lo <= n) and (n <= hi), if (hi >= lo) inwin(n) = (lo <= n) or (n <= hi), otherwise For example, suppose lo = 254 and hi = 1, then the in-window sequence number sequence is 254, 255, 0, 1. Q18) Could you expand on what you said in class about virtual sequence numbers? It seemed like you left out the hardest part: the receiver-side stuff. A18) If we do use the RcvWinBuf.[hc] files, we need to work with the infinite sequence number set {0, 1, ... , Inf} even though the actual sequence numbers are still from N = {0..255}. The basic idea is to associate a virtual sequence number v with each actual sequence number s. The sender code is straightforward. Let the ith sequence number at the sender be s(i). Then, v = i s(i) = i%W that is, the sender just increments the virtual sequence number, computes s(i), puts s(i) in the pkt header, and inserts the pkt into the send buffer using the virtual sequence number v (=i). The receiver code is a little more complicated since the receiver gets a sequence number n and must determine its virtual sequence number v. The RcvWinBuf class defines two variable lo and hi that are the low and high virtual sequence numbers that are in-window. Let s(x) be the actual sequence number of the pkt that would be in slot x. Then, the boolean function inwin(n) is: (s(lo) <= n) and (n <= s(hi)), if (s(hi) >= s(lo)) inwin(n) = (s(lo) <= n) or (n <= s(hi)), otherwise Note that s(lo) and s(hi) are the actual sequence numbers of the pkts that would be in slots lo and hi. We need to define the function offset(n) which is the distance into the window for a pkt with sequence number n: (n-s(lo)), if (n >= s(lo)) offset(n) = (256-s(lo)+n), otherwise Note that the definition holds only if inwin(n). Then, the virtual sequence number associated with the actual sequence number n is: v(n) = lo + offset(n), if inwin(n) which can be rewritten as: lo + (n-s(lo)), if inwin(n) and (n >= s(lo)) v(n) = lo + (256-s(lo)+n), if inwin(n) and (n <= s(hi)) Undefined, otherwise Now, we can insert the pkt using the RcvWinBuf class since we know the virtual sequence number. So, you can either extend RcvWinBuf to handle a finite sequence number set or write the auxiliary function vsn(n) that computes the virtual sequence number (using information from an RcvWinBuf object) and use the RcvWinBuf class as is. Q19) Since dad_read() fails on EOF, is it O.K for rcvr to exit during this condition ? A19) You will have to do that, but it is considered a minor receiver or sender bug since your receiver should not still be reading or the sender forgot to send enough information; i.e., your sender and receiver should be going through a coordinated FIN phase. If that is your only problem then you should try to fix it. If you have other more pressing problems, then fix those first. Q20) For the project, the ACK policy you are referring to is a policy that simply recover by timeout, right? Or are we suppose to experiment with different policies, such as Taho, Reno, New Reno policies? A20) Since your grade depends on how high your ETR is and your ability to clearly explain how good your measured ETR is, I doubt that timeout alone will be good enough ... although it is a good starting point (base case). That was the whole point of problem 4, homework 3! It looks like you have taken cs423 and know about taho, reno, etc. Those are end-to-end protocols. They do contain some good ideas, but we are dealing with a single-hop link level protocol. So, you should not waste your energy exploring approaches that have little chance of increasing the ETR. For example, o Why would you want to have anything to do with TCP's slow start? o There are no intermediate routers and therefore no cross traffic congestion. So, the RTT is not variable except for receiver responsiveness. Q21) How do we ensure we are getting the right ETR's ? Atleast within the ballpark. And not way off by order of ? For E.g : I get ETR's only @ around 300-500 Kbps running both Sndr & Rcvr on clarion ? I used: rcvr -w 64 -v 0 -D 0 sndr -h clarion -w 64 -v 0 -R 30 -n 1000 -b 1000 A21) dad simulates a 1 Mbps link with delay. I will talk more about this in class today, but basically, this should be an upper bound. Looks like you have delay = 0 msec and a body size of 1000 bytes. So, it should take 8128 bits/1 Mbps = 8 msec to transmit 1 pkt if there is 0 overhead. But pkts will likely get copied 4 or more times in/out of main memory. Depending on all sorts of parameters, from cs422s, each byte memory copy could take as much as 10 nsec (give or take a few nsec) and every system call could take 0.5 msec. So, it could take another 1-3 msec of overhead per pkt. But this is all speculation. For example, there is a huge time difference if you use memset and bcopy. Anyway, let's say it takes 10 msec per 1016 byte pkt ... then, if you are able to fill a window, you will run at 8128 bits/10 msec = 812 Kbps ... and that is if you are the only user on the system! Also, the select() call that is used to implement the delay uses at best a 10 msec timer. So, it could be off by as much as 10 msec. e.g., If it tries to delay 8 msec, it may actually delay 10 msec (or 11 msec). If it tries to delay 11 msec, it may actually delay 20 msec. So, maybe 500 Kbps is not unreasonable. Q22) Is there any tools that I could use to verify against the no.'s that I should be getting ? A22) No. Q23) On Solaris/Linux, how do we (command) figure out the NIC's B.W ? A23) That will do you no good since dad simulates a 1 Mbps link. Q24) Although there are no delays/drop at the Recv'r there's still atleast 1-3 retransmits which varies randomly. How do we verify when we should see retransmits and when we should'nt ? For E.g : with less window sizes and with w=w'=1, with No Drop I dont see any retransmits. For with W=W'=64, I do see retransmits. A24) You should have no retransmits if there are no drops/delays unless your code is either buggy or configured to timeout too soon. Sounds like you have a bug. You need to find out why there are retransmits because there shouldn't be any without drops. Your sender timeout value should be chosen to take care of the worst-case receiver delay. Q25) Do I need to run the dad stuff at the sender? A25) No. In fact, if you do that, things will act strangely. Q26) I am setting the cntrl field in the header to 'R' and dad doesn't seem to recognize that it is a retransmission. A26) You need to set the REXMIT bit to 1 in the cntrl field. For a data pkt, you need something like: hdr_t h; h.cntrl = REXMIT | DATA; // or h.cntrl = 0x09; What is important to dad is that the REXMIT bit is on. But note that the bit is ONLY necessary in STRIDE_MODE retransmissions. Q27) When I compile, I get an error message about an "unsigned short" parameter in the call to dad_init() being undefined; i.e., it doesn't understand what uint16_t means. It happens under Linux and when I try compiling and linking. A27) uint16_t is ISO_C. In Linux, defines it but it may have been defined earlier in something like . is included in dad.c when the OS is Linux. So, it is weird that you have that error. Furthermore, I have compiled dad.c on numerous Linux machines and Solaris without a problem. So, this points to the error message being misleading ... the error is really something totally unrelated. [Epilogue: the error was due to trying to link dad.o with the sender when it should be linked ONLY with the receiver! ] Q28) In my FIN phase, your dad routine returns an error message ... something about an incorrect body size of 7680 bytes. Looks like a dad bug. A28) No, it is a bug in your program ... Actually 2 bugs. The first bug is that your Xread is ignoring the fact that xread() can return 0 indicating an end of file. Note that select will return with a ready file descriptor even when the other end has closed down to indicate EOF. Your Xread() (which is my original Xread()) has no return code. So, if xread() returns 0 (EOF), Xread() will return also. Then, your code tries to read a body, but dad_read fails because there is nothing there because the channel has been closed. But dad_read uses the Xread() that is internal to dad.c. Solution: replace your Xread in stdinc.h with the one inside dad.c (you need to leave that one inside dad.c though because it is not exported or imported). Bug 2 is that you have not done byte swapping on the header length field. Note that 7680 is 30 in NBO. Here is a Unix command sequence to figure that out (the indented lines are the output lines from bc): > bc obase=16 7680 1E00 ibase=16 obase=10 1E 30 This value is really left over from the preceding pkt since nothing was really read. You need to call ntohs() on the length field on the receiver side. dad assumes that the length field is in NBO. Although it may convert it for its own use, it will pass it to you just like you sent it which means that the sender needs to call htons() and the receiver needs to call ntohs() on the length. Note that you do not need to do that on any of the other header fields ... do you know why? Q29) Do you mean in A28 that select() will return with a ready file descriptor even if there is no data on the channel (i.e., after the sender has closed down the socket)? A29) Yes. Q30) My program works except I have not been able to take care of the limited sequence number field in the header. So, I can only send 256 frames. What should I do? A30) Just report using 256 frames and note the problem ... unless you have time to fix it. Q31) A number of email have been asking me if they can have a README file that is not plain text because they want to show me some nifty graphs of their wonderful results. A31) Ok, but if you want to do that, use PDF ... I do NOT want a word document, postscript, etc. Furthermore, call it README.pdf so that I know that it is PDF. Reminder: I do NOT want binary files, etc. The shar file you send me should not be large ... something is wrong if the file is a MB or more. Q32) Some how I still dont understand it but if I turned logging off the performance numbers seemed to be okay so I dont know why I was getting numbers greater than 1 Mbps for logging turned on. A32) I have been running my code all weekend and I have never encountered a rate higher than 900 Mbps. I suspect that either you are either incorrectly printing something or measuring something. Even if you used the time command on the sender side you would probably see that something is in error: time sndr ... Then, compute by hand the ETR to see if it makes sense. Q33) Also when you say co-ordinated fin phase is this a valid way of doing things? Sender Wait until ack for eot is received Reciever Send an Ack for an eot, select on a time out to see if anything has been written on the wire by the sender if select times out or returns something less than 0 i am done. If on the other hand i read 0 bytes then i know its an eof and i quit. If this is a valid thing to do why do i get connection reset by peer or dad_alarm: rbuf = 0 in the fin phase. A33) You need to make sure that the dad queues are empty. A simplified FIN approach is to just keep calling dad_read() and responding until it returns 0. Note though that you can not say you are in the FIN phase until the receiver window has shifted past the EOT, NOT just that the EOT was received. Q34) I have a bug in my code using dad_read. Hopefully you can clear some questions for me. Here is how I am using dad_read. First read header and then body.. ... I have modified your dad_read ... A34) I am not sure what you are saying and I am not sure what BLconversion() does, but here is what I do on the receiver side in most cases: if (s->Recv(hdrptr, sizeof(hdr_t)) == 0) ... premature fin error ... bodySz = ntohs(h->length); if (bodySz > 0) { if (s->Recv(bdyptr, bodySz) == 0) ... error ... } dad_read should work as long as you have hdr-body pairs (assuming bodySz > 0) and there is no special thing you need to do. There is a small case that might be a problem on EOF. But the most recent dad file fixes that. You can do a diff to see what was changed in addition to defining dad_dump(). Yes, it does have slightly different code when Xread() returning 0. I have been running this latest version of dad all weekend and have not had any problems ... well, the FIN phase has to be correct on both sides. Q35) I understand that we're suppose to test our programs with body sizes of either 10 or 1000 bytes. However, in stdinc.h, you specify that the body size is limited to 20 bytes. How are we suppose to get around this limitation to make a body size larger than 20 bytes? I assume we shouldn't be changing anything in stdinc.h . A35) You should not leave that variable in stdinc.h. You can change anything you want. In fact, you will have to. Q36) I wanted to verify , as discussed in class, can we change Sndr's RTO to be 60 ms instead of the default 20 or 30 ms for Parts 2 and 3. Reason : RTO = twice * Worst case 1 way delay + Rcvr's Delay RTO = 2 * (1 Mbps link simulation + select) + Rcvr's Delay Reason for twice is to include sys. call overhead, etc.. RTO = 2 * (10 ms + 10 ms) + Rcvr's Delay RTO = 40 ms + 20 ms = 60 ms A36) You can change the RTO to whatever you want. There is a delay going from sender to receiver, but none coming back ... it's an artifact of using dad. So, for 1000 byte pkts, the RTT will be about 20 msec + 8*1000/1Mbps/1000 = 28 msec. Add to that some noise, and you will need probably atleast 40 msec. So, 60 msec is ok. Q37) While running the test cases, should the two diff. hosts be diff. arch ? A37) Yes. Q38) W.r.to theoretical ETR calc'n to compare against practical values ? Is the following right ? Assume there's no drop (P=0.0) and just D=20 ms, so RTO=60 ms For W=W'=1, L=10 bytes, ETR= L/L/R + RTO = 80/80 us + 60 ms = 80/60.080 ms = 1.33 Kbps For W=W'=1, L=1000 bytes, ETR= L/L/R + RTO = 8000/8 ms + 60 ms = 80/68 ms = 117.64 Kbps A38) If there is no dropping, what is RTO doing in the equation? Do you mean RTT? For L=10 bytes, the RTT will be close to the delay which is 20 msec. And the ETR will be close to 80/20 msec = 4 Kbps. All of your equations are wrong because you use RTO instead of RTT. The effect of L in this case is to increase the RTT and the L. So, approximately ETR = 8000/28 msec = 285 Kbps. Your calculations make no sense. Didn't we derive the results: efficiency = W/(2a+1) ETR = R x efficiency ????? Q39) When I did the experiment with the parameters, VerboseLevel(0), Nframes(1000), Delay(20), RTO(30), Win(64), DropProb(0), Body(1000), I got the "Full Queue" problem. Here is my explanation for this problem. Initially, the sender will send out 64 frames to the receiver, and start to wait for an ack back. Because each frame will be delayed for 20ms, the receiver should spend at least 1280ms reading all these 64 frames. Therefore, after the receiver reads couples of frames (or I should say "after the receicer reads 3 or 4 frames) and sends back the acks, the sender starts to retransmit frames which are not acked, and the number of retransmitted frames will be large. Therefore, before the receiver reads all these 64 frames in, the buffer of dad library will queue a lot of frames so that the buffer will overflow. Is that possible for you to modify the code of dad library so that when the buffer of dad library is full, the dad library starts to drop incoming frames ? Or you can change the RTO to make it longer since 30ms is a litter shorter for this experiment. A39) The dad queueing system will indeed start dropping pkts if you overwhelm it. But you should not be using a retransmit policy that overwhelms it; e.g., GBN with too small of an RTO will definitely be a real bad choice. You should not set the RTO to 30 msec if it causes spurious timeouts. Q40) I have a few qns regarding submission. The test outputs for 2 a and b is around 4 * 8 K files. Since its been explicitly stated that we should not attach files > 1K. what do we do, since its verbose output, the files are larger ? A40) Snip out just enough to explain what is going on. Q41) I don't know if anyone else had this problem, but whenever i tried to compile my receiver on Hilton (solaris), I would get the following compiler error in dad.c: implicit declaration of function `int bcopy(...)' On the man page, it says that this function was previously defined in before it was moved to . Currently, dad.c includes When I changed this include to , everything compiled as expected. I never received a compiler error on the linux machines (when using ). Just thought you might want to know -maybe I'm the only one who encountered this. A41) This should NOT be a problem with dad.c since version 2c1 (maybe even earlier). Q42) If I don't need dad_dump(), can I still use the previous version of dad? A42) Yes, but if you encounter problems in the FIN phase, you might try the newer version which fixes a very small problem with EOF. Save the old dad version just in case you need it. Q43) I found that the behaviors of select on linux and Saloris are different. With Win = Win' = 1, RTO = 30 msec and Delay = 20 msec, running sender on chigger with receiver on calrissian.arl gave me a lot of timeouts, while if I did the other way around (sender on calrissian with receiver on chigger), I got quite fewer timeouts. Therefore, if I test my code on the second senario, I get pretty good performance (about 884 kbps when Win = Win' = 64, Body = 1000, Delay = 20, RTO = 30, and no drop occurs). Should I give additional explanation to this in my report? A43) You do know that linux select is different than solaris select, right? In fact, you need to reset the timeout variable every time you use select in Linux or else you will not get the behavior you want. Do you do that? Q44) I found that even if my sender has received an ACK of the EOT, that it continues to receive pkts. Does that mean I need a FIN phase like the receiver where it flushes the channel? A44) That might be possible depending on your retransmission policy. Q45) What should I do if some cases work but some just don't work by Tuesday? A45) Present what you have and try to document what you think the problem with the other cases are. Q46) I am getting ETR = 500 Kbps when W=8, L=1000, P=0.0 and RTO=30 msec. But I get ETR = 700 Kbps for the same parameters except P=0.2; i.e., a higher ETR when there is dropping??? How can that be? A46) You need to collect some auxiliary variables (e.g., #drops, interpkt times, ...) to see if you can understand what is going on. But I suspect that your RTO is TOO SMALL. Here is a possible explanation. Since your RTO is so small, you are getting extraneous retransmissions when the drop probability is 0.0. I have been able to get close to 900 Kbps using RTO = 55 msec. When P=0.2, you are still sending spurious retransmissions. But now, these retransmissions could be filling in missing pkts at the receiver. At the very least, they are generating more ACKs when may make it back to the sender, giving it a more up-to-date view of the receive buffer state. If you change the RTO to 50 msec or 60 msec, I suspect you will find that your ETRs will match theory better. Q47) How much is the virtual sequence number capability worth? A47) My suggestion is that it is better to do the exerpiments using 256 frames and getting a good explanation for the results than spending time doing the virtual sequence numbers. If your results made sense and if this were last week, I would say try to make your code handle 1000 frames. You should note that changing my WinBuf stuff (ring buffers) to use actual sequence numbers between 0 and 255 won't work unless a lot of changes are made. It is almost better to use your own window buffering structure (e.g., a list). Alternatively, you could do the actual-to-virtual sequence number functions, but you have to do it carefully. It is discussed in A18, but had a typographical error until today. You would need the inwin(asn) and vsn(asn) functions where asn is the actual sequence number. Q48) Everything works except my FIN phase has a problem whenever my ACK gets dropped. This makes collecting data really hard. What should I do given that I don't have much time left? A48) In the FIN phase, call write() instead of dad_write() and your ACK will be guaranteed to get through since it won't get dropped. Finish the experiments and note on your writeup that you had to do that. Q49) How do I get a double precision number from the command line ... like the drop probability? Everything is working except I haven't been able to figure out how to do that. A49) Try 'man atof'.