Discussion Summary of "Sizing Router Buffers"

 

CSE 770x of Fall 2004

 

This paper got a pretty good rating from the summaries I received from all the students in our class. Out of the 14 students, 9 students rated this paper among top 3 (and 5 of them rate this paper No. 1), 5 students rated this paper in the middle. Very similar results are collected in the poll right after the presentation. However through discussion especially some comments from the professors, the overall reaction to this paper may go down to some extent.

Overall, this paper may or may not become the best paper chosen for this semester because of the controversies and the relative quality and strength of the other papers we are going to evaluate.

This paper argued that the backbone routers with thousands of flows do not need the current buffer size based on the rule-of-thumb, which states that a link needs a buffer of size B=RTT * C, where RTT is the average round-trip time of a flow passing across the link, and C is the data rate of the link. In fact, through the authors’ modeling, mathematical analysis, simulations and experiments, they claim that a link with n flows requires no more than B= (RTT * C)/sqrt(n), which would enormously affect the routers’ design and cost.

The rule of thumb comes from a single TCP long flow or multiple synchronized long flows passing through the link and to keep the link busy by provide enough buffer. Through some citations and the authors’ simulation, they conclude that when the number of flows are over 500, the can treat flows as being not synchronized at all. Therefore by using central limit theorem, the authors analyzed the distribution of the queue occupancy and further concluded that they can use the buffer size showed earlier to achieve full utilization for flows with number of n = 10000 or larger. Another part of the paper is to evaluate the affect from short-lived flows in the mixes of short- and long-lived flows, and the conclusion here is that the average queue length peaks when the probability of large burst is highest, but not when the average burst size is highest, so the bandwidth has no impact on the buffer requirement. The analysis also attributes to the shape of hillt! ops observed on Figure 9.  

The possible points for this paper to be selected as the best paper are:

·        Well written, address the problem clearly, provide accurate analysis.

·        Novel idea, attack practical and vital issues about routers

·        Acknowledge the related areas not covered by the paper

The potential reasons NOT for this paper to be selected as the best paper are:        

·        Made several assumptions that are not fully proved before their mathematical modeling and analysis

·        Simulation and experiments are not enough to support the viewpoints.

All the above summaries are also reflected in some degree in the discussion after the presentation. || ( this symbol ‘||’ is used to separate different topics appeared in the discussion session of this paper) If the buffer size can be reduced by the factor sqrt(n) comparing to what is widely used now at backbone routers, the buffer can be even implemented using on-chip SRAM, reducing the amount of pins needed to access them. The benefit will be more efficient performance through reducing the access time, less cost and power because of the support, and also provide a feasible solution as the line speed and the routers’ capacity increase much faster than the technology of memory access speedup. The cost saving can be compound if this could change the design architecture of backbone router.     ||With having the current buffer size, other than providing a peace of mind to the router operators if we can optimize the memory access methods through MPU or FPGA dynamic programming, maybe we can make better use the current buffer size.    || TCP with no buffers, the Utilization of bottleneck link is 75%.   || In order to provide quality of service the advanced queuing requires certain amount of buffer for each flow, therefore the relationship between buffer size and the numbers of flows may not be as the authors claimed.  || In their simulation, it seems like the authors should test with multiple number of flows and observe the performance and throughput, which may produce different result since in reality the number of flows going through a link can be varied. || The factor affecting the buffer requirement can be other than the number of flows, and window size might affect the desynchronized character of multiple flows which is the assumption the paper has for their analysis of buffer size. The detailed analysis of this observation can be found from the Appendix. || Misformed TCP flows can also affect the performance, such as FIFO queue with tail drop. || For this type of boundary analysis of the key parameter of backbone routers, I feel that assumptions could not be like “very rare” for synchronization taking place, “assume” that they are desynchronized enough that the window size processes are independent of each other.

 

Messages posted to the newsgroup: (The purpose is that we can easily find the related comments right here instead of going to the newsgroup)

1. Message posted by Professor Turner to newsgroup wu.cse.class.770 on Thursday, September 30, 2004 5:39 PM

After the seminar this morning, I thought a bit more
carefully about my comment that flow synchronization is
a function of windows size, and I now don't think that it is.

Here's the reasoning. Assume that we have n backlogged TCP
flows feeding a bottleneck link. Also, assume for the moment
that they are synchronized and that just before the buffer
overflows, they all have a window size of w packets. That is,
with a window size of w, the n flows can send a total of
n*w packets during an RTT and just fill the link and the buffer.
During the next RTT, each flow is going to increase its
window size to w+1, meaning that there are n "extra" packets
being sent that must be discarded. That is, the average
number of packets lost per flow is 1. Or looking at it from
the perspective of one flow, it will send w+1 packets
and it can expect 1 of these w+1 packets to be lost.

So, if we assume that the packet losses for a single flow are
independent, then the probability that the flow experiences
no losses is

(1-1/(w+1))^{w+1)

If w=1, this is .25. As w gets large, it approaches 1/e which
is about .37. So that says that the average number of flows
that experience a lost packet during the RTT when losses are
occuring is .63*n when w is large and about .75*n when w=1.
So, we would expect a high degree of synchronization in
both cases, and a little bit less synchronization when
less synchronization when w is large than when w is small.
This is the opposite of what I was saying in the seminar
this morning.

Now, you could challenge my assumption that the losses
that a single flow are independent, given the burstiness
of TCP transmissions. But, even if you assume dependent
losses, it can't change the main conclusion that there
should be a fairly high degree of synchronization when
w is small, since if w=1, each flow sends 2 packets
during the RTT when losses occur, which means that at
least n/2 flows must experience a lost packet (recall
that n packets must be discarded altogether).

So I am now feeling somewhat more skeptical about the
paper. I can see no reason to think that synchronization
won't occur on congested links with lots of long-lived
flows. Certainly it can occur if RTTs are equal, and
I don't see how variations in RTT would by itself,
keep flows from getting synchronized.

I believe that it is true that there is very little
synchronization of flows on backbone links, but that
may just be because the backbone links are almost
never the bottleneck links for the flows passing
through them. You can only get synchronization among
the flows that share the same bottleneck link.
The real question the paper is trying to address
is how big buffers should be for congested links.
And, I am now not so sure they really have made
a compelling case.

Jon

 

2. Message posted by Professor Turner to newsgroup wu.cse.class.770 on Thursday, September 30, 2004 9:46 PM

More followup on this. I looked at reference 14 from the
"Sizing" paper. This is the principal paper that the authors
used to support their assertion that synchronization is rare.
Ref 14 (by Qiu, et al) is a nice paper and sheds considerable
light on the question, although it does not provide unqualified
support for the authors' case.

Qiu shows that congestion synchronization does occur even
when the congestion window during the congestion period (w) is
as small as 3 or 4. This is consistent with my previous
analysis. However, if w is even smaller, then synchronization
gets destroyed. The reason for this is subtle. It has to do
with the fast retransmit mechanism. With a window size of 4,
a source can lose a packet during one RTT and still get
three duplicate acks, allowing it to continue transmission
without waiting for a timeout. If the window size is smaller,
a source must wait for a timeout. For really small w, a large
fraction of sources are forced to wait for a timeout. Since
different sources are forced to wait for a timeout during
successive loss periods, the synchronization breaks down.
Also, I suspect that the coaresness of the timeout mechanism
contributes to this, since timeouts can differ by hundreds
of milliseconds.

Qiu et. al. also look at some effects caused by RTT variation,
but they don't really focus on the synchronization effects,
so it's hard to draw any clear conclusion. I suspect that
synchronization can occur with a range of RTT values so long
as the window sizes are not very small.

Jon

 

3. Message posted by Manfred Georg to newsgroup wu.cse.class.770 on Friday, October 01, 2004 1:59 AM

Hi,

I have some thoughts regarding the stability of the synchronization
described by Dr. Turner.

Point 2 is probably more interesting.

POINT 1:

My argument is that assuming different flows are coming in
on different ports on the router.  We can imagine that the flows
will occasionally arrive simultaneously from different input ports,
destined for the same congested output port.  Alternately, to
make up for this, there will be times when no input port has
traffic destined for the congested output port.  This will
cause a small amount of variation in the output port's buffer.
When the buffer is close to overflowing, this variation will
be the difference between making it into the queue and being
dropped.  If a large enough number of flows receive a dropped
packet and react fast enough (I obviously haven't done enough
analysis to make a quantitative claim here), then no other
flows will lose packets.  In the long run, this sort of
effect will desynchronized the majority of flows.

Basically, my claim is that the equation
> (1-1/(w+1))^{w+1)
for not receiving a dropped packet is only valid if we assume
no variation in the queue's length.  With a variation in queue
length of just a few packets, however, I don't believe this
equation will hold anymore.  The question then becomes, is
this variation enough to cause instability in the synchronization?

The more I think about it as I write this, the more I think
maybe the original analysis was correct.  The main problem with
my thoughts in this post is that a variation of just a few
packets does not make a large difference, when compared to
the fact that n "extra" packets arrive within that RTT.

comments?

POINT 2:

What is perhaps more compelling is that the 37% of flows which
are unscathed by the drop event, will continue to increase their
window size until the next drop event.  At this point they will
have a different window size than the 63% that did receive a
dropped packet.  Perhaps this difference will destabilize the
synchronization.  For example, notice that the 37% of flows would
pause for a longer period if they received a dropped packet
in the second drop event.  More importantly, they
receive more throughput and hence a higher probability of being
in the next drop event.  This might mean that more than 37% of
flows survive the second drop event without a loss.  If this
behavior continues to escalate, then resynchronization might occur.
Different RTTs for flows will also cause differences in window
size.

Can anyone think of a reason why this would, or would not
destabilize the synchronization?

I'll go to bed before I write even more of a novel.  Thanks
for reading this far.

Manfred