Report on: A Comparison of Application-Level and Router-Assisted Hierarchical Schemes for Reliable Multicast by Radoslavov et. al.

 

This paper compares ALH and RHS recovery schemes for reliable multicast. Given that it’s very hard to compare ALH and RAH from all aspects, this paper uses specific scenario analysis and extensive simulations to demonstrate that the loss recovery performance of ALH and RAH are close, though not identical. The performance metrics used are recovery latency, recovery data overhead, recovery control overhead and receiver exposure. Sensitivity analysis of simulation results was done across parameters such as group size, depth of recovery hierarchy, underlying network topology, receiver placement and link loss.

 

Authors acknowledge that their work is by no means complete or conclusive and is best used (only) as a starting point for comparing ALH and RAH. This is relevant because by using ALH, we could reduce router complexity. Also, application deployment might be easier than router upgrades, as it is significantly easier to distribute new software for multicasting rather than to change a large number of routers all throughout the Internet. But to do that, we first need to validate their assumptions, both for analysis and simulation, and also the metrics that have used to evaluate performance of the two schemes.

 

And that is where a serious problem of the paper surfaces. Although the authors have inundated us with simulation results, there don’t discuss the big picture (i.e.) what the metrics really are, what their (absolute) values signify, when is one more important than the other, etc. A tangential problem here is the lack of precise explanations for some of their equations and more seriously, some of their graphs like the one on data overhead when the receivers are placed extremely clustered. This is important because in the real Internet, this is the more likely placement. Further, use of phrases such as “may be”, “a possible explanation”, etc. weakens some of the explanations and conclusions.

 

Although the above-mentioned problem supersedes other problems in the paper, for the sake of completeness, I’m enumerating here the rest of the discussion, followed by some that were not discussed because of time constraints:

 

·        NACKs can take a shorter path than the multicast tree but this is not discussed in the paper.

·        Although many simulation results are presented, they are not analyzed for typical case, extreme cases, etc.

·        Loss of control/ data retransmission packets are not considered in this paper.

·        There are a range of other solutions for reliable multicasting such as using router buffer to cache, to reduce latency, etc. A comparison with these would have given us a better perspective of the solutions analyzed in this paper.

·        Assumption of packet loss on a single link at a time for both analysis and simulation may be too simplifying. Yes, it is an assumption for simulation also and is stated on page 477, left column, 2nd para, 1st sentence.

The packet loss that happens on different links will have different effect on the recovery efficiency, and the link loss probability is different at different links. But the metrics only evaluate the average. When they set up the test configurations, the extreme cases they use are not necessary and enough to reflect the real case performance. In real world, the clustering receiver configuration is more typical. Actually, in the experiment, the results show that the ALH-heuristic is bad (4 times) in this case comparing with RAH scheme but the authors fail to analyze this result at all.

·        There was also a discussion on whether parameters such as size of recovery group, router fan-out will have an impact on latency and other metrics discussed in the paper but a conclusion was not reached.

 

·        The authors’ ALH approach uses RMTP which severely limits adaptability[2].

C. Papadopoulos, G. Parulkar, and G. Varghese, “An error control scheme for large-scale multicast applications,” in Proc. IEEE INFOCOM, San Francisco, CA, Mar. 1998, pp. 1188–1196

·        Simulation should also have considered relative placement of senders and receivers and NOT just relative placement of receivers.

·        In conclusion, authors say that the difference between the two schemes is significant only when losses occur near the root and the number of levels in the hierarchy is large but they don’t seem to directly support this conclusion in the paper.

·        The main point of the authors being surprised that ALH typically does not show more than a 2x in data recovery latency over RAH does not appear to be such a surprise. Given the constraints the authors have placed on their evaluation environment, it seems like the only factor that would cause a significant difference between ALH and RAH is the placement and number of parent nodes in a system.

·        In order to really get a realistic view of how great a data recovery latency improvement can be achieved with RAH over AHL, it seems like it would be crucial to take into account the amount of time spent going up and down the network protocol stack to interact with the Application implementing the AHL parent nodes.  Without this delay factor one would expect AHL to perform extremely close to RAH.  Especially in situations where the parent/receiver groupings coincide exactly with a networks hierarchical topology.

·        Implementation complexity should have been considered for ALH.

 

·        Next step in this analysis could be to simulate how each of these multicast schemes perform under conditions where other network traffic is traversing the network (i.e.) model load on the network.

·        Simulation assumes all links have same propagation latency and that sending a single packet over any link creates the same overhead to the network.

·        There are several areas that are not clear they have accounted for all the overhead. One example is the hypothesis that all participants have somehow obtained information about the distance to each other. They use CBT yet don’t account for the overhead to select the core. This is very important to creating a “well constructed ALH” according to Perlman [1].

a.       Perlman “Interconnections” p464

 

 

Despite these many shortcomings, more than half of the class ranked this paper in the middle 1/3 at the end of the discussion.