11.10.06
Review of “ROFL: Routing on Flat Labels”
It has been generally recognized that using IP addresses for both identity and location is not a good thing. Many proposals for separating the two exist, most often in the context of disruptive changes to the Internet. In this paper, the authors give a different view point on the problem. Instead of merely separating identity and location, they discard the notion of location entirely. More accurately, they propose a system where there is no explicit representation of location and routing is done on identity. Of course, in many ways this is similar to what is done with IP, as addresses are used for identity. The real departure from the current system is that the routing label is flat, i.e., there is no structure to the labels (identities). This challenges the long held belief that the structure of routing labels (in this case, IP addresses) allows the act of routing to be scalable, and that is really what this paper is all about.
The idea is certainly interesting and worth exploring at least a little just for the sake of considering an alternative to the current thinking. A system with no explicit location representation does have some immediate benefits. User mobility becomes a natural expectation, there is less confusion when one machine has many addresses (locations), and identities can easily move around the network. Of course these depend on efficient mechanisms to actually provide such behavior, but the model lends itself to such ideas rather than ignoring them. These particular properties, though, are true for any system that separates location and identity. What does getting rid of location specifically add to that list? The authors suggest four distinct advantages. First, no new infrastructure for resolution is needed because identities aren’t resolved to anything. I find this mildly misleading because identities will have to be numeric for routing purposes and so there will have to be some mapping to human readable strings. It could be that the identities are built from a hashing algorithm, or something similar, that can always be done at the local machine, but that is not necessarily the case. This ties in with their second stated advantage which is that packet delivery does not depend on anything outside of the actual data path. Again, there has to be a way to map from ‘google’ to ‘id something’, and that could rely on an outside system. The point is that neither of these first two advantages are something inherent to systems without locations but properties that could possibly be included in such systems. The third advantage is also a bit odd. It states that allocation of identities is simpler in location-less systems because only uniqueness of identities is needed as opposed to IP addresses where there is also a constraint on allocation based on network topology. In other words, identities need not have any structure to them. This benefit really comes from having a flat identifier space rather than the lack of location. The last advantage is that network access controls can now be targeted at identities instead of the particular location that that identity is current at. This makes a lot of sense, but this could also be done in any system that splits identity and location and is thus not really a specific advantage of what is being proposed here. The reason all this is important is that I believe the motivation for this work is severely lacking. There are only three possible advantages that apply, and each of those is dependent on implementation choices.
Now, let’s consider the instantiation of this idea being presented in the paper. Routing On Flat Labels, as the name suggests, utilizes a flat identifier space. This means that allocation of identities should be simpler than in the IP case. Indeed, they assume that identities are specifically tied to the public-private keys for a host or router. The identity of a node is a hash of its public key (hash collisions could be a problem). The problem with this choice is that there does need to be a resolution-type step for this to work. When someone types ‘google’ into their web browser they have to obtain the public key for google, or at least its hash, before the machine can stamp the packet with the appropriate destination identifier. The only motivation for using ROFL, then, is having simple assignment of identifiers (again, that is only assuming that no hash collisions occur and there is no need for some other management scheme). I understand the desire to explore different architectural options, but there still needs be either some compelling motivation to do so or, lacking that, compelling results.
Moving on to results, then, I don’t believe too much needs to be said. The authors go out of their way to acknowledge that their results are not stellar. Moreover, the results the supply are not as useful or as clear as they should be. For instance, figure 5a claims to show the total number of packets sent during start-up as a function of the number of IDs in the AS. As each host in an AS is given an ID, I can only assume that to produce these numbers for each of the four AS topologies they have, they take some subset of the hosts and measure the join overhead then repeat for more nodes. One of the odd things about this result is that the topologies mentioned have millions of hosts but the chart only shows up to ten thousand. Extrapolating to ten million (the largest AS) would mean one hundred billion packets. Assuming 64 byte packets, this means around 6.5 terabytes of data goes across that AS network during start-up. Do understand, however, that this is unlikely to happen over a short time period as joining will occur as new hosts come on-line rather than all at once (returning after a power outage could be a problem). The second odd thing is that figure 5b seems to suggest that it takes no more than 50 packets per node to join. If that is so, I have no idea why figure 5a shows it taking nearly 10,000 packets to join in a single node AS. I suspect there are reasonable answers for this, but I couldn’t find them.
There are few other things about ROFL that I feel it is worth mentioning. The first is about using a Chord-like routing scheme for ROFL. Chord and other similar ideas were designed for overlay routing. ROFL uses it directly on the nodes and routers in the Internet. This is sort of a funny idea as nodes are being mapped to a routing ring (really, multiple rings) that will not, in general, reflect the underlying topology. Of course, that is one of the gains of using such a system, but it is also one of the drawbacks. When a packet is routed to the next router on the ring, it may have to pass over multiple other routers to get there. It is then possible that it could go back to one of those routers again, if that is the next hop on the ring. The results indicate that path stretch is not too bad, but it is a strange way to operate.
Another issue I have with ROFL is that a host joining or leaving/failing has far reaching consequences. The actual number of packets sent out per event is tolerable, but the way that predecessors and successors are dealt with over the AS hierarchy means that an AS a very long way away will have to update its state. In their results, they say that there are often 75-100 ASes above an AS in the hierarchy. Since nodes in ROFL attempt to find successors at each level of that hierarchy, one host going up or down can effect many other nodes across a wide area. This is unacceptable even without getting into worries about attacks on routing stability.
Lastly, I just want to point out something that I was unsure about. In the discussion on exploiting reference locality, they say that ASes that cache pointers will maintain a bloom filter containing all hosts joined below that AS. It seems to me that an AS that is even just a few steps up the hierarchy could potentially have millions of entries (note that this is per-router). In addition, the churn of hosts leaving and joining might make maintaining a bloom filter of that size difficult, but then again I am not a bloom filter expert.
To be honest, I am not sure that this was a good SIGCOMM paper. It seems much more fitted to a conference like HotNets. The core idea is interesting, but everything goes down hill from there. There seems to be little motivation outside of the ‘neat idea’ one, and the results are, as they say, not pretty. The actual system is a conglomeration of existing ideas modified to better fit the model and in some cases it appears that all of the consequences were not completely thought through. It doesn’t help that some of the writing and flow of the paper is poor and it is too often difficult to really understand what was going on. I am hoping that later work will improve ROFL because it is nice to see widely held beliefs challenged occasionally, but at this point I am unconvinced both of the reason and the feasibility of the work.
jon.turner said,
November 11, 2006 at 2:24 pm
Charlie mentions the use of multiple rings in ROFL. This is actually not quite right, although it’s easy to get that impression from the paper. I strongly urge everyone to take a look at the first couple sections of the Canon paper http://www.mpi-sws.mpg.de/~gummadi/papers/hierarchical_dhts.pdf
which explains very clearly how this works. It’s a very clever idea and is really the key to understanding inter-domain routing in ROFL.