next up previous
Next: Summary Up: Placing Servers in Overlay Previous: Server Load


Related Work

The application of the set cover problem in network design has two variants: the facility location problem and the $k$-median problem. The facility location problem minimizes the joint cost of server installation and the cost of connecting each client to its designated server. This problem has been applied to designing and placing network concentrators. The $k$-median problem minimizes the cost of connections between clients and servers under the constraint that no more than $k$ servers can be used. Both problems are NP-Hard. The best known approximation algorithms can achieve constant ratio [#!facility-const-approx!#,#!guha-improved!#,#!facility-lp!#], if the connection cost is symmetric and satisfies the triangle inequality. For arbitrary cost, the worst case bound is $O(\log n)$. However, neither of the problems can be applied directly to the design of overlay networks, since in the overlay model the communication channels between clients and servers are over the commodity Internet and do not incur any cost to service providers. Rather, the major cost is the number of servers needed to service all clients and the access bandwidth required at each server's network interface.

Our model of server placement more closely resembles the set cover problem. The classic greedy algorithm for solving the set cover problem [#!greedy-setcover-johnson!#,#!greedy-setcover-lovasz!#] achieves an $O(\log n)$ performance ratio. In geometric spaces, the problem is easier. In [#!shifting-lemma!#], Hochbaum proposed a shifting strategy that gives an $O(1+\varepsilon)$ performance ratio. Unfortunately, the interconnections between networks dictate that the network propagation delay no longer exhibits the geometric properties of distance.

References [#!qiu-infocom01!#,#!wcw01!#] studied the problem of placing cache replicas in the network and formulated it as the $k$-median problem: given a specific number of servers, what is the best placement that achieves the highest average service level to clients, where service level is indicated by access delay from a client to its nearest replica. In [#!qiu-infocom01!#], Qiu et al. proposed several placement strategies including: a greedy strategy that incrementally places replicas to achieve highest service quality; a hot-spot strategy that places replicas near the clients that generate the greatest load. In [#!wcw01!#], the authors also proposed a max degree strategy by placing replicas in decreasing order of nodes' degrees. By simulating over several synthetic and real network graphs, they concluded that the greedy strategy performs remarkably well, achieving within 1.1 to 1.5 of the lower bound.

Our approach to network design is from a different angle. we are more interested in examining the necessary cost, in this case the number of servers, if we want to provide all clients the guaranteed service. This gives service providers insight into the relation of network cost and the achievable service quality. Then, they can make adjustments to reflect their revenue stream, such as eliminating servers that are only serving small numbers of clients. Contrarily, the work in [#!qiu-infocom01!#,#!wcw01!#] measures the average service quality which masks the number of unhappy customers. Additionally, the performance of our approach, which is the number of required servers, is not susceptible to the cost metric of connections, since we only use it to categorize clients as serviceable or not by a server; while theirs is achieved for a specific cost metric, namely the access delay. Since the connection cost metric depends heavily on the application, it is not known if the same ratio could be achieved.

Another difference is that we model the network geographically and consider server co-locations. As networks overlap geographically, the number of potential server locations is much fewer in number than the number of network nodes that need to be considered. In [#!qiu-infocom01!#,#!wcw01!#], they used network graphs consisting of tens of thousands nodes for router-level graphs and thousands of nodes for AS-level graphs. Consequently, the optimal algorithm based on LP relaxation is too expensive for their models. We think considering the geographical locations of servers is important and corresponds to the current trend in the Internet, where servers are typically located in data centers in different geographic areas. This reduces the problem size and enables us to solve it more optimally. In Section 5.3, we compare the performance of our algorithm both with co-location and without, and show that with co-location we can reduce the number of required servers to approximately half of that with no network co-locations.


next up previous
Next: Summary Up: Placing Servers in Overlay Previous: Server Load
© Sherlia Shi 2002
sherlia@acm.org
2002-7-25