CSE 770 Paper Review

Reviewer: Manfred Georg
Date: 9-22-2005

How would you rate this paper, relative to others we have read? top 50%, but not top 25%

How would you rate your knowledge of the topic of this paper? novice

What problem or issue does the paper address? Why is it important?

The paper discusses peer-to-peer decentralized searching methods. Searching is an increasingly important problem, due to the growth of worldwide data. Currently it is mainly accomplished through centralized services. Decentralized searching methods have been repeatedly shown to be less effective, and sometimes even less scalable than centralized search. Unfortunately, it is not always feasible to create a centralized source for search. This leads to the development of decentralized searching algorithms which attempt to be as effective as centralized searches.

What are the main contributions of the paper and why are they important?

This paper presents a new decentralized algorithm for searching in a peer-to-peer network. As far as I can tell, the main contribution is the merging of the LSI (Latent Semantic Indexing) algorithm currently in widespread use and CAN (Content-Addressable Network) routing. The idea being that documents are projected into a semantic space as in LSI. The resulting vector is then searched for in the CAN. Neighboring results, which should be similar in character to what is being searched for are returned.

How significant are these contributions relative to previous work?

The observation that CAN can be applied to LSI generated vectors has its merits. Furthermore, the insight that nearest neighbors in the CAN will have similar documents in the LSI decomposition brings the approach together. The rest of the paper, for the most part just follows these observations.

Give detailed comments justifying your view of the paper.

I find it interesting that a couple dimensions in a static semantic space (50-350) can capture a multitude of topics. One would expect that there are more than 350 distinct topics in the world; however, decomposing documents to so few dimensions is adequate for successfully searching.

Last year we read a proposal to use a flat namespace instead of domain names, and to implement this namespace using DHTs. Among other problems, the issue with this approach was that similar names could be completely unrelated (unlike arl.wustl.edu and cse.wustl.edu). I think a very important insight in the paper is that by making similar key values address similar objects, one can find what one is looking for without having an exactly matching key. This works particularly well, where the search is for a general solution instead of a particular location. The paper, leverages this insight by searching nearest neighbors in the CAN to try to locate documents related to the query.