CSE 770 Paper Review

Reviewer: Sailesh Kumar
Date: 9-22-2005

How would you rate this paper, relative to others we have read? top 50%, but not top 25%

How would you rate your knowledge of the topic of this paper? novice

What problem or issue does the paper address? Why is it important?

This paper addresses the problem of content searching and indexing in a scalable manner over the web. The information retrieval (IR) ideas proposed are decentralized and presented in conjunction with the peer to peer systems.

What are the main contributions of the paper and why are they important?

Authors have built a prototype pSearch IR system which organizes contents around their semantics in a P2P network. Furthermore they propose the use of rolling-index which resolves the dimensionality mismatch between semantic space and CAN. It also takes advantage of the higher importance of low-dimensional elements of semantic vectors. The load is balanced using a content aware bootstrapping. This scheme achieves index and query locality and distribute document indices evenly.

How significant are these contributions relative to previous work?

Previous work has concentrated more on simple keyword matching and has ignored the advanced relevance ranking algorithms devised by the IR community through decades of advancement and evaluation. Without effective ranking, queries may result in tons of results which might be difficult to look at by any user. In this regard this paper makes several contribution as mentioned above.

Give detailed comments justifying your view of the paper.

Content based text searching and information retrieval have been of research interest for long time. However, with the increasing scale and amount of data which has to be searched, it is becoming quite challenging. Furthermore an equally important thing is to rank the query result in appropriate manner. This paper presents ideas and algorithms which concentrates on the ranking and efficiency of IR systems. This paper describes several techniques to build a large self organizing IR P2P system. The algorithms proposed are decentralized and have been quantified with respect to the bandwidth and the number of nodes to be searched.