Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee,...

19
Search in Peer-to- Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu Information Retrieval Laboratory Illinois Institute of Technology Chicago, IL USA

Transcript of Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee,...

Page 1: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines,But Not ReallyWai Gen Yee, Dongmei Jia, Linh Thai Nguyen{yee, jiadong, nguylin}@iit.eduInformation Retrieval LaboratoryIllinois Institute of TechnologyChicago, IL USA

Page 2: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

2

Goal

To motivate research in peer-to-peer information retrieval (P2P IR).

To model P2P IR in terms of a metasearch engine.

Page 3: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

3

Model

Peers share data objects, each described with a descriptor (bag of terms).

Peers are connected in a random graph. Queries (bag of terms) are routed to peers

(servers) that return references to data objects O s.t.: DOQ

DO is the descriptor of O. Each descriptor also contains the hash

value of the data object.

Page 4: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

4

Metadata Distribution Example

Assume Q={Mozart, Concerto}. Ungrouped results:

Hash KeyAll

descriptors contain Q.

Sources

Page 5: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

5

Motivation for Model

Peer to peer file-sharing.Millions of users.Petabytes of data.

• Data objects are replicated.• A replica’s descriptor is independently

maintained.

Page 6: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

6

Metasearch Engines

Search other search engines.dogpile.comaskjeeves.com

Page 7: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

7

Main Metasearch Engine Activities Source selection.

Which search engines to search. Query dispatching.

Translating a query to a local format. Result selection.

Picking from the multiple result sets. Result merging.

Unifying/ranking the selected results.

Page 8: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

8

Source Selection

Metasearch engine.Employs profiles of each search

engine to make decision. P2P File-Sharing System.

Routing:• Flooding.• Use of statistics of neighbors.• Distributed hash tables.

Cost related to peer autonomy.

Page 9: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

9

Query Dispatching

Metasearch Engine.One search engine may use a vector

space model, and another might use a Boolean model.

P2P File-Sharing System.Some search engines, such as eMule,

access multiple networks.

Page 10: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

10

Result Selection

Metasearch Engine.Some results lists might be pruned if

they come from less relevant search engines.

Uses search engine profiles. P2P File-Sharing System.

Generally, all results are sent to the client.

Page 11: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

11

Result Merging

Metasearch Engine.Rankings from individual lists.Profiles of search engines.

P2P File-Sharing System.Group results.Rank based on likelihood of

successful download:• Group size.• Connection quality.

Page 12: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

12

Example Search on Limewire’s Gnutella

Query (number of results)

Descriptors

Group Size

Page 13: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

13

Basic Difference

Metasearch engines assume a fixed and reliable set of search engines.

Can collect statistics on search engines to improve query processing and results.

Page 14: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

14

P2P File Sharing Research Areas (1/2)

Source selection:Inexpensive routing with autonomous

peers. Query dispatching:

Translating queries to maximize precision and recall of final result set.

Page 15: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

15

P2P File Sharing Research Areas (1/2)

Result selection:Usage of queries and local statistics to

prune returned results. Result merging:

Usage of replication and distributed metadata to improve rankings.

Recall: link analysis for Web search.

Page 16: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

16

Goals of Open Source in P2P File-Sharing Systems

Allow the communal development of the technology.New routing techniques.New ranking functions.

Disclose all functionality.Better security.No spyware.

Page 17: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

17

Examples of Openness in P2P File-Sharing

Gnutella is an open protocol.Limewire, Bearshare, Kazaa.

Limewire publishes an open-source implementation of the Gnutella protocol.

eMule is another open-source project built on a competing protocol.

Page 18: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

18

Conclusion

Many research areas.Can be modeled as a form of

metasearch engine. High impact.

Many users and petabytes of data. There already exists an active open-

source community.Large community of users and much

source exist.

Page 19: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

19

Questions and Contact Information

Wai Gen [email protected]/~waigen

• Recent results and publications.

Information Retrieval Laboratory, Illinois Institute of Technologyir.iit.edu