Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee,...

Post on 01-Jan-2016

212 views 0 download

Transcript of Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee,...

Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines,But Not ReallyWai Gen Yee, Dongmei Jia, Linh Thai Nguyen{yee, jiadong, nguylin}@iit.eduInformation Retrieval LaboratoryIllinois Institute of TechnologyChicago, IL USA

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

2

Goal

To motivate research in peer-to-peer information retrieval (P2P IR).

To model P2P IR in terms of a metasearch engine.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

3

Model

Peers share data objects, each described with a descriptor (bag of terms).

Peers are connected in a random graph. Queries (bag of terms) are routed to peers

(servers) that return references to data objects O s.t.: DOQ

DO is the descriptor of O. Each descriptor also contains the hash

value of the data object.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

4

Metadata Distribution Example

Assume Q={Mozart, Concerto}. Ungrouped results:

Hash KeyAll

descriptors contain Q.

Sources

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

5

Motivation for Model

Peer to peer file-sharing.Millions of users.Petabytes of data.

• Data objects are replicated.• A replica’s descriptor is independently

maintained.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

6

Metasearch Engines

Search other search engines.dogpile.comaskjeeves.com

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

7

Main Metasearch Engine Activities Source selection.

Which search engines to search. Query dispatching.

Translating a query to a local format. Result selection.

Picking from the multiple result sets. Result merging.

Unifying/ranking the selected results.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

8

Source Selection

Metasearch engine.Employs profiles of each search

engine to make decision. P2P File-Sharing System.

Routing:• Flooding.• Use of statistics of neighbors.• Distributed hash tables.

Cost related to peer autonomy.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

9

Query Dispatching

Metasearch Engine.One search engine may use a vector

space model, and another might use a Boolean model.

P2P File-Sharing System.Some search engines, such as eMule,

access multiple networks.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

10

Result Selection

Metasearch Engine.Some results lists might be pruned if

they come from less relevant search engines.

Uses search engine profiles. P2P File-Sharing System.

Generally, all results are sent to the client.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

11

Result Merging

Metasearch Engine.Rankings from individual lists.Profiles of search engines.

P2P File-Sharing System.Group results.Rank based on likelihood of

successful download:• Group size.• Connection quality.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

12

Example Search on Limewire’s Gnutella

Query (number of results)

Descriptors

Group Size

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

13

Basic Difference

Metasearch engines assume a fixed and reliable set of search engines.

Can collect statistics on search engines to improve query processing and results.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

14

P2P File Sharing Research Areas (1/2)

Source selection:Inexpensive routing with autonomous

peers. Query dispatching:

Translating queries to maximize precision and recall of final result set.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

15

P2P File Sharing Research Areas (1/2)

Result selection:Usage of queries and local statistics to

prune returned results. Result merging:

Usage of replication and distributed metadata to improve rankings.

Recall: link analysis for Web search.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

16

Goals of Open Source in P2P File-Sharing Systems

Allow the communal development of the technology.New routing techniques.New ranking functions.

Disclose all functionality.Better security.No spyware.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

17

Examples of Openness in P2P File-Sharing

Gnutella is an open protocol.Limewire, Bearshare, Kazaa.

Limewire publishes an open-source implementation of the Gnutella protocol.

eMule is another open-source project built on a competing protocol.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

18

Conclusion

Many research areas.Can be modeled as a form of

metasearch engine. High impact.

Many users and petabytes of data. There already exists an active open-

source community.Large community of users and much

source exist.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France

19

Questions and Contact Information

Wai Gen Yeeyee@iit.eduir.iit.edu/~waigen

• Recent results and publications.

Information Retrieval Laboratory, Illinois Institute of Technologyir.iit.edu