Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee,...
-
Upload
linette-leonard -
Category
Documents
-
view
212 -
download
0
Transcript of Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee,...
![Page 1: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/1.jpg)
Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines,But Not ReallyWai Gen Yee, Dongmei Jia, Linh Thai Nguyen{yee, jiadong, nguylin}@iit.eduInformation Retrieval LaboratoryIllinois Institute of TechnologyChicago, IL USA
![Page 2: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/2.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
2
Goal
To motivate research in peer-to-peer information retrieval (P2P IR).
To model P2P IR in terms of a metasearch engine.
![Page 3: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/3.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
3
Model
Peers share data objects, each described with a descriptor (bag of terms).
Peers are connected in a random graph. Queries (bag of terms) are routed to peers
(servers) that return references to data objects O s.t.: DOQ
DO is the descriptor of O. Each descriptor also contains the hash
value of the data object.
![Page 4: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/4.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
4
Metadata Distribution Example
Assume Q={Mozart, Concerto}. Ungrouped results:
Hash KeyAll
descriptors contain Q.
Sources
![Page 5: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/5.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
5
Motivation for Model
Peer to peer file-sharing.Millions of users.Petabytes of data.
• Data objects are replicated.• A replica’s descriptor is independently
maintained.
![Page 6: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/6.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
6
Metasearch Engines
Search other search engines.dogpile.comaskjeeves.com
![Page 7: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/7.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
7
Main Metasearch Engine Activities Source selection.
Which search engines to search. Query dispatching.
Translating a query to a local format. Result selection.
Picking from the multiple result sets. Result merging.
Unifying/ranking the selected results.
![Page 8: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/8.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
8
Source Selection
Metasearch engine.Employs profiles of each search
engine to make decision. P2P File-Sharing System.
Routing:• Flooding.• Use of statistics of neighbors.• Distributed hash tables.
Cost related to peer autonomy.
![Page 9: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/9.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
9
Query Dispatching
Metasearch Engine.One search engine may use a vector
space model, and another might use a Boolean model.
P2P File-Sharing System.Some search engines, such as eMule,
access multiple networks.
![Page 10: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/10.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
10
Result Selection
Metasearch Engine.Some results lists might be pruned if
they come from less relevant search engines.
Uses search engine profiles. P2P File-Sharing System.
Generally, all results are sent to the client.
![Page 11: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/11.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
11
Result Merging
Metasearch Engine.Rankings from individual lists.Profiles of search engines.
P2P File-Sharing System.Group results.Rank based on likelihood of
successful download:• Group size.• Connection quality.
![Page 12: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/12.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
12
Example Search on Limewire’s Gnutella
Query (number of results)
Descriptors
Group Size
![Page 13: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/13.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
13
Basic Difference
Metasearch engines assume a fixed and reliable set of search engines.
Can collect statistics on search engines to improve query processing and results.
![Page 14: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/14.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
14
P2P File Sharing Research Areas (1/2)
Source selection:Inexpensive routing with autonomous
peers. Query dispatching:
Translating queries to maximize precision and recall of final result set.
![Page 15: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/15.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
15
P2P File Sharing Research Areas (1/2)
Result selection:Usage of queries and local statistics to
prune returned results. Result merging:
Usage of replication and distributed metadata to improve rankings.
Recall: link analysis for Web search.
![Page 16: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/16.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
16
Goals of Open Source in P2P File-Sharing Systems
Allow the communal development of the technology.New routing techniques.New ranking functions.
Disclose all functionality.Better security.No spyware.
![Page 17: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/17.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
17
Examples of Openness in P2P File-Sharing
Gnutella is an open protocol.Limewire, Bearshare, Kazaa.
Limewire publishes an open-source implementation of the Gnutella protocol.
eMule is another open-source project built on a competing protocol.
![Page 18: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/18.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
18
Conclusion
Many research areas.Can be modeled as a form of
metasearch engine. High impact.
Many users and petabytes of data. There already exists an active open-
source community.Large community of users and much
source exist.
![Page 19: Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu.](https://reader036.fdocuments.us/reader036/viewer/2022072015/56649ecb5503460f94bd95ba/html5/thumbnails/19.jpg)
Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France
19
Questions and Contact Information
Wai Gen [email protected]/~waigen
• Recent results and publications.
Information Retrieval Laboratory, Illinois Institute of Technologyir.iit.edu