Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and...
-
Upload
edgar-payne -
Category
Documents
-
view
214 -
download
0
Transcript of Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and...
![Page 1: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/1.jpg)
Divide and Conquer:Challenges in Scaling Federated
Search
Presented by Abe Lederman, President and CTO
Deep Web Technologies, LLC
SearchEngine Meeting 24 April 2006 Boston, MA
![Page 2: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/2.jpg)
SEARCH ALL OF THESE SOURCES
ONE AT A TIME
![Page 3: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/3.jpg)
OR SEARCH THEM ALL AT
ONCE
![Page 4: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/4.jpg)
Finding the Gold Hidden in the World Wide Web
“Google-type” search engines “pan” the surface web for gold
“Deep Web” search engines go mining for gold
![Page 5: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/5.jpg)
Finding the Gold Hidden in the World Wide Web
“Google-type” search engines “pan” the surface web for gold
“Deep Web” search engines go mining for gold
![Page 6: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/6.jpg)
Challenges Overview
• Managing a large number of sources
• Searching a large number of sources in parallel
• Organizing and ranking the results returned
![Page 7: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/7.jpg)
Challenges of Managing Thousands of Data Sources
Locate Reliable Sources
Categorize Sources by Content
Configure Sources for Searching
Maintain Sources
4
![Page 8: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/8.jpg)
Challenges in Searching Thousands of Sources
Automatically Select Sources to Search
Retrieve Results from Cache
5
Perform Many Searches in Parallel
Bring Back Best Results
![Page 9: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/9.jpg)
Source Selection Optimizer
Search Conductor
Source Selection Optimizer
Source
Descriptions Previous Results
![Page 10: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/10.jpg)
Caching of Search ResultsReduces the load (cost) of accessing sources
CHALLENGES
• Requires a large database
• Need to determine how often to update the cache
• Works best with lots of users doing similar searches
![Page 11: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/11.jpg)
We Address Scalability Through a Grid-Based Solution
• Uses open standards (Web Services, WSDL, SOAP, XML)
• Runs on distributed nodes
• Is platform independent (Java based)
• Very flexible, providing a framework for integration of various filtering and analysis tools
![Page 12: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/12.jpg)
Distributing the Workload as Grid Services
Information Services
Filtering Services
Aggregation Services
Presentation Services
A0
A0
A1
IS0
IS2
IS1
IS3
P0
F0
F0
F0
F0
![Page 13: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/13.jpg)
Select sources to search
Can I get more results from “good”
sources?
Enough good
results?
YES
Deliver results to user
YES
NO
NO
Perform Search
Get Next Results
Search Conductor
![Page 14: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/14.jpg)
Searching a large number
of sources can lead to a flood
of results
![Page 15: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/15.jpg)
Challenges in Organizing and Ranking Results
5
Multi-tier Relevance Ranking
User-driven Ranking
Clustering of Results
![Page 16: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/16.jpg)
Multi-tier Relevance Ranking
• QuickRank – Ranks results based on occurrence of search terms in title, author, and snippet
• MetaRank – Ranks results utilizing custom algorithms applied to meta-data
• DeepRank – Downloads and indexes full-text documents
HEAVY LIFTING REQUIRED!
![Page 17: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/17.jpg)
User-driven Ranking
Credibility of sourceDate rangeDocument lengthDocument type
Geographic proximityPopularity of documentReading levelRelevance
Desired: Blending (weighing) of above criteria
![Page 18: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/18.jpg)
Clustering
![Page 19: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/19.jpg)
A Grand Challenge for Federated Search
Source: Walter Warnick, Ph.D., DOE OSTI. Global Discovery: Increasing the Pace of Knowledge Diffusion to Increase the Pace of Science. Presented at the Annual Meeting of the American
Association for the Advancement of Science, February 16-20, 2006.
![Page 20: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/20.jpg)
Mathematician’s Scientific Discovery
Biology Researcher’s
Scientific Discovery
Physics Scientific Discovery
Math Databases:•Research Papers•Correspondence•Conferences
Biology Databases:•Research Papers•Correspondence•Conferences
Physics Databases:•Research Papers•Correspondence•Conferences
Global Discovery
Search Portal
Math Community
Biology Community
Physics Community
Knowledge Diffusion in Action
![Page 21: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/21.jpg)
Grid of Grids
Each circle = a portal with 10-100 sources
End result is thousands of sources in 2
hops
Scaling to the Next Level
![Page 22: Divide and Conquer: Challenges in Scaling Federated Search Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC SearchEngine Meeting.](https://reader038.fdocuments.us/reader038/viewer/2022110209/56649e375503460f94b27cdd/html5/thumbnails/22.jpg)
Abe Lederman
122 Longview Drive
Los Alamos, NM 87544
www.deepwebtech.com
12
Thank You!