Pedersen
-
Upload
ram-dutt-shukla -
Category
Technology
-
view
329 -
download
4
Transcript of Pedersen
![Page 1: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/1.jpg)
1
Internet Search Engines: Past and Future
Jan PedersenChief Scientist, Yahoo! Search
11 April 2005
![Page 2: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/2.jpg)
2
Outline
• Compare and contrast
– 2005 vs 1998
• Some underlying trends
• What’s New
![Page 3: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/3.jpg)
3
Search Landscape 2005
• Three major Players
– Google ($52B)
– Yahoo ($42B)
– MSN ($271B)
• $6+B in Paid Search Revenues
• >400M searches daily
• 5-8B claimed index size
• Excellent relevanceSource: Search Engine Watch
![Page 4: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/4.jpg)
4
Search Landscape in1998
![Page 5: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/5.jpg)
5
Infoseek circa 1998
• IPO in 1996
– Same year as Excite and Yahoo!
• 5th ranked destination site
• Market cap of $1B
• Competed as a portal
– Content was king
• Ultimately sold to Disney
– Go network
• Decommissioned in 2001
![Page 6: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/6.jpg)
6
Infoseek Search Technology
• 1.5 Generation Search– Tuned for relevance
• Proximity• Anchor text (ESP)• Inlink counting
– Small but competant index• 60M pages • Alta Vista was 140M at that time
• Still exists as Ultraseek server– Sold to Inktomi in 2000, later resold to Verity
![Page 7: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/7.jpg)
7
Infoseek UI
![Page 8: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/8.jpg)
8
Comparison to State-of-the-art
![Page 9: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/9.jpg)
9
What was missing?
• Business Model
– Monetized via untargeted banner ads
– Need for increased inventory• Portalitis
• Clutter
• Lack of focus• Search was deprioritized
![Page 10: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/10.jpg)
10
Some Underlying Trends
![Page 11: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/11.jpg)
11
Moore’s Law
![Page 12: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/12.jpg)
12
Index Size
• ~150M in 1998
• ~5B in 2005– 33x increase
– Moore would predict 25x
• Monthly refresh in 1998
• Daily refresh in 2005
• What about 2010?– 40B?
• Where is the content?– Public Web?
– Personal Web?
Source: Search Engine Watch and Search Engine Showdown
![Page 13: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/13.jpg)
13
Meaning of Hit Counts
• Hits Counts are estimated
– Indices are tiered
– Estimates can be non-linear
Source:
http://aixtal.blogspot.com/2005/01/web-googles-counts-faked.html
![Page 14: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/14.jpg)
14
Meaning of Document Counts
• Claimed index Size
– Google: 3B
– FAST: 3B
– AV: 1B
• Not all Documents are equal
– Thin docs
• Disparity between claimed and reported
Source: Search Engine Showdown
![Page 15: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/15.jpg)
15
Online Advertising
• Internet accounts for 30+% of viewing time
– Yet only 4% of spend
– $370B overall• $10B online
• Fastest growing advertising segment
• Steady shift toward Online advertising
Source: The Economist
![Page 16: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/16.jpg)
16
The Keyword Marketplace
• The great, unsung, search problem– Matching relevant ads to user intent
• Example of distributed authorship– Advertisers bid on keywords– Discounts for good performance (relevance)
![Page 17: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/17.jpg)
17
What’s New?
![Page 18: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/18.jpg)
18
Verticals
Image Search
Product Search
![Page 19: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/19.jpg)
19
Local
![Page 20: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/20.jpg)
20
Personal Search
![Page 21: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/21.jpg)
21
Contextual Search
![Page 22: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/22.jpg)
22
Desktop Search
![Page 23: Pedersen](https://reader038.fdocuments.us/reader038/viewer/2022102806/5585b46bd8b42a49548b481e/html5/thumbnails/23.jpg)
23