Pedersen

23
1 Internet Search Engines: Past and Future Jan Pedersen Chief Scientist, Yahoo! Search 11 April 2005

Transcript of Pedersen

Page 1: Pedersen

1

Internet Search Engines: Past and Future

Jan PedersenChief Scientist, Yahoo! Search

11 April 2005

Page 2: Pedersen

2

Outline

• Compare and contrast

– 2005 vs 1998

• Some underlying trends

• What’s New

Page 3: Pedersen

3

Search Landscape 2005

• Three major Players

– Google ($52B)

– Yahoo ($42B)

– MSN ($271B)

• $6+B in Paid Search Revenues

• >400M searches daily

• 5-8B claimed index size

• Excellent relevanceSource: Search Engine Watch

Page 4: Pedersen

4

Search Landscape in1998

Page 5: Pedersen

5

Infoseek circa 1998

• IPO in 1996

– Same year as Excite and Yahoo!

• 5th ranked destination site

• Market cap of $1B

• Competed as a portal

– Content was king

• Ultimately sold to Disney

– Go network

• Decommissioned in 2001

Page 6: Pedersen

6

Infoseek Search Technology

• 1.5 Generation Search– Tuned for relevance

• Proximity• Anchor text (ESP)• Inlink counting

– Small but competant index• 60M pages • Alta Vista was 140M at that time

• Still exists as Ultraseek server– Sold to Inktomi in 2000, later resold to Verity

Page 7: Pedersen

7

Infoseek UI

Page 8: Pedersen

8

Comparison to State-of-the-art

Page 9: Pedersen

9

What was missing?

• Business Model

– Monetized via untargeted banner ads

– Need for increased inventory• Portalitis

• Clutter

• Lack of focus• Search was deprioritized

Page 10: Pedersen

10

Some Underlying Trends

Page 11: Pedersen

11

Moore’s Law

Page 12: Pedersen

12

Index Size

• ~150M in 1998

• ~5B in 2005– 33x increase

– Moore would predict 25x

• Monthly refresh in 1998

• Daily refresh in 2005

• What about 2010?– 40B?

• Where is the content?– Public Web?

– Personal Web?

Source: Search Engine Watch and Search Engine Showdown

Page 13: Pedersen

13

Meaning of Hit Counts

• Hits Counts are estimated

– Indices are tiered

– Estimates can be non-linear

Source:

http://aixtal.blogspot.com/2005/01/web-googles-counts-faked.html

Page 14: Pedersen

14

Meaning of Document Counts

• Claimed index Size

– Google: 3B

– FAST: 3B

– AV: 1B

• Not all Documents are equal

– Thin docs

• Disparity between claimed and reported

Source: Search Engine Showdown

Page 15: Pedersen

15

Online Advertising

• Internet accounts for 30+% of viewing time

– Yet only 4% of spend

– $370B overall• $10B online

• Fastest growing advertising segment

• Steady shift toward Online advertising

Source: The Economist

Page 16: Pedersen

16

The Keyword Marketplace

• The great, unsung, search problem– Matching relevant ads to user intent

• Example of distributed authorship– Advertisers bid on keywords– Discounts for good performance (relevance)

Page 17: Pedersen

17

What’s New?

Page 18: Pedersen

18

Verticals

Image Search

Product Search

Page 19: Pedersen

19

Local

Page 20: Pedersen

20

Personal Search

Page 21: Pedersen

21

Contextual Search

Page 22: Pedersen

22

Desktop Search

Page 23: Pedersen

23