How Search Engines Work?

20
1 How Search Engines Work? Ziv Bar-Yossef Department of Electrical Engineering Technion

description

How Search Engines Work?. Ziv Bar-Yossef Department of Electrical Engineering Technion. What is the Internet?. A global network of computers connected to each other Computers “talk” to each other using standard protocols TCP/IP. What is the World-Wide Web (WWW)?. - PowerPoint PPT Presentation

Transcript of How Search Engines Work?

Page 1: How Search Engines Work?

1

How Search Engines Work?

Ziv Bar-Yossef

Department of Electrical Engineering Technion

Page 2: How Search Engines Work?

2

What is the Internet?

A global network of computers connected to each other

Computers “talk” to each other using standard protocols TCP/IP

Page 3: How Search Engines Work?

3

What is the World-Wide Web (WWW)?

Collection of pages available via the Internet Internet users can view

pages with web browsersWWW is only one

application of the InternetOther applications: email,

messengers, VOIP, newsgroups, ftp

Page 4: How Search Engines Work?

4

Web Pages Various formats

pdf, word, excel, images, mp3, video, text

Most popular format: HTMLHTML pages point

to each other using hyperlinks

Users “surf the web” by clicking hyperlinks

Page 5: How Search Engines Work?

5

What are Search Engines?

Users have “information needs” Where can I find solutions to my math homework

problem? Where can I find mp3s of Miri Messika’s latest album? What is the weather in Eilat in Channuka? What other Sharons are famous except for our prime

minister?

Search engines enable us to find web pages that match our information needs

Page 6: How Search Engines Work?

6

What other Sharons are famous, except for

our prime minister?

Search Engines

queryUser

“Information Need”

sharon -ariel

1. Sharon Creech2. Sharon Stone3. Sharon, Massachusetts

Ranked list of matching pages

Search Engine

Search Engine

Web pages

Web

Page 7: How Search Engines Work?

7

How Search Engines (don’t) Work?

queryUser

sharon -ariel

1. Sharon Creech2. Sharon Stone3. Sharon, Massachusetts

Ranked list of matching pages

Web pages

Common misconception: when user submits a query, the search engine scans all web pages to find the relevant matches

Search Engine

Search EngineWeb

Page 8: How Search Engines Work?

8

How Search Engines Work?

queryUser

1. Sharon Creech2. Sharon Stone3. Sharon, Massachusetts

Ranked list of matching pagesWeb pages

What do you do when you look for a term in an encyclopedia? Use the index!

Web

Search Engine

index

sharon -ariel

Page 9: How Search Engines Work?

9

Search Engine Architecture

CrawlerCrawler

Search Engine

IndexIndex

RankingAlgorithmRanking

AlgorithmQuery

ProcessorQuery

Processor

Page 10: How Search Engines Work?

10

Web Crawler (a.k.a. Spider)

Fetches web pages and stores them in a local repository

Tries to get as many web pages as possible

Follows hyperlinks to learn about new pages

Refetches pages that change frequently

Page 11: How Search Engines Work?

11

The Index

Ariel1 Sharon2, the3 prime4 minister5 of6 Israel7 founded8 a9 new10 political11 party12.

Sharon1 Stone2 dressed3 a4 new5 Jean6 Paul7 Gaultier8 gown9 at10 the11 Oscars12 after13 party14.

www.cnn.com

ariel: (cnn.com,1)

dress: (hollywood.com,3)

found: (cnn.com,8)

gaultier: (hollywood.com,8)

gown: (hollywood.com,9)

israel: (cnn.com,7)

jean: (hollywood.com,6)

minister: (cnn.com,5)

new: (cnn.com,7), (hollywood.com, 5)

oscar: (hollywood.com,12)

party: (cnn.com,12), (hollywood.com,14)

paul: (hollywood.com,7)

political: (cnn.com,11)

prime: (cnn.com,4)

sharon: (cnn.com,2), (hollywood.com,1)

stone: (hollywood.com,2)

Index

www.hollywood.com

Page 12: How Search Engines Work?

12

Index by “Anchor Text”

Anchor text: what’s written inside a linkExample: Ariel Sharon, the prime minister…

Usually succinctly describes what’s written in the linked page

By which terms a page is listed in the index?Terms that appear in the pageTerms that appear in anchor text of links to the

page

Page 13: How Search Engines Work?

13

Query Processor

Gets a user query Fetches relevant posting lists from index Extracts relevant matches from lists Example: Query = “sharon –ariel”

L1 posting list of sharon sharon: (cnn.com,2), (hollywood.com,1)

L2 posting list of ariel ariel: (cnn.com,1)

Return all pages in L1 that do not occur in L2

cnn.com

Page 14: How Search Engines Work?

14

Ranking Algorithm

Many queries have many matching pages 472 million matches for “London” in Google

Cannot return all of them to the user User needs the most relevant results anyway

Need to order results by relevance Most relevant results are at the top

Ranking algorithm: a method of ordering matches The “heart” of a search engine The reason why Google is the most preferred search

engine today

Page 15: How Search Engines Work?

15

Google’s PageRank Ranking Elections

Candidates: all web pages Voters: all web pages p votes to q, if p has a hyperlink to q.

Favorites(p) = all the pages p votes for. Fans(p) = all the pages that vote for p.

1 if p has no fans

Page 16: How Search Engines Work?

16

Google’s PageRank

Underlying principles:A page is “important” if it has important fansA page splits its “importance” evenly among its

favorite pages.

1

1

1

1

1.5

2.5

4

Page 17: How Search Engines Work?

17

Google’s PageRank

Ranking algorithm:Find pages that match the given queryOrder them by their PageRankReturn top 10 matches

Page 18: How Search Engines Work?

18

But…PageRank Not Always Works

SPAM

Page 19: How Search Engines Work?

19

Conclusions

Search engines use index to answer user queries

Ranking is the most important component Spam is a problem

Page 20: How Search Engines Work?

20

Thank You