Working of search engine

12
Working Of “Search Engine” Nikhil D-1 14BTCSERS033 Maths Assignment

Transcript of Working of search engine

Page 1: Working of search engine

Working Of “Search Engine”

Nikhil D-1

14BTCSERS033Maths Assignment

Page 2: Working of search engine

What is Search Engine ?

“A web search engine is a software system that is designed to search for information on the World Wide Web.”

Page 3: Working of search engine

Purpose of Search Engines

Helping people find what they’re looking for:• Starts with an “information need”• Convert to a query• Gets results

Page 4: Working of search engine

Types of Search Engines

• Search by Keywords (e.g.AltaVista,Google)

• Search by categories (e.g. Yahoo)

Page 5: Working of search engine

The Parts of a Search Engine

Spider (or “crawler”)

Index

Search software (an algorithm)

Page 6: Working of search engine

The “spider” or “crawler”

The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being "spidered" or "crawled". This is also known as “harvesting”. The spider returns to the site on a regular basis, such as every month or two, to look for changes.

Page 7: Working of search engine

The Indexer

Everything the spider finds goes into the second part of a search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated new information.

Page 8: Working of search engine

Search engine software

It is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant.

Page 9: Working of search engine

Variations of the tf–idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document's relevance given a user query.

Term Frequency–Inverse Document Frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection.

TF-IDF Ranking Algorithm

wij = weight of Term Tj in Document Ditfij = frequency of Term Tj in Document DjN = number of Documents in collectionn = number of Documents where term Tj occurs at least once

Page 10: Working of search engine

• The equation: PR(A) = (1-d) + d(PR(t1)/C(t1) + … + PR(tn)/C(tn))• Used by WebQuery and Google• Google simulates users using the search engine to

rank documents.• Google uses citation graph (518 million links)• Google computes 26 million in a few hours.

PageRank

Page 11: Working of search engine

PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites

Page 12: Working of search engine

The End

Thank you for listening patiently.