Working of search engine

Post on 16-Apr-2017

368 views 0 download

Transcript of Working of search engine

Working Of “Search Engine”

Nikhil D-1

14BTCSERS033Maths Assignment

What is Search Engine ?

“A web search engine is a software system that is designed to search for information on the World Wide Web.”

Purpose of Search Engines

Helping people find what they’re looking for:• Starts with an “information need”• Convert to a query• Gets results

Types of Search Engines

• Search by Keywords (e.g.AltaVista,Google)

• Search by categories (e.g. Yahoo)

The Parts of a Search Engine

Spider (or “crawler”)

Index

Search software (an algorithm)

The “spider” or “crawler”

The spider visits a web page, reads it, and then follows links to other pages within the site. This is what it means when someone refers to a site being "spidered" or "crawled". This is also known as “harvesting”. The spider returns to the site on a regular basis, such as every month or two, to look for changes.

The Indexer

Everything the spider finds goes into the second part of a search engine, the index. The index, sometimes called the catalog, is like a giant book containing a copy of every web page that the spider finds. If a web page changes, then this book is updated new information.

Search engine software

It is the third part of a search engine. This is the program that sifts through the millions of pages recorded in the index to find matches to a search and rank them in order of what it believes is most relevant.

Variations of the tf–idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document's relevance given a user query.

Term Frequency–Inverse Document Frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection.

TF-IDF Ranking Algorithm

wij = weight of Term Tj in Document Ditfij = frequency of Term Tj in Document DjN = number of Documents in collectionn = number of Documents where term Tj occurs at least once

• The equation: PR(A) = (1-d) + d(PR(t1)/C(t1) + … + PR(tn)/C(tn))• Used by WebQuery and Google• Google simulates users using the search engine to

rank documents.• Google uses citation graph (518 million links)• Google computes 26 million in a few hours.

PageRank

PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites

The End

Thank you for listening patiently.