Search Engine Techniques

download Search Engine Techniques

of 23

Transcript of Search Engine Techniques

  • 8/9/2019 Search Engine Techniques

    1/23

  • 8/9/2019 Search Engine Techniques

    2/23

    Terminologyy Search Enginey Web crawler

  • 8/9/2019 Search Engine Techniques

    3/23

    Search Enginey A Web search engine is a tool or a p rogr am

    designe d to search for inform ation on the WWW

    on the basis of specified keywords and returns a listof the documents where the keywords were found.

    y The search resul ts are usu ally presen ted in a listand a re commonly called hits . The inform ation may consis t of webpa ges, images, inform ation and other types of files

  • 8/9/2019 Search Engine Techniques

    4/23

    W eb Crawler y A Web crawler is a computer progr am that b rowses

    the WWW in a methodical, autom ated manner .

    y O ther terms for Web craw lers are ants , automatic indexers , bots , and worms or Web

    spider , Web robot , Web scutter

    y This process is called Web crawling or spidering .

  • 8/9/2019 Search Engine Techniques

    5/23

    W hy search engine ???y Internet environment is having huge amount of textual

    information yet it was just impossible to find anything

    y And t hus there is a nee d of a p rogr am or a tool that keep t rack of this all inform ation and t his is what the SEARCH ENGINE p rovi des.

  • 8/9/2019 Search Engine Techniques

    6/23

    Search Engine Wo rkingy A search engine operates, in the following order

    1. Web craw ling :- Document Gathering step2. Indexing :- Document Arrangement Step3. Information Extraction and Storing in DB4.User Reques t :- Request for specific

    k eyword5. Searching :- Query Building and

    Execution

    6.Response to user

  • 8/9/2019 Search Engine Techniques

    7/23

    History Of Search Engine

    A rchie

    Ver o nica and JugheadExciteYah ooL

    yco

    s A lta Vista

  • 8/9/2019 Search Engine Techniques

    8/23

    K eyw o rd searchingy M ost common form of text search on the Weby K eyword specified b y the user is searche dy Those keywords woul d a ctually tell a user

    some thing abou t the subject and content of this pa ge.y It's up t o the search engine to determine the type

    keywordy They may refer to the words specified a s the title of

    the documen ts or their first line content for M ATCHING p ur pose

  • 8/9/2019 Search Engine Techniques

    9/23

    K eyw o rd searchingP roblems with keyword searchingy

    Same spelled KEYWORDy Stemming Problemy Synonym Problem

  • 8/9/2019 Search Engine Techniques

    10/23

    Stemming Pr o blem

    BIG_

    Search Engine XYZ

    GO!

    Should I check for the BIGGERBiggest..??

  • 8/9/2019 Search Engine Techniques

    11/23

    Syn o nym Pr o blem

    BIG_

    Search Engine XYZ

    GO!

    I am not going to return the documents havingsynonym of heart , CARDIAC

  • 8/9/2019 Search Engine Techniques

    12/23

    Same spelledK

    EYWORD

    BIG_

    Search Engine XYZ

    GO!

    H ard drive 100KB www .H ddve .com

    Ha rd Exam 115 KB www.Hd exm .cmHa rd stone 105 KB www.H rdsto.com

    It will return..the following

    Most of these are IRRELEVANT to the user ,Also the problem of CASE SENSITIVITY

  • 8/9/2019 Search Engine Techniques

    13/23

    R efined Searchingy ADVANCED SEARCHy C riteria of searching is given by the usery Uses BOOL EAN operatorsy Allow t he user to

    Search entire phr ase ,Field Searching ,specify what form he woul d like his resul ts to

    app ear in ,res trict his search to certain fields on the interne t(i.e., usene t or the Web)

  • 8/9/2019 Search Engine Techniques

    14/23

    B OOL E AN o perat o rsy Boolean ANDFCC AND WIRELESS

    ANDCOMMU NI C ATI O N

    y Boolean ORFCC O R WIRELESS

    O RCOMMU NI C ATI O N

  • 8/9/2019 Search Engine Techniques

    15/23

    y Boolean AND NOT

    B OOLE

    AN o

    perato

    rs

    Boolean +/- : + AND- AND NOT

  • 8/9/2019 Search Engine Techniques

    16/23

  • 8/9/2019 Search Engine Techniques

    17/23

    F ield Searchy M ost effective technique for narrowing resultsy A w eb pa ge is has a num ber of fields, such as title,

    dom ain, hos t, URL , and link. Searching effectiveness incre ases as you combine fieldsearches with phr ase searches and Boolean logic .

    +

    title:Thailand" - Image willreturn the indicated result

  • 8/9/2019 Search Engine Techniques

    18/23

    Co ncept-based searchingy Semantic Searchy Conce pt-ba sed search sys tems try to determine what

    user meany Returns hits on documen ts that a re "abou t" the

    su bject/t heme user exploring, even if the words in the documen t d on 't p recisely match the words you enter into the query .

  • 8/9/2019 Search Engine Techniques

    19/23

    Co ncept-based searchingy Builds clus tering sys tems ,and no te the no of

    frequencies occurring in the documen ty The higher the frequency , the higher the ranking

    of the documen t

    y Example

  • 8/9/2019 Search Engine Techniques

    20/23

    Co ncept-based searchingSearch Engine XYZ

    GO!

    It will return the documents related to medical\healthscience

  • 8/9/2019 Search Engine Techniques

    21/23

    Co ncept-based searchingSearch Engine XYZ

    GO!

    A concept-oriented search engine returns hits onthe subject of romance.

  • 8/9/2019 Search Engine Techniques

    22/23

    P o pularityy The chart sho ws the

    percen tage of online

    searches done by UShome and w ork w ebsur fers in July 2006that w ere performe dat a pa rticular search engine

  • 8/9/2019 Search Engine Techniques

    23/23

    Current Statusy The grap h sho ws how

    the share of searches

    has change d over the pa st few mon ths, for those search sites with a share of 5 percen t or higher :