Hyper-Searching the Web. Search Engines Basic Search (index) Cluster Search (themes) Meta-search...

12
Hyper-Searching the Web

Transcript of Hyper-Searching the Web. Search Engines Basic Search (index) Cluster Search (themes) Meta-search...

Page 1: Hyper-Searching the Web. Search Engines Basic Search (index) Cluster Search (themes) Meta-search (outsource) “Smarter” meta-search (themes + outsource)

Hyper-Searching the Web

Page 2: Hyper-Searching the Web. Search Engines Basic Search (index) Cluster Search (themes) Meta-search (outsource) “Smarter” meta-search (themes + outsource)

Search Engines

Basic Search(index)

Cluster Search(themes)

Meta-search(outsource)

“Smarter” meta-search(themes + outsource)

Page 3: Hyper-Searching the Web. Search Engines Basic Search (index) Cluster Search (themes) Meta-search (outsource) “Smarter” meta-search (themes + outsource)

Basic search engine

• Examples: AltaVista, InfoSeek, HotBot, Lycos, Excite, Google, etc

• Maintains an index for every word found

• Processes through crawling, indexing, and returning results

Page 4: Hyper-Searching the Web. Search Engines Basic Search (index) Cluster Search (themes) Meta-search (outsource) “Smarter” meta-search (themes + outsource)

Basic search engine

• Different ranking systems used -most use heuristics (easiest solution) counts # of keywords that appear

-Google uses PageRank

Page 5: Hyper-Searching the Web. Search Engines Basic Search (index) Cluster Search (themes) Meta-search (outsource) “Smarter” meta-search (themes + outsource)

Basic search engine

• No idea of searcher’s intent so “best” result hard to achieve

• Problems with synonymy and polysemy ex. car and automobile ex. jaguar

• One solution: store semantic relations -only can help w/synonmy

• Can’t identify concepts/author intent ex. IBM site does not say “computer”

Page 6: Hyper-Searching the Web. Search Engines Basic Search (index) Cluster Search (themes) Meta-search (outsource) “Smarter” meta-search (themes + outsource)

Cluster search engine

• Example: Clusty

• Clusters results into categories/themes

• Can show results that would be ranked lower in another search engine -due to different meanings in words, can show the less searched-for

Page 7: Hyper-Searching the Web. Search Engines Basic Search (index) Cluster Search (themes) Meta-search (outsource) “Smarter” meta-search (themes + outsource)

Meta-search engine

• Examples: Dogpile, Surfwax, Copernic, etc• Sends searcher’s query to a database of

search engines• Claimed to not be any better than

database; often the referenced search engines are small, free, commercial

• Users can create their own on Google of up to 5,000 URLs as “database”

Page 8: Hyper-Searching the Web. Search Engines Basic Search (index) Cluster Search (themes) Meta-search (outsource) “Smarter” meta-search (themes + outsource)

“Smarter” meta-search engine

• Example: Clever project (n/a online yet)• Includes clustering and linguistic analysis

“cat”

AltaVista Yahoo

Google

Clever“cat”

“cat”

Cat – feline

Cat – power

Cat – equipment

Cat – scans

etc.

Page 9: Hyper-Searching the Web. Search Engines Basic Search (index) Cluster Search (themes) Meta-search (outsource) “Smarter” meta-search (themes + outsource)

The Clever Project

• Uses hyperlinks to locate hubs and authorities

“a respected authority is a page that is referred to by many good hubs; a useful hub is a location that points to many valuable authorities”

Page 10: Hyper-Searching the Web. Search Engines Basic Search (index) Cluster Search (themes) Meta-search (outsource) “Smarter” meta-search (themes + outsource)

The Clever Project

• Obtains a list of webpages from a standard index & follows hyperlinks to increase own database

-resulting collection = “root set” -each page gets numerical hub & authority score

Page 11: Hyper-Searching the Web. Search Engines Basic Search (index) Cluster Search (themes) Meta-search (outsource) “Smarter” meta-search (themes + outsource)

The Clever Project

• Similar to PageRank in determining method – guesses & constant calculations -useful by-product: clusters sites

• Adds to competition because competitors don’t have to acknowledge their competition through hyperlinks

Page 12: Hyper-Searching the Web. Search Engines Basic Search (index) Cluster Search (themes) Meta-search (outsource) “Smarter” meta-search (themes + outsource)

Clever vs. Google

GOOGLE - gives initial rankings

- keeps pages indpt. of queries

- faster

- looks forward “link to link”

CLEVER - root sets per keyword

- page priority through query context

- forwards & backwards “hub and authority”

- sometimes too broad ex. Fallingwater