Best Web Directories and Search Engines
Best Web Directories and Search Engines
Order Out of Chaos on the World Order Out of Chaos on the World Wide WebWide Web
How People Search on the WebHow People Search on the Web
Input URLs, surf linksInput URLs, surf links Subject directoriesSubject directories Search enginesSearch engines Metasearch enginesMetasearch engines
Web DirectoriesWeb Directories
Small, selective databasesSmall, selective databases Created by humans not machinesCreated by humans not machines Editors select and place sites into Editors select and place sites into
categories for easy retrievalcategories for easy retrieval User browses categories and links to sitesUser browses categories and links to sites
Why Use Directories?Why Use Directories?
Identify quality, major sitesIdentify quality, major sites Get overview, general information Get overview, general information
on topicon topic Serendipity in discovery as result of Serendipity in discovery as result of
manipulating a smaller, more manipulating a smaller, more focused filefocused file
High Quality DirectoriesHigh Quality Directories
Librarian’s Index to the InternetLibrarian’s Index to the Internet InformineInformine Academic InfoAcademic Info WWW Virtual LibraryWWW Virtual Library
Top Directories – Less Selective Top Directories – Less Selective
Yahoo!: 1,800,000+ Yahoo!: 1,800,000+
Open Directory: 2,600,000+ Open Directory: 2,600,000+
LookSmart:LookSmart: 2,500,000+ 2,500,000+
HyperResearch GuideHyperResearch Guide
How Directories Work How Directories Work
Browse subject categoriesBrowse subject categories Funnel: category to topic, web site to pageFunnel: category to topic, web site to page HealthHealth Yahoo! LookSmartYahoo! LookSmart
FitnessFitness Open Directory Open Directory
• YogaYoga Most popular sitesMost popular sites
yogaclass.comyogaclass.com
http://www.yogaclass.com/http://www.yogaclass.com/
Directory Search BoxesDirectory Search Boxes
When to use Yahoo SearchWhen to use Yahoo Search Subject categories don’t match topicSubject categories don’t match topic Want broad search of WebWant broad search of Web
Why results are differentWhy results are different Directory searches only Yahoo’s selected Directory searches only Yahoo’s selected
sites sites Search box combines Yahoo directory sites Search box combines Yahoo directory sites
and full Web search results (from engine) and full Web search results (from engine)
Top Web Directories Top Web Directories
Yahoo!: 1,800,000+ Yahoo!: 1,800,000+
Open Directory: 2,600,000+ Open Directory: 2,600,000+
LookSmart:LookSmart: 2,500,000+ 2,500,000+
HyperResearchHyperResearch Guide Guide
Web Search EnginesWeb Search Engines
What Are Search EnginesWhat Are Search Engines
SoftwareSoftware Captures web sites, pagesCaptures web sites, pages Indexes full-text of web pageIndexes full-text of web page Provides interface to search web pagesProvides interface to search web pages
DatabaseDatabase Large, billions of pages (unlike directories)Large, billions of pages (unlike directories) Computer built (robots, spiders)Computer built (robots, spiders) No selectivity, no evaluationNo selectivity, no evaluation
Why Use Search Engines?Why Use Search Engines?
Have already identified major sites Have already identified major sites from directoryfrom directory
Could find very little in directoryCould find very little in directory Want everything, comprehensive Want everything, comprehensive
information on a topicinformation on a topic Note: need to judge quality of sites Note: need to judge quality of sites
since engines are NOT selectivesince engines are NOT selective
How Search Engines WorkHow Search Engines Work
Spider comb, “capture” web pagesSpider comb, “capture” web pages Software builds databaseSoftware builds database Words from web pages “indexed”Words from web pages “indexed” Search interface finds words on Search interface finds words on
pagespages Engine ranks, describes resultsEngine ranks, describes results How engines and directories differHow engines and directories differ
Spiders Comb, Capture Web PagesSpiders Comb, Capture Web Pages
Software decides which web pages to Software decides which web pages to collectcollect
Spiders check for updated pages Spiders check for updated pages Spiders remove dead sitesSpiders remove dead sites
Spider Software Builds DatabaseSpider Software Builds Database
Current web size: over 15 billion pagesCurrent web size: over 15 billion pages No engine’s database covers it allNo engine’s database covers it all
Google covers 29% (4.3 billion+)Google covers 29% (4.3 billion+) AlltheWeb covers 21% (3.2 million+)AlltheWeb covers 21% (3.2 million+) HotBot covers 20% (3 billion+)HotBot covers 20% (3 billion+) Teoma covers 10% (1.5 billion)Teoma covers 10% (1.5 billion)
Words from Web Pages “Indexed”
Words from Web Pages “Indexed”
““Index” is list of words in database Index” is list of words in database linked to words in Web pages linked to words in Web pages
Some engines index full text in documentSome engines index full text in document Some index part of textSome index part of text
First 100 words in documentFirst 100 words in document Words in abstract, or title of documentWords in abstract, or title of document
How an engine indexes affects search How an engine indexes affects search resultsresults
Search Interface Finds Web PagesSearch Interface Finds Web Pages
Provides keyword search boxProvides keyword search box
Offers simple or advanced searchingOffers simple or advanced searching Offers search options to affect results:Offers search options to affect results:
Most assume AND between words: Russian mafiaMost assume AND between words: Russian mafia Most accept “quotes” to search a PHRASE: Most accept “quotes” to search a PHRASE:
“Russian mafia”“Russian mafia” Most allow FIELD searches : ti:Russian mafiaMost allow FIELD searches : ti:Russian mafia
AlltheWebAlltheWeb
Engine Ranks, Describes Results Engine Ranks, Describes Results
Software lists most “relevant” items firstSoftware lists most “relevant” items first Word popularity: word repetitions, locationWord popularity: word repetitions, location Site popularity – visitations of web siteSite popularity – visitations of web site Link popularity – how often link citedLink popularity – how often link cited
Results describedResults described Few words to a paragraphFew words to a paragraph Sometimes stars, other indicators of Sometimes stars, other indicators of
relevancyrelevancy
How Engines and Directories Differ
How Engines and Directories Differ
Computers vs peopleComputers vs people Engine spiders not editors select documentsEngine spiders not editors select documents
Quantity vs qualityQuantity vs quality Engines big: want all, accept anythingEngines big: want all, accept anything Directories small: want “best” “important”Directories small: want “best” “important”
Technology vs human judgmentTechnology vs human judgment Engine software ranks, no human evaluationEngine software ranks, no human evaluation
Top Search Engines Top Search Engines
GoogleGoogle 4.2 billion+4.2 billion+ AlltheWebAlltheWeb 3.2 billion+3.2 billion+ HotBot (Inktomi)HotBot (Inktomi) 3 billion+3 billion+ TeomaTeoma 1.5 billion+1.5 billion+
HyperResearch GuideHyperResearch Guide
Metasearch EnginesMetasearch Engines
Metasearch EnginesMetasearch Engines
Technologies that search several Technologies that search several search engines at the same timesearch engines at the same time
ProsPros
Increase results when search engine Increase results when search engine produce littleproduce little
Save time by searching several engines at Save time by searching several engines at onceonce
Show results of several engines on one Show results of several engines on one pagepage
ConsCons Retrieve too many hitsRetrieve too many hits Retrieve less relevant resultsRetrieve less relevant results
Do not individualize search syntax all Do not individualize search syntax all engines they searchengines they search Do not know whether to use and or AND, +, or Do not know whether to use and or AND, +, or
“or” OR, cannot interpret phrase, title search “or” OR, cannot interpret phrase, title search etc.etc.
Exclude certain large engines like GoogleExclude certain large engines like Google
Top Metasearch EnginesTop Metasearch Engines
DogpileDogpile Refines results, covers major enginesRefines results, covers major engines
VivisimoVivisimo Categorizes results, narrows topicsCategorizes results, narrows topics
Ez2findEz2find Includes most major enginesIncludes most major engines
A Few Words About the Web and Search Engines
A Few Words About the Web and Search Engines
What’s In Search Engines?What’s In Search Engines?
Business, commercial informationBusiness, commercial information Organizational publicationsOrganizational publications Government resourcesGovernment resources Some magazine, newspaper articlesSome magazine, newspaper articles Some scholarly informationSome scholarly information
Teaching materials, unpublished articlesTeaching materials, unpublished articles Books, articles whose copyright expiredBooks, articles whose copyright expired
What’s Not in Search EnginesWhat’s Not in Search Engines
Books under copyrightBooks under copyright Most Fiction, non-fiction in existenceMost Fiction, non-fiction in existence
Journal, magazine, newspaper articlesJournal, magazine, newspaper articles Most current and past researchMost current and past research
Reference materialsReference materials Recent, quality, expensive encyclopedias, Recent, quality, expensive encyclopedias,
handbooks, business advisory services, etc.handbooks, business advisory services, etc. In shortIn short
Bulk of human knowledge and researchBulk of human knowledge and research
Search TipsSearch Tips
Check “advanced” search and optionsCheck “advanced” search and options Learn about AND, OR, ANY, ALL, PHRASELearn about AND, OR, ANY, ALL, PHRASE Know how to search in titles, URLsKnow how to search in titles, URLs Spell it rightSpell it right Switch engines, get different resultsSwitch engines, get different results Keep up to date about search enginesKeep up to date about search engines
Newspapers and magazinesNewspapers and magazines Library web sitesLibrary web sites
Evaluating Web SitesEvaluating Web Sites AccuracyAccuracy
Is information reliable? Is information reliable? What does URL tell you (.com, .org, .gov, .edu)?What does URL tell you (.com, .org, .gov, .edu)?
AuthorityAuthority Author’s credentials? Address, email given? Author’s credentials? Address, email given?
Content and CurrencyContent and Currency Purpose of site: inform, sell, propagandize? Date?Purpose of site: inform, sell, propagandize? Date?
DocumentationDocumentation Are sources given, footnotes?Are sources given, footnotes?
Find and EvaluateFind and Evaluate
Use Google and find Website titled:Use Google and find Website titled:The Burmese Mountain DogThe Burmese Mountain Dog
Evaluate this site forEvaluate this site for AccuracyAccuracy AuthorityAuthority Content and CurrencyContent and Currency DocumentationDocumentation
Is it a trustworthy Web site?Is it a trustworthy Web site?
Top Related