The Invisible Web

27
When Google Isn’t Enough! Finding Information on the Invisible Web Yaacov Taube [email protected] o.il

Transcript of The Invisible Web

Page 1: The Invisible Web

When Google Isn’t Enough!

Finding Information on the Invisible Web

Yaacov [email protected]

Page 2: The Invisible Web

What is the Visible (Surface) Web?

“It’s made up of HTML Web pages that the search engines have chosen to include in their

indices. It’s no more complicated than that.”

Sherman and Price.

Page 3: The Invisible Web

What is the Visible (Surface) Web?

•A collection of webpages •Searchable with “search engines”•What you and I think of as the “Internet” is actually only a small portion of the Internet

Page 4: The Invisible Web

What is the Visible (Surface) Web?

•High volume

•Mass appeal

•High value

•Small percentage of web content –Exception: Google books and Google Scholar

Page 5: The Invisible Web

What is the Invisible Web?

•What search engines do not search•Searchable Databases

–Tens of Thousands–Accessible and searchable via the Internet–Results often dynamically generated in specific response to your request (eBay, MapQuest, etc.)

Page 6: The Invisible Web

What is the Invisible Web?•Excluded Pages

–Excluded per search engine–Excluded per webpage by the owner of the site

•Typically databases–Businesses–Governments–Schools–Libraries –Associations

Page 7: The Invisible Web

What is the Invisible Web?•Academic•Never been indexed or linked•Uniquely generated pages•Proprietary •Confidential•Protected by username & password•Constitutes the majority of the webpages on the Internet

Page 8: The Invisible Web

•The Invisible Web is about 550 times larger than the visible web and is growing much faster•The deep Web consists of about 91,000 terabytes. •The surface Web is only about 167 terabytes1•The Library of Congress contains about 11 terabytes. •Quality content is 1,000 to 2,000 times greater than surface web•95% of the Deep Web is accessible to public (no fees or subscription required)

•based on extrapolations from a study done at University of California, Berkeley

Visible vs. Invisible Web

Yaacov Taube
based on extrapolations from a study done at University of California, Berkeley
Page 9: The Invisible Web

Opaque Web

Private Web

Proprietary Web

Pay per click

What is on the Invisible Web

Page 10: The Invisible Web

• Requires payment

• Requires registration

• Dynamically generated

• Very new

• Website specifically stops spiders

Why can’t Google find it?

Page 11: The Invisible Web

• Fixed, or Could be indexed, but is not

• Deemed not important enough

• Too new and therefore not linked

• Never makes max results cutoff

• No one ever linked or submitted URL

Opaque Web

Page 12: The Invisible Web

Private Web

• Deliberately excluded– Password– Special coding in website stops spiders

• Only for select individuals– Employees– Students – Researchers

Page 13: The Invisible Web

Proprietary Web• Protected

– Password – Registration (N.Y. Times, eBay, banks, etc.) – Terms of Use

• Anyone can access if you – Pay – Register– Agree to terms

Page 14: The Invisible Web

Pay per click

Search Engine Marketing toolsEx: overture.com, FindWhat.com

Page 15: The Invisible Web

When do I use ….

• Portal or Directory?

• Search Engine?

• Invisible Web?

Page 16: The Invisible Web

Portal or Directory

• You have a general topic• You know little about the subject• You do not know keywords • You want someone or something to

have sorted out the junk• You need an exploratory overview

Page 17: The Invisible Web

Search Engine

• You are looking for something specific• You have keywords• You are pretty sure the information is

– advertised or – otherwise generally disseminated

Page 18: The Invisible Web

Tips for search engines

• Use a toolbar• Determine the key words/phrases

most likely to be in your document and nowhere else

• Learn and use Boolean Operators• Scan results • Question the results

Page 19: The Invisible Web

Invisible Web• You are pretty sure the information is in a

specific database • Need something authoritative• Speed• The information is dynamically generated• You are familiar with the database

– Search techniques– Protocols– Access requirements

Page 20: The Invisible Web

Searching the Invisible Web• Directories – subject guide compiled by

human editors

• Specialized Search Engines– http://library.albany.edu/internet/choose.html

• Special Databases ( Library of Congress,Library of Congress

http://catalog.loc.gov

LookSmart’s Find Articles (over 900 publicationshttp://www.findarticles.com

National Science Digital Libraryhttp://www.nsdl.org

Singing Fish – audio and videohttp://www.singingfish.com

Page 21: The Invisible Web

Special Databases

• Library of Congress– http://catalog.loc.gov

• LookSmart’s Find Articles (over 900 publications)– http://www.findarticles.com

• National Science Digital Library– http://www.nsdl.org

• Singing Fish – audio and video– http://www.singingfish.com

Page 22: The Invisible Web

Types of Databases

Information stored in tables (Access, Oracle, SQL Server, DB2) and accessible only by query.

Examples: • Phone books, People finders, • Patents, laws• Items for sale in a Web store or Web-based auctions • Digital exhibits• Multimedia and graphical files• Stock and bond prices

Page 23: The Invisible Web

Types of Hidden Info

• Pages in searchable databases: medical (WebMD.com), patent, scientific, legal (Lexis and Westlaw), reference

• Pages requiring login or registration: Social Sites, New

York Times, web based applications, calendars, Google Docs, etc. • Government publications or databases: ERIC,

usa.gov • Online databases: Gale Research• PDF files, audio, video, any new format

Page 24: The Invisible Web

More hidden stuff

• Dictionaries and thesauri• Sites that require forms to be filled out (ex:

travel direction, job hunting)• Product catalogs and library catalogs• Newspaper and magazine archives• Dynamic web pages (ex: airline flight

checkers, mapquest)• Interactive tools (ex: calculators &

measurement converters)

Page 25: The Invisible Web

Access to invisible web is improving …

Google Books http://books.google.com/

Google Scholar http://scholar.google.co.il/

Page 26: The Invisible Web

Maybe Consider …

• Specialized Databases such as Dialog, Nexis Lexis, Factiva, etc. (not cheap)

• Use an Information Professional www.aiip.org

Page 27: The Invisible Web

To Conclude …

Focus and continue doing what you do best and what you have been trained for and let an Information Professional find the info you need.

He is trained to do it faster, more effectively and efficiently than you or one of your employees. (www.aiip.org)