Digital Libraries: What Should We Expect from Search Engines Dr. John M. Lervik, CEO FAST ECDL 2003.
Search engines and digital libraries
-
Upload
ram-sagar-mourya -
Category
Documents
-
view
219 -
download
0
Transcript of Search engines and digital libraries
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 1/42
© Tefko Saracevic 1
part 1: search engines
part 2: digital
libraries
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 2/42
© Tefko Saracevic 2
dictionary definitions
searchCOMPUTING (transitive verb) to examine a computer
file, disk, database, or network for particular
information
enginesomething that supplies the driving force or energy to
a movement, system, or trend
search enginea computer program that searches for particular
keywords and returns a list of documents in whichthey were found, especially a commercial servicethat scans documents on the Internet
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 3/42
© Tefko Saracevic 3
about definition of search
engines• oh well …
search engines do not search only for
keywords, some search for other stuff as well
• and they are really not “engines” in theclassical sensebut then mouse is not a “mouse”
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 4/42
© Tefko Saracevic 4
use of search engines… among others
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 5/42
© Tefko Saracevic 5
YourBrowser
How Search Engines Work(Sherman 2003)
The Web
URL1
URL2
URL3 URL4
Crawler
Indexer
SearchEngine
Database Eggs?Eggs.
Eggs - 90%
Eggo - 81%
Ego- 40%
Huh? - 10%
All AboutEggsby
S. I. Am
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 6/42
© Tefko Saracevic 6
how do search engineswork? elaboration
• crawlers, spiders: go out to findcontent in various ways go through the web
looking for new & changed sitesperiodic, not for each query
no search engine works in real time
some search engines do it for themselves,
others not buy content from companies such as Inktomi
for a number of reasons crawlers do notcover all of the web – just a fraction
what is not covered is “invisible web”
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 7/42© Tefko Saracevic 7
elaboration …
• organizing content: labeling, arranging indexing for searching – automatic
keywords and other fields
arranging by URL popularity - PageRank as Googleclassifying as directory
mostly human handpicked & classified
• as a result of different organization we
have basically two kinds of searchengines:
search – input is a query that is then searched &displayed
directory – classified content – a class is displayed–
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 8/42© Tefko Saracevic 8
elaboration (cont.)
• databases, caches: storing content humongous files usually distributed over many
computers
• query processor: searching, retrieval,display takes your query as input
engines have differing rules how handled
displays ranked output some engines also cluster output and provide
visualization
• at the other end is your browser
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 9/42© Tefko Saracevic 9
elaboration…similarities, differences
• all search engines have these basicparts in common
• BUT the actual processes – methodshow they do it – are based on variousalgorithms & they differ
most are proprietary with details kept
mostly secret but based on well knownprinciples from information retrieval orclassification
to some extent Google is an exception –
they published their method
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 10/42© Tefko Saracevic 10
case of
• developed by Sergey Brin andLawrence Page while students atStanford in the beginning run on Stanford computers
• basic approach has been described intheir famous paper“The Anatomy of a Large-ScaleHypertextual Web Search Engine”
well written, simple language, has their pictures in acknowledgement they cite the support by NSF’sDigital Library Initiative i.e. initially, Google cameout of government sponsored research
describe their method PageRank - based on rankinghyperlinks as in citation indexing
“We chose our system name, Google, because it is acommon spelling of googol, or ten on hundredth”
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 11/42© Tefko Saracevic 11
coverage differences
• no engine covers more than a fraction of WWW estimates: none more than 16%
hard (even impossible) to discern & compare coverage, butthey differ substantially in what they cover
• in addition: many national search engines
own coverage, orientation, governance many specialized or domain search engines
own coverage geared to subject of interest
many comprehensive sources independent of searchengines
some have compilations of evaluated web sources
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 12/42© Tefko Saracevic 12
searching differences
• substantial differences among searchengines on searching, retrieval display
need to know how they work & differ inrespect to defaults in searching a query
searching of phrases, case sensitivity, categories
searching of different fields, formats, types of
resources advance search capabilities and features
possibilities for refinement, using relevancefeedback
display options
personalization options
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 13/42
© Tefko Saracevic 13
business model
differencesseveral business models
• public good - have independent
budget e.g. PubMed, Librarians’ Index to Internet
• earn revenue from provision of information all commercial search engines
• using search engines to promote theirother activities
e.g. telephone directories
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 14/42
© Tefko Saracevic 14
sponsorship differences
• need to understand treatment of sponsorship – they influence what theysearch & how they display results
some list separately results fromsponsored sites so you are reasonablyclear what is there because it is sponsored& not
some have display-per-pay - showing first
sites that paid most & do not even tell youthat
some have pay per update of sites
• imperative to find sources that explain
these models for different engines to
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 15/42
© Tefko Saracevic 15
limitations
• every search engine has limitation as tocoverage
meta engines just follow coverage limitations & have
more of their ownsearch capabilities
finding quality information
• some have compromised search with
economicsbecoming little more than advertisers
• but search engines are also many times
victims of spamindexingaffectin what is included and how ranked
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 16/42
© Tefko Saracevic 16
spamming a search
engine• use of techniques that push rankings
higher than they belong is also called
spamdexingmethods typically include textual as well as
link-based techniques
like e-mail spam, search engine spam is a
form of adversarial information retrieval the conflicting goals of accurate results of search
providers & high positioning by content pagerank
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 17/42
© Tefko Saracevic 17
meta search engines
• meta engines search multipleengines
getting combined results from avariety of engines
• do not have their own databases
but have their own business modelsaffecting results
• a number of techniques usedinteresting ones: clustering, statistical
analyses
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 18/42
© Tefko Saracevic 18
how to find a search
engine?• variety of resources that list or categorizeengines
• SearchEngines.com
search for engines by topic, geography, referenceSearch Engine Guideengines categorized by topic; other engine information
Search Engine Colossus
international directory of search engines by country,topicfrom 198 countries and 61 territories; engines in choice of
languages
Phil Bradley’s country based search enginover 2000 serach engines from countries all over the globe
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 19/42
© Tefko Saracevic 19
sample of meta engines- with organized results
Dogpile
results from a number of leading search engines;gives source, so overlap can be compared; (hasalso a (bad) joke of the day)
Surfwax
gives statistics and text sources & linking tosources; for some terms gives related terms tofocus
Teomaresults with suggestions for narrowing; links
resources derived; originated at Rutgers
Turbo10
provides results in clusters; engines searched canbe edited
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 20/42
© Tefko Saracevic 20
meta search engines
(cont.)• Large directory
Complete Planet directory of over 70,000 databases & specialty
engines
• Results with graphical displays Vivisimo
clusters results; innovative
Webbrain results in tree structure – fun to use
Kartoo
results in display by topics of query
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 21/42
© Tefko Saracevic 21
domain engines &
catalogs• cover specific subjects & topics
• important tool for subject searches
particularly for subject specialistvalued by professional searchers
• selection mostly hand-picked ratherthan by crawlers, following inclusioncriteriaoften not readily discernable
but content more trustworthy
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 22/42
© Tefko Saracevic 22
domain engines … sample
Open Directory Project large edited catalog of the web – global, run by
volunteers
BUBL LINK selected Internet resources covering all academic
subject areas; organized by Dewey Decimal System– from UK
Profusion search in categories for resources & search engines
Resource Discovery Network – UK “UK's free national gateway to Internet resources
for the learning, teaching and research”
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 23/42
© Tefko Saracevic 23
domain engines … sample
Think Quest – Oracle Education Foundation • education resources, programs; web sites created by students
All Music Guide• resource about musicians, albums, and songs
Internet Movie Database• treasure trove of American and British movies
Genealogy links and surname search engineswell.. that is getting really specialized (and popular)
Daypopsearches the “living web” “The living web is composed of sites
that update on a daily basis: newspapers, online magazineand weblogs”
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 24/42
© Tefko Saracevic 24
science, scholarshipengines …sample free
access Psychcrawler - Amer Psychological
Association web index for psychology
Entrez PubMed – Nat Library of Medicinebiomedical literature from MEDLINE & health
journals
CiteSeer - NEC Research Center scientific literature, citations index; strong in
computer science
Scholar Googlesearches for scholarly articles & resources
Infominescholarly internet research collections
Scirusscientific information in journals & on the web
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 25/42
© Tefko Saracevic 25
science, scholarshipengines …sample
commercial access• an addition to freely accessible engines
many provide search free but access tofull text paid
by subscription or per itemRUL provides access to these & many
more:
ScienceDirect
Elsevier: “world's largest electronic collection of science,technology and medicine full text and bibliographic
information” ACM PortalAsoc. for Computing Machinery: access to ACM Digital Library &
Guide to Computing
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 26/42
© Tefko Saracevic 26
where to find out?
• information about search engines insources that have updates, news, tipsfor searching and more – a MUST forsearchers : Search Engine Watch
ratings, news, statistics, charts, explanations,tutorials
Search Engine Showdown “The users’ guide to web searching” - run by a
librarian, news links, ratings
Virtual Chase a site about “Teaching Legal Professionals How To
Do Research;,” this section has very good tipsand links for consideration of quality on the web
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 27/42
© Tefko Saracevic 27
where? ….
SiteLines
a blog, written by Rita Vine, a professionallibrarian, & web search trainer; many
evaluations in archiveResourceShelf
“Resources and News for InformationProfessionals,” edited by Gary Price, a librarian &author of Invisible Web – has extensive archive
WebsearchAbout
not evaluative, but provides news, capabilities,sources, articles about web searching
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 28/42
© Tefko Saracevic 28
art of searching search
engines
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 29/42
© Tefko Saracevic 29
part 2: digital libraries
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 30/42
© Tefko Saracevic 30
definition
• digital libraries are viewed from severalperspectivestechnical: “Digital library is a managed
collection of information, with associatedservices, where information is stored in digitalformat and accessible over a network.” (Arms,2000)
institutional: “Digital libraries are
organizations that provide the resources,including the specialized staff, to select,structure, offer intellectual access to,interpret, distribute, preserve the integrity of,and ensure the persistence over time of
collections of digital works so that they are
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 31/42
© Tefko Saracevic 31
a bit of context
• short but volatile history research & development took of by start/mid
1990’s
in the next decade phenomenal growthworldwide
large investment in research & building
• number of communities involvedcomputer science, primarily in researchmany subjects: digital libraries in their
domain
library & information science: operations,
studies of users, use, usability
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 32/42
© Tefko Saracevic 32
libraries & digital
resources• libraries (particularly research, academic &
special) directed massive funding toward
such resourceselectronic journals
databases
catalogs
digitization of parts of collection
• thus becoming in effect digital libraries –or more accurately hybrid librarieswith graphic and digital versions or types of
resources
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 33/42
© Tefko Saracevic 33
emphasis here
• on large academic or research digitallibraries that also are related to
searchingprovide search capabilities or access to
search engines
provide electronic journals that provide full
text of articles after a search• such libraries have become also search
portals of sort, essential for their users in education, research & related activities
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 34/42
© Tefko Saracevic 34
sample
New York Public Library Digital“NYPL Digital is your gateway to The New York Public Library’s
rare and unique collections in digitized form.” Includesaccess to searchable databases
U California Berkeley Digital LibrarySUNsite“builds digital collections and services while providing
information and support to digital library developers
worldwide.
The British Library“The world’s knowledge.” Includes “Services fro library and
information Professionals.”
Los Angeles Public Library Kids’ Pathresources for children; search through directory
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 35/42
© Tefko Saracevic 35
sample …
New Zealand Digital Librarysearching of a number of digital collections, including
humanity development library
Research Library Group“RLG is a not-for-profit organization of over 150 research
libraries, archives, museums, and other cultural memoryinstitutions.” Includes links to a number of searchablecollections
Public Library of Science“PLoS is a nonprofit organization of scientists and physicians
committed to making the world's scientific and medicalliterature a public resource.” Publishes open access journals
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 36/42
© Tefko Saracevic 36
Rutgers libraries – digital components
• strategic planning in developing digitalaccess
• rich & complex content of digitalresourcesseveral hundred indexes & databases for
searching
some 20,000 electronic journals
thousand & more digital reference sources
subject research guides
Searchpath & other tutorials
electronic reserve
• affected teachin , learnin , research b
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 37/42
© Tefko Saracevic 37
some critical issues for
searching• no way yet to do federated searching
in digital libraries
to search several indexes at the same timeeach source has to be searched separately
most have very different search features,capabilities
• finding items in indexes does not meanthat always able to get full text
• thus, searching time-consuming,chaotic
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 38/42
© Tefko Saracevic 38
where to find out?
• information about digital librariesLibWeb U California, Berkeley“lists currently over 7200 pages from libraries in over 125
countries”Digital Library Federation“a consortium of libraries and related agencies that are
pioneering the use of electronic-information technologies
to extend their collections and services”
D-Lib Magazine“a solely electronic publication with a primary focus on
digital library research and development, including butnot limited to new technologies, applications, andcontextual social and economic issues”
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 39/42
© Tefko Saracevic 39
where? …
Ariadne (UK)“to report on information service developments and
information networking issues worldwide, keepingthe busy practitioner abreast of current digital library
initiatives” Information Technology and Libraries
ALA publication; “related to all aspects of libraries andinformation technology, including digital libraries”
Journal of Digital Information
“Publishing papers on the management, presentationand uses of information in digital environments”
Biblio Tech Review“Information Technology for Libraries” – monthly news
and review magazine
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 40/42
© Tefko Saracevic 40
in conclusion
• search engines are great but you haveto KNOW what is under the hoodas to coverage, business model, search
features, outputs … they are NOT for every kind of information
need
• digital libraries are great for searching
but you have to KNOW requirementsfor searching different resources thatare included there is no federated searching as yet, or
for the time to come
8/14/2019 Search engines and digital libraries
http://slidepdf.com/reader/full/search-engines-and-digital-libraries 41/42
© Tefko Saracevic 41
art of searching digital
libraries