Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY...
-
Upload
aubrey-moody -
Category
Documents
-
view
216 -
download
2
Transcript of Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY...
Finding Resources On Your Web Site
Brian KellyUK Web Focus
UKOLN
University of Bath
Bath, BA2 7AYEmail: [email protected]: http://www.ukoln.ac.uk/
UKOLN is funded by the Library and Information Commission, the Joint Information Systems Committee (JISC) of the Higher Education Funding Councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.
Aims of Talk:• Review approaches
taken by UK HE and Public Library communities to indexing web sites
• Discussion of findings• Describe future
developments
Aims of Talk:• Review approaches
taken by UK HE and Public Library communities to indexing web sites
• Discussion of findings• Describe future
developments
2
UKOLN and UK Web Focus
UKOLN:• UK Office for Library and Information Networking• Small research and advisory group based at
University of Bath• Funded by JISC and LIC (MLAC from 1 April) to
advise Higher Education and Library (and Museums & Archives from 1 April) communities on digital networking issues
UK Web Focus:• JISC-funded post to advise HE community on web
matters
3
Contents
• Background• A Survey of Two Communities• Comparisons• Interesting Examples• Other Developments• Conclusions
4
Importance of Indexing
Design and browsing tends to be given priority
But:• Users will search as well as browse• Users may not understand navigation structure /
metaphors which are obvious to members of organisation
• Searching becomes more important as web site grows
5
Which To Choose?• Alkaline (Vestris) • AltaVista - Search Intranet • ASTAWare SearchKey • atomz Search (remote) • BooleanSearch • BBDBot • BRS/Search (Dataware) • Compass Server (Netscape)
• Cybotics • DataWare BRS/Search • DocFather (formerly
SiteSearch) • dtSearch Web • Excalibur RetrievalWare • EWS (Excite) • Excerpt (Obsolete) • Extense • FAST Search Server
• Findex (code library)
• Folio siteDirector
• FreeFind (remote)
• Fulcrum
• Glimpse
• Harvest
• ht://Dig
• ICE
• iHound (ICATT)
• Index Search (Xavatoria)
• Index Server (Microsoft)
• IndexMySite (remote)
• Infoseek - Ultraseek
• Intermediate Search
• intraSearch (remote)
• I-Search
• Isearch
• ITMS
• Isys:web
• Java Applets
• JHLSearch
• JObjects QuestAgent
• Lycos / InMagic
• Magnifi Enterprise Server
• Matt's SimpleSearch
• Microsoft Index Server
• Microsoft Site Server
• MiniSearch (remote)
• MondoSearch
• Muscat
• NetResults (now SearchKey Plus)
• Netscape - Compass Server
• OpenText - LiveLink
• Perl Scripts
• Perlfect Search
• Phantom (Maxum)
• PicoSearch (remote)
• Etc.
Indexing software from <http://searchtools.com/tools/tools.html>Which to choose? What software may be obsolete? What does remote mean?
Indexing software from <http://searchtools.com/tools/tools.html>Which to choose? What software may be obsolete? What does remote mean?
Can choose byreading reviews, web
sites, etc. or by looking at usage in
community
6
Two Surveys
Two surveys have been carried out:• Summer 1999: a survey of search engines
used on institutional UK University web sites (updated recently)
• January 2000: a survey of search engines used on UK Public Library web sites
7
Characteristics of HE Community
The UK Higher Education community:• Long-standing involvement in Internet and Web• Much technical expertise available (e.g. PhD
students)• Early involvement in web by enthusiasts• Initially little finance available, so interest in public
domain and open source software• More financial resources becoming available as
senior managers become aware of strategic importance of Web
8
Findings: UK HE Web Sites
Main findings of two surveys:
0
10
20
30
40
50
60
Nos.
ht://Dig
eXcite
M icrosoft
Harvest
Ultraseek
Other
None
Software Nos. (Jul)
ht://DigeXciteMicrosoftHarvestUltraseekOtherNone
Nos. (Mar)
25191287
2960
32171568
2951
—
160 163Totals
• Article published in Ariadne issue 21 - <http://www.ariadne.ac.uk/issue21/webwatch/>
• Results (including update on survey) available from:<http://www.ukoln.ac.uk/web-focus/surveys/uk-he-search-engines/>
• Article published in Ariadne issue 21 - <http://www.ariadne.ac.uk/issue21/webwatch/>
• Results (including update on survey) available from:<http://www.ukoln.ac.uk/web-focus/surveys/uk-he-search-engines/>
9
Popular Products: ht://Dig
ht://Dig• Now used at 32 (up from
25) UK HEIs• Freely available• New version released in
December 1999• Own domain with well-
designed web site• Robot to index multiple
servers
See <http://www.htdig.org/>See <http://www.htdig.org/>
Oxford Case Study131 servers
438,500 resources
Indexes MS Office, PDF, etc. files (external parser)
Oxford Case Study131 servers
438,500 resources
Indexes MS Office, PDF, etc. files (external parser)
Case Studies produced by Helen Sargan (Cambridge)
Case Studies produced by Helen Sargan (Cambridge)
10
Popular Products: eXcite
eXcite• Now used at 17 (down
from 19) UK HEIs • By-product of the eXcite
Internet search engine• Bug announced in
January 1998. Notice not updated since!
Time to change?
See <http://www.excite.com/navigate/>
See <http://www.excite.com/navigate/>
11
Popular Products: MicrosoftMicrosoft
• Several Microsoft indexing tools available (FrontPage, Index Server, SiteServer, …)
• Most powerful is the SiteServer indexer
• Now used at 15 (up from 12) UK HEIs
Essex Case Study16 servers indexed
11,500 resources
Constrained searches possible
Indexes MS Office, PDF, etc. files
Essex Case Study16 servers indexed
11,500 resources
Constrained searches possible
Indexes MS Office, PDF, etc. files
12
Popular Products: Ultraseek
Ultraseek:• Used at 8 (up from 7)
UK HEIs• Powerful but
expensive• See <http://software.infoseek.com/>
Cambridge Case Study232 servers
188,000 resources
Weightings given to meta tags
Useful logs and reports
Cambridge Case Study232 servers
188,000 resources
Weightings given to meta tags
Useful logs and reports
13
Popular Products: Harvest
Harvest:• Now used at 6 UK HEIs (down from 8)• For IR research use?• See <http://www.tardis.ed.ac.uk/harvest/>
14
Other Popular Products
SWISH / SWISH-E• Used at 5 HEIs• Dated?
Webinator• Used at 4 HEIs• Useful functionality• See
<http://www.thunderstone.com/webinator/>
Output from SWISHOutput from SWISH
Output from WebinatorOutput from Webinator
15
Use of Third Party Services
Small usage of third parties to provide indexes:
FreeFind (Used at 2 HEIs) and AltaVista (Used at 1 HEI)
Why not more use by 50+ institutions with no search facility?
Benefits from services provided by popular large-scale search engine
Low cost (free?) Incomplete coverage? Loss of control, advertising, …
Benefits from services provided by popular large-scale search engine
Low cost (free?) Incomplete coverage? Loss of control, advertising, …
16
Characteristics of Public Library CommunityPublic Library Community:
• Relatively new to Internet and Web• Less technical expertise available• Large OPACs available• Often part of Council's web site
Note: "Well Connected: A Snapshot of Local Authority Websites" (Society of Information Technology Management report) found that in 1999 69% of local authority websites did not have a search facility
Note: "Well Connected: A Snapshot of Local Authority Websites" (Society of Information Technology Management report) found that in 1999 69% of local authority websites did not have a search facility
17
ResultsSurvey carried out on 4-5th January 2000
Results for 137 web sites:• 49% have no search facility?!• Of those that do:
45% (18) use Microsoft
7.5% (3) use Domino
7.5% (3) use Muscat
40% (16) another solution
Comments• Some sites use the general Council search facility and in some sites
the Council search facility can be used to search areas (e.g. Library)• Some sites very small (1 page with opening hours)• See <http://www.ukoln.ac.uk/web-focus/surveys/pub-lib-search-jan-2000/survey.html>
Comments• Some sites use the general Council search facility and in some sites
the Council search facility can be used to search areas (e.g. Library)• Some sites very small (1 page with opening hours)• See <http://www.ukoln.ac.uk/web-focus/surveys/pub-lib-search-jan-2000/survey.html>
0
10
20
30
40
50
60
70
80
90
100
NoneMicrosoftMuscatDominoht://DigeXciteOther
18
Popular Products: Microsoft
Microsoft:• Several Microsoft options
available• Used in 18 public libraries• Sometimes can
restrict searches to selected areas
• Popularity indicativeof use of Windows NTin public libraries
19
Popular Products: Muscat
Muscat Empower:• Powerful licensed
product• Agent technology• Email alerting of
changed resources• Foreign language
support• Used in 2 Public
Libraries (full Council web site only)
• Muscat FX also used(1 site)
• See <http://www.muscat.com/>
20
Popular Products: Domino
Lotus Domino (Notes):• Powerful, licensed web server system• Used at 3 Public Libraries• See <http://www.lotus.com/home.nsf/welcome/domino>
21
Home-Grown Solution
A small number of Public Libraries have developed their own indexing software. Leeds Public Library have a good example:
• Various areas can be searched
• Multiple search terms• Boolean operators• Attractive interface
Software:• Written in C++• Interrogates file when
they are live• Directories can be
excluded • Operational for 3 years
22
Try Them For Yourself
• Interfaces to UK University search engines are available providing a single location for evaluation
• The page also provides a link toorganisational search pages
• The resources are grouped in alphabetical orderand by search engine
What functionality do libraries using Domino provide?
What functionality do libraries using Domino provide?
What does Aberdeen's search facility provide?
What does Aberdeen's search facility provide?
See <http://www.ukoln.ac.uk/web-focus/surveys/>See <http://www.ukoln.ac.uk/web-focus/surveys/>
23
Other Developments
What else is happening to indexing of these communities?
• eLib Hybrid Libraries• National search engines• Local initiatives
24
eLib Hybrid LibrarieseLib Phase 3 includes "Hybrid Library" projects:
• Help users find electronic (web, OPAC, etc.) and "real world" resources
• Includes regional and subject-specific approaches
MusicOnline search of Music Catalogues
MusicOnline search of Music Catalogues
BUILDER search of eLib Phase 3 web sites
BUILDER search of eLib Phase 3 web sites
25
National Search EnginesACDC (Academic Directory)
• (Unfunded) pilot of index of ac.uk domain based on distributed approach using Harvest
• Set up in March 1996• Lack of development effort
resulted in degraded service (e.g. indexer not aware of JavaScript code)
• No longer being developed?
http://acdc.hensa.ac.uk/http://acdc.hensa.ac.uk/
26
Institutional DevelopmentsMaestro robot (Dundee):
• Indexes Scottish resources• Volunteer effort
Maestro robot (Dundee):• Indexes Scottish resources• Volunteer effort
North East Universities (UNIS4NE):• Appearance of cross-searching• Actually interface to HotBot / AltaVista
North East Universities (UNIS4NE):• Appearance of cross-searching• Actually interface to HotBot / AltaVista
27
Other Possibilities
What other developments may we expect:• Increased indexing in institutions of other web
sites (opposition / friends)• Development of a HE (or public sector?) national
search engine• "Surface-scraping" of institutional search engines• Leave it to commercial sector• European developments• New developments (XML / RDF / etc.)
28
Indexing Remote SitesMay see increased indexing of remote sites within institutions: Examples provided by Dundee and BUILDER
Feeling of ownership Easily done
Can develop enhancements locally Increased server load locally Increased server load remotely Increased network load Not scalable Unnecessary duplication
29
"Meta-Search" PossibilityA collection of interfaces to search engines for UK HEIs is available
This could be used as the basis of a "meta-searcher":
Indexes aren't duplicated
Local site responsible for content of its index
A hack
Problems with maintenance
30
Commercial Solutions
Could leave searching to commercial world: No costs to institution / HE community Results too broad Distracting interface Little scope for tailoring Not integrated with
non-Web services
31
European Developments (1)DESIRE project:
• EU-funded project with resource discovery component• Nordic Web Index provides index across Nordic countries (but
partly discontinued due to lack of funding)• See <http://www.desire.org/html/services/resourcediscovery/indexing/>
REIS:• Pilot project on Research
& Education Indexing Service for Europe
• See <http://www.terena.nl/projects/reis/>
32
European Developments (2)
Surfnet:• Dutch Research
network service• Use of AltaVista
search software for national index
• But how widely used is it?
• Is there a user demand for this type of service?
http://www.surfnet.nl/en/surfnet-searchtools/http://www.surfnet.nl/en/surfnet-searchtools/
33
What About Metadata?
Metadata can:• Improve search results• Provide structured information (for automated
processing) which can provide richer services:– Fielded searches– Limit searches (e.g. only Library pages on
Council web site) – Web site administration– Alternative browsing interfaces
Tools, standards, etc. becoming available
Expected growth area
34
Example Exploit Interactive web magazine (www.exploit-lib.org) is using metadata to provide enhanced searching: Search for foo in:
• Issue 2 or in issue 2 and 4 (this is possible using directory structure)
• Feature Articles(needs metadata)
• Articles about EU-funded projects
• Etc.• Combinations of
aboveAlso provides alternative browsingstructures
35
JISC DevelopmentsDNER (Distributed National Electronic Resource):
• Seamless access to national resources• What about local resources?• Need for "institutional portals"
RDN• Resource Discovery
Network• Builds on work of
eLib subject gateways• Based on standards
(Z39.50, whois++, LDAP, etc.)
• Lessons for institutions
36
Conclusions
To conclude:• No clear "best buy" for indexing software• Probably some to avoid• In 2 years time are you likely to:
– Still be using same software?– Have changed software / architecture?
• If changes likely, need to think about change migration strategies, interoperability issues, etc.
• Need for user studies (not covered)
Useful Resourceshttp://SearchTools.com/http://www.searchenginewatch.com/http://www.builder.com/Servers/AddSearch/
Useful Resourceshttp://SearchTools.com/http://www.searchenginewatch.com/http://www.builder.com/Servers/AddSearch/
Questions welcome