Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY...

36
Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: [email protected] URL: http://www.ukoln.ac.uk/ UKOLN is funded by the Library and Information Commission, the Joint Information Systems Committee (JISC) of the Higher Education Funding Councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath Aims of Talk: •Review approaches taken by UK HE and Public Library communities to indexing web sites •Discussion of findings •Describe future developments

Transcript of Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY...

Page 1: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

Finding Resources On Your Web Site

Brian KellyUK Web Focus

UKOLN

University of Bath

Bath, BA2 7AYEmail: [email protected]: http://www.ukoln.ac.uk/

UKOLN is funded by the Library and Information Commission, the Joint Information Systems Committee (JISC) of the Higher Education Funding Councils, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath where it is based.

Aims of Talk:• Review approaches

taken by UK HE and Public Library communities to indexing web sites

• Discussion of findings• Describe future

developments

Aims of Talk:• Review approaches

taken by UK HE and Public Library communities to indexing web sites

• Discussion of findings• Describe future

developments

Page 2: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

2

UKOLN and UK Web Focus

UKOLN:• UK Office for Library and Information Networking• Small research and advisory group based at

University of Bath• Funded by JISC and LIC (MLAC from 1 April) to

advise Higher Education and Library (and Museums & Archives from 1 April) communities on digital networking issues

UK Web Focus:• JISC-funded post to advise HE community on web

matters

Page 3: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

3

Contents

• Background• A Survey of Two Communities• Comparisons• Interesting Examples• Other Developments• Conclusions

Page 4: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

4

Importance of Indexing

Design and browsing tends to be given priority

But:• Users will search as well as browse• Users may not understand navigation structure /

metaphors which are obvious to members of organisation

• Searching becomes more important as web site grows

Page 5: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

5

Which To Choose?• Alkaline (Vestris) • AltaVista - Search Intranet • ASTAWare SearchKey • atomz Search (remote) • BooleanSearch • BBDBot • BRS/Search (Dataware)  • Compass Server (Netscape)

• Cybotics • DataWare BRS/Search  • DocFather (formerly

SiteSearch) • dtSearch Web • Excalibur RetrievalWare • EWS (Excite) • Excerpt (Obsolete) • Extense • FAST Search Server

• Findex (code library)

• Folio siteDirector

• FreeFind (remote)

• Fulcrum

• Glimpse

• Harvest

• ht://Dig

• ICE

• iHound (ICATT)

• Index Search (Xavatoria)

• Index Server (Microsoft)

• IndexMySite (remote)

• Infoseek - Ultraseek

• Intermediate Search

• intraSearch (remote)

• I-Search

• Isearch

• ITMS

• Isys:web

• Java Applets

• JHLSearch

• JObjects QuestAgent

• Lycos / InMagic

• Magnifi Enterprise Server

• Matt's SimpleSearch

• Microsoft Index Server

• Microsoft Site Server

• MiniSearch (remote)

• MondoSearch

• Muscat

• NetResults (now SearchKey Plus)

• Netscape - Compass Server

• OpenText - LiveLink

• Perl Scripts

• Perlfect Search

• Phantom (Maxum)

• PicoSearch (remote)

• Etc.

Indexing software from <http://searchtools.com/tools/tools.html>Which to choose? What software may be obsolete? What does remote mean?

Indexing software from <http://searchtools.com/tools/tools.html>Which to choose? What software may be obsolete? What does remote mean?

Can choose byreading reviews, web

sites, etc. or by looking at usage in

community

Page 6: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

6

Two Surveys

Two surveys have been carried out:• Summer 1999: a survey of search engines

used on institutional UK University web sites (updated recently)

• January 2000: a survey of search engines used on UK Public Library web sites

Page 7: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

7

Characteristics of HE Community

The UK Higher Education community:• Long-standing involvement in Internet and Web• Much technical expertise available (e.g. PhD

students)• Early involvement in web by enthusiasts• Initially little finance available, so interest in public

domain and open source software• More financial resources becoming available as

senior managers become aware of strategic importance of Web

Page 8: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

8

Findings: UK HE Web Sites

Main findings of two surveys:

0

10

20

30

40

50

60

Nos.

ht://Dig

eXcite

M icrosoft

Harvest

Ultraseek

Other

None

Software Nos. (Jul)

ht://DigeXciteMicrosoftHarvestUltraseekOtherNone

Nos. (Mar)

25191287

2960

32171568

2951

160 163Totals

• Article published in Ariadne issue 21 - <http://www.ariadne.ac.uk/issue21/webwatch/>

• Results (including update on survey) available from:<http://www.ukoln.ac.uk/web-focus/surveys/uk-he-search-engines/>

• Article published in Ariadne issue 21 - <http://www.ariadne.ac.uk/issue21/webwatch/>

• Results (including update on survey) available from:<http://www.ukoln.ac.uk/web-focus/surveys/uk-he-search-engines/>

Page 9: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

9

Popular Products: ht://Dig

ht://Dig• Now used at 32 (up from

25) UK HEIs• Freely available• New version released in

December 1999• Own domain with well-

designed web site• Robot to index multiple

servers

See <http://www.htdig.org/>See <http://www.htdig.org/>

Oxford Case Study131 servers

438,500 resources

Indexes MS Office, PDF, etc. files (external parser)

Oxford Case Study131 servers

438,500 resources

Indexes MS Office, PDF, etc. files (external parser)

Case Studies produced by Helen Sargan (Cambridge)

Case Studies produced by Helen Sargan (Cambridge)

Page 10: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

10

Popular Products: eXcite

eXcite• Now used at 17 (down

from 19) UK HEIs • By-product of the eXcite

Internet search engine• Bug announced in

January 1998. Notice not updated since!

Time to change?

See <http://www.excite.com/navigate/>

See <http://www.excite.com/navigate/>

Page 11: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

11

Popular Products: MicrosoftMicrosoft

• Several Microsoft indexing tools available (FrontPage, Index Server, SiteServer, …)

• Most powerful is the SiteServer indexer

• Now used at 15 (up from 12) UK HEIs

Essex Case Study16 servers indexed

11,500 resources

Constrained searches possible

Indexes MS Office, PDF, etc. files

Essex Case Study16 servers indexed

11,500 resources

Constrained searches possible

Indexes MS Office, PDF, etc. files

Page 12: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

12

Popular Products: Ultraseek

Ultraseek:• Used at 8 (up from 7)

UK HEIs• Powerful but

expensive• See <http://software.infoseek.com/>

Cambridge Case Study232 servers

188,000 resources

Weightings given to meta tags

Useful logs and reports

Cambridge Case Study232 servers

188,000 resources

Weightings given to meta tags

Useful logs and reports

Page 13: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

13

Popular Products: Harvest

Harvest:• Now used at 6 UK HEIs (down from 8)• For IR research use?• See <http://www.tardis.ed.ac.uk/harvest/>

Page 14: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

14

Other Popular Products

SWISH / SWISH-E• Used at 5 HEIs• Dated?

Webinator• Used at 4 HEIs• Useful functionality• See

<http://www.thunderstone.com/webinator/>

Output from SWISHOutput from SWISH

Output from WebinatorOutput from Webinator

Page 15: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

15

Use of Third Party Services

Small usage of third parties to provide indexes:

FreeFind (Used at 2 HEIs) and AltaVista (Used at 1 HEI)

Why not more use by 50+ institutions with no search facility?

Benefits from services provided by popular large-scale search engine

Low cost (free?) Incomplete coverage? Loss of control, advertising, …

Benefits from services provided by popular large-scale search engine

Low cost (free?) Incomplete coverage? Loss of control, advertising, …

Page 16: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

16

Characteristics of Public Library CommunityPublic Library Community:

• Relatively new to Internet and Web• Less technical expertise available• Large OPACs available• Often part of Council's web site

Note: "Well Connected: A Snapshot of Local Authority Websites" (Society of Information Technology Management report) found that in 1999 69% of local authority websites did not have a search facility

Note: "Well Connected: A Snapshot of Local Authority Websites" (Society of Information Technology Management report) found that in 1999 69% of local authority websites did not have a search facility

Page 17: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

17

ResultsSurvey carried out on 4-5th January 2000

Results for 137 web sites:• 49% have no search facility?!• Of those that do:

45% (18) use Microsoft

7.5% (3) use Domino

7.5% (3) use Muscat

40% (16) another solution

Comments• Some sites use the general Council search facility and in some sites

the Council search facility can be used to search areas (e.g. Library)• Some sites very small (1 page with opening hours)• See <http://www.ukoln.ac.uk/web-focus/surveys/pub-lib-search-jan-2000/survey.html>

Comments• Some sites use the general Council search facility and in some sites

the Council search facility can be used to search areas (e.g. Library)• Some sites very small (1 page with opening hours)• See <http://www.ukoln.ac.uk/web-focus/surveys/pub-lib-search-jan-2000/survey.html>

0

10

20

30

40

50

60

70

80

90

100

NoneMicrosoftMuscatDominoht://DigeXciteOther

Page 18: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

18

Popular Products: Microsoft

Microsoft:• Several Microsoft options

available• Used in 18 public libraries• Sometimes can

restrict searches to selected areas

• Popularity indicativeof use of Windows NTin public libraries

Page 19: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

19

Popular Products: Muscat

Muscat Empower:• Powerful licensed

product• Agent technology• Email alerting of

changed resources• Foreign language

support• Used in 2 Public

Libraries (full Council web site only)

• Muscat FX also used(1 site)

• See <http://www.muscat.com/>

Page 20: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

20

Popular Products: Domino

Lotus Domino (Notes):• Powerful, licensed web server system• Used at 3 Public Libraries• See <http://www.lotus.com/home.nsf/welcome/domino>

Page 21: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

21

Home-Grown Solution

A small number of Public Libraries have developed their own indexing software. Leeds Public Library have a good example:

• Various areas can be searched

• Multiple search terms• Boolean operators• Attractive interface

Software:• Written in C++• Interrogates file when

they are live• Directories can be

excluded • Operational for 3 years

Page 22: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

22

Try Them For Yourself

• Interfaces to UK University search engines are available providing a single location for evaluation

• The page also provides a link toorganisational search pages

• The resources are grouped in alphabetical orderand by search engine

What functionality do libraries using Domino provide?

What functionality do libraries using Domino provide?

What does Aberdeen's search facility provide?

What does Aberdeen's search facility provide?

See <http://www.ukoln.ac.uk/web-focus/surveys/>See <http://www.ukoln.ac.uk/web-focus/surveys/>

Page 23: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

23

Other Developments

What else is happening to indexing of these communities?

• eLib Hybrid Libraries• National search engines• Local initiatives

Page 24: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

24

eLib Hybrid LibrarieseLib Phase 3 includes "Hybrid Library" projects:

• Help users find electronic (web, OPAC, etc.) and "real world" resources

• Includes regional and subject-specific approaches

MusicOnline search of Music Catalogues

MusicOnline search of Music Catalogues

BUILDER search of eLib Phase 3 web sites

BUILDER search of eLib Phase 3 web sites

Page 25: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

25

National Search EnginesACDC (Academic Directory)

• (Unfunded) pilot of index of ac.uk domain based on distributed approach using Harvest

• Set up in March 1996• Lack of development effort

resulted in degraded service (e.g. indexer not aware of JavaScript code)

• No longer being developed?

http://acdc.hensa.ac.uk/http://acdc.hensa.ac.uk/

Page 26: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

26

Institutional DevelopmentsMaestro robot (Dundee):

• Indexes Scottish resources• Volunteer effort

Maestro robot (Dundee):• Indexes Scottish resources• Volunteer effort

North East Universities (UNIS4NE):• Appearance of cross-searching• Actually interface to HotBot / AltaVista

North East Universities (UNIS4NE):• Appearance of cross-searching• Actually interface to HotBot / AltaVista

Page 27: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

27

Other Possibilities

What other developments may we expect:• Increased indexing in institutions of other web

sites (opposition / friends)• Development of a HE (or public sector?) national

search engine• "Surface-scraping" of institutional search engines• Leave it to commercial sector• European developments• New developments (XML / RDF / etc.)

Page 28: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

28

Indexing Remote SitesMay see increased indexing of remote sites within institutions: Examples provided by Dundee and BUILDER

Feeling of ownership Easily done

Can develop enhancements locally Increased server load locally Increased server load remotely Increased network load Not scalable Unnecessary duplication

Page 29: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

29

"Meta-Search" PossibilityA collection of interfaces to search engines for UK HEIs is available

This could be used as the basis of a "meta-searcher":

Indexes aren't duplicated

Local site responsible for content of its index

A hack

Problems with maintenance

Page 30: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

30

Commercial Solutions

Could leave searching to commercial world: No costs to institution / HE community Results too broad Distracting interface Little scope for tailoring Not integrated with

non-Web services

Page 31: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

31

European Developments (1)DESIRE project:

• EU-funded project with resource discovery component• Nordic Web Index provides index across Nordic countries (but

partly discontinued due to lack of funding)• See <http://www.desire.org/html/services/resourcediscovery/indexing/>

REIS:• Pilot project on Research

& Education Indexing Service for Europe

• See <http://www.terena.nl/projects/reis/>

Page 32: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

32

European Developments (2)

Surfnet:• Dutch Research

network service• Use of AltaVista

search software for national index

• But how widely used is it?

• Is there a user demand for this type of service?

http://www.surfnet.nl/en/surfnet-searchtools/http://www.surfnet.nl/en/surfnet-searchtools/

Page 33: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

33

What About Metadata?

Metadata can:• Improve search results• Provide structured information (for automated

processing) which can provide richer services:– Fielded searches– Limit searches (e.g. only Library pages on

Council web site) – Web site administration– Alternative browsing interfaces

Tools, standards, etc. becoming available

Expected growth area

Page 34: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

34

Example Exploit Interactive web magazine (www.exploit-lib.org) is using metadata to provide enhanced searching: Search for foo in:

• Issue 2 or in issue 2 and 4 (this is possible using directory structure)

• Feature Articles(needs metadata)

• Articles about EU-funded projects

• Etc.• Combinations of

aboveAlso provides alternative browsingstructures

Page 35: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

35

JISC DevelopmentsDNER (Distributed National Electronic Resource):

• Seamless access to national resources• What about local resources?• Need for "institutional portals"

RDN• Resource Discovery

Network• Builds on work of

eLib subject gateways• Based on standards

(Z39.50, whois++, LDAP, etc.)

• Lessons for institutions

Page 36: Finding Resources On Your Web Site Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY Email: B.Kelly@ukoln.ac.uk URL:

36

Conclusions

To conclude:• No clear "best buy" for indexing software• Probably some to avoid• In 2 years time are you likely to:

– Still be using same software?– Have changed software / architecture?

• If changes likely, need to think about change migration strategies, interoperability issues, etc.

• Need for user studies (not covered)

Useful Resourceshttp://SearchTools.com/http://www.searchenginewatch.com/http://www.builder.com/Servers/AddSearch/

Useful Resourceshttp://SearchTools.com/http://www.searchenginewatch.com/http://www.builder.com/Servers/AddSearch/

Questions welcome