Biddle, K.T., &nd N., Strike-Slip Basin Fornatiol1, Clnd tioT!. Spec ...
Presented by K.T. Lam Head of Library Systems
description
Transcript of Presented by K.T. Lam Head of Library Systems
9 December 20049 December 2004
International Conference on Developing Digital Institutional International Conference on Developing Digital Institutional Repositories: Experiences and ChallengesRepositories: Experiences and Challenges
December 9-10, 2004, Hong KongDecember 9-10, 2004, Hong Kong
DSpace in ActionDSpace in ActionImplementing theImplementing the
HKUST Institutional Repository SystemHKUST Institutional Repository System
Presented by K.T. LamPresented by K.T. Lam
Head of Library SystemsHead of Library Systems
The Hong Kong University of Science and Technology LibraryThe Hong Kong University of Science and Technology Library
[email protected]@ust.hk
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 22
Table of ContentsTable of Contents
From Idea to CreationFrom Idea to Creation Why have an IR?Why have an IR? IR Software SelectionIR Software Selection
Major FeaturesMajor Features Future ImprovementsFuture Improvements ConclusionsConclusions
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 33
From Idea to CreationFrom Idea to Creation
The idea of establishing an IR originated from a The idea of establishing an IR originated from a staff development workshopstaff development workshop at HKUST Library at HKUST Library on 26 November 2002, where Kimberly Douglas on 26 November 2002, where Kimberly Douglas was invited to speak on “was invited to speak on “E-prints, OAI and E-prints, OAI and Institutional RepositoryInstitutional Repository”.”.
After the workshop, a After the workshop, a Task ForceTask Force was formed to was formed to investigate the idea.investigate the idea.
After two months of After two months of software evaluationsoftware evaluation, DSpace , DSpace was selected to build the Repository.was selected to build the Repository.
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 44
From Idea to Creation From Idea to Creation (cont.)(cont.)
The IR System at HKUST was brought to life in The IR System at HKUST was brought to life in February 2003February 2003, with the following configuration , with the following configuration and data content:and data content:
DSpace Version 1.01DSpace Version 1.01 Server with Intel Pentium III 733 MHz, 512 MB RAM, Server with Intel Pentium III 733 MHz, 512 MB RAM,
and RedHat Linux Release 7.3and RedHat Linux Release 7.3 105 Computer Science Technical Reports105 Computer Science Technical Reports
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 55
From Idea to Creation From Idea to Creation (cont.)(cont.)
Background / Experience Facilitating the Background / Experience Facilitating the CreationCreation HKUST Library is an early supporter of the HKUST Library is an early supporter of the Open Open
AccessAccess concept - joined concept - joined SPARCSPARC (Scholarly Publishing (Scholarly Publishing & Academic Resources Coalition) in 2001& Academic Resources Coalition) in 2001
Experience of conducting Experience of conducting digital librariesdigital libraries projects, projects, with with CJKCJK capabilities capabilities• Electronic Course Reserve - 1993Electronic Course Reserve - 1993• Digital University Archives and Electronic Theses - Digital University Archives and Electronic Theses -
19971997• etc. etc.
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 66
From Idea to Creation From Idea to Creation (cont.)(cont.)
Why have an IR?Why have an IR? To create a To create a permanent recordpermanent record of the scholarly output of the scholarly output
of HKUSTof HKUST• No available access to some scholarly works No available access to some scholarly works
published by our own facultypublished by our own faculty• Collections of working papers, technical reports, Collections of working papers, technical reports,
research reports floating aroundresearch reports floating around• Some of our scholarly works are in the public Some of our scholarly works are in the public
domaindomain
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 77
From Idea to Creation From Idea to Creation (cont.)(cont.)
Why have an IR? Why have an IR? (cont.)(cont.)
To make HKUST’s scholarly output more To make HKUST’s scholarly output more globally and globally and openly accessibleopenly accessible
To support the international To support the international Open AccessOpen Access effort. effort.
““[T]he mission of disseminating knowledge is only half [T]he mission of disseminating knowledge is only half complete if it is not widely and readily available to complete if it is not widely and readily available to society” - Berlin Declaration society” - Berlin Declaration (http://www.zim.mpg.de/openaccess-berlin/berlindeclaration.html(http://www.zim.mpg.de/openaccess-berlin/berlindeclaration.html))
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 88
From Idea to Creation From Idea to Creation (cont.)(cont.)
IR Software SelectionIR Software Selection The July/August 2004 issue of The July/August 2004 issue of Library Technology Library Technology
ReportsReports provides a very detailed discussion on provides a very detailed discussion on institutional repository systems and functional institutional repository systems and functional requirementsrequirements
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 99
From Idea to Creation From Idea to Creation (cont.)(cont.)
IR Software Selection (IR Software Selection (cont.cont.)) Decision in the first meeting of the IR Task Force in Decision in the first meeting of the IR Task Force in
mid December 2002:mid December 2002:• follow Caltech's model, i.e. to base our IR on follow Caltech's model, i.e. to base our IR on open open
source softwaresource software and with and with OAI-PMHOAI-PMH interface. interface. We therefore evaluated two IR systems: We therefore evaluated two IR systems: EPrintsEPrints and and
DSpaceDSpace
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 1100
From Idea to Creation From Idea to Creation (cont.)(cont.)
IR Software Selection IR Software Selection (cont.)(cont.)
EPrintsEPrints• Developed by University of SouthamptonDeveloped by University of Southampton• The very first open source IR software; since 2000The very first open source IR software; since 2000• Written in Perl, with MySQL database and Apache Written in Perl, with MySQL database and Apache
Web serverWeb server
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 1111
From Idea to Creation From Idea to Creation (cont.)(cont.)
IR Software Selection IR Software Selection (cont.)(cont.)
DSpaceDSpace• Jointly developed by MIT Libraries and Hewlett-Jointly developed by MIT Libraries and Hewlett-
Packard CompanyPackard Company• Open source softwareOpen source software• Released on Sourceforge during our system Released on Sourceforge during our system
evaluation period in late December 2002evaluation period in late December 2002• Written in Java, with PostgreSQL database, Written in Java, with PostgreSQL database,
Lucene search engine, and a Tomcat web servlet Lucene search engine, and a Tomcat web servlet containercontainer
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 1122
From Idea to Creation From Idea to Creation (cont.)(cont.)
IR Software Selection IR Software Selection (cont.)(cont.)
We chose (almost two years ago) DSpace because:We chose (almost two years ago) DSpace because:• DSpace began the development with the DSpace began the development with the
experience gained from EPrints - the very first and experience gained from EPrints - the very first and most popular open source IR software at that timemost popular open source IR software at that time
• EPrints did not have full support on Unicode and is EPrints did not have full support on Unicode and is not Java- and servlet-basednot Java- and servlet-based
• Both EPrints and DSpace are open source Both EPrints and DSpace are open source software, fulfill our functional requirements, and software, fulfill our functional requirements, and follow state-of-the-art library standardsfollow state-of-the-art library standards
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 1133
Current Configuration of IR at HKUSTCurrent Configuration of IR at HKUST
As of 4 December 2004,As of 4 December 2004,
Home URL: Home URL: http://repository.http://repository.ustust..hkhk//
IR Software:IR Software: DSpace Version 1.2DSpace Version 1.2
System Software:System Software: Fedora Core 2 Linux; Tomcat 5.0;Fedora Core 2 Linux; Tomcat 5.0;
JDK1.4.2JDK1.4.2
Server:Server: Intel Pentium 4 2.4GHz, 1GB RAMIntel Pentium 4 2.4GHz, 1GB RAM
Content:Content: 1650 documents from 38 Departments1650 documents from 38 Departments
Usages:Usages: Documents were accessed Documents were accessed 9,051 times in the previous month9,051 times in the previous month
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 1144
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 1155
Growth (May 2003 to September 2004)
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 1166
Major FeaturesMajor Features
This section covers the following topicsThis section covers the following topics Data structureData structure Document submission formDocument submission form Add item formAdd item form CJK supportCJK support OAI data providerOAI data provider SRW/U interfaceSRW/U interface Google pilot projectGoogle pilot project Authentication and authorizationAuthentication and authorization
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 1177
Major Features Major Features (cont.)(cont.)
Data StructureData Structure Document TypesDocument Types
• Preprints, technical reports, working papers, Preprints, technical reports, working papers, conference papers, journal articles, presentations, conference papers, journal articles, presentations, book chapters, patents, theses, etc.book chapters, patents, theses, etc.
Document FormatsDocument Formats• Mainly PDF files; also contains PowerPoint filesMainly PDF files; also contains PowerPoint files
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 1188
Major Features Major Features (cont.)(cont.)
Data Structure Data Structure (cont.)(cont.)
DSpace data modelDSpace data model• Communities (and sub-communities) Communities (and sub-communities) • CollectionsCollections• ItemsItems
MetadataMetadata Bundles of bitsteamsBundles of bitsteams
HKUST implementation: Items are grouped by HKUST implementation: Items are grouped by DepartmentsDepartments (i.e. communities) and then by (i.e. communities) and then by Document TypesDocument Types (i.e. collections). (i.e. collections).
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 1199
Community
Collections
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 2200
CNRI Handle(Persistent Identifier)
Document in PDF
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 2211
Major Features Major Features (cont.)(cont.)
Document Submission FormDocument Submission Form Faculty are Faculty are apathetic about self-submissionapathetic about self-submission DSpace’s submission and workflow functions are too DSpace’s submission and workflow functions are too
lengthy; might scare off facultylengthy; might scare off faculty In need of a In need of a simple and effortless submission formsimple and effortless submission form - -
as a quick medium for submitting documentsas a quick medium for submitting documents
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 2222
Major Features Major Features (cont.)(cont.)
Document Submission Form Document Submission Form (cont.)(cont.)
Decided to develop our own formDecided to develop our own form• Requires only very minimal data entryRequires only very minimal data entry• Non-exclusive distribution license agreementNon-exclusive distribution license agreement• Library IR staff enhance the metadata of the Library IR staff enhance the metadata of the
submissions and then add them to DSpacesubmissions and then add them to DSpace
--------------• Written in PerlWritten in Perl• Submitted data stored in DSpace “Simple Archive Submitted data stored in DSpace “Simple Archive
Format”Format”
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 2233
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 2244
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 2255
Major Features Major Features (cont.)(cont.)
Add Item FormAdd Item Form Locally developed JSP application to add items to Locally developed JSP application to add items to
DSpace by Library IR staffDSpace by Library IR staff Allows IR staff to:Allows IR staff to:
• Create new item from scratchCreate new item from scratch• Enhance the metadata from faculty submission Enhance the metadata from faculty submission
and then add the item to DSpaceand then add the item to DSpace
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 2266
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 2277
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 2288
Major Features Major Features (cont.)(cont.)
CJK (Chinese, Japanese, Korean) SupportCJK (Chinese, Japanese, Korean) Support DSpace supports UnicodeDSpace supports Unicode Problem - Lucene search engine is unable to search Problem - Lucene search engine is unable to search
by CJK charactersby CJK characters• Solved by replacing DSpace’s Tokenizer with a Solved by replacing DSpace’s Tokenizer with a
CJKTokenizerCJKTokenizer - but has an interesting side effect - but has an interesting side effect Problem - URL of query containing CJK characters is Problem - URL of query containing CJK characters is
not properly encodednot properly encoded• Solved by setting Tomcat Solved by setting Tomcat URIEncoding="UTF-8" URIEncoding="UTF-8"
and adding and adding URLEncode()URLEncode() to one line of the java to one line of the java source code source code
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 2299
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 3300
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 3311
So, ….
Sorting Problem.Can you figure out the
logic behind?
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 3322
Major Features Major Features (cont.)(cont.)
OAI Data ProviderOAI Data Provider DSpace is OAI-compliantDSpace is OAI-compliant This means that OAI harvesters can easily collect the This means that OAI harvesters can easily collect the
metadata (in metadata (in Dublin CoreDublin Core format) from various IRs format) from various IRs (including HKUST’s) for their added-value (including HKUST’s) for their added-value indexing/searching services.indexing/searching services.
For example: For example: OAIsterOAIster OAI Path to IR at HKUST:OAI Path to IR at HKUST:
http://repository.ust.hk/dspace-oai/request?http://repository.ust.hk/dspace-oai/request?
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 3333
http://repository.ust.hk/dspace-oai/request?verb=GetRecord& ... 1783.1/1805
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 3344
Major Features Major Features (cont.)(cont.)
SRW/U InterfaceSRW/U Interface Search and Retrieval for the Web (or by URL)Search and Retrieval for the Web (or by URL) Retain core functionality of Z39.50 but in the form of Retain core functionality of Z39.50 but in the form of
web servicesweb services This means search service providers can broadcast a This means search service providers can broadcast a
search to various IRs and deliver the search results in search to various IRs and deliver the search results in their own GUI interfacetheir own GUI interface
SRW/U Interface for the IR at HKUSTSRW/U Interface for the IR at HKUST• Based on OCLC’s SRW/U softwareBased on OCLC’s SRW/U software• URL: URL: http://repository.http://repository.ustust..hkhk/SRW//SRW/
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 3355
The results of a SRW/U search, with XSLT transformation
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 3366
Major Features Major Features (cont.)(cont.)
Google Pilot ProjectGoogle Pilot Project Initiated in March 2004 by the DSpace user Initiated in March 2004 by the DSpace user
community under the leadership by MacKenzie Smithcommunity under the leadership by MacKenzie Smith To improve access to DSpace IRs from within GoogleTo improve access to DSpace IRs from within Google HKUST is a participant of this projectHKUST is a participant of this project Result - created a Result - created a restrict=dspacerestrict=dspace search filter for search filter for
use in the Google URL. For example:use in the Google URL. For example:
http://www.google.com/search?http://www.google.com/search?restrict=dspacerestrict=dspace&q=collaboration&q=collaboration
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 3377
http://www.google.com/search?restrict=dspace&q=collaboration
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 3388
Major Features Major Features (cont.)(cont.)
Authentication and AuthorizationAuthentication and Authorization AuthenticationAuthentication - by EPerson record created through - by EPerson record created through
user registrationuser registration AuthorizationAuthorization - based on the policy settings on the - based on the policy settings on the
object (community, collection, item, bitstream, etc.) object (community, collection, item, bitstream, etc.) A&A are not a big concern to our IRA&A are not a big concern to our IR
• We do not use DSpace’s submission and workflow We do not use DSpace’s submission and workflow functionsfunctions
• It is open to the publicIt is open to the public• A&A only required when our library IR staff access A&A only required when our library IR staff access
DSpace’s administration functionsDSpace’s administration functions
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 3399
Major Features Major Features (cont.)(cont.)
DSpace Authentication and Authorization DSpace Authentication and Authorization (cont.)(cont.) We have however customized DSpace to allow for We have however customized DSpace to allow for
campus-wide LDAP authenticationcampus-wide LDAP authentication• Mainly for a different project that also uses DSpace Mainly for a different project that also uses DSpace
((Digital University ArchivesDigital University Archives).).• Transparent creation of EPerson record on-the-fly Transparent creation of EPerson record on-the-fly
during authenticationduring authentication We have also investigated the feasibility of hooking We have also investigated the feasibility of hooking
DSpace with Yale’s DSpace with Yale’s Central Authentication ServicesCentral Authentication Services• With only little success - due to cumbersome stage With only little success - due to cumbersome stage
transfer from authentication to authorizationtransfer from authentication to authorization
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 4400
https://archives.ust.hk/
Login to see more…
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 4411
Future ImprovementsFuture Improvements
Flatten community+collection structure - 2-level Flatten community+collection structure - 2-level only, not deep enoughonly, not deep enough
Linked collection - a collection that belongs to Linked collection - a collection that belongs to more than one communitymore than one community
Unable to search across multiple collections Unable to search across multiple collections from multiple communitiesfrom multiple communities
Query Syntax not apparent to users, e.g.Query Syntax not apparent to users, e.g.+water +rapid+water +rapid [for exact word [for exact word
match]match]
"vapor generator""vapor generator" [for phrase search][for phrase search]
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 4422
Future Improvements Future Improvements (cont.)(cont.)
Insufficient capability for sorting search resultsInsufficient capability for sorting search results Unable to display the number of items in a Unable to display the number of items in a
community and in a collectioncommunity and in a collection We have developed a JSP page to display the size of We have developed a JSP page to display the size of
the Repositorythe Repository Does not have the capability of transferring an Does not have the capability of transferring an
item from one collection to another; nor a item from one collection to another; nor a collection from one community to anothercollection from one community to another
DSpace is open source software; its success depends DSpace is open source software; its success depends on contributions from its user communityon contributions from its user community
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 4433
ConclusionsConclusions
DSpace was selected about two years ago to DSpace was selected about two years ago to build the HKUST IR.build the HKUST IR.
Make HKUST's scholarly research more openly Make HKUST's scholarly research more openly and globally accessible.and globally accessible.
Installing DSpace is straightforward, but tailoring Installing DSpace is straightforward, but tailoring it to work effectively in your institutional it to work effectively in your institutional environment is not trivial.environment is not trivial.
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 4444
Conclusions Conclusions (cont.)(cont.)
Customization:Customization: CJK support with UTF-8 encodingCJK support with UTF-8 encoding Driven by the fact that faculty are apathetic about self-Driven by the fact that faculty are apathetic about self-
submission, a simple document submission form was submission, a simple document submission form was developed.developed.
Developed the “Add Item Form” to allow IR staff to Developed the “Add Item Form” to allow IR staff to add items to DSpace without the need of batch add items to DSpace without the need of batch importingimporting
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 4455
Conclusions Conclusions (cont.)(cont.)
By having the following implementations:By having the following implementations: DSpace's built-in OAI supportDSpace's built-in OAI support OCLC's SRW/U on DSpaceOCLC's SRW/U on DSpace Google’s DSpace search filterGoogle’s DSpace search filter
documents in the Repository are more fully documents in the Repository are more fully exposed on the Internet for easy harvesting, exposed on the Internet for easy harvesting, searching and discoverysearching and discovery
Implementing the HKUST Institutional Repository System / K.T. LamImplementing the HKUST Institutional Repository System / K.T. Lam 4466
Conclusions Conclusions (cont.)(cont.)
Finally, many many thanks to the DSpace team Finally, many many thanks to the DSpace team from MIT and HP for developing this high quality from MIT and HP for developing this high quality open source product!open source product!
Thank you!
謝 謝 !