Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan,...

71
Enhancing Workflow Enhancing Workflow thr thr o o ugh Batch ugh Batch Import fr Import fr o o m m Excel Excel to to DSpace DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries

Transcript of Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan,...

Page 1: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Enhancing Workflow Enhancing Workflow thrthroough Batch Import frugh Batch Import froom m

ExcelExcel to to DSpaceDSpace

Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries

Page 2: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Outline

Page 3: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

“Book in hand” cataloging

Traditionally, original and copy cataloging workflow is based on “book in hand” approach and does not require batch processing of cataloging records:

Copy cataloging: Have item in hand – Search OCLC – Identify the matching copy – Import record from OCLC to a local ILS – Edit record as needed – Save to ILS Original cataloging – Have item in hand – Create record in OCLC – Import record from OCLC to a local ILS – Save to ILSDatabase maintenance – find and correct errors manually one by one

Page 4: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Batch processing in early days of library automation

Batchloading: “A process by which records to be processed are collected into batches. The records in a batch are loaded all at once”. http://www.oclc.org/support/documentation/glossary/batchprocessing/default.htm#BatchloadingOrBatchProcessing

Has been introduced in early days of automation (1980s in library literature);

The major batchload projects at WSU (80s-90s): load of OCLC records to NOTIS; migration from NOTIS to Voyager, and also load of Marcive; Tapes were used before the Internet era.

Page 5: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Batch processing today

Batchloading becomes a standard part of the cataloging workflow due to:

Increased granularity of cataloging in order to improve access to “hidden collections” (article level; separate poems from a collection, etc.); Availability of records from vendors and OCLC; Repurposing metadata for using in different systems (e.g. from IR to ILS and vice versa).

Page 6: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

WSU batchloading services

Regular batch load: Acquisitions records from jobbers (e.g. YBP, Blackwell's);Marcive files (monthly);Serials Solutions records for e-journals;LTI authority processing (weekly; quarterly; and yearly).

One time load of purchased set of records: Anti-Slavery Collection (microforms) Early English Books (microforms) net-Library (e-books)

Page 7: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Workflow for batch import of MARC records to a library catalog

Sample file is received and reviewed by the cataloging administrator (may include subject librarians);Records quality is evaluated;Decision is made on local customization (do we need it; who will do it if needed: vendor or library catalogers);Decision is made to proceed/not proceed with the purchase of records.

Page 8: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Workflow for batch import of MARC records to a library catalog (Cont.)

The cataloging administrator creates a profile (template) for test loads;Testing (sample is loaded and reviewed; may be repeated as needed);Production load (done in University Computing Center);Review and clean up (done by cataloging staff and coordinated by the cataloging administrator).

Page 9: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Workflow for batch import of spreadsheet data to DSpace

IR Librarian works with faculty to prepare data;Spreadsheet data enhancement and enrichment by the IR Librarian and the Metadata Cataloger;Install and set up the Java-based add-on to facilitate DSpace batch import procedure by the DSpace Technician and the Metadata Cataloger;Program customization, sample data testing and source data adjustment by the Metadata Cataloger with assistance of a student programmer (when needed);Generate Submission Information Packages (SIPs) for individual collections;DSpace Server upload by the DSpace Technician.

Page 10: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Typical roles in batch import workflow

Professional catalogers/metadata experts create profiles (templates) for batch import /export; TS administrator coordinates batch loading to ILS; IR manager coordinates teamwork in batch loading to DSpace.

Page 11: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Typical roles in batch import workflow (Cont.)

In MARC/ILS projects, copy catalogers perform quality control (review records for duplicates, diacritics, errors, omissions & local customization (notes, subjects, etc.)In DC/Dspace projects, metadata cataloger performs data standardization (field value format, e.g., date format, escape special characters in xml, etc.) with assistance of a student programmer.

Page 12: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

The Waconda Lake collectionDigital images of the Kansas archaeological artifacts;Source data provided by the Anthropology Dept.

Graduate Research and Scholarly Projects (GRASP) collection

Conference papers presented at the annual symposiums of GRASP. Started in 2005;The Graduate School delivers a book and all the digital files of the papers to the library;Catalog and manually add records to DSpace from 2005-2009.

The collections, team and workflow

Page 13: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

The collections, team and workflow (Cont.)

The WSU Herbarium collectionContains over 4,500 specimens, date back as far as 1895;Samples collected from Kansas, Massachusetts, Colorado, New Mexico, and Arizona;Faculty and interns in the Biological Science Department are responsible for digitization of the specimen and editing of the collected data.

WSU Libraries provide long term preservation and access to these collections through DSpace.

Collaborate with faculty, department and Graduate School;Consultation in digitization specification and department workflow;Compile, enrich, enhance and standardize source data;Batch load data to DSpace server;Additional services: creating image zoom site…

Library team: The Institutional Repository Librarian, the Metadata Cataloger and the DSpace Technician, with assistance from two WSU students

Page 14: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

An add-on to facilitate DSpace batch import procedure

Google Summer of Code Project 2008;Java-based program; Transforms data prepared in Excel to DSpace batch import format;Systems and environment (different packages created for):

Windows environment/Microsoft ExcelWindows environment/OpenOffice.org 2.4 Calc (spreadsheet)Linux environment/OpenOffice.org 2.4 Calc (spreadsheet)

Written by Blooma Mohan John, the Nanyang Technological University;It has been implemented in Nanyang (Singapore) and other universities in France, India, China and the U.S.

Page 15: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

An add-on to facilitate DSpace batch import procedure (Cont.)

What is in the Metadata Import Windows_Excel package?MetaDataImport (Java file)cEJ (sample spreadsheet)Documentation (in PDF)

Program Wiki: http://wiki.dspace.org/index.php/Google_Summer_of_Code_2008_Batch_Import

Program download: http://code.google.com/p/dspace-gsoc/downloads/list

Page 16: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Submission Information Packages

Page 17: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Algorithm of MetaDataImport

1. Start 2. Input the Main Submission Information Package (SIP) folder name 3. Create SIP folder with the name mentioned in Step 2 4. Store digital object metadata and location details in a Resultset 5. For each record in Resultset do Step 6 to Step 13 6. Create individual SIP folder 7. Create an xml file named dublin_core inside the SIP folder 8. Create a file named contents inside the SIP folder 9. Check the type of digital object 10. Add comments about digital object to dublin core file 11. Add digital object file name to contents file 12. Copy digital object from an external location to individual SIP folder13. Add metadata details to Dublin Core file14. Stop

(From http://wiki.dspace.org/index.php/Google_Summer_of_Code_2008_Batch_Import)

Page 18: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Program installation and running

Program installation and setupInstall Java SE 6 for Windows (see appendix 1);Install Java IDE and JCreator LE version for Windows (see appendix 2);Download the program at http://code.google.com/p/dspace-gsoc/downloads/list. The Metadata_Import_Windows_Excel package will be used in the WSU cases;Create a Microsoft Excel Driver (DSN) to link the Java program to the source spreadsheet (see appendix 3).

Run the program (see appendix 4)

Page 19: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Tips

What is defined in the DSN needs to match the name and location of the spreadsheet, and

What is described in the resourceLocation field of the spreadsheet needs to match the actual locations of the PDF or image files.

Page 20: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Data preparation in Excel

An example: Graduate Research and Scholarly Projects (GRASP) papers

GRASP fields: Dublin Core fields: title, contributor.author, identifier.citation, date.issued, description, description.abstract

Several common fields were added:publisher, language.iso, type, relation.ispartofseries;Values in these fields can be dragged and copied in a spreadsheet.

Naming fields in the Excel source file: contributorAuthor, dateIssued…

Page 21: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Program customization

Customize the section which includes all the metadata fields:Change field strings, output of the DC elements and qualifiers, messagese.g. String titleDC = rs.getString("title");

if (titleDC == null || titleDC.trim().length () == 0){System.out.println("You have given null value for the title Dublin

Core element of "+ archiveFolderName + "_“ + initialDocumentNo);}else {outDC.write(" <dcvalue element=\"title\" qualifier=\"none\">");outDC.write(titleDC); outDC.write("</dcvalue>\n");}

Data needs to be standardized and validated to guarantee correct display and program execution.

Page 22: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Tips

Fields in the Java program need to be correctly defined to reflect the columns in the Excel source file; All uppercases and lowercases of the metadata fields and their qualifiers (if any) need to be exactly matched;Date needs to be in a parseable format;It is important to save the source file before re-running the program.

Page 23: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Data standardization

Date (e.g. “dateIssued”):Format date to “yyyy-mm-dd” in the Java program.

Special characters: Especially shown in the paper abstract field, such as “&”, “<“, “>“, “‘“ and “““;These XML reserved characters were replaced in the Java program.

Results: By running the customized MetaDataImport program, the SIP packages were generated for DSpace batch import. Each record package includes a dublin_core xml file, a PDF file and a contents list file.

Page 24: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

DSpace specification

Dspace Version 1.4.2Java 1.4.2_17Apache Ant 1.6.5Tomcat 4.1.31Postgres 8.1.4We tested batch import in our Dspace server (version 1.4.2) and also a DSpace test instance (version 1.5.2)

Page 25: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

DSpace directories

Source Directory Layout[dspace-source]

Installed Directory Layout[dspace]

assetstore/ - asset store filesbin/ - shell and Perl scriptsconfig/ - configuration, with sub-directories as abovehandle-server/ - Handles server fileshistory/ - stored history files (generally RDF/XML)lib/ - JARs, including dspace.jar, containing the DSpace classeslog/ - Log filesreports/ - Reports generated by statistical report generatorsearch/ - Lucene search index filesupload/ - temporary directory used during file uploads etc.webapps/ - location where DSpace installs all Web Applications

JSPUI Web applicationXMLUI Web Application (for version 1.5.2)

Page 26: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Server upload

Transfer generated data to the Linux server via flash drive:e.g. [root@localhost mnt]fdisk –l

[root@localhost mnt]mount /dev/sde1 /mnt/usbflash/

Run DSpace ItemImport command:e.g. [root@localhost bin]./dsrun org.dspace.app.itemimport.ItemImport –a –e [email protected] –c 10057/2239 –s /mnt/usbflash/GRASP –m mapfileGRASP10057/2239 is the collection handle in DSpace and mapfileGRASP is the target file name;The e-person (e.g. [email protected]) should have rights to the collection (e.g. 10057/2239).

What if the process stops?Check the place where the program stopped running; Need to adjust the fields in the Java program, standardize and validate data;Make sure all fields are spelt correctly and registered in the DSpace metadata registry.

Page 27: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

New challenges of the herbarium project

DarwinCore (DwC):DwC is a biodiversity information standard designed to facilitate the exchange of information of species and specimens in collections (Biodiversity Information Standards, Taxonomic Database Working Group, 2009).

Debates:Use DC elements only, or DwC only, use mostly DC, or mostly DwC;Use DSpace-based SOAR, or “Specify”, a database system designed for museum and herbarium data processing, or both.

Due to the time constraints of the faculty, SOAR was utilized as the depository and two intern students were sought to assist in specimen digitization and source data editing.

In order to deposit data to SOAR, DC standard needs to be followed, while an additional DwC metadata registry could be added as a supplement.

Page 28: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Specimen fields and mapping

The specimen-level information: family, scientific name, common namelocality, country, state, county, elevation, latitude, longitude, habitatcollector, identifier, date collectedWSU collection number, image name

Mapped most of the elements to DC when it is appropriate:“scientific name” is mapped to DC element “title” “common name” to “title.alternative” “locality”, “country”, “state” and “county” all mapped to “coverage.spacial”“collector” to “contributor.author”“date collected” to “date.created” “WSU collection number” to “identifier”

When there seems no DC element to be mapped to, a DwC element will be added.

Page 29: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

DarwinCore registry and data preparation

A local DarwinCore registry was added to DSpace;

A few DwC elements were included:Family, habitat, verbatimElevation, verbatimLatitude and verbatimLongitudeIn order to make these fields searchable, local indexes need to be built.

The library further enriched the data by adding DC elements such as subject, rights, source, publisher, relation and relation.uri.

The locations of the access images were also specified in the spreadsheet. Only the access image will be shown on the record homepage, and the large sized herbarium image will be used as a zoomify source file in an external website;The zoomed image page will be linked back to its corresponding DSpace record through the dc.identifier.uri field.

Page 30: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

MetaDataImport modification

The MetaDataImport Java program was modified to generate the SIP packages.

Following the same steps to create the DC xml file (dublin_core.xml), additional code was added to the program to generate a separate xml file (metadata_dwc).

Page 31: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

dublin_core.xml<?xml version="1.0" encoding="iso-8859-1"?><!-- title of jpg HERBARIUM_3.jpg--><dublin_core><dcvalue element="identifier" qualifier="none">2040.0</dcvalue> <dcvalue element="title" qualifier="none">Asclepias stenophylla A. Gray</dcvalue> <dcvalue element="title" qualifier="alternative">Narrow-leaved milkweed</dcvalue> <dcvalue element="coverage" qualifier="spacial">Sedgwick Co., KS</dcvalue> <dcvalue element="coverage" qualifier="spacial">North America</dcvalue> <dcvalue element="coverage" qualifier="spacial">United States</dcvalue> <dcvalue element="coverage" qualifier="spacial">Kansas</dcvalue> <dcvalue element="contributor" qualifier="author">Raugust, Barry M.</dcvalue><dcvalue element="type" qualifier="none">image</dcvalue> <dcvalue element="date" qualifier="created">1981-06-16</dcvalue> <dcvalue element="date" qualifier="issued">2009-10-20</dcvalue><dcvalue element="subject" qualifier="none">Asclepidaceae</dcvalue><dcvalue element="subject" qualifier="none">Asclepias stenophylla A. Gray</dcvalue><dcvalue element="subject" qualifier="none">Narrow-leaved milkweed</dcvalue><dcvalue element="subject" qualifier="none">United States -- Kansas -- Sedgwick </dcvalue><dcvalue element="rights" qualifier="none">Copyright Wichita State University, 2010</dcvalue><dcvalue element="source" qualifier="none">WSU herbarium</dcvalue><dcvalue element="identifier" qualifier="uri"> http://library.wichita.edu/techserv/herbarium/browse.asp?id=2040</dcvalue> <dcvalue element="publisher" qualifier="none">Wichita State University. Dept. of Biological Sciences</dcvalue><dcvalue element="relation" qualifier="none">U.S. Dept. of Agriculture Natural Resources Conservation Service</dcvalue><dcvalue element="relation" qualifier="uri">http://plants.usda.gov/</dcvalue> </dublin_core>

Page 32: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

metadata_dwc.xml

<?xml version="1.0" encoding="iso-8859-1"?> <!-- title of jpg HERBARIUM_3.jpg--> <dublin_core schema="dwc"> <dcvalue element="family"

qualifier="none">Asclepidaceae </dcvalue> <dcvalue element="habitat" qualifier="none"> Dry,

rocky prairies, glades, ledges of bluffs. </dcvalue> <dcvalue element="verbatimElevation"

qualifier="">1400’</dcvalue> </dublin_core>

Page 33: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

A sample record (in detailed view) DC Field Value Languagedc.contributor.author Raugust, Barry M. [collector] en_USdc.contributor.author Raugust, Barry M. [cataloger] en_USdc.date.accessioned 2009-11-30T20:44:25Z -dc.date.available 2009-11-30T20:44:25Z -dc.date.created 1981-06-16 en_USdc.date.issued 2009-10-20 en_USdc.identifier 2040.0 en_USdc.identifier.uri http://library.wichita.edu/techserv/herbarium/browse.asp?id=2040 en_USdc.identifier.uri http://hdl.handle.net/10057/2337 -dc.format.extent 81447 bytes -dc.format.mimetype image/jpeg -dc.language.iso en_US -dc.publisher Wichita State University. Dept. of Biological Sciences en_USdc.relation U.S. Dept. of Agriculture Natural Resources Conservation Service en_USdc.relation.uri http://plants.usda.gov/ en_USdc.rights Copyright Wichita State University, 2010 en_USdc.source WSU herbarium en_USdc.subject Asclepidaceae en_USdc.subject Asclepias stenophylla A. Gray en_USdc.subject Narrow-leaved milkweed en_USdc.subject United States -- Kansas -- Sedgwick County en_USdc.title Asclepias stenophylla A. Gray en_USdc.title.alternative Narrow-leaved milkweed en_USdc.type image en_USdc.coverage.spacial Sedgwick Co., KS en_USdc.coverage.spacial North America en_USdc.coverage.spacial United States en_USdc.coverage.spacial Kansas en_USdwc.family Asclepidaceae -dwc.verbatimElevation 1400' -dwc.habitat Dry, rocky prairies, glades, ledges of bluffs. en_USAppears in Collections: Vascular Plants

Page 34: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

A sample record as shown in DSpace

Headeraa Header

Page 35: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

The external zoom siteHeader

Header

Page 36: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Some reflections

Data mapping and transformation challenges;Excel as source data and tool;“Mashing up” data from various sources;Add additional metadata registry to DSpace;System upgrades, program compatibility and the library’s reaction;Collaborations.

Page 37: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Conclusion

Metadata repurposing and batch processing at WSU Libraries promotes sharing and reusing of data resources and significantly improves the cataloging workflow;

While there are many challenges in each new metadata related project in a library, standards, XML technology, data processing strategies and metadata management all help pave the way to possibility and success;

Tools and programs can play an important role in metadata batch processing;

Collaboration and coordination within the library and beyond put the pieces together and make a project possible.

Page 38: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendices

Appendix 1: Install Java SE 6 for WindowsAppendix 2: Install Java IDE, JCreator LE version for Windows Appendix 3: Create DSN (Microsoft Excel Driver)Appendix 4: Run the MetaDataImport program Appendix 5: Upload SIP to DSpace Server

Page 39: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 1: Install Java SE 6 for Windows

To install Java which is the pre-requisite for the DSpace batch import program go to the link below;Install Java SE 6 for Windows

http://java.sun.com/javase/downloads/index.jspDownload this on your computer and run it.

Page 40: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 1: Install Java SE 6 for Windows

Read carefully the rules; Check the radio button on the top which says “yes” and click on “next” to go to the next step.

Page 41: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 1: Install Java SE 6 for Windows

Select the directory where you want to install Java, remember the location as we have to set this as the default home directory for Java later in the program.

Page 42: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 1: Install Java SE 6 for Windows

It will ask you to create a new directory, Select the option carefully as we have to choose this directory for the JCE creator program.

Page 43: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 1: Install Java SE 6 for Windows

If you have a secure environment, it will generate a security alert. Click on “allow access” to add an exception to your firewall.

Page 44: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 1: Install Java SE 6 for Windows

Type in the admin credentials; you can leave the default ports.

Page 45: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 1: Install Java SE 6 for Windows

If you need any of the additional services, you can select from this menu or you can leave them as default.

Page 46: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 1: Install Java SE 6 for Windows

Now it has everything configured; click on “install now” to install the program.

Page 47: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 1: Install Java SE 6 for Windows

Now you should be able to see the progress of the installation.

Page 48: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 1: Install Java SE 6 for Windows

You can either register with sun or you can skip the registration process.

Page 49: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 1: Install Java SE 6 for Windows

Once the installation is done, click on “finish”, and this completes the java installation which is the prerequisite for the next program installation.

Page 50: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 2: Install Java IDE, JCreator LE version for Windows

You can download this at http://www.jcreator.com/download.htm

Page 51: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 2: Install Java IDE, JCreator LE version for Windows

In our scenario, we have selected Windows XP. Click on the “Download” button to start the installation.

Page 52: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 2: Install Java IDE, JCreator LE version for Windows

Read the license agreements carefully and click on “I accept the agreement” to agree and click on “next” to go to the next step.

Page 53: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 2: Install Java IDE, JCreator LE version for Windows

Choose the directory where you want to install this program and click on “next” to go to the next step.

Page 54: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 2: Install Java IDE, JCreator LE version for Windows

If you have selected the default option from the previous step, it will ask you to create a new folder; click “yes” to create a new folder.

Page 55: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 2: Install Java IDE, JCreator LE version for Windows

Select the option which you want to have and click on “next” to go to the next step.

Page 56: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 2: Install Java IDE, JCreator LE version for Windows

Once the program is installed, click on “finish” to launch the program.

Page 57: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 2: Install Java IDE, JCreator LE version for Windows

Here are the settings for the JCreator, carefully follow the steps sequentially.

Page 58: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 2: Install Java IDE, JCreator LE version for Windows

Click on “next” with the following selections.

Page 59: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 2: Install Java IDE, JCreator LE version for Windows

This is the most important step where you select the path of Java (the prerequisite which we have installed). Refer to Appendix 1 for the default Java home and click on “new” to browse and add the path.

When you are done click on apply this completes the prerequisites of the batch import program.If you are not familiar with Jcreator IDE, go through the help section"Creating your first application”.

Page 60: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 3: Create DSN (Microsoft Excel Driver)

Go to “Start-Control Panel-Administrative Tools-Data sources (ODBC)”, and click “Add…”

Page 61: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 3: Create DSN (Microsoft Excel Driver)

Select “Microsoft Excel Driver (*.xls) and click “Finish”.

Page 62: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 3: Create DSN (Microsoft Excel Driver)

Fill in “Data Source Name” (DSN) as “connExcelJava”;Click “Select Workbook” to choose the Excel source file.

Page 63: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 3: Create DSN (Microsoft Excel Driver)

DSN for Excel driver is created.

Page 64: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 4: Run the MetaDataImport program

Run MetaDataImport using Java IDE.

Page 65: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 4: Run the MetaDataImport program

Page 66: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 4: Run the MetaDataImport program

Results (with only Dublin Core elements)

Page 67: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 4: Run the MetaDataImport program

Results (with Darwin Core elements)

Page 68: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 4: Run the MetaDataImport program

Individual record package

Generated SIP

Page 69: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Appendix 5: Upload SIP to DSpace Server

• [root@localhost bin]# ./dsrun org.dspace.app.itemimport.ItemImport -a -e [email protected] -c 10057/1996 -s /mnt/usbflash/HERBARIUM/ -m mapfileHERBARIUM

• Destination collections:• Owning Collection: HERBARIUM• Adding items from directory: /mnt/usbflash/HERBARIUM/• Generating mapfile: mapfileHERBARIUM • Adding item from directory HERBARIUM_1• …• Adding item from directory HERBARIUM_2• …• Adding item from directory HERBARIUM_3• Loading dublin core from /mnt/usbflash/HERBARIUM//HERBARIUM_3/dublin_core.xml• Schema: dc Element: identifier Qualifier: none Value: 2040.0• Schema: dc Element: title Qualifier: none Value: Asclepias stenophylla A. Gray • Schema: dc Element: title Qualifier: alternative Value: Narrow-leaved milkweed• …• Loading dublin core from /mnt/usbflash/HERBARIUM/HERBARIUM_3/metadata_dwc.xml• Schema: dwc Element: family Qualifier: none Value: Asclepidaceae• Schema: dwc Element: habitat Qualifier: none Value: Dry, rocky prairies, glades, ledges of bluffs. • Schema: dwc Element: verbatimElevation Qualifier: none Value: 1400’• Processing contents file: /mnt/usbflash/HERBARIUM//HERBARIUM_3/contents• Bitstream: HERBARIUM_3.jpg• Processing handle file: handle• …

Page 70: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Acknowledgemnts

Susan Matveyeva, Catalog and Institutional Repository Librarian, [email protected]

Sai Deng, Metadata Cataloger, [email protected]

Baseer Khan, Technology Support Consultant II and DSpace Technician, [email protected]

Levi Wang, WSU Graduate, Java Programming, [email protected]

Thank Nancy Deyoe and the Cataloging Department at WSU Libraries for their help in providing historical projects information during Susan’s informal interview in the introduction part of this presentation.

Page 71: Enhancing Workflow through Batch Import from Excel to DSpace Sai Deng, Susan Matveyeva, Baseer Khan, Wichita State University Libraries.

Thank you!