WAS to Archive-It Metadata Migration March 11, 2015.

60
WAS to Archive-It Metadata Migration March 11, 2015

Transcript of WAS to Archive-It Metadata Migration March 11, 2015.

Page 1: WAS to Archive-It Metadata Migration March 11, 2015.

WAS to Archive-It Metadata Migration

March 11, 2015

Page 2: WAS to Archive-It Metadata Migration March 11, 2015.

WAS -> Archive-It

WASProject/Archive

• 3 levels of hierarchy– Project– Site (can contain 1 or more Seed URLs)

– Seed URL

Archive-It Collection

• 2 levels of hierarchy– Collection– Seed URL

Page 3: WAS to Archive-It Metadata Migration March 11, 2015.

2 Seed URLs per Site

1 Seed URL per Site

1 Seed URL per Site

1 Seed URL per Site

2 Seed URLs per Site

Page 4: WAS to Archive-It Metadata Migration March 11, 2015.

Multiple seeds – flattens out; each Seed URL gets all the Site Metadata

Page 5: WAS to Archive-It Metadata Migration March 11, 2015.

BEFORE starting, you should…

Delete sites (seeds) that you have never captured or you captured, but you deleted all the captures. Probably sitting under ‘never captured’ or ‘inactive sites’

Page 6: WAS to Archive-It Metadata Migration March 11, 2015.

How to move

• Move project (collection) by project (collection).

• When you sit down, start and finish the move of a project.

• You don’t have to do all projects/collections in one day

Page 7: WAS to Archive-It Metadata Migration March 11, 2015.

Run two reports (Administration > Project Admin)1. Click “Archive-It Seed Export” > Export Seeds2. Click “Archive-It Seed Metadata Export > export metadata

Coming Soon in your accounts

Page 8: WAS to Archive-It Metadata Migration March 11, 2015.

Export Seeds

Page 9: WAS to Archive-It Metadata Migration March 11, 2015.

Seeds export from WAS

• It is in .txt format, open it with notepad • Your seeds will be segmented by crawl

frequency. • E.g., “Seeds with custom schedule of 1x per

year”• You will copy and paste URLS from the .txt

document and upload them in chunks by frequency

Page 10: WAS to Archive-It Metadata Migration March 11, 2015.

Example text file

Page 11: WAS to Archive-It Metadata Migration March 11, 2015.

Consult the WAS- Archive-it mapping document to decide on the equivalent frequency

https://wiki.library.ucsf.edu/download/attachments/351243364/MappingofWAStoArchive-It.pdf?version=1&modificationDate=1422304077000&api=v2

Page 12: WAS to Archive-It Metadata Migration March 11, 2015.

Create a Collection (project) in Archive-It

Page 13: WAS to Archive-It Metadata Migration March 11, 2015.

Create Collection (aka Project)

Page 14: WAS to Archive-It Metadata Migration March 11, 2015.

Select frequency: for now leave at “One-Time’, click next

Page 15: WAS to Archive-It Metadata Migration March 11, 2015.

Enter Collection level metadata. This metadata displays in the public site. You can go back and

fully enter this later

Page 16: WAS to Archive-It Metadata Migration March 11, 2015.

Topics will appear in public site(along with any Subjects you have)

Page 17: WAS to Archive-It Metadata Migration March 11, 2015.

Example display

Page 18: WAS to Archive-It Metadata Migration March 11, 2015.

In order to create a collection, you must upload seeds.

Page 19: WAS to Archive-It Metadata Migration March 11, 2015.

If you have Historical seeds, Upload those FIRST(!)

• Historical sites/seeds are seeds where the seed URL has changed over the life of the captures.

• They will be at the top of your seeds .txt document• Do these first because it is easiest to do a ‘bulk edit”

and select ‘deactivate”

Page 20: WAS to Archive-It Metadata Migration March 11, 2015.

Example seeds list with Historical Seeds

Page 21: WAS to Archive-It Metadata Migration March 11, 2015.

Copy and paste seeds from .txt fie into box. Leave ‘Default’ selected > Next

Page 22: WAS to Archive-It Metadata Migration March 11, 2015.

VERY important:1. Ignore this error for ALL your seed uploads. 2. “URL is correct; use as is” MUST be checked regardless of the error you see. If it is not selected for any seeds, go thru now and change it for all instances.

Page 23: WAS to Archive-It Metadata Migration March 11, 2015.

Another example, click: “URL is correct; use as if” for all

Page 24: WAS to Archive-It Metadata Migration March 11, 2015.

Collection created

Page 25: WAS to Archive-It Metadata Migration March 11, 2015.

Bulk Edit Historical Seeds (where applicable)

Page 26: WAS to Archive-It Metadata Migration March 11, 2015.

Under “Seed Management” click “All”

Page 27: WAS to Archive-It Metadata Migration March 11, 2015.

Click top box to select all. Note: you will ‘select all’ for what is displayed, if there are more than 400 items,

they are on another page. You will have to repeat

Page 28: WAS to Archive-It Metadata Migration March 11, 2015.

Click “bulk edit”

Page 29: WAS to Archive-It Metadata Migration March 11, 2015.

Choose “Deactivate”

Page 30: WAS to Archive-It Metadata Migration March 11, 2015.

Go back to bulk edit > Add Metadata

• Suggestion: add a Notes field if you don’t already have one, where you note that these are historical seeds. Most likely will never want to crawl these again so you may want to keep track

Page 31: WAS to Archive-It Metadata Migration March 11, 2015.
Page 32: WAS to Archive-It Metadata Migration March 11, 2015.

Add a custom field

Page 33: WAS to Archive-It Metadata Migration March 11, 2015.

Go back Collection management and repeat for the next frequency in your seed list

Page 34: WAS to Archive-It Metadata Migration March 11, 2015.

Back to Seeds .txt file

Leave as ‘one-time” they will not crawl until you say crawl now

Page 35: WAS to Archive-It Metadata Migration March 11, 2015.

Copy and paste seeds into box. Leave ‘Default’ selected > Next

Page 36: WAS to Archive-It Metadata Migration March 11, 2015.

For this case, choose Quarterly

Page 37: WAS to Archive-It Metadata Migration March 11, 2015.

Import metadata

Page 38: WAS to Archive-It Metadata Migration March 11, 2015.

Click “ALL seeds” > Import metadata

Page 39: WAS to Archive-It Metadata Migration March 11, 2015.

Upload the metadata file > Upload File (leave default setting)

Page 40: WAS to Archive-It Metadata Migration March 11, 2015.

You could stop here and do the clean up at a later day

Page 41: WAS to Archive-It Metadata Migration March 11, 2015.

Metadata cleanup

• If there is a WAS field that is not in Archive-it, on import Archive-it creates a custom field.

• All fields will display in the public interface by default

• The following fields may be in your upload, but they should ALL be made private:Note, Scope, Robots honored, Max crawl seconds, Capture frequency, Seed type, Site ID

Page 42: WAS to Archive-It Metadata Migration March 11, 2015.

How to make fields private in Archive-it:

1. Go to Admin (link in the upper right corner)2. Account Settings 3. In the text box toward the bottom of the page

called 'Private Metadata Fields' enter all these fields: Note , Scope, Robots honored, Max crawl seconds, Capture frequency, Seed type, Site ID

4. NB: Enter each field name on a separate line, in all lower case letters.

Page 43: WAS to Archive-It Metadata Migration March 11, 2015.
Page 44: WAS to Archive-It Metadata Migration March 11, 2015.
Page 45: WAS to Archive-It Metadata Migration March 11, 2015.

Scope –> Seed Type

• What about Directory only?• What about Page only?

Page 46: WAS to Archive-It Metadata Migration March 11, 2015.

NB. Archive-it offers a lot of additional scoping options for crawls. View: Help Documentation (linked top, right of collection page)

Page 47: WAS to Archive-It Metadata Migration March 11, 2015.

Directory is not a separate scoping option in Archive-it ( it is handled through slash - /)

NO action need by you, except to QA

WAS – Directory crawls

• Rosalie.com/presentations– We will add the ending slash for you if you didn’t

• Rosalie.com/presentations/– It moves over as is

• Rosalie.com/presentations.html– It will crawl as host

Page 48: WAS to Archive-It Metadata Migration March 11, 2015.

What about ‘page only’ crawls?

• For ‘Page only’ you will have to manually go back and change crawl scope (seed type)

• You can find these by opening the metadata export. It is in .ods format, which you can open in Google docs, with most versions of excel or download open office.

• Do NOT edit the .ods file before doing the metadata upload; make a copy.

• Then sort “scope” column to find the relevant URLsHow to change it:• Page: click on Settings > Crawl one page only (can also be

bulk edited)

Page 49: WAS to Archive-It Metadata Migration March 11, 2015.

Change Frequency under Settings > Seed Type

Page 50: WAS to Archive-It Metadata Migration March 11, 2015.

When will my crawls start? When you start them.

Page 51: WAS to Archive-It Metadata Migration March 11, 2015.

When do I shut off WAS crawls?

• FIRST set up your crawls in Archive-It• Make sure daily crawls are running• Then you can stop your WAS crawls

Page 52: WAS to Archive-It Metadata Migration March 11, 2015.

VERY important: Do NOT make any edits to WAS data, crawls, ANYTHING once you have moved a project to Archive-It!

Page 53: WAS to Archive-It Metadata Migration March 11, 2015.

Batch shut off crawling in WAS

Page 54: WAS to Archive-It Metadata Migration March 11, 2015.

Sites > Manage Sites > “all” > “select all” > “Reschedule Selected”

Page 55: WAS to Archive-It Metadata Migration March 11, 2015.

Select “off” and click “Reschedule”

Page 56: WAS to Archive-It Metadata Migration March 11, 2015.

Send CDL your info

Page 57: WAS to Archive-It Metadata Migration March 11, 2015.

After you have created all your collections,

1. Send Rosalie this info for each collectiona) Collectionidb) Accountid

AND

2. Add Rosalie as a user to your account (for now)

Page 58: WAS to Archive-It Metadata Migration March 11, 2015.

CollectionId and AccountId in URL

Page 59: WAS to Archive-It Metadata Migration March 11, 2015.

Where’s my data?

• Archive-It will work with CDL staff to move over your data.

• Timeline: May/June 2015

Page 60: WAS to Archive-It Metadata Migration March 11, 2015.

Resources

WAS – Archive-It Migration wiki: https://wiki.library.ucsf.edu/display/UCLCKG/WAS+-%3E+Archive-it+Migration

Mapping of terms and metadata: WAS - Archive-It:https://wiki.library.ucsf.edu/download/attachments/351243364/MappingofWAStoArchive-It.pdf?version=1&modificationDate=1422304077000&api=v2