Creating A Database of Ship Citations

Post on 13-Jul-2015

604 views 0 download

Transcript of Creating A Database of Ship Citations

CREATING A DATABASE OF

SHIP CITATIONS: THE CHALLENGES ENCOUNTERED

IN SHIPINDEX.ORG

The Charleston Conference, 3 Nov 2010

Peter McCrackenCo-Founder & Director of Content

and Business Development, ShipIndex.org

What kinds of ships are these?

Bark (or barque); Ship; Brigantine; Barquentine; Topsail Schooner; Schooner

Serials :: Ships

Publication pattern (or format?) :: Vessel type

Serial title :: Ship name

ISSN :: IMO

Ship research :: Any other historical research

Ships :: Other historical research

Problems with ships are the same as problems

with personal names, geographic descriptors,

etc.

Can also apply to concepts, as well as things

Also ‘non-unique’ items, like a car model

Data challenges – personal names

Innumerable works by “Anonymous”

Names are often shortened

Pablo Picasso’s full name was Pablo Diego José

Francisco de Paula Juan Nepomuceno María de

los Remedios Cipriano de la Santísima Trinidad

Ruiz y Picasso

Names have strange limitations

Some must be unique – Consider Michael J. Fox

Some are very common – Consider Adam Smith

Data challenges – geographic

names

Numerous variations: Köln; Cologne; Keulen;

Colonia; Colònia; Kolín nad Rýnem; Cwlen;

Κολωνία; Kolonjo; كولونيا; Кьолн; Ķelne; Кёльн

Name changes

Hot Springs, NM -> Truth or Consequences, NM

Halfway, OR -> Half.com, OR

Clark, TX -> DISH, TX

St. Petersburg -> Petrograd -> Leningrad ->

St. Petersburg (“Petersburg,” or “Piter”)

A “meaning-less” identifier

Regardless of the topic, some meaning-less

identifier can provide significant assistance

“Meaning-less” in the sense of a one-to-many

relationship between the identifier and the

data

The identifier doesn’t change, but the data can

Overview of ShipIndex.org

A database of citations –

>1.42 million citations, from >200 resources

>140,000 citations are freely available

Changes how one does maritime research

Far more content can researched more quickly

Opens up maritime research to everyone No need for inside knowledge on where to start

searching

Uncovers many hidden resources

Locates free, but hidden, web resources

Maritime access points

Vessel name

Vessel number

IMO numbers are new; hull numbers change

Captain name

They change between voyages, and die during them

Rig or vessel type

Ships are rebuilt; definitions change; “ship”

ALSO: Port of registration; crew members; others

Sources of errors – transcribers,

indexers, OCR operators, etc.

Transcription errors are very easy to make –

whether through incorrect assumptions, or

just mistakes

“Earnets” for “Earnest”; “Elizaneth” for

“Elizabeth”, etc.

Some files are much tougher to manage than

others

More challenges

How do we locate Elizabeth? Or Mary?

Elizabeth = 1899 citations

Mary = 2614 citations

Top ten ship names, for no good reason: Mary, Maria, Elizabeth, Anna, Union, Victoria, Hope, Flora, Emma, America

Try to limit results sets?

by time period

by vessel rig (maybe?)

by location(?)

by nationality

Changing vessel names

What do we do when a vessel changes its

name?

A person researching a vessel wants to know the

life of a ship; at present they need to know its

previous or subsequent names

This can only be done when we have unique

vessel identifiers – otherwise, how do you know

which Elizabeth became Hogwarts Belle?

Existing vessel identifiers

Hull Identification Number – Only US; any powered boat

USCG Documentation Number – Only US; >5 net tons

IMO Number – Assigned by Lloyd’s/Fairplay; international; passenger ships >100 gross tons, and cargo ships >300 gross tons; mandatory from 1996

Naval Identifiers – eg, PT-109, CV-42, BB-18, DD-793, D118, etc.

Lloyd’s numbers, and many more…

Unique historical vessel identifiers

Need an easy way to differentiate between

“Mary,” “Mary,” and “Mary”

Needs to be unique and unchanging (unlike

name, naval identifier, etc.)

Identifier itself has no meaning – no

indication within it of size, nationality, etc.

Identifier is quickly & automatically assigned

Identification is coordinated with multiple

organizations

Creating an identifier

Could be done through a standards-creation

process, via NISO or another organization

Or informally, with publicly-defined

guidelines, such as (just as examples):

Nine-digit number; ddd-ddddd-c (c=check digit)

Allow individuals to easily request identifiers for

their vessels or their citations

Need ability to easily combine/split/modify

User-managed is likely most cost-effective solution

Creating an identifier

Must have buy-in from many groups

Should be easy to implement

Should be easy to use; available to many

individuals and resources

Pre-populate as much as possible, open

editing to all

Maintain advisory group to address concerns,

disagreements, etc.

Defining <ShipIdentifier>

<OtherIdentifiers>

<IdentifierType>

<IdentifierNumber>

<ShipName>

<DateNameStartedInUse>

<DateNameEndedInUse>

<PreviousShipName>

<SubsequentShipName>

<RigType> - defined list of types, & “other”

<VoyageIdentifier> - multiple

More <ShipIdentifier>

<MilitaryUsage?> - yes/no/unclear

<Nationality>

<ServiceBranch>

<HullIdentifier>

<VesselMeasurements>

<MeasurementType> - list of options

<MeasurementValue>

Defining <VoyageIdentifier>

<ShipIdentifier>

<Captain>

<Crew> - multiple positions, multiple names

<CrewPosition>

<CrewmemberName>

<OtherVoyageIdentifiers>

<OtherVoyageDatabase>

<OtherVoyageDbId>

Expanding to other fields

Makes discovery more manageable

Makes linking possible

Use the same concept for other areas of

research, linking everything together

People

Places

Manufactured items

Artwork

Everything

Thoughts, questions, more?

Thank you –

Peter McCracken

peter@shipindex.org