Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit...

118

Transcript of Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit...

Page 1: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Page 2: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Toro 1

EMu on a Diet

Page 3: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Yale campus

Page 4: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Page 5: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Peabody CollectionsCounts & Functional Cataloguing Unit

• Anthropology 325,000 Lot• Botany 350,000 Individual• Entomology 1,000,000 Lot• Invertebrate Paleontology 300,000 Lot• Invertebrate Zoology 300,000 Lot• Mineralogy 35,000 Individual• Paleobotany 150,000 Individual• Scientific Instruments 2,000 Individual• Vertebrate Paleontology 125,000 Individual• Vertebrate Zoology 185,000 Lot / Individual

2.7 million database-able units => ~11 million items

Page 6: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Peabody CollectionsFunctional Units Databased

• Anthropology 325,000 90 %• Botany 350,000 1 %• Entomology 1,000,000 3 %• Invertebrate Paleontology 300,000 60 %• Invertebrate Zoology 300,000 25 %• Mineralogy 35,000 85 %• Paleobotany 150,000 60 %• Scientific Instruments 2,000 100 %• Vertebrate Paleontology 125,000 60 %• Vertebrate Zoology 185,000 95 %

990,000 of 2.7 million => 37 % overall

Page 7: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

The four YPM buildings

Peabody(YPM)

EnvironmentalScience Center

(ESC)

Geology / Geophysics(KGL)

175 Whitney(Anthropology)

Page 8: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

VZKristof Zyskowski (Vert. Zool. - ESC)

Greg Watkins-Colwell(Vert. Zool. - ESC)

Page 9: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

HSIShae Trewin

(Scientific Instruments – KGL )

Page 10: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

VPMary Ann Turner

(Vert. Paleo. – KGL / YPM)

Page 11: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

ANTMaureen DaRos

(Anthro. - YPM / 175 Whitney)

Page 12: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

0

10

20

30

40

50

60

70

80

90

100

1 10 100 1000

% Databased vs. Collection Size (in 1000s of items)

Page 13: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

0

10

20

30

40

50

60

70

80

90

100

1 10 100 1000

BotanyEntomologyInvertebrate PaleontologyInvertebrate Zoology

% Databased vs. Collection Size (in 1000s of items)

Page 14: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

• 1991 Systems Office created & staffed

Peabody CollectionsApproximate Digital Timeline

Page 15: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

• 1991 Systems Office created & staffed

• 1992 Argus collections databasing initiative started

Peabody CollectionsApproximate Digital Timeline

Page 16: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

• 1991 Systems Office created & staffed

• 1992 Argus collections databasing initiative started

• 1994 Gopher services launched for collections data

Peabody CollectionsApproximate Digital Timeline

Page 17: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

• 1991 Systems Office created & staffed

• 1992 Argus collections databasing initiative started

• 1994 Gopher services launched for collections data

• 1997 Gopher mothballed, Web / HTTP services launched

Peabody CollectionsApproximate Digital Timeline

Page 18: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

• 1991 Systems Office created & staffed

• 1992 Argus collections databasing initiative started

• 1994 Gopher services launched for collections data

• 1997 Gopher mothballed, Web / HTTP services launched

• 1998 Physical move of many collections “begins”

• 2002 Physical move of many collections “ends”

Peabody CollectionsApproximate Digital Timeline

Page 19: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

• 1991 Systems Office created & staffed

• 1992 Argus collections databasing initiative started

• 1994 Gopher services launched for collections data

• 1997 Gopher mothballed, Web / HTTP services launched

• 1998 Physical move of many collections “begins”

• 2002 Physical move of many collections “ends”

• 2003 Search for Argus successor commences

• 2003 Informatics Office created & staffed

Peabody CollectionsApproximate Digital Timeline

Page 20: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

• 1991 Systems Office created & staffed

• 1992 Argus collections databasing initiative started

• 1994 Gopher services launched for collections data

• 1997 Gopher mothballed, Web / HTTP services launched

• 1998 Physical move of many collections “begins”

• 2002 Physical move of many collections “ends”

• 2003 Search for Argus successor commences

• 2003 Informatics Office created & staffed

• 2004 KE EMu to succeed Argus, data migration begins

• 2005 Argus data migration ends, go-live in KE EMu

Peabody CollectionsApproximate Digital Timeline

Page 21: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

EMu migration in '05(all disciplines went live

simultaneously)

Physical move in ‘98-'02(primarily neontological disciplines)

Big events

Page 22: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Page 23: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

What do you do …

Page 24: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

What do you do …

… when your EMu is out of shape & sluggish ?

Page 25: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

What do you do …

… when your EMu is out of shape & sluggish ?

Page 26: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Page 27: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Page 28: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Page 29: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Page 30: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Page 31: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Page 32: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

The Peabody Museum Presents

Page 33: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

What clued us in that we should put our EMu on a diet ?

The Peabody Museum Presents

Page 34: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

980 megabytes in Argus

10,400 megabytes in EMu

Area of Server Occupied by Catalogue

Page 35: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

?

Area of Server Occupied by Catalogue

980 megabytes in Argus

10,400 megabytes in EMu

Page 36: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Default EMu “cron” maintenance job schedule

Mo Tu We Th Fr Sa Su

late night

workday

evening= emulutsrebuild

= emumaintenance batch

= emumaintenance compact

Page 37: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

late night

workday

evening= emulutsrebuild

= emumaintenance batch

= emumaintenance compact

Mo Tu We Th Fr Sa Su

Default EMu “cron” maintenance job schedule

Page 38: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

late night

workday

evening= emulutsrebuild

= emumaintenance batch

= emumaintenance compact

Mo Tu We Th Fr Sa Su

Default EMu “cron” maintenance job schedule

Page 39: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

late night

workday

evening= emulutsrebuild

= emumaintenance batch

= emumaintenance compact

Mo Tu We Th Fr Sa Su

Default EMu “cron” maintenance job schedule

Page 40: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Three Fabulously Easy Steps !

Page 41: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Three Fabulously Easy Steps !

• 1. The Legacy Data Burnoff• ( best quick loss plan ever ! )

Page 42: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Three Fabulously Easy Steps !

• 1. The Legacy Data Burnoff• ( best quick loss plan ever ! )

• 2. The Darwin Core Binge & Purge • ( eat the big enchilada and still end up thin ! )

Page 43: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Three Fabulously Easy Steps !

• 1. The Legacy Data Burnoff • ( best quick loss plan ever ! )

• 2. The Darwin Core Binge & Purge • ( eat the big enchilada and still end up thin ! )

• 3. The Validation Code SlimDing • ( your Texpress metabolism is your friend ! )

Page 44: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

1. The Legacy Data Burnoff

Anatomy of the ecatalogue database

File Name Function

~/emu/data/ecatalogue/data the actual data

~/emu/data/ecatalogue/rec indexing (part)

~/emu/data/ecatalogue/seg indexing (part)

The combined size of these was 10.4 gb -- 4 gb in data and 3 gb in each of rec and seg

980 mB 10,400 mB

Page 45: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

The ecatalogue database was a rate limiter

typical EMu data directory23 files, 2 subdirs

Page 46: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Closer Assessment of Legacy Data

In 2005, we had initially adopted many of the existing formats for data elements from the USNM’s EMu client, to allow for rapid development of the Peabody’s modules by KE prior to migration -- Legacy Data fields were among them

Page 47: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Closer Assessment of Legacy Data

In 2005, we had initially adopted many of the existing formats for data elements from the USNM’s EMu client, to allow for rapid development of the Peabody’s modules by KE prior to migration -- Legacy Data fields were among them

Page 48: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Closer Assessment of Legacy Data

Page 49: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

sites – round 2

constant data

lengthy prefixes

Page 50: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

sites – round 2

data of temporary use in migration

Page 51: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

catalogue – round 2data

rec

seg

Page 52: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Repetitive scripting of texexport & texload jobs

Conducted around a million updates of records

Manually adjusted cron jobs to accommodate

Did the work at night over six-month-long period

Watched process closely to keep from filling server disks

How did we do the LegacyData Burnoff in 2005 ?

Page 53: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Repetitive scripting of texexport & texload jobs

Conducted around a million updates of records

Manual;y adjusted nightly cron jobs to accommodate

Did the work at night over six-month-long period

Watched process closely to keep from filling server disks

How did we do the LegacyData Burnoff in 2005 ?

Page 54: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

ecatalogue

data

rec

seg

Page 55: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Crunch 2data

rec

seg

delete nulls from AdmOriginalData

ecatalogue

Page 56: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Crunch 3data

rec

seg

delete nulls from AdmOriginalData

shorten labels on AdmOriginalData

ecatalogue

Page 57: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Crunch 4data

rec

seg

delete nulls from AdmOriginalData

shorten labels on AdmOriginalData

delete prefixes on AdmOriginalData

ecatalogue

Page 58: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Crunch 4data

rec

seg

delete nulls from AdmOriginalData

shorten labels on AdmOriginalData

delete prefixes on AdmOriginalData

ecatalogue

Wow ! 55 % reduction !

Page 59: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

2. The Darwin Core Binge & Purge

Charles Darwin, 1809-1882

Page 60: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Natural History Metadata Standard

“ DwC ”

Affords interoperability of different database systems

Widely used in collaborative informatics initiatives

Circa 40-50 fields depending on particular version

Directly analogous to the Dublin Core standard

Page 61: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Page 62: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Page 63: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Page 64: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Page 65: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Populate DwC fields at 3.2.02 upgrade in 2006… so what ?

Page 66: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Populate DwC fields at 3.2.02 upgrade in 2006… so what ?

IZ Department: total characters existing data 43,941,006

Page 67: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Populate DwC fields at 3.2.02 upgrade in 2006… so what ?

IZ Department: total characters existing data 43,941,006IZ Department: est. new DwC characters 20,000,000

Page 68: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Populate DwC fields at 3.2.02 upgrade in 2006… so what ?

IZ Department: total characters existing data 43,941,006IZ Department: est. new DwC characters 20,000,000IZ Department: est. expansion factor 45 %

Page 69: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

We’re about to gain back most of the pounds we just lost in the Legacy Data Burnoff !

Page 70: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

catalogue – round 2data

rec

seg

Page 71: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

catalogue – round 2data

rec

segaction in ecollectionevents

Page 72: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

catalogue – round 2data

rec

segaction in eparties

Page 73: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

catalogue – round 2data

rec

segaction in ecatalogue

Page 74: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

catalogue – round 2data

rec

segBefore actions

Page 75: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

catalogue – round 2data

rec

segAfter actions

Page 76: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Page 77: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

ExtendedData

Page 78: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

ExtendedData

SummaryData

Page 79: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

ExtendedData

SummaryData

ExtendedData field is a full duplication ofIRN + SummaryData fields… delete theExtendedData field, use SummaryDatawhen in “thumbnail mode” on records

Page 80: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Populate DwC fields at 3.2.02 upgrade… so what ?

IZ Department: total characters existing data 43,941,006IZ Department: est. new DwC characters 20,000,000IZ Department: est. expansion factor 45 %

Page 81: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Populate DwC fields at 3.2.02 upgrade… so what ?

IZ Department: total characters modified data 43,707,277IZ Department: total new DwC characters 22,358,461IZ Department: actual expansion factor - 0.1 %

Page 82: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Populate DwC fields at 3.2.02 upgrade… so what ?

IZ Department: total characters existing data 43,707,277IZ Department: total new DwC characters 22,358,461IZ Department: actual expansion factor - 0.1 %

Some pain, but NO weight gain !

Page 83: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

3. The Validation Code SlimDing

We’ve taken off the easiest pounds… any other fields to trim ?Some sneakily subversive texpress tricks

Page 84: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

3. The Validation Code SlimDing

Can history of query behavior by users help identify some EMu soft spots ?

Page 85: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

3. The Validation Code SlimDing

Can history of query behavior by users help identify some EMu soft spots ?

If so, can we slip EMu a “dynamic diet pill” into its computer code ?

Page 86: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

3. The Validation Code SlimDing

Can history of query behavior by users help identify some EMu soft spots ?

If so, can we slip EMu a “dynamic diet pill” into its computer code ?

texadmin

Page 87: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

…you make certain common types of changes to any record in any EMu module

…and automatic changes then propagate via “emuload” to numerous records in linked modules

…those linked modules can grow a lot and slow EMu significantly between maintenance runs

EMu actions in the background you don’t see

Page 88: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Page 89: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Page 90: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Why not harness EMu’s continuously ravenous appetite for pushing local copies of linked fields into remote modules… and put it to work slimming for us !

Page 91: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Why not harness EMu’s continuously ravenous appetite for pushing local copies of linked fields into remote modules… and put it to work slimming for us !

Need to first understand how different EMu queries work

Page 92: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Drag and Drop Query

Page 93: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Drag and Drop Query

checks the link field

Page 94: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Straight Text Entry Query

instead checks a local copy of the SummaryData from the linked record

that has been inserted into the catalogue

Page 95: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

EMu’s audit log - gigantic activity trail

How often do users employ these two verydifferent query strategies, on what fields,

and are there distinctly divergent patterns ?

Page 96: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

catalogue audit

In this one week sample, only 7 of 52 queries for accessions from insidethe catalogue module used text queries, the other 45 were drag & drops

Page 97: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Of those 7 text queries, every one asked for a primary id numberfor the accession, or the numeric piece of that number, but notfor any other type of data from within those accessions

Page 98: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Over a full year of catalogue audit data, less than 1% ofall the queries into accessions used other than the primary id of the accession record as the keyword(s).

Page 99: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Over a full year of catalogue audit data, less than 1% ofall the queries into accessions used other than the primary id of the accession record as the keyword(s).

This is where we gain our SlimDing advantage !

Page 100: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Over a full year of catalogue audit data, less than 1% ofall the queries into accessions used other than the primary id of the accession record as the keyword(s).

This is where we gain our SlimDing advantage !

We don’t need more than the primary id of the accession record in the local copy of the accession module data stored in the catalogue module.

Page 101: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Over a full year of catalogue audit data, less than 1% ofall the queries into accessions used other than the primary id of the accession record as the keyword(s).

This is where we gain our SlimDing advantage !

We don’t need more than the primary id of the accession record in the local copy of the accession module data stored in the catalogue module.

This pattern also held true for queries launched from the catalogue against the bibliography and loans modules !

Page 102: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.
Page 103: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Catalogue Database

Page 104: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Catalogue Database

Page 105: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Catalogue Database

Page 106: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Catalogue Database

Page 107: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Catalogue Database

Catalogue module lost

another 19% of its bulk

over a couple months !

Page 108: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Internal Movements Database

Page 109: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Internal Movements Database

Internal movements

dropped from 550 mbytes

down to 200 mbytes…

65% reduction !

Page 110: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Internal Movements Database

Page 111: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Internal Movements Database

Page 112: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

late night

workday

evening= emulutsrebuild

= emumaintenance batch

= emumaintenance compact

Mo Tu We Th Fr Sa Su

Default EMu “cron” maintenance job schedule

Page 113: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Mo Tu We Th Fr Sa Su

late night

workday

evening= emulutsrebuild

= emumaintenance batch

= emumaintenance compact

Default EMu “cron” maintenance job schedule

* * *

Page 114: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Mo Tu We Th Fr Sa Su

late night

workday

evening= emulutsrebuild

= emumaintenance batch

= emumaintenance compact

Default EMu “cron” maintenance job schedule

* * *

Page 115: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Mo Tu We Th Fr Sa Su

late night

workday

evening= emulutsrebuild

= emumaintenance batch

= emumaintenance compact

Default EMu “cron” maintenance job schedule

* * *

Page 116: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

Quick backup

Page 117: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

A Happy EMu Means Happy Campers

Page 118: Toro 1 EMu on a Diet Yale campus Peabody Collections Counts & Functional Cataloguing Unit Anthropology 325,000Lot Botany 350,000Individual Entomology1,000,000Lot.

finis