Toro 1 EMu Hacking at the Peabody Museum. Yale campus.

Post on 15-Dec-2015

226 views 3 download

Tags:

Transcript of Toro 1 EMu Hacking at the Peabody Museum. Yale campus.

Toro 1

EMu Hacking at the Peabody Museum

Yale campus

Peabody CollectionsCounts & Functional Cataloguing Unit

• Anthropology 325,000 Lot• Botany 350,000 Individual• Entomology 1,000,000 Individual• Invertebrate Paleontology 300,000 Lot• Invertebrate Zoology 300,000 Lot• Mineralogy 35,000 Individual• Paleobotany 150,000 Individual• Scientific Instruments 2,000 Individual• Vertebrate Paleontology 125,000 Individual• Vertebrate Zoology 185,000 Lot / Individual

2.7 million database-able units => ~11 million items

Peabody CollectionsFunctional Units Databased

• Anthropology 325,000 90 %• Botany 350,000 1 %• Entomology 1,000,000 1 %• Invertebrate Paleontology 300,000 55 %• Invertebrate Zoology 300,000 20 %• Mineralogy 35,000 85 %• Paleobotany 150,000 60 %• Scientific Instruments 2,000 100 %• Vertebrate Paleontology 125,000 60 %• Vertebrate Zoology 185,000 95 %

940,000 of 2.7 million => 37 % overall

EMu migration in '05(all disciplines went live

simultaneously)

Physical move in '00-'02(primarily neontological disciplines)

Big events

The four YPM buildings

Peabody(YPM)

EnvironmentalScience Center

(ESC)

Geology / Geophysics(KGL)

175 Whitney(Anthropology)

VZKristof Zyskowski (Vert. Zool. - ESC)

Greg Watkins-Colwell(Vert. Zool. - ESC)

HSIShae Trewin

(Scientific Instruments – KGL )

VPMary Ann Turner

(Vert. Paleo. – KGL / YPM)

ANTMaureen DaRos

(Anthro. - YPM / 175 Whitney)

EMu Hacking at Peabody

Hacking – in a laudatory programming sense, not a criminal sense

Mitnick

Often we tend to think of “hackers” in this mode

Mitnick modifiedcracker

A better moniker

Mitnick modified w/EMucracker

Crackers often have unnamed accomplices…

3 Vignettes of YPM EMu “hacks”

• An issue of functionality (background script)

• An issue of performance (tweaking the catalogue)

• An issue of user behavior & cost (another script…)

Hack Vignette #1

Multimedia module - JPEG 2000 support

http://www.jpeg.org/jpeg2000

- non-proprietary compression standard- lossless mode (much smaller files)- lossy mode (vastly smaller files)- potential space/bandwidth savings

http://www.fnordware.com/j2k

JP2 spicebush with J2K and tail target

JP2 spicebush tails with file sizes

1.54 mB (native TIFF) 15 kB (heavily squeezed JP2)

HERBIS images

261 kb – <1% 1,302 kb – 2%

5,166 kb – 12% 62,640 kb – 100%

JP2 – no thumbnailIn EMu, oops… no thumbnail

JP2 – script coding

find imagedir –name *.jp2 –mtime -2 –print

loop on the matches and test to see which recently loaded JP2 files are missing a thumbnail JPG, or which JP2 files have been modified more recently than their existing thumbnail JPG ; then build filenames for any qualifying target JPGs ; execute script several times per hour from cron

jasper –f match –F tempfile

convert tempfile –resize 90x90 target

JP2 – prior, withoutscript wakes up every 20 minutes…

JP2 – now, withmakes the thumbnail…

JP2 – Tiled View

JP2 files now behave just like all other standard multimedia

JP2 – Photoshop opens

Double click and the Photoshop handler kicks in

JP2 – V1

V. 1 – simply generated thumbnails in the background

JP2 – V2

V. 2 – also inserted suitable metadata into records via texload

(next version, script to be called directly in validation code at file time)

Hack Vignette #1

Moral #1 = EMu is extensible, you may be able to implement significant changes yourself in whole or in part, without delay

Catalogue module - performance issues

Hack Vignette #2

Default EMu “cron” job configuration

late night

workday

evening= emulutsrebuild

= emumaintenance batch

= emumaintenance compact

Mo Tu We Th Fr Sa Su

Orange is time EMu busyrunning background jobs.Interfering with workdaywork, and leaving Sundayprocessing time idle/unused.

The ecatalogue database is a rate limiter

File Name Function

~/emu/data/ecatalogue/data the actual data~/emu/data/ecatalogue/rec indexing (part)~/emu/data/ecatalogue/seg indexing (part)

At YPM, the combined size of these was >10 gB, with 4 gB in data and 3 gB in both rec and seg

Touch many types of records in EMu…

e.g., Party record add middle namee.g., Bibliography record add authore.g., Collecting Events recordadd collector

…automatic changes subsequently propogate to numerous records in the ecatalogue database

…ecatalogue can grow a lot and slow EMu to varying degrees between maintenance runs

How to make ecatalogue go faster ?

maybe save 20+% ?

Make it smaller - trim nulls from Legacy Data ?

Repetitive scripting of texexport & texload jobs

Conducting around a million re-imports of records

Manual adjustment of nightly cron jobs to accommodate

Do the work at nighttime over a month-long period

Watched ecatalogue closely to keep from exploding disk

Make it smaller - trim nulls from Legacy Data ?

data

rec

seg

Starting situation at YPM for ecatalogue (gB on y axis)

data

rec

seg

delete nulls from AdmOriginalData

sites – round 2

constant data

lengthy prefixes

… not satisfied with just that… here are some other things to possibly trim!

data

rec

seg

delete nulls from AdmOriginalData

shorten prefix on AdmOriginalData

selectively delete AdmOriginalData

>55 % !

catalogue – round 2data

rec

seg

What ecatalogue AdmOriginalData looks like post scripting

Default EMu “cron” job configuration

late night

workday

evening= emulutsrebuild

= emumaintenance batch

= emumaintenance compact

Mo Tu We Th Fr Sa Su

BEFORE

Modified EMu “cron” job configuration

Mo Tu We Th Fr Sa Su

late night

workday

evening= emulutsrebuild

= emumaintenance batch

= emumaintenance compact

* * *

AFTER

Can now squeeze allmaintenance into wee hoursof night, use Sunday, andfully compact ecatalogueevery other day (asterisks)!

Quick backupAlso, all of YPM EMu can now be squeezed onto a thumbdrive

Hack Vignette #2

Moral #2 = know your data, you can put aspects of EMu on a diet and your computer system is likely to thank you

Hack Vignette #3

EMu sessions - licensing and user behavior

Dreaded email

WARNING! 2 KE EMu user(s) are currently

being denied access because all 10 of your KE EMu licenses

are in use. For license upgrades, please contact info@kesoftware.com

Dreaded email for sysadmins

Museum Director: "Go license shopping at KE!"

Systems Admin: "VISA or MasterCard?"

The conversation you dreamof but of course never have…

What do you need ?

• Guaranteed license seat for every potential user ?

• Cover maximal number of expected concurrent users ?

• Minimize expenses by minimizing license seats ?

Jess & Lourdes fight (2)

My turn to log in !

%}&$

Dream on, loser !

#@^*

3rd option is dangerous… if you have this you probably have too few licenses

Even with a moderate number of licenses…

… inactive EMu sessions can and will accumulate

Critical research

VARIANT 1: critical research needed, EMu session put on hold

VARIANT 2: both people and computers crash…

Life intervenesMon cherie IRN

View >Attachments

…enter the EMu Grim Reaper Script

seeks out inactive EMu sessions

reaper – script

codingtexlicstatusps -ef

-Grim Reaper wakes up frequently throughout the day -keeps a running table of statistics about each texserver -compares each texserver against a countdown timer -adjusts timer based on activity since last wake up -if some new activity, resets the countdown timer -if no activity, increments the countdown timer -if countdown timer max is reached, kill the texserver

kill –9 texserver_process_id

Tuning the Emu Grim Reaper Script

Change time between wakeup checks

Change number of wakeup check intervals

Tell reaper to ignore certain users

Amend reaper behavior by time of day

Alter how much inactivity is considered bad

32 regular YPM users, 13 runtime licenses

New sessions started per hour, 0800-1700

25

0

Real data prior two weeks in October 2006

Cumulative new sessions started, 0800-1700

80

0

Real data prior two weeks in October 2006

Active sessions, 0800-1700: three slow days

12

0

2

10

6

8

4

Real data prior two weeks in October 2006

Active sessions, 0800-1700: three fast days

12

0

2

10

6

8

4

Real data prior two weeks in October 2006

Cope on phone

It’s telling me, “Licenses

Exceeded?!”

No more worrries

Hack Vignette #3

MORAL = find a licensing balance, but also consider training your users and EMu system

Happy Scripting, Happy Campers