CERN Document Server Software - blogs.epfl.ch

12
CDSware [email protected] CERN Document Server Software Jean-Yves Le Meur 15 June 2004

Transcript of CERN Document Server Software - blogs.epfl.ch

Page 1: CERN Document Server Software - blogs.epfl.ch

CDSware

Jean

-Yve

s.Le

.Meu

r@ce

rn.c

h

CE

RN

Doc

umen

t Ser

ver S

oftw

are

Jean

-Yve

s Le

Meu

r

15 J

une

2004

Page 2: CERN Document Server Software - blogs.epfl.ch

CDSware

Jean

-Yve

s.Le

.Meu

r@ce

rn.c

h

Sin

ce th

e cr

eatio

n of

CE

RN

50

year

s ag

o, li

brar

y m

issi

on is

the

sam

e: th

e di

ssem

inat

ion

and

long

term

ke

epin

g of

Hig

h E

nerg

y P

hysi

cs re

sults

CD

Sw

are

> In

trodu

ctio

n >

His

tory

Onl

y th

e m

eans

hav

e ch

ange

d1s

tco

mpu

ter e

ver

used

by

CE

RN

lib

rary

1965

1990

1stco

mpu

ter e

ver u

sed

by

WW

W

1993

: Pre

prin

t 19

96: L

ibra

ry

1999

: Doc

umen

t S

erve

rs

2002

1stre

leas

e of

C

DS

war

eto

pr

omot

e O

AI

mov

emen

t

Page 3: CERN Document Server Software - blogs.epfl.ch

CDSware

Jean

-Yve

s.Le

.Meu

r@ce

rn.c

h

Gen

eral

des

crip

tion

CE

RN

Doc

umen

t Ser

ver S

oftw

are

(CD

Sw

are)

is th

e so

ftwar

e de

velo

ped

by, m

aint

aine

d by

, and

use

d at

, the

CE

RN

Doc

umen

t Se

rver

(5 to

10

peop

le)

It al

low

s yo

u to

run

your

ow

n el

ectro

nic

prep

rint s

erve

r, yo

ur o

wn

onlin

e lib

rary

cat

alog

ue o

r a d

ocum

ent s

yste

m o

n th

e w

eb.

It ha

s be

en d

eplo

yed

thro

ugh

an in

crem

enta

l org

anic

-gro

wth

SW

de

velo

pmen

t mod

el.

It us

es fr

eew

are

tech

nolo

gy:

MyS

QL

RD

BM

S; C

DS

war

eIn

dexe

s;

Apa

che/

Pyt

hon;

XM

L M

AR

CTh

e C

DS

war

eis

free

sof

twar

e, li

cens

ed u

nder

GN

U G

ener

al P

ublic

Li

cenc

e(G

PL)

. It

com

plie

s w

ith th

e O

pen

Arc

hive

s In

itiat

ive

met

adat

a ha

rves

ting

prot

ocol

(OA

I-PM

H) a

nd u

ses

MA

RC

21

as it

s un

derly

ing

bibl

iogr

aphi

c st

anda

rd.

CD

Sw

are

> D

escr

iptio

n >

Gen

eral

Page 4: CERN Document Server Software - blogs.epfl.ch

CDSware

Jean

-Yve

s.Le

.Meu

r@ce

rn.c

h

CD

Sw

are

Glo

bal V

iew

CD

Sw

are

> D

escr

iptio

n >

Arc

hite

ctur

e

OA

I Dat

a Pr

ovid

ing

OA

I Ser

vice

s/A

pplic

atio

ns

CD

Swar

em

etad

ata

+ da

ta

Bib

Con

vert

Bib

Upl

oad

Bib

Sche

d

syst

em

libra

rian

Bib

Wor

ds

Bib

Har

vest

OA

I/Non

OA

I D

ata

Prov

ider

Bib

Form

at

Bib

Dat

a

user

Web

Sear

ch

Web

Pers

ous

er

auth

orW

ebSu

bmit

adm

in

adm

in

adm

in

adm

in

Page 5: CERN Document Server Software - blogs.epfl.ch

CDSware

Jean

-Yve

s.Le

.Meu

r@ce

rn.c

h

To

CD

Sw

are

> Fe

atur

es >

Inte

rope

rabi

lity

or n

ot to

?

Cat

alog

uers

are

bec

omin

g “b

atch

ers”

!

CD

Sw

are

can

harv

est m

etad

ata

from

–O

AI s

ourc

es w

ith B

ibH

arve

st–

Non

OA

I sou

rces

can

be

“tran

sfor

med

” with

Bib

Con

vert,

allo

win

g fo

r exa

mpl

e lo

adin

g ba

cklo

g of

reco

rds

CD

Sw

are

prov

ides

its

met

adat

a vi

a –

OAI •

Rec

ords

can

be

priv

ate,

pub

lic a

nd “O

AI-p

ublic

”•

OA

I Set

s ca

n be

def

ined

usi

ng a

ny s

earc

h cr

iteria

•XM

L M

ARC

; XM

L D

ublin

Cor

e an

d m

ore…

–an

y qu

ery

that

is “O

AI-re

ady”

•Eg

: OA

I har

vest

er c

ould

har

vest

onl

y pa

pers

writ

ten

by E

llis, J

.•

Eg: O

AI h

arve

ster

cou

ld h

arve

st o

nly

title

fiel

ds

–C

omm

and

Line

Inte

rface

(AP

Is)

•Eg

: wee

kly

bulle

tin, e

-jour

nal s

ite a

nd m

any

appl

icat

ions

At C

ER

N, 9

2% o

f the

acq

uisi

tion

is d

one

thro

ugh

auto

mat

ic im

port

from

mor

e th

an 8

0 di

ffere

nt s

ourc

es -

not u

sing

OA

I (ye

t) in

mos

t ca

ses

Page 6: CERN Document Server Software - blogs.epfl.ch

CDSware

Jean

-Yve

s.Le

.Meu

r@ce

rn.c

h

From

Aut

hor D

eskt

op to

Lon

g Te

rm A

rchi

ve

Eac

h co

llect

ion

can

have

its

own

subm

issi

on p

olic

y–

Dire

ct s

ubm

issi

on–

Subm

issi

on w

ith m

onito

ring

–Su

bmis

sion

with

sim

ple

appr

oval

–Su

bmis

sion

with

pee

r rev

iew

/refe

reei

ng a

nd e

dito

rial b

oard

Eac

h co

llect

ion

can

have

its

own

reco

rd d

efin

ition

–M

etad

ata

field

s (m

anda

tory

, opt

iona

l, co

ntro

lled

at in

put t

ime…

)–

Full

text

form

ats

–R

evis

ed v

ersi

ons

Eac

h su

bmis

sion

has

its

own

proc

ess

man

agem

ent

–W

ith a

n H

TML

adm

inis

tratio

n in

terfa

ce–

To d

efin

e su

bmis

sion

scr

eens

–To

def

ine

actio

ns to

be

appl

ied

–Eg

: •W

hen

finis

hing

the

subm

issi

on o

f a v

ideo

tape

by

vide

o se

rvic

e, th

e la

bel i

s cr

eate

d (P

DF)

to b

e st

ick

on th

e ta

pe•

Whe

n a

note

is s

ubm

itted

by

ATL

AS

col

labo

ratio

n, th

e op

tion

of s

endi

ng c

omm

ents

to it

s au

thor

is

pro

vide

d to

the

colla

bora

tion

mem

bers

CD

Sw

are

> Fe

atur

es >

Sub

mis

sion

At C

ER

N, o

n 45

0 di

ffere

nt

colle

ctio

ns, u

p to

50

diffe

rent

rule

s ar

e us

ed fo

r sub

mitt

ing

docu

men

ts.

Page 7: CERN Document Server Software - blogs.epfl.ch

CDSware

Jean

-Yve

s.Le

.Meu

r@ce

rn.c

h

From

the

Arc

hive

to th

e en

d-us

er

Goo

gle-

like

spee

d fo

r up

to 1

,000

,000

reco

rds

–W

eb A

pplic

atio

n se

rver

D

B se

rver

–D

B in

suffi

cien

t: in

-hou

se p

erfo

rman

ce-d

riven

inde

x de

sign

–Fa

st m

arsh

allin

g &

fast

set

inte

rsec

tions

:qu

ery

no.h

itsse

arch

tim

e•

cern

223,

843

0.07

sec

•of

43

9,79

3 0.

07 s

ec•

of c

ern

109,

635

0.10

sec

•of

cer

nth

e th

is

11,9

40

0.17

sec

Com

bine

d m

etad

ata/

fullt

ext/r

efer

ence

sea

rch

–Eg

: titl

e:hi

ggs

or re

fere

nce:

higg

sor

fullt

ext:h

iggs

Mul

ti-st

age

sear

ch g

uida

nce

syst

emP

erso

naliz

atio

n: b

aske

ts, e

mai

l ale

rtsN

avig

able

col

lect

ion

trees

–Pr

imar

y an

d V

irtua

l orth

ogon

al v

iew

sIn

tern

atio

naliz

atio

n: m

ulti-

lang

uage

inte

rface

CD

Sw

are

> Fe

atur

es>

Sea

rch

“Whe

n it

was

pro

clai

med

that

th

e Li

brar

y co

ntai

ned

all b

ooks

, th

e fir

st im

pres

sion

was

one

of

extra

vaga

nt h

appi

ness

.”B

orge

s

Page 8: CERN Document Server Software - blogs.epfl.ch

CDSware

Jean

-Yve

s.Le

.Meu

r@ce

rn.c

h

Long

Ter

m A

rchi

ve ?

CD

Sw

are

at C

ER

N–

“Cer

tifie

d In

form

atio

n Sy

stem

” (C

IS)

–C

onsi

dere

d as

a lo

ng te

rm e

lect

roni

c ar

chiv

e–

Hos

ts th

e of

ficia

l CE

RN

Arc

hive

sM

AR

C21

bas

ed: L

OC

sta

ndar

d–

XML

MA

RC

is th

e in

tern

al re

pres

enta

tion

of C

DS

war

ere

cord

sR

ecor

ds d

elet

ion

polic

y–

Rec

ord

IDs

neve

r cha

nge

Full

text

aut

omat

ical

ly c

onve

rted

to P

DF

–C

ER

N C

onve

rsio

n se

rver

can

be

plug

ged

in (G

NU

GPL

)

Dig

ital c

onte

nt d

isse

min

ated

… v

ia O

AI !

CD

Sw

are

> Pr

eser

vatio

n

“Wha

t the

Inte

rnet

nee

ds is

an

old

fash

ione

d lib

raria

n”(h

p)

Page 9: CERN Document Server Software - blogs.epfl.ch

CDSware

Jean

-Yve

s.Le

.Meu

r@ce

rn.c

h

-650

000

diff

eren

t rec

ords

-320

000

full

text

s-4

50 d

iffer

ent c

olle

ctio

ns

-125

,000

dis

tinct

hos

ts/c

lient

s in

200

3-1

2,00

0 di

stin

ct h

osts

/clie

nts

per m

onth

-120

,000

sear

ches

per

mon

th-5

,000

OA

I har

vest

ing

requ

ests

per

mon

th

{ {

Page 10: CERN Document Server Software - blogs.epfl.ch

CDSware

Jean

-Yve

s.Le

.Meu

r@ce

rn.c

h

Futu

re: a

Ful

l Ope

n D

igita

l Lib

rary

sys

tem

Ext

endi

ng tr

aditi

onal

libr

ary

syst

ems

Des

igne

d to

evo

lve

Sui

tabl

e fo

r mid

to la

rge

size

repo

sito

ries

(1M

recs

)D

edic

ated

sup

port

from

CE

RN

CD

S te

am

Use

d in

mor

e an

d m

ore

plac

es, (

or c

onsi

dere

d fo

r use

) by:

–U

nive

rsity

of M

isso

uri-C

olum

bia,

USA

–Fu

ndao

Osq

aldo

Cru

z (M

inis

try o

f Hea

lth) R

io d

e Ja

neiro

, Bra

silia

–IS

DN

-EN

SS

IB, F

ranc

e-M

ontre

al In

tern

atio

nal

-Bol

ogna

Uni

vers

ity, I

taly

–U

N P

opul

atio

n Fu

nd, N

ew Y

ork,

USA

–In

stitu

tode

inve

stig

acio

nsE

lect

rica,

Mex

ico

-Cas

alin

iLib

ri, It

aly

–H

BZ-

NR

W, G

erm

any

-SD

SC

, US

A–

Aris

totie

Uni

vers

ity o

f The

ssal

onik

i, G

reec

e–

RE

RO

: Con

sorti

um d

es b

iblio

theq

ues

publ

ique

sde

Sui

sse

Rom

ande

, Sw

itzer

land

–an

d: E

PF

Laus

anne

, Sw

itzer

land

Pac

kage

dow

nloa

ded

510

times

in 2

004

(on

mid

-may

)

CD

Sw

are

> C

oncl

usio

n >

Futu

re

Page 11: CERN Document Server Software - blogs.epfl.ch

CDSware

Jean

-Yve

s.Le

.Meu

r@ce

rn.c

h

CD

Sw

are

> C

oncl

usio

n >

Futu

r

Tow

ards

the

Pap

erle

ss O

ffice

for r

esea

rche

rs ?

We

still

hav

e so

me

wor

k le

ft…

Page 12: CERN Document Server Software - blogs.epfl.ch

CDSware

Jean

-Yve

s.Le

.Meu

r@ce

rn.c

h

Que

stio

ns?

Dis

tribu

tion

site

: http

://cd

swar

e.ce

rn.c

h

At C

ER

N: h

ttp://

cdsw

eb.c

ern.

ch

Sup

port

emai

l: cd

s.su

ppor

t@ce

rn.c

h

Sen

d m

e an

e-m

ail:

Jean

-Yve

s.Le

.Meu

r@ce

rn.c

h

Que

stio

ns?

Con

clus

ions

> Q

uest

ions

?