RMLL visits at CERN – July 2012 What is it used for? Depositing Archiving Organizing Disseminating...

34

Transcript of RMLL visits at CERN – July 2012 What is it used for? Depositing Archiving Organizing Disseminating...

Two Examples of Open Source Software Developed at CERN: and

RMLL visits at CERN – July 2012

LOGO

Digital Library Software

http://invenio-software.org/

LOGO

What is it used for?• Depositing• Archiving• Organizing• Disseminating

• Any type of document~350GB of PDFs at CERN

~20TB of images and videos1M records

LOGO

What is

LOGO

‣ Integrated Digital Library / Repository software

‣ A platform of choice for managing documents in HEP

‣ also adopted in other fields (medium to big repositories)

‣ Web application

‣ Open-source GPL-2 project

‣ LAMP stack: Python (mostly), MySQL and Apache

‣ Based on open standardsMARCXML, OAI-PMH, OpenURL, OpenSearch, etc.

‣ Flexible, scriptable

LOGO

Invenio’s gears• Lots of Python, with a sprinkle of C and Lisp(!)• 630K lines of Python code• MySQL ISAM for storing data• Native indexing engine• Apache + mod_wsgi + mod_xsendfile

LOGO

Invenio’s History1954 CERN library starts paper dissemination of preprints (early Open

Access)

1965 First computers at CERN library to help with cataloging

1990 Electronic distribution of preprints via FTP

1993 CERN Preprint Server, web front-end of electronic preprint catalogue. Institutional repository

1996 CERN Library Server (weblib): added books, periodicals and "other material”.

2000 CERN Document Server: multimedia material, internal notes

2002 First public release of the software under GNU-GPL.Worldwide installations and collaborations

Open Access at CERN• “Consistent with the stated position of the Collaborations and the General Conditions applicable

to Experiments at CERN, every effort will be made to publish papers under Open Access conditions, as defined by the SCOAP3 initiative. As at the date of this document, the Creative Commons Attribution ("cc by") license meets these conditions.”

• OA at CERN has a long history, the CERN Convention of 1953 states: "...the results of its experimental and theoretical work shall be published or otherwise made generally available". 

LOGO

Our development Environment• Git distributed version control system• Trac for ticket tracking• VirtualBox + Vagrant for testing

deployment• We develop on SLC5/6 (based on

RHEL5/6), on Ubuntu, on Debian…

LOGO

Quality Assurance• Coding standards

• Eg. PEP8 (Style Guide for Python), etc.

• Documentation• "If the code and the comments disagree, then both are probably wrong."

– attributed to Norm Schryer

• Test suite

• ~1,000 unit/regression/web tests

• Security• XSS, CSRF, SQL injection, etc.

• Code review

• Kwalitee check: "measuring" quality• "It looks like quality, it sounds like quality, but it’s not quite

quality.”– CPAN Testing Service (quoting Michael Schwern)

LOGO

Our community

• 30 institutions worldwide• CERN + DESY + Fermilab + SLAC• EPFL …• ADS and arXiv joining forces• Translated so far into 26 languages• 45 committers (in the last year)• Free + Paid support

LOGO

An example installation

LOGO

• 1 Load balancer (HAProxy + Apache mod_proxy + mod_evasive)

• 5 Worker nodes:• 2 VMs for static files• 3 Real machines for Python handled requests

• 2 DB nodes (MySQL master + MySQL replica)• AFS distributed FS for backups and file storage• Sustained recent Higgs announcement load (230

requests per second with peaks of 800 req/s)

What’s next?• Werkzeug/Flask + Jinja2 + WTForms for the

web framework• SQLAlchemy for DB abstraction• Twitter Bootstrap + jQuery for the style• Optional Solr indexing

LOGO

Conference Management Software

http://indico-software.org

LOGO

• History and Features• Technologies• Development

LOGO

What is Indico ?• Web-based event organization• Archive of events metadata and related

documents (minutes, slides, etc)• Booking service and collaboration hub

• Rooms• Videoconference• Webcast

LOGO

What is Indico ?• Started as an European Project - 2002

• First time used in 2004

• In production at CERN: http://indico.cern.ch• And in >100 institutions around the world

• GSI, DESY, Fermilab,…• http://indico-software.org/wiki/IndicoWorldWide

• Free and Open Source

LOGO

Indico @ CERN• > 170.000 events• > 700.000 presentations• > 900.000 files

LOGO

Event Management with Indico• All kinds of events

LOGO

Managing Simple Events

LOGO

Managing Meetings

LOGO

Managing Conferences

LOGO

Managing Conferences• Full Lifecycle

LOGO

Managing Conferences

LOGO

Collaboration Hub• Room Booking

LOGO

Collaboration Hub• Collaboration service requests:

Videoconference, webcast, recording

LOGO

Technology• Python >2.6 + WSGI

• babel, webassets, pytz, zope.index, zope.interface, simplejson, suds, lxml, zc.queue, python-dateutil, pypdf, pyatom, reportlab, etc

• Mako 0.4.1+ as template engine• ZODB as underlying database (http

://www.zodb.org/)• Web frameworks:

• jQuery• Backbone.js

LOGO

Infrastructure

LOGO

Compatibility• Many browsers compatibility: IE8+, FF3.6+,

GChrome, Safari, etc• Working on mobile version

LOGO

Development Tools• Git as Control Version System• ~ Eclipse + PyDev• Unit and Selenium Test +

Jenkins (Continuous Integration Server)

• Sphinx for Documentation• Trac as Project Site• Github: http://github.com/indico• Transifex for i18n:

https://www.transifex.com/projects/p/indico/

What’s Next ?• Enhance the software: v1.0 end of 2012• Enlarge the community: more advertising

LOGO

Questions?

LOGO