New Value from the DSpace Foundation and Fedora Commons Michele Kimpton and Sandy Payette Executive...

Post on 16-Jan-2016

214 views 0 download

Tags:

Transcript of New Value from the DSpace Foundation and Fedora Commons Michele Kimpton and Sandy Payette Executive...

New Value from the DSpace Foundation and Fedora Commons

Michele Kimpton and Sandy PayetteExecutive Directors

DuraSpace

Social and Technical Forces (2000-present)Waves of Repository-Enabled Applications

• Institutional Repositories• Digital Collections

• Digital Libraries• Collaborative Spaces and “Web 2.0”

• Scholarly and Scientific Infrastructure• E-Research• Data (archiving, linking, sharing)

Implications for our future work

mor

e d

istr

ibut

ed

mor

e co

llabo

rativ

e

mor

e w

eb-o

rient

ed

mor

e op

en

mor

e in

tero

pera

ble

Emergence of Infrastructure

Source: Understanding Infrastructure: Lessons for New ScientificInfrastructure, http://deepblue.lib.umich.edu/handle/2027.42/49353

Systems

Integrate componentsCentral controlDedicated/specialized gatewaysMore closedMore preconceived

Integrate systemsDistributed controlGeneric gatewaysMore openMore reconfigurable

Networks

Source: Francine Berman, Got Data? A Guide to Data Preservation in the Information Age, pp 50-56

December 2008

page 55

page 53

Sandy Payette
test comment

History: DSpace and Fedora

• Two open source repository systems– DSpace:

• End-user application and repository• Turn key system providing easy out-of-box

– Fedora: • Web services (repository and supporting services)• Flexible, modular, and scalable

• Enabling technology supporting…– scholarship, science, culture, education– open access– preservation and archiving

DSpace and Fedora Installations

Largest share of open repositories worldwide… over 700 institutions tracked in our registries

UniversitiesResearch CentersLibrariesArchivesCultural HeritageGovernmentMore…

DSpace Foundation and Fedora Commons501(c)(3) non-profit organizations

Web APIsStorage Abstraction

Architecture Strategy

SWORD DepositMS Word Plug-In

DuraSpaceFuture Joint Offerings

Business StrategyCommunication/Outreach

Progression of Partnership

http://blogs.the451group.com/opensource/

Goals of Strategic Partnership• Stewardship:

– Support and align open source development communities for DSpace and Fedora

– Keepers of the cause (durability + access)• Innovation:

– Think beyond existing platforms – New strategic directions for repositories– New products and services

• Sustainability: – Devise business models that fit our sector– Services that generate revenue for non-profits

What About the Cloud?

An emerging architecture in which data and applications reside in cyberspace,

allowing users to access via the internet(Pew Internet 9/08)

A style of computing where massively scalable IT-related capabilities are provided “as a service” using Internet

technologies to multiple external customers. (Gartner, 6/08).

Types of Cloud Services

• Software as a Service (SAAS) – e.g. , Google Apps

• Cloud Computing– e.g., Amazon Elastic Compute Cloud (EC2)

• Cloud Storage– e.g., Amazon Simple Storage Service (S3)

Cloud Services

Vision: Federated Repositories and Cyberinfrastructure

DuraSpace

Heaven

DuraSpace PropositionTrust and durability in the cloud

What have we learned from our users?

Focus Groups

Site Visits

Forums

Problems

• Tools and processes unproven• Limited IT support• Capital expenditures limited• Task can be overwhelming ( replication,

migration, emulation ect.)

Preservation important but difficult to implement

Problems

• Systems not interoperable• Heterogeneous applications/platforms• Lack of commons standards• Inelastic compute capability

Barriers to making content more accessible and useful to researchers

Advantages – Cloud Services• Flexibility• Scalability• Pay for use• Easy to implement• Cost

Cost

Public cloud providers drive cost down through scale, location and virtualization technology

Large Data centers(50k+) can achieve 5 to 7 times costs savings over Medium Data Centers(1,000)

*Hamilton, J Internet-Scale Service Efficiency (Sept 08)

Technology* Cost Med DC Cost Large DC

Network $95 per Mbit/sec/mo $13 per Mbit/sec/mo

Storage $2.20 per Gbyte/mo $.40 per Gbyte/mo

Admin 140 servers/admin >1000 servers/admin

Issues• Security• Transparency• Data lock in• SLA’s• Trust

DuraSpaceTrusted management of and access to

durable digital assets in the cloud

DuraSpaceMediating

Service

DuraSpace- Notional Architecture

Architectural view

Core services-Preservation based

• Replicate to multiple storage providers• Replicate to multiple geographic areas• Be able to manage content and services

through web based “Dashboard”• Includes integrity checking and monitoring• “Pay for use” for services and storage

Technology Services• Build and run services on top of content stored in the

cloud– Search– Aggregation– Streaming– Migration– Hosting

• Enable others to build services/apps on top of content

Use Cases:DuraSpace with Cloud Storage

• Online backup for text, images, datasets, video, audio

• Preservation-Multiple copies, geographies, administrations

• Temporary or permanent project storage

Use cases:DuraSpace with Cloud Compute

• Streaming service for video• JPEG2000 image engine• Indexing and other processing heavy jobs• Staging area for repository ingest• Repositories in cloud• Data and text mining over open data• Aggregation and web 2.0 tools on open content

and collections

DuraSpace software

• Open source - apache license• Open core• Run Your Own: Private clouds, University

consortia• Extensible: Research partners

Critical success factors

• Ease of use- simplicity• Trusted partner for end user• Cost effective• Scalable/Flexible• Can establish key partnerships with service

providers• Can build community of developers and users

Timeline• Identified initial cloud partners• Identified initial pilot partners• Defined initial requirements• Initial open source release -Q3 2009• Begin pilot- Fall 2009• Extensions available for repository platforms- Q1 2010• Roll out to Repository community-Q1 2010• Launch production service Q2 2010

Initial capabilities• Replication, up to three providers

(including local store)• Web based “Dashboard”• Data integrity checking and monitoring• Can push content from DSpace/Fedora

repository platform• Integrated billing• Compute capability• A few initial compute services TBD

Listen…

Sandy and Michele’s DuraSpace webinar

http://www.education-webevents.com/