1
<Insert Picture Here>
Long-Term Archive and Digital Preservation at TACC Donna Harland Oracle Optimized Solutions: Solutions Architect June 20, 2011
3
<Insert Picture Here>
CHALLENGES OF TODAY’S ARCHIVE
4
Challenges of Today’s Archive
Challenge Results Bit Rot • Data Loss
• Data Corruption Obsolescence • Can no longer access the data or read the data Natural Disaster • Data Loss Economic Failure • Data access Loss; data loss Organizational Failure • Data access loss; data loss, inappropriate use Information Attack • Data corruption or loss Human Error • Data loss or data access loss
5
Challenges of Today’s Archive
Challenge Results Lack of context • Data is available but no access or pointers or
metadata Ambiguous IP State • Copyright • Licensing
• Loss of data access
Distribution and Dissipation
• Loss of data access
Migrations and Transitions • People (2-20yrs) • Software (5-10yrs) • Hardware (3-5yrs)
• Data loss and loss of data access
6
<Insert Picture Here>
CHARACTERISTICS OF ARCHIVE SOLUTIONS
7
Availability
• Searchable
• Retrievable – Dynamic access
– What went in is comes out
• Deliverable to new environments, in new contexts
• Over time… a VERY long time
8
Integrity
• Fixity of the original object – No data loss
– No data corruption
– No data “augmentation”
• Wholeness – Contains all of its essential bits
– Transformed content is documented
9
Authenticity
• Assure that an object is what it purports to be…
• Include a description of the object in its original state as well as transformations
• Include provenance – where an object came from and the chain of custody and processes from its point of origin
10
Reusability
• Collaboration
• May require the object in its original form or format
• May require a derived form, suitable for a specific purpose – Case study: what’s more useful, an image of a newspaper
page, or the full text of a newspaper page?
• Requires clear understanding of business purpose
11
Security
• Secure against leakage
• Secure against tampering
• A primary design consideration
• A vital element in trust
12
Sustainability
• Technically feasible & maintainable
• Economically viable and maintainable
• Organizational alignment and commitment
• Able to adapt – Technically: changes in technology, scale, have a migration
plan that is non-disruptive
– Economically: changes in costs, funding (recessions…)
– Organizationally: layoffs, staff changes, mergers, strategy shifts
13
Trustworthiness
• Perception of competence, security, long-term commitment
• Prerequisite for confidence by – Depositors
– Funders
– Content Consumers
14
<Insert Picture Here>
ARCHITECTING AN ARCHIVE SOLUTION
15
Data Archive Layers
Storage Archive
Manager
Flash Tape
Manage content
Data Preservation and
Content Management Applications
Disk
15
16
Preservation Mindset & Strategies
• Resist the temptation to think of preserved objects as “static” – Migrations, versions, audits & disseminations all require
constant attention
– New access to old data, old access to new data
– The content will not change but it’s home will
– Awareness of retention requirements
• Remember that preservation is a journey, not a destination
17
Technological Considerations
• Minimize dependencies – Encapsulate your metadata with your objects – Storage preservation should not depend on specific storage – Applications should not depend on specific storage
• Minimize affect of errors – Embrace redundancy – Embrace diversity
18
Design
• Don’t overspec; don’t overbuild – Design a scalable architecture – Build in ability to grow non-disruptively with customer demand
• Monolithic systems don’t meet requirements – Complex, expensive, inflexible – Migration costs can capsize you
• Components should not depend on each other but should be proven to work together
• Keep it simple; have an exit plan for every component
19
Know Your Designated Community - Who will be using the content?
- Is there data connectivity requirements?
- How will they be using the data?
- Latency
- Delivery formats
- Security - Offer (appropriate) access from the start - Remain flexible as the community changes and
grows
20
Basic Architecture of an Unstructured Data Archive Solution
• Application – Captures Data – Creates Content Metadata;
Optionally stored in DB – Stores Content in a File Store – Provides Search Engine – Provides data preservation
features
• Database Server – Content Metadata – Security – Improved search performance
• File Store
Application Database Server
Metadata
File Store
21
SAM QFS As The File Store
21
• SAM-QFS – Dynamically maintains
data on defined tiers of storage
– Dynamically stages data for access when requested by application
– Standard file access via FC, NFS, CIFS
Application
File Store
SAM-QFS Managed Tiered Storage
Database Server
22
Oracle Storage Appropriate for an Archive
SAM QFS File system and Metadata
• High Speed FC Drives • FC Access • High Availability
FC Array Storage
S6580
S6780
S6180
High Capacity Disk Storage
S6580
S6780
S6180
7720
7420
7320 7120
Disk Archive • SATA Drives • FC or IP access • High Capacity • High Availability
Tape and Libraries
SL8500 SL3000
LTO T10K
Tape Archive • T10KC
• Highest capacity
• DIV • LTO 5
23
Oracle Enterprise Content Management
• Content Management – Geared toward business data and workflow – Customizable for different data types
• Oracle Optimized Solution • Fully tested, integrated solution (HW, SW, Storage SW) • Expanding into industry data – Health Sciences – Media and Entertainment
24
What Solutions Integrated SAM QFS? • Third Party Applications and SAM QFS
− Scalable On-line Archive Repository (S.O.A.R.) from Moca/Arrow and their Channel Partners (see Mark Legott preso) − Sun tested and partner marketed − Uses Open Source Software Drupal and Fedora − Fully supported by Oracle partner for implementation and 1st call
− Ex Libris − New Zealand National Library implementation and validated solution
− Storage Resource Broker (SRB) − Customer implementation at DOD − Tight integration with SAM
− PACS Applications − Been in production in many sites since STK was STK
− Home-Grown-Application − Norwegian National Library “it just works” − 6PB under SAM management (1 on disk archive 2 on tape archive)
24
25
Questions..
25
26
We encourage you to use the newly minted corporate tagline “Hardware and Software, Engineered to Work Together.” at the end of all your presentations. This message should replace any reference to our previous corporate tagline “Hardware. Software. Complete.”
Top Related