Post on 12-Jan-2016
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT 1
Content Addressed Storage
Chapter 9
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 2
Chapter Objective
Upon completion of this chapter, you will be able to:
• Describe CAS, fixed content and archives, traditional storage solutions for archive
• Describe the features and benefits of a CAS based storage strategy
• List the physical and logical elements of CAS• Describe the storage and retrieval process for CAS data
objects• Describe the best suited operational environments for CAS
solutions
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 3
Lesson: CAS Overview
Upon completion of this lesson, you be able to:• Define fixed content• Describe traditional archival solutions and its
shortcoming • Define Content Addressed Storage (CAS)• List benefits of CAS
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 4
What are Fixed Content and Archives
Electronic Documents• Contracts, claims, etc.• E-mail and attachments• Financial spread sheets• CAD/CAM designs• Presentations
Digital Records• Documents– Checks, securities trades– Historical preservation
• Photographs– Personal / professional
• Surveys – Seismic, astronomic,
geographic
Digital Assets Retained For Active Reference And ValueDigital Assets Retained For Active Reference And Value
Leverage Historical Value
Improve Service Levels
Generate New Revenues
Rich Media• Medical– X-rays, MRIs, CTI
• Video– News / media, movies– Security surveillance
• Audio– Voicemail– Radio
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 5
Challenges of Storing Fixed Content
• Fixed content is growing at more than 90% annually– Significant amount of newly created information falls into this
category – New regulations require retention and data protection
• Often, long-term preservation is required (years-decades)• Simultaneous multi-user online access is preferable to
offline storage• Need faster access to fixed content• Need for location independent data, enabling technology
refresh and migration• Traditional storage methods are inadequate
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 6
Traditional storage solutions for Archive
• Three categories of archival solution are:– Online, nearline, and offline based on the means
of access• Traditional archival solution were offline– Traditional archival process used optical disks and
tapes as media for archival– An archive is often stored on a Write Once Read
Many (WORM) device, such as a CD-ROM
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 7
Shortcomings of Traditional Archiving Solutions
• Tape is slow, and standards are always changing • Optical is expensive, and requires vast amounts
of media• Recovering files from tape and optical is often
time consuming• Data on tape and optical is subject to media
degradation• Both solution require sophisticated media
managementCAS has emerged as an alternative to traditional
archiving solutions
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 8
What is Content Addressed Storage (CAS)
• Object-oriented, location-independent approach to data storage
• Repository for the “Objects”• Access mechanism to interface with repository• Globally unique identifiers provide access to
objects
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 9
Benefits of CAS
• Content authenticity• Content integrity• Location independence• Single-instance storage (SiS)• Retention enforcement• Record-level protection and disposition• Technology independence• Fast record retrieval
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 11
Lesson Summary
Key points covered in this lesson:• CAS Definition• Challenges of Storing Fixed Content• Shortcomings of Traditional Archiving
Solutions• Benefits of CAS
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 12
Lesson: CAS Architecture
Upon completion of this lesson, you will be able to:
• Describe CAS architecture• Describe Physical and logical elements of CAS• Describe data storage and retrieval process in
CAS environment• CAS examples
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 13
Physical Elements of CAS• Storage devices (CAS Based)– Storage node– Access node
• Servers (to which storage devices get connected)
• Client
Server
Private LAN
Storage Nodes
Access Nodes
CAS System
IP
API
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 14
CAS Terminology
• Application Programming Interface (API)– A set of function calls that enables
communication between applications or between an application and an operating system
• Binary Large Object (BLOB)– The Distinct Bit Sequence (DBS) of user data
represents the actual content of a file and is independent of the filename and physical location
API
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 15
CAS Terminology (Cont.)
• C-Clip– A package containing the user's data and associated
metadata– C-Clip ID (C-Clip handle or C-Clip reference) is the CA
that the system returns to the client application• Content Address (CA)
– An identifier that uniquely addresses the content of a file and not its location. Unlike location-based addresses, content addresses are inherently stable and, once calculated, they never change and always refer to the same content
• C-Clip Descriptor File (CDF)– The additional XML file that the system creates when
making a C-Clip. This file includes the content addresses for all referenced BLOBs and associated metadata
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 16
API
How CAS Stores a Data Object
Unique Content Address is calculated
Client presents data to API to be archived CAS System
Client
Application Server
CDF
C-Clip(Object)
Object is sent to Centera via Centera API over IP
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 17
API
How CAS Stores a Data Object
Unique Content Address is calculated
Client presents data to API to be archived CAS System
Client
Application ServerObject is sent to Centera via Centera API over IP Object
Centera validates the Content Address and stores the object
Acknowledgement returned to application
Clip ID is retained and stored for future use
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 18
Application Server
Client
CAS System
Object is needed byan application
1 CAS authenticatesthe request and
delivers the object
4
Application findsContent Address of
object to be retrieved
2 Retrieval request issent to the CAS via
CAS API over IP
3
How CAS Retrieves a Data Object
API
C-Clip ID
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 19
CAS Features
• Features available with most CAS systems are:– Integrity checking – Data protection
• Local replication • Remote replication
– Load balancing – Scalability – Self-diagnosis and repair – Report generation and event notification – Fault tolerance – Audit trails
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 20
Example 1: CAS Healthcare Solution
• Each X-ray image ranges from about 15MB to over 1GB• Patient record is stored online for a period of 60-90
days• Beyond 90 days patient records are archived
Data Stored on CAS
Patient Studies
Stored locally for Short-Term Use
(60 Days)
Hospital
CAS SystemApplication Server
API
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 21
Example 2: CAS Financial Solution
• Check image size is about 25KB• Check imaging service provider may process 50–
90 million check images per month• Checks are stored online for a period of 60 days• Beyond 60 days data is archived
Bank
CAS SystemApplication Server
API
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 22
Lesson Summary
Key points covered in this lesson:• CAS architecture• Physical and logical elements of CAS• CAS storage and retrieval process• CAS solution examples
ISMDR:BEIT:VIII:chap 6:Madhu N PIIT - 23
Concept in Practice – EMC Centera• Centera Architecture– Based on RAIN (Redundant Array of Independent
Node)• Access Node• Storage Node
Access/Storage Nodes
1 2 3 4 5 6 4
3
6
1
5
2
Private LAN
Storage Nodes
Content Mirrored Content
Power Rails
EthernetSwitch
EthernetSwitch
LAN
To Server