Post on 07-Jul-2020
1
Data Security: Leveraging Information
Dispersal as an Alternative to RAID and Replication
Chris GladwinCEO & Founder
Cleversafe
Data Storage is Transforming
*IDC Digital Universe Report
Methods used in the past 50 years won’t be adequate for the next 50 years
Over 90% of future storage = unstructured digital content*• Data storage is growing 10x every 5 years*
Numbers: 5 KB / record
Text: 500 KB / record
Images: 1,000 KB / picture
Audio: 5,000 KB / song
Video: 5,000,000 KB / movie
Hi-Res: 50,000,000 KB / HD movie, CT scan, etc.
Traditional Data New Data
2
Security Breaches Increasing
Sources: Identity Theft Resource Center, Bank info Security, Cisco Security Expert
Selected Examples
• Chase Bank, JPMorgan Chase - 2009
• Chase Bank is notifying customers that a tape used as a backup for system information is missing at a secure offsite storage unit. It may have included name, address and SSN. The information "can be read only with special equipment and software…“
• BlueCross BlueShield - 2009
• Between 57-68 hard drives are missing from the BlueCross BlueShield office in Eastgate, TN. BCBS announced that the theft affects about 2 million clients
• Virginia State Prescription Monitoring Program Records - 2009
• Hackers stole 8.3 million records, erased the originals and created an encrypted backup of VPMP's database. The records were patient records and 35 million drug prescriptions for their patients.
Industries represented by percentage of breaches
Data Security & ThreatsInformation Security CIA model:
4
Objectives Requirements Example ThreatsConfidentiality Data is never
accessed by unauthorized parties
• Key or credential mismanagement.• Accidental loss of media or devices.• Malicious access.• Remote compromise or theft.• Interception of packets.
Integrity Data is always accurate, and cannot be modified without authorization
• Bit errors in drives, memory, connections, or flash.
• Physical read and write errors.• Accidental data corruption.• Malicious data tampering.
Availability Data is always available to authorized parties
• Drive, location, server, and connection failures.
• Maintenance operations.• Denial of service attacks.
Replication increases Availability but making copies of data increases the risk of attack
3
Challenges with Replicated Storage1. Because systems often fail you need multiple copies in
different places• Total bits stored = 3.5 times the original data (Assumes RAID arrays)
• Total bandwidth consumed = 3.5 times the original data• Requires 3.5 times the equipment, cooling, power, floor space
2. Multiple copies also significantly decrease security• 3 copies = Seven security vulnerabilities (3 copies + 4 data moves)• Hundreds of millions of personal records lost
3. Replication does not protect from Silent Error Corruption and results in lower data integrity
RAID failing in petabyte scale
• Drive Sizes Decreasing Reliability• Chance of Unrecoverable Read Error (URE)
approaching size of drives• Rebuild times increasing
4
Cloud Storage Presents New Challenges
• Multi-terabyte to petabyte scale• Distributed across geographies• Housing unstructured content
– CT Scans, HD movies, photo libraries, etc.• Storage may be accessible via the
public internet– Can’t put a firewall around it
• Delivers Information Security –confidentiality, integrity, and availability – in order to aid adoption
Traditional Storage
Copies & Replication
Information Dispersal
Packet switching applied to storage
Applying the Internet to StorageIncreasing scale drove a transformation in data communications
Telephony
Circuit Switching
Internet
Packet Switching
System Growth
010
1100111011010101
1011010101
System Growth
1011010101 010 1100111011010101
Dispersal is to storage what packet switching is to networking
5
History of Information DispersalDeep Academic Roots:
1960 Reed-Solomon codes developed by Irving S. Reed and Gustave Solomon at the MIT Lincoln Laboratory.
1969 Elwyn Berlekamp and James Massey determine the Berlekamp-Massey decoding algorithm.
1979 Adi Shamir (MIT) publishes "How to share a secret“ in the Communications of the ACM
1989 Michael Rabin (Harvard) publishes “Efficient dispersal of information for security, load balancing, and fault tolerance”
1997 Ron Rivest (MIT) publishes “All-Or-Nothing Encryption and The Package Transform”
Information Dispersal 101
Digital Content
Site 1
Site 2
Site 3
Site 4
Real-time data retrieval is always bit-perfect as long as a threshold number of slices are available
8h$1 vD@- fMq& Z4$’ >hip )aj% l[au T0kQ %~fa Uh(k My)v 9hU6 >kiR &i@n pYvQ 4Wco
Digital Assets divided into slices using Information Dispersal Algorithms
8h$1 vD@- >hip )aj% l[au %~fa 9hU6 >kiR pYvQ 4Wco
Slices distributed to separate storage devices
Dispersal is packet switching applied to data storage
IDA
IDA
6
Information Dispersal Configuration
• Better Reliability – tolerates loss or unavailability of slices (n-k)• Better Security – tolerates k compromises
5 stores needed to break Confidentiality or Integrity
5 stores needed to break Availability
5-of-9 Configuration
10 stores needed to break Confidentiality or Integrity
7 stores needed to break Availability
126 combinations
10-of-16 Configuration 8008 combinations
Scale out capacity and performance independently
Information Dispersal Seamless Access
12
SITE 1 SITE 2 SITE 3 SITE nStorage nodes
Access layer
Protocols NAS protocols
JAVA SDK
Object Access
Info. Dispersal routers Direct application integration
Object Store
Dispersal
Massive content distribution with edge clients
Object Storage delivers scalability, efficiency and mobility
REST/HTTP, FTP
File access
Block Store
7
Information Dispersal for Cloud Storage
• Typically, multi-site configuration with slices residing across 3-4 data centers
• Geographic redundancy and availability are achieved without the overhead of replication
DATA CENTER 1
DATA CENTER 2
DATA CENTER 3
DATA CENTER 4
Access device
Slices stored on each node –not copies of data
Examining IDA methods
• Information Dispersal is Forward Error correction techniques (AKA Reed Solomon) that form nsegments where m are needed to recreate the data (m of n)
• Look for approaches that transform data so that it doesn’t represent the original data to guarantee security– Example: Credit card number with 4 of 6 configuration
Visible Data IDA Method Secure IDA Method
5466 1610 4539 4439
5466 1610 «þTE
4439 4439 NIy^
1fLIÇ øÐ1â @Cåâ
d6=W Qµ©7 SQí&
8
Information Dispersal Improves Reliability
Annual Chance of Data Loss in a 1,000 Disk System
Prob
abili
ty o
f D
ata
Loss
Storage Efficiency – 1 PB usable example
16
9
Information Dispersal Improves CostsCapacity Optimized
Traditional Storage (low cost RAID 5)
Information Dispersal System
Number of Extra Copies 2 plus 20% RAID overhead None (10 of 16 Dispersal)
Expected Data Integrity 6 nines over 5 years 12 nines over 5 years
Raw Storage Capacity 2,873 TB 960 TB
Usable Storage Capacity 600 TB 600 TB
IDC AVERAGE END USER PRICE
INFO. DISPERSAL END USER PRICE
$ per TB Raw Capacity $ 1,090* $ 732 (incl. commodity hardware)
$ per TB Usable Capacity
$ 5,219 ** $ 1,172
Price as Configured $3,131,352 $ 702,933
Electrical Power $ 97,606 $ 14,709
Space $ 124,488 $ 18547
Total Cost of Ownership (year 1)
$ 3,353,446 $ 736,189
* Traditional Storage $/TB from IDC 2009 cost for Capacity Optimized Storage** Assumes 33% physical storage increase from non-virtualized storage containers
1/5 the cost
IDA – Ideal for Cloud Storage
• Scales to Petabytes with distributed architecture
• End users are in control of data since it exists only where and when they want it to
• Works in public internet since slices are transformed from actual data into unrecognizable form
10
Thank you
cgladwin@cleversafe.com