(STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

62
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Henry Zhang, Senior Product Manager, Amazon Glacier October 2015 Amazon Glacier Deep Dive STG312

Transcript of (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Page 1: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Henry Zhang, Senior Product Manager, Amazon Glacier

October 2015

Amazon Glacier Deep Dive

STG312

Page 2: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Audio archives – SoundCloud

• World’s leading social sound platform

• Audio files transcoded and stored in multiple formats

• Stores PBs of data

• Transcoded files served from Amazon S3

• Originals moved to Amazon Glacier for long-term retention

Page 3: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Video archives – Sony Media Cloud (Ci)

Amazon

Glacier

Page 4: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Tape replacement – King County

• Most populous county in Washington State

• Replace tape solution for backup from 17 agencies

• Meet compliance requirement

• Saved $1MM in first year, no more tape refresh or

management churn

Page 5: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Archive:

Data retained for the long term,

for compliance or potential

future reference

Data archiving needs are growing everywhere

• Media assets, 4K, 8K

• Health care / Life sciences

• Financial services

• Regulated industries

• Oil and gas / Geospatial

• Digital preservation

• Long-term backups

• Logs

Page 6: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Traditional archiving approaches

• Tape silos / Tape libraries

• Tape drives (LTO-X / DLT / etc.)

• Virtual tape libraries (VTLs)

• Tape out / Vaulting

• Specialized software & personnel

Page 7: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

How can Amazon Glacier help with your archival?

Metered usage:

Pay as you go

No capital investment

No commitment

No risky capacity planning

Avoid risks of physical

media handling

Control your

geographic locality for

performance and

compliance

Page 8: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier is a low-cost storage service for

archival data with long-term retention requirements.

$0.007/GB per month 3-5 hour data retrievalFinancial records

Medical PACs images

High Res Media Assets

Page 9: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

How can Amazon Glacier help with your archival?

Extremely low-cost archive storage service, starting at $0.007 GB/mo

Allows you to retrieve data within 3-5 hours

99.999999999% of durability (7 orders of magnitude higher than 2 copies of tape)

No data migration, no hardware/infrastructure investments

Infinite scale and pay for what you use

Access to on-demand compute resource on AWS

Page 10: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Getting started – key concepts

• Account – Access AWS services, view billing/usage, manage security

• Vaults – Container for archives, up to 1000 vaults per account

• Archives – Files and records, write-once, 40TB max, unlimited archives

• Inventory – Cold index of archive properties refreshed every 24 hours

Page 11: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – 3 ways to Access

•Direct Glacier API/SDK

•S3 lifecycle integration

•Third party tools and gateways

Page 12: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier concepts: Uploading data

Create vault (films)1

Configure access policies2

ArchiveApp user policy

Effect:Allow

Resource:

arn:aws:glacier:<accountId>:vaults/Films

Action: glacier:UploadArchive

3 Upload archivesUploadArchive(data) ->

Archive ID

Page 13: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier concepts: Retrieving data

Initiate JobArchiveId: AE99F…

Vault: Films -> Job ID

1

3-5 hours for job completion2

3 Job completion notification

4 Download output

Page 14: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Amazon S3 lifecycle archival

• Seamlessly move data from Amazon S3 to Amazon Glacier

• Automated lifecycle rules

• Transition based on object age or predefined date

Page 15: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Backup software integration

• CommVault – Native Integration

with Amazon S3 & Amazon Glacier

• Deduplication & encryption

• Single console management

Amazon S3 Amazon Glacier

Page 16: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Third-party tools and gateways

•Consumer grade: less than $50

• Example: Cloudberry, FastGlacier, Arq (Haystack Software)

•Small / medium business: $500 - $1,000

• Example: Synology, Veeam, QNap

•Enterprise grade gateway (price varies)

• Example: NetApp AltaVault

Page 17: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Best practices – Prepare your data

Page 18: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Use Archive descriptions

• Use Archive description field for

metadata.

• If local index is corrupted or

destroyed, use archive description

to reconstruct critical mappings.

• For example, create index entry,

add primary key to archive

description on upload.

Page 19: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Small objects and object size overhead

• Every archive has 32KB of associated overhead

and some operations are charged per request

• For archive size of 3.2MB ~1% cost overheads

• For 1KB archive, 97% of cost would go to

overhead

• Solution is aggregation – recommend minimum

size on the order of at least MBs

Page 20: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Archive aggregation

Checksum 2

Checksum 1

File 2

Checksum 3

. . .

Local index

File 1 offset

File 1

File 2 offset

File 3 offset

Index/directory

Checksum & metadata

Checksum & metadata

Checksum & metadata

Archive

Page 21: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Best practices – Optimize upload

Page 22: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Best practices: Multipart uploads

Improve throughput, reliability, and get idempotency with multipart uploads

1. InitiateMultipartUpload(partSize) → uploadId

2. UploadPart(uploadId, data)

3. CompleteMultipartUpload(uploadId) → archiveId

Arc

hiv

e

Parallel Uploads

Parts

Page 23: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Best practices: Data ingestion options

AWS Direct

ConnectDedicated bandwidth between

your site and AWS

InternetTransfer data in a secure SSL tunnel

over the public Internet

AWS Import/Export

SnowballPhysical transfer of media into

and out of AWS

Page 24: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Best practices – Cost management

Page 25: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Data retrieval policies

• Provides transparency and cost control for data retrievals

• Governs all retrieval activities for an account in a region

• Synchronously accept/reject each retrieval request

• Accounts for inflight retrieval operations

Page 26: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Data retrieval policies

Page 27: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Data retrieval policies

Page 28: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Data retrieval policies

Page 29: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Data retrieval policies

Page 30: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Cost allocation with vault tags

Page 31: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Best practices – Security and compliance

Page 32: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier – Audit logging with AWS CloudTrail

• Enable AWS CloudTrail in

console

• Control plane events –

Vault activities

• Data plane events –

Archive activities

Page 33: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault access policies

• Manage access to a Vault in a single location – single IAM policy

– Grant/revoke access to internal business units/teams

– “Marketing_Vault” has a distinct access policy than “DevOps_Vault”

• Easily manage cross-account access for your business partner

– Simply add a section for your business partner in the same policy

Page 34: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier Vault Lock allows you to easily

set compliance controls on individual vaults and

enforce them via a lockable policy.

Time-based retention

MFA Authentication

Controls govern all

records in a Vault

Immutable policy

Two-step locking

Compliance Storage with Vault Lock

Page 35: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock for compliance storage

• Non-overwrite, non-erasable records

• Time-based retention with “ArchiveAgeInDays” control

• Policy lockdown (strong governance)

• Legal hold with vault-level tags

• Configure optional designated third-party access and grant

temporary access

Page 36: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Example control: 1 year record retention

Page 37: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Example control: 1 year record retention

Page 38: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock: Two-step locking

Page 39: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Legal hold with vault-level tags

Page 40: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Example control: Legal hold

Page 41: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault lock best practices

Page 42: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault access policy• Can be updated/deleted

Vault lock policy• Lockable/Immutable policy

• Cannot be updated/deleted after lockdown

Use vault access policy to:• Designate third-party access

• Grant temporary read permissions when necessary

Use vault lock policy to:• Deploy regulatory controls such

as records retention

• Enforce data access through multi-factor authentication only

Compliance/Governance Flexibility

Using vault lock policy with vault access policy

Page 43: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 44: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 45: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 46: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 47: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 48: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 49: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 50: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 51: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 52: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 53: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 54: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 55: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 56: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 57: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 58: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 59: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Vault Lock in the Glacier Console

Page 60: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Amazon Glacier received a third-party assessment

from Cohasset Associates on how Amazon Glacier

with Vault Lock can be used to meet the

requirements of SEC 17a-4(f) and CFTC 1.31(b)-(c).

Page 61: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Thank you!

Page 62: (STG312) Amazon Glacier Deep Dive: Cold Data Storage in AWS

Remember to complete

your evaluations!