Cloud storage with · 2017-12-14 · Cloud storage with Apache jclouds Andrew Gaul 19 November 2014...

Post on 26-Jun-2020

3 views 0 download

Transcript of Cloud storage with · 2017-12-14 · Cloud storage with Apache jclouds Andrew Gaul 19 November 2014...

Cloud storage withApache jclouds

Andrew Gaul

19 November 2014

http://jclouds.apache.org/http://gaul.org/

1 / 29

OverviewWhat is Apache jcloudsWhat is object storageBasic conceptsHow to put and get objectsAdvanced topics

2 / 29

What is Apache jcloudsInterface with public andprivate cloud services via JavaProvider-specificimplementations, e.g.,Amazon S3, OpenStack SwiftCross-provider, portableabstractions, e.g., BlobStoreSupports all major providers

3 / 29

Why object storageScale out to billions of objectsand petabytes of dataLower cost, higherperformance, better reliabilityApplications and tenants canshare one object storeMany competitive public andprivate cloud offerings

4 / 29

Why not object storageMust rewrite existingapplications against newinterfacesDistributed systems introducecomplexity and additionalfailure modesLacks features that databasesand filesystems have

5 / 29

What is object storageDistributed system of storagenodesSupports narrow interface:

Create and deletecontainersPut, get, and delete objectsList objects in a containerModify and retrievemetadata 6 / 29

Example applicationsStore photos and videos for asocial networking siteArchive large data sets, e.g.,scientific measurements

7 / 29

Basic conceptsAPIs and ProvidersRegionsContainersObjects

8 / 29

APIs and Providers (1)API defines how tocommunicate with a cloudProvider specializes a givenAPI, e.g., regionsSupport major APIs: Atmos,Azure, Google, S3, SwiftSupport many providers:Glacier, HP, Rackspace

9 / 29

APIs and Providers (2)

10 / 29

RegionsMost public providers offermultiple geographic regionsUsually associate a containerwith a given regionSome providers can replicatebetween regionsautomatically, others offer aper-object cross-region copy

11 / 29

ContainersContainers can contain manynamed objectsSome providers have partialdirectory supportSome providers allow public-read access containers

12 / 29

ObjectsObjects have a name, streamof bytes, and metadata mapMust write entire stream ofbytes but can read rangesMetadata map includesstandard HTTP headers, e.g.,Content-Type: text/plain andarbitrary user headers, e.g., x-amz-meta-foo: bar 13 / 29

Example put and gettry (BlobStoreContext context = ContextBuilder .newBuilder("transient") .credentials("identity", "credential") .buildView(BlobStoreContext.class)) { BlobStore blobStore = context.getBlobStore(); ByteSource payload = ByteSource.wrap( new byte[] { 1, 2, 3, 4 }); Blob blob = blobStore.blobBuilder(blobName) .payload(payload) .contentLength(payload.size()) .build(); blobStore.putBlob(containerName, blob);

blob = blobStore.getBlob( containerName, blobName); try (InputStream is = blob.getPayload().openStream()) { ByteStreams.copy(is, os); }} 14 / 29

Consistency (1)Some blobstores have weaksemantics and can returnstale data for some timeputBlob(name, data1),putBlob(name, data2),getBlob(name) can returndata1 or data2

15 / 29

Consistency (2)Applications may need retryloops or other fallback logicS3 (standard region) and Swifthave various forms ofeventual consistencyAtmos, Azure, and GCS havestrong consistency, more likea file system

16 / 29

Data integrity (1)Provider guarantees objectintegrity at-rest via contenthashApplication can guaranteeobject integrity on-the-wire byproviding and verifyingContent-MD5 headerGuava has useful helpers

17 / 29

Data integrity (2)Guarantee integrity duringputBlob:

ByteSource payload = ByteSource.wrap( new byte[] { 1, 2, 3, 4 });Blob blob = blobStore.blobBuilder(blobName) .payload(payload) .contentLength(payload.size()) .contentMD5(payload.hash(Hashing.md5())) .build();blobStore.putBlob(containerName, blob);

This reads InputStream twice 18 / 29

Data integrity (3)Guarantee integrity duringgetBlob:

Blob blob = blobStore.getBlob(containerName, blob);HashCode md5 = HashCode.fromBytes(blob.getMetadata() .getContentMetadata().getContentMD5());try (HashingInputStream his = new HashingInputStream( blob.getPayload().openStream()), Hashing.md5()) { ByteStreams.copy(his, os); if (md5.equals(his.hash())) { throw new IOException(); }}

19 / 29

Large object sizesSome objects may not fit inmemory or in a byte[]Non-repeatable payloads, e.g.,InputStreamRepeatable payloads, e.g.,Guava ByteSource, can retryafter network failure

20 / 29

Multi-part uploadSome objects are too large fora single put operation, e.g.,AWS S3 > 5 GB, Azure > 64 MBjclouds abstracts these detailsvia:

Blob blob = ... blobStore.putBlob(containerName, blob, new PutOptions().multipart());

21 / 29

Many objectsAmazon S3 supports anarbitrary number of objectsper containerSwift recommends max500,000 objects per containerAtmos recommends max65,536 objects per directorySolution: shard over multiplecontainers or directories 22 / 29

URL signing (1)Most object stores allowcreating a URL which yourapplication can vend toexternal clients

client

jclouds1. request URL

blobstore3. access object

2. vend signed URL

23 / 29

URL signing (2)URL is cryptographicallysigned; allows time-limitedaccess to a single objectClients interact directly withthe object store, removingyour application as thebottleneck

24 / 29

Amazon GlacierOptimized for storage, notretrievalRead requests can takeseveral hours to completeLeast expensive public cloudprovider, often cheaper thanprivate cloud

25 / 29

Local blobstoresIn-memory (transient)appropriate for unit testingFilesystem has productionuses with limited number ofobjects or quantity of dataRemote clients requirerunning a HTTP server andimplementing authorization

26 / 29

jclouds-cliCommand-line tool useful foradministrative and debuggingtasks:

jclouds blobstore container-list \ --provider aws-s3 \ --identity $IDENTITY \ --credential $CREDENTIALjclouds blobstore write \ --provider aws-s3 \ --identity $IDENTITY \ --credential $CREDENTIAL \ $CONTAINER_NAME $OBJECT_NAME $FILE_NAME

27 / 29

Referenceshttps://github.com/jclouds/jclouds-exampleshttps://github.com/andrewgaul/s3proxyhttp://gaul.org/object-store-comparison/

28 / 29

Thank you!http://jclouds.apache.org/

http://gaul.org/

29 / 29