Hosting huge amount of binaries in JCR

13
Hosting Huge Amount of Binaries in JCR Woonsan Ko Solution Architect, Hippo USA August 29, 2016 1

Transcript of Hosting huge amount of binaries in JCR

Page 1: Hosting huge amount of binaries in JCR

Hosting Huge Amount of Binaries in JCR

Woonsan KoSolution Architect, Hippo USAAugust 29, 2016

1

Page 2: Hosting huge amount of binaries in JCR

What’s this about?

● You wouldn’t want to store huge amount of binaries in JCR for various reasons (e.g. storage size, traffic concerns, etc).

● Hippo has implemented custom UI plugins to store binaries in external storages (e.g. S3, SFTP, etc.) in some projects.

● I’d like to introduce a new alternative as a standard Jackrabbit DataStore component (VFSDataStore), storing binaries in WebDAV, SFTP, etc.

2

Page 3: Hosting huge amount of binaries in JCR

Problems when storing huge binaries in JCR

● May cause huge database size, hard to maintain, backup, costly, etc.

● Big binary data can cause blocking in writing, no offloading in traffic.○ This might not be solved solely by repository-level

solution though.○ Application level handling or offloading in traffic will

be needed as well.

3

Page 4: Hosting huge amount of binaries in JCR

Jackrabbit DataStore*

4* https://wiki.apache.org/jackrabbit/DataStore

Page 5: Hosting huge amount of binaries in JCR

Jackrabbit DataStore (cont.)

Large Binary Store for performance, reducing disk usage.● Fast copy : only the identifier stored.● Storing and reading does not block others.● Objects in DataStore are immutable.● Hot backup is supported.● Clustering : all cluster nodes use the same DataStore.● OAK BlobStore supports Jackrabbit DataStores.● JackrabbitValue.getContentIdentity()

5

Page 6: Hosting huge amount of binaries in JCR

Jackrabbit DataStore (cont.)

Existing Jackrabbit DataStores● FileDataStore● S3DataStore● DbDataStore (the default option in Hippo project)

6

AbstractDataStore

FileDataStore DbDataStore CachingDataStore

S3DataStore VFSDataStore

<< DbDataStore >> << File System based DataStores >>

Page 7: Hosting huge amount of binaries in JCR

Jackrabbit DataStore (cont.)

7

CachingDataStore

+ setPath(String)+ setConfig(String)+ setCacheSize(long)+ setSecret(String)+ setCachePurgeTrigFactor(double)+ setCachePurgeResizeFactor(double)+ setMinRecordLength(long)+ setContinueOnAsyncUploadFailure(boolean)+ setConcurrentUploadsThreads(int)+ setAsyncUploadLimit(int)+ setUploadRetries(int)+ setTouchAsync(boolean)+ setProactiveCaching(boolean)+ setRecLengthCacheSize(int)

S3DataStore VFSDataStore

<!-- repository.xml -->

<DataStore class="org.apache.jackrabbit.vfs.ext.ds.VFSDataStore"> <param name="config" value="${catalina.base}/conf/vfs2.properties" /> <param name="secret" value="123456789" /> <param name="path" value="/data/datastore" /> <param name="cacheSize" value="68719476736" /> <param name="minRecordLength" value="1024" /> <!-- ... --></DataStore>

Page 8: Hosting huge amount of binaries in JCR

Current Approaches

● DbDataStore (the default in Hippo)

● Custom Upload/Picker UI Plugins with a different backend (e.g. SFTP or S3)

8

CMSCustom Plugins

S3 Storage

SFTP Storage

JackRabbit

SITE

Page 9: Hosting huge amount of binaries in JCR

New Approaches

● Jackrabbit S3DataStore

● Jackrabbit VFSDataStore (JCR-3975✦)

9

✦ https://issues.apache.org/jira/browse/JCR-3975✢ http://commons.apache.org/proper/commons-vfs/filesystems.html

JackRabbit S3DataStore S3BackendS3

Storage

JackRabbit VFSDataStore VFSBackend

SFTP Storage

WebDAV Storage

HDFS Storage

…✢

Page 10: Hosting huge amount of binaries in JCR

Comparisons

10

JR DbDataStore(default)

Custom UI Plugin JR S3 DataStore JR VFS DataStore

Reducing DB Size? No Yes Yes Yes

Secure? Yes Yes(S3, SFTP)

No to some (e.g. .gov),Yes to some

Yes(SFTP, WebDAV, etc)

Indexable? Yes No Yes Yes

JCR API based? Yes No Yes Yes

Cost Effective? Yes No Yes Yes

Offloading in Traffic? No Yes Possibly Possibly

Page 11: Hosting huge amount of binaries in JCR

Demo*

● https://github.com/woonsanko/hippo-davstore-demo○ Just follow README.md to build/run.

● VFSDataStore with WebDAV backend

● VFSDataStore with SFTP backend

11

Page 12: Hosting huge amount of binaries in JCR

FAQ

“You would only solve part of the problem. Writes will still block, no offloading of traffic, etc.”

● Right. Uploading / Downloading also depends on applications as well, not only on repository and backend.

● Possible to handle reads/writes in separate IO channels directly from/to the backend.

○ DataStore can give a content identifier.

JackrabbitValue.getContentIdentity(),which allows to infer binary content paths.

○ https://issues.onehippo.com/browse/CMS-10204 12

Page 13: Hosting huge amount of binaries in JCR

Summary

● VFSDataStore can give the most feasible, cost-effective and secure option in most projects when hosting huge amount of binaries with various backends such as WebDAV, SFTP, HDFS, etc.

● No custom UI plugins needed. Just use the default Hippo CMS gallery/asset folder tree and picker UI since it’s handled in the Jackrabbit DataStore backend.

13