Hosting huge amount of binaries in JCR
-
Upload
woonsan-ko -
Category
Software
-
view
790 -
download
0
Transcript of Hosting huge amount of binaries in JCR
Hosting Huge Amount of Binaries in JCR
Woonsan KoSolution Architect, Hippo USAAugust 29, 2016
1
What’s this about?
● You wouldn’t want to store huge amount of binaries in JCR for various reasons (e.g. storage size, traffic concerns, etc).
● Hippo has implemented custom UI plugins to store binaries in external storages (e.g. S3, SFTP, etc.) in some projects.
● I’d like to introduce a new alternative as a standard Jackrabbit DataStore component (VFSDataStore), storing binaries in WebDAV, SFTP, etc.
2
Problems when storing huge binaries in JCR
● May cause huge database size, hard to maintain, backup, costly, etc.
● Big binary data can cause blocking in writing, no offloading in traffic.○ This might not be solved solely by repository-level
solution though.○ Application level handling or offloading in traffic will
be needed as well.
3
Jackrabbit DataStore*
4* https://wiki.apache.org/jackrabbit/DataStore
Jackrabbit DataStore (cont.)
Large Binary Store for performance, reducing disk usage.● Fast copy : only the identifier stored.● Storing and reading does not block others.● Objects in DataStore are immutable.● Hot backup is supported.● Clustering : all cluster nodes use the same DataStore.● OAK BlobStore supports Jackrabbit DataStores.● JackrabbitValue.getContentIdentity()
5
Jackrabbit DataStore (cont.)
Existing Jackrabbit DataStores● FileDataStore● S3DataStore● DbDataStore (the default option in Hippo project)
6
AbstractDataStore
FileDataStore DbDataStore CachingDataStore
S3DataStore VFSDataStore
<< DbDataStore >> << File System based DataStores >>
Jackrabbit DataStore (cont.)
7
CachingDataStore
+ setPath(String)+ setConfig(String)+ setCacheSize(long)+ setSecret(String)+ setCachePurgeTrigFactor(double)+ setCachePurgeResizeFactor(double)+ setMinRecordLength(long)+ setContinueOnAsyncUploadFailure(boolean)+ setConcurrentUploadsThreads(int)+ setAsyncUploadLimit(int)+ setUploadRetries(int)+ setTouchAsync(boolean)+ setProactiveCaching(boolean)+ setRecLengthCacheSize(int)
S3DataStore VFSDataStore
<!-- repository.xml -->
<DataStore class="org.apache.jackrabbit.vfs.ext.ds.VFSDataStore"> <param name="config" value="${catalina.base}/conf/vfs2.properties" /> <param name="secret" value="123456789" /> <param name="path" value="/data/datastore" /> <param name="cacheSize" value="68719476736" /> <param name="minRecordLength" value="1024" /> <!-- ... --></DataStore>
Current Approaches
● DbDataStore (the default in Hippo)
● Custom Upload/Picker UI Plugins with a different backend (e.g. SFTP or S3)
8
CMSCustom Plugins
S3 Storage
SFTP Storage
JackRabbit
SITE
New Approaches
● Jackrabbit S3DataStore
● Jackrabbit VFSDataStore (JCR-3975✦)
9
✦ https://issues.apache.org/jira/browse/JCR-3975✢ http://commons.apache.org/proper/commons-vfs/filesystems.html
JackRabbit S3DataStore S3BackendS3
Storage
JackRabbit VFSDataStore VFSBackend
SFTP Storage
WebDAV Storage
HDFS Storage
…✢
Comparisons
10
JR DbDataStore(default)
Custom UI Plugin JR S3 DataStore JR VFS DataStore
Reducing DB Size? No Yes Yes Yes
Secure? Yes Yes(S3, SFTP)
No to some (e.g. .gov),Yes to some
Yes(SFTP, WebDAV, etc)
Indexable? Yes No Yes Yes
JCR API based? Yes No Yes Yes
Cost Effective? Yes No Yes Yes
Offloading in Traffic? No Yes Possibly Possibly
Demo*
● https://github.com/woonsanko/hippo-davstore-demo○ Just follow README.md to build/run.
● VFSDataStore with WebDAV backend
● VFSDataStore with SFTP backend
11
FAQ
“You would only solve part of the problem. Writes will still block, no offloading of traffic, etc.”
● Right. Uploading / Downloading also depends on applications as well, not only on repository and backend.
● Possible to handle reads/writes in separate IO channels directly from/to the backend.
○ DataStore can give a content identifier.
JackrabbitValue.getContentIdentity(),which allows to infer binary content paths.
○ https://issues.onehippo.com/browse/CMS-10204 12
Summary
● VFSDataStore can give the most feasible, cost-effective and secure option in most projects when hosting huge amount of binaries with various backends such as WebDAV, SFTP, HDFS, etc.
● No custom UI plugins needed. Just use the default Hippo CMS gallery/asset folder tree and picker UI since it’s handled in the Jackrabbit DataStore backend.
13