HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.
-
Upload
marianna-barefoot -
Category
Documents
-
view
217 -
download
0
Transcript of HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.
![Page 1: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f94931cd9/html5/thumbnails/1.jpg)
IT-SDC : Support for Distributed Computing
HDFS and S3 plugins
Andrea Manzi Martin Hellmich
13/12/2013
![Page 2: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f94931cd9/html5/thumbnails/2.jpg)
DPM Workshop 2IT-SDC
Plugins functionalities
13/12/2013
NFS HTTP/DAV XROOT GridFTP RFIO
Namespace Management Pool Management Pool Driver I/O
Legacy DPM Legacy DPM Legacy DPM Legacy DPM
MySQL MySQL HDFS HDFS
Oracle Oracle S3
HDFS
Memcache
![Page 3: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f94931cd9/html5/thumbnails/3.jpg)
DPM Workshop 3IT-SDC
HDFS plugin
dmlite plugin implementing I/O, pool driver and namespace functionalities through Apache Hadoop HDFS ensuring: Automatic data replication Fault tolerance to client’s read
Dead of Datanode and Namenode Scalability
13/12/2013
![Page 4: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f94931cd9/html5/thumbnails/4.jpg)
DPM Workshop 4IT-SDC
Deployment with Lcgdm-dav
13/12/2013
DPM Head Node Lcgdm-dav + dmliteHDFS-plugin
HDFS Namenode
HDFS Datanode(s)Lcgdm-dav + dmliteHDFS-plugin
![Page 5: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f94931cd9/html5/thumbnails/5.jpg)
DPM Workshop 5IT-SDC
Some details
13/12/2013
HDFS C APIs (libhdfs) do not implement functions to retrieve the available datanodes ( LIVE nodes) Patch implemented and submitted to Hadoop hadoop-libhdfs rpm from our repo
First version for Puppet installation is available. To be adapted to recent dav/dmlite module
changes
![Page 6: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f94931cd9/html5/thumbnails/6.jpg)
DPM Workshop 6IT-SDC
On-going issues
13/12/2013
Tested with new dmlite-based GridFTP plugin Same deployment model as http/dav
frontend or single node writing to HDFS But…HDFS does not support multiple
write streams / random writes: OSG developed in-memory stream reordering in
GridFTP in order to avoid this limitation ( gridftp-hdfs DSI available also in Globus toolkit)
To test and understand integration
![Page 7: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f94931cd9/html5/thumbnails/7.jpg)
DPM Workshop 7IT-SDC
On-going issues
13/12/2013
SRM frontend does not speak dmlite
SRM calls through old dpm daemons do not handle properly new pools (as HDFS)
Patch to dpm daemon to be implemented
![Page 8: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f94931cd9/html5/thumbnails/8.jpg)
DPM Workshop 8IT-SDC
Future steps
13/12/2013
Distribution: Need to understand how to distribute
the plugin HDFS client only in Fedora 20 and
Rawhide https://apps.fedoraproject.org/packages/libh
dfs
Support for security enabled HDFS clusters ( Kerberos)
![Page 9: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f94931cd9/html5/thumbnails/9.jpg)
DPM Workshop 9IT-SDC
Performances
13/12/2013
Tests through LCDM-DAV: HDFS Namespace
stat/s half performances compared to Mysql plugin namespace
To be optimized with Memcached in front ROOT analysis with massive Vector
I/O and TTreeCache Comparable performance with standard
disk pools
![Page 10: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f94931cd9/html5/thumbnails/10.jpg)
10IT-SDC
S3 plugin
13/12/2013DPM Workshop
![Page 11: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f94931cd9/html5/thumbnails/11.jpg)
11IT-SDC
Key Facts
Data directly to the cloud
HTTP/HTTPS only
DPM provides the namespace
13/12/2013DPM Workshop
3
2
1
![Page 12: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f94931cd9/html5/thumbnails/12.jpg)
12IT-SDC
Data in the Cloud
REDIRECTGET
GET
No data through DPM Inherits all capabilities
from S3 provider: Amazon: range-header, no
multi-range, multi-stream download only, no 3rd party copy, http access only
DATA
DPM Workshop 13/12/2013
![Page 13: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f94931cd9/html5/thumbnails/13.jpg)
13IT-SDC
How to install an S3 pool
yum install dmlite-plugins-s3
dmlite-shell> pooladd poolaws s3> poolmodify poolaws bucketsalt xFVlsrg> poolmodify poolaws s3accesskeyid <ID>> poolmodify poolaws s3secretaccesskey <SK>
<create an s3 bucket on your storage>
13/12/2013DPM Workshop
![Page 14: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f94931cd9/html5/thumbnails/14.jpg)
14IT-SDC
More info
HDFS plugin https://svnweb.cern.ch/trac/lcgdm/wiki/D
pm/Dev/Dmlite/Plugins/HDFS
S3 plugin https
://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Dmlite/Plugins/S3
13/12/2013DPM Workshop
![Page 15: HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.](https://reader030.fdocuments.us/reader030/viewer/2022032516/56649c7d5503460f94931cd9/html5/thumbnails/15.jpg)
15IT-SDC
Thanks!
Questions?
DPM Workshop 13/12/2013