NFS and ODBC

27
1 ©MapR Technologies - Confidential Using Standard File- Based Applications and SQL-Based Tools with Hadoop

description

Using Standard File-Based Applications and SQL-Based Tools with Hadoop.

Transcript of NFS and ODBC

  • 1. Using Standard File-BasedApplications and SQL-Based Tools with HadoopMapR Technologies - Confidential 1

2. http://info.mapr.com/Japan-HUG-8-2012 Tomer Shiran [email protected] Director of Product Management, MapR TechnologiesMapR Technologies - Confidential 2 3. The MapR Distribution for Apache Hadoop The open, enterprise-grade distribution for Apache Hadoop Open source componentsHive, Pig, Cascading, HBase, ZooKeeper, Oozie, Flume, Sqoop, Whirr, Enhancements to make Hadoop more open and enterprise-grade Fastest growing distribution Thousands of clusters deployed Now available as a service with Amazon Elastic MapReduce (EMR) http://aws.amazon.com/elasticmapreduce/maprMapR Technologies - Confidential 3 4. Recent News Amazon selects MapR to provide the enterprise-grade Hadoopdistribution in EMR Google selects MapR to provide Hadoop on Google ComputeEngine MapR launched open source Apache Drill project inspired byGoogle Dremel Low latency queriesMapR Technologies - Confidential 4 5. MapR Make HadoopMake Hadoopmore openenterprise-grade This presentationMapR Technologies - Confidential5 6. Not All Applications Use the Hadoop APIs Applications and libraries that use files and/or SQL 30 years100,000s applications 10,000s libraries 10s programming languages Applications and libraries that use the Hadoop APIsMapR Technologies - Confidential6 7. Hadoop Needs Industry-Standard Interfaces Hadoop MapReduce and HBase applications API Mostly custom-built File-based applications NFS Supported by most operating systems SQL-based tools ODBC Supported by most BI applications andquery buildersMapR Technologies - Confidential 7 8. NFSMapR Technologies - Confidential 8 9. Your Data is Your Data HDFS-based Hadoop distributions do not (cannot)support NFS Your data is your data make sure you can access it Why store your data in a system which cannot be accessed by 95% of the worlds applications and libraries?MapR Technologies - Confidential9 10. The NFS Protocol RFC 1813WRITE3res NFSPROC3_WRITE(WRITE3args) = 7;struct WRITE3args {nfs_fh3 file; Very simple protocoloffset3 offset;count3count;stable_how stable; Random reads/writes opaquedata; Read count bytes from}; offset offset of file file Write buffer data to READ3res NFSPROC3_READ(READ3args) = 6; offset offset of a file filestruct READ3args {nfs_fh3 file;offset3 offset; HDFS does not support count3 count;random writes so it };cannot support NFSMapR Technologies - Confidential10 11. S3o.a.h.fs.s3native.NativeS3FileSystemMapR Technologies - ConfidentialHDFSo.a.h.hdfs.DistributedFileSystem Storage Layers Local File Systemo.a.h.fs.LocalFileSystemMapReduceFTP o.a.h.fs.ftp.FTPFileSystem11MapR storage layer o.a.h.fs.FileSystem Interface com.mapr.fs.MapRFileSystem Hadoop Was Designed to Support Multiple Hadoop NFS interface FileSystem API 12. One NFS GatewayMapR Technologies - Confidential 12 13. Multiple NFS GatewaysMapR Technologies - Confidential 13 14. Multiple NFS Gateways with Load BalancingMapR Technologies - Confidential 14 15. Multiple NFS Gateways with NFS HA (VIPs)MapR Technologies - Confidential 15 16. Customer Examples: Import/Export Data Network security vendor Network packet captures from switches are streamed into the cluster New pattern definitions are loaded into online IPS via NFS Online measurement company Clickstreams from application servers are streamed into the cluster SaaS company Exporting a database to Hadoop over NFS Ad exchange Bids and transactions are streamed into the clusterMapR Technologies - Confidential16 17. Customer Examples: Productivity and Operations Retailer Operational scripts are easier with NFS than DFS + MapReducechmod/chown, file system searches/greps, make, tab-complete Consolidate object store with analytics Credit card company User and project home directories on Linux gatewaysLocal files, scripts, source code, Administrators manage quotas, snapshots/backups, Large Internet company Web server serve MapReduce results (item relationships) directly from cluster Email marketing company Object store with HBase and NFSMapR Technologies - Confidential 17 18. MapR Technologies - Confidential 18 19. MapR Roadmap: What?MapR Technologies - Confidential 19 20. ODBCMapR Technologies - Confidential 20 21. ODBC ODBC Open DataBase Connectivity Open standard API for accessing a SQL-based backend Developed by Microsoft and Simba Technologies in 1992 Flagship API for SQL-based BI and reporting Excel, Tableau, MicroStrategy, Crystal Reports, Advanced ODBC drivers use the latest 3.52 specificationMapR Technologies - Confidential 21 22. MapR ODBC Driver MapR provides a Hive ODBC 3.52 driver Developed in partnership with ODBC inventor Simba Technologies Compliant with latest ODBC 3.52 specification 32- and 64-bit platform supportWindows and Linux Enables direct SQL access to MapR-stored data by translating SQL toHiveQL SQLizer enables seamless connectivity Provides ANSI SQL-92 front-end Targeted for existing apps that generate standard SQL queries Transforms SQL query into HiveQL queryMapR Technologies - Confidential22 23. Example: TableauMapR Technologies - Confidential 23 24. Example: TableauMapR Technologies - Confidential 24 25. Example: Open source query builder (Kaimon)MapR Technologies - Confidential 25 26. Example: Microsoft ExcelMapR Technologies - Confidential 26 27. Time for Questions Download slides or send me an email http://info.mapr.com/Japan-HUG-8-2012 Download MapR to learn more www.mapr.com/download Contact EMC Greenplum Japan Yoshiaki Hirabayashi [email protected] Akihiko Kusanagi [email protected] Technologies - Confidential27