Cloudera Developer Kit (CDK)

Headline Goes HereSpeaker Name or Subhead Goes Here

Cloudera Developer Kit:Hadoop Application Development Made Easier

E. Sammer | Engineering ManagerBig Data Gurus - 2013/09/16 - @esammer

“[I]t’s not enough to just build a scalable and stable system; the system also has to be easy enough for thousands of internal developers of all types and all skill levels to use.”

http://gigaom.com/data/how-disney-built-a-big-data-platform-on-a-startup-budget/

Hadoop is incredibly powerful

Hadoop is incredibly flexible

Hadoop is incredibly low-level

Hadoop is incredibly complex

A typical system (zoom 100:1)

What you actually care about

Getting data from A to BUsing it later

Infrastructure details

Serialization, file formats, and compressionMetadata capture and maintenanceDataset organization and partitioningDurability and delivery guaranteesWell-defined failure semanticsPerformance and health instrumentation

Cloudera Development Kit

Make Hadoop accessible to the enterprise developerCodify expert patterns and practicesMake the “right thing” easy and obviousAddress the most common cases

Let developers focus on business logical, not infrastructure

Cloudera Development Kit

An open source set of libraries, guides, and examples for building data-oriented systems and applicationsProvides higher level APIs atop existing components of CDHSupports piecemeal adoption via loosely coupled modules

CDK Data Module

High level APIs for interacting with datasets in HDFSConfiguration-based format and schema managementConsistent data model and serialization semanticsMetadata system integration and supportAutomatic dataset partitioning and file management

DatasetRepository repo = new FileSystemDatasetRepository.Builder() .fileSystem(FileSystem.get(new Configuration())) .directory(new Path(“/data”)) .get();

Dataset events = repo.create(“events”, new DatasetDescriptor.Builder() .schema(new File(“event.avsc”)) .partitionStrategy( new PartitionStrategy.Builder().hash(“userId”, 53).get() ).get());

DatasetWriter<GenericRecord> writer = events.getWriter();writer.open();writer.write( new GenericRecordBuilder(schema) .set(“userId”, 1) .set(“timeStamp”, System.currentTimeMillis()) .build());writer.close();

/data /events /.metadata /schema.avsc /descriptor.properties /userId=0 /10000000.avro /10000001.avro /userId=1 /20000000.avro /userId=2 /30000000.avro

CDK Morphlines Module

Pluggable, configuration-driven data transform libraryBorn out of Cloudera Search, but general purposeConfigure record transform stages in a container libraryUse the library in Flume, MapReduce jobs, Storm, and other Java applications

Other Modules

Maven pluginPackage, deploy, and execute “apps”Execute dataset operations

ExamplesPOJO, generic, and generated entity ingestDataset administrative operationsCrunch and MR integration...

Future

HBaseExtending data APIs to support random accessSame automatic serialization, schema management, etc.

Higher-order data managementCommon tasksThink background compaction, conversion, etc.

Integration with existing middleware frameworksGive us all your good ideas (and code)!

Getting started

CDK code repo: github.com/cloudera/cdkCDK example repo: github.com/cloudera/cdk-examplesBinary artifacts available from Cloudera’s Maven repositoryCommunity forums: community.cloudera.comMailing list: groups.google.com/a/cloudera.org/d/forum/cdk-devJIRA: issues.cloudera.org/browse/CDK

Questions?

I also wrote a book.We’re going to give a few copies away.

Cloudera Developer Kit (CDK)

Technology

Transcript of Cloudera Developer Kit (CDK)

CDK Machinery Catalogue - CDK Stone - Natural Stone ...

CDK 4.1.x Powered By Apache Kafka®ImportantNotice ©2010-2019Cloudera,Inc.Allrightsreserved. Cloudera,theClouderalogo,andanyotherproductor ...

Fraud Detection Using Deep Learning - CyberCamp · Cloudera Developer Training for Apache Spark Cloudera Developer Training for Apache Hadoop

RichFaces CDK Developer Guide - JBoss...RichFaces CDK Developer Guide iv 10.12. ..... 56 Chapter 1. Introduction 1 Introduction The major benefit of the JSF framework

~AWS CDK IaC~

Cloudera Developer Training for Apache Hadoop: Hands · PDF fileHadoop!cluster,!theonlykeydifference!(apart!fromspeed,!of!course!)!being!that!the! blockreplicationfactorissetto1,sincethereisonlyasingleDataNodeavailable.

CONTINUOUS DRYING KILNS - CDK - Automation & Elec · CONTINUOUS DRYING KILNS - CDK . ... CDK control is via a PC/PLC kiln ... • All CDK kiln options are fully automated in terms

Cloudera CCA175 CCA Spark and Hadoop Developer Exam

Cloudera CCD-410 Cloudera Certified Developer for Apache ...

Building the Enterprise Data Lake with Cloudera & Cisco · cloudera introduces the enterprise data hub and cloudera enterprise 5 2015 cloudera includes kafka, kudu and record service

CDK 4.0.x Powered By Apache Kafka® · ImportantNotice ©2010-2020Cloudera,Inc.Allrightsreserved. Cloudera,theClouderalogo,andanyotherproductor ...

Cloudera User Group Chicago - Cloudera Manager: APIs & Extensibility

Cdk 134 masalah_anak

Cdk 067 Kardiovaskuler

Cdk 118 Malaria

INSTALLATION: SAS UNIVERSITY EDITION · CLOUDERA HADOOP AND SPARK CERTIFICATION (AVAILABLE) x CCA 131 : Hadoop Administrator x CCA -175 Cloudera® (Hadoop and Spark Developer) x CCP:DE575

Cdk 131 Malaria

Cdk 085 Hepatitis

Apache Atlas Reference - Cloudera · Cloudera, Cloudera Altus, HUE, Impala, Cloudera Impala, and other Cloudera marks are registered or unregistered trademarks in the United States

HueGuide - Cloudera