Cloudera Developer Training for Apache Hadoop
-
Upload
tintojames -
Category
Documents
-
view
57 -
download
1
description
Transcript of Cloudera Developer Training for Apache Hadoop
-
Clouderas four-day developer training course delivers the key concepts and expertise necessary to create robust data processing applications using Apache Hadoop.
Through lecture and interactive, hands-on exercises, attendees will navigate the Hadoop ecosystem, learning topics such as
MapReduce and the Hadoop Distributed File System (HDFS) and how to write MapReduce code
Best practices and considerations for Hadoop development, debugging techniques and implementation of workflows and common algorithms
How to leverage Hive, Pig, Sqoop, Flume, Oozie and other projects from the Apache Hadoop ecosystem
Optimal hardware configurations and network considerations for building out, maintaining and monitoring your Hadoop cluster
Advanced Hadoop API topics required for real-world data analysis
Upon completion of the course, attendees are able to attempt the Cloudera Certified Developer for Apache Hadoop (CCDH) exam. Certification is a great differentiator; it helps establish individuals as leaders in their field, providing customers with tangible evidence of their skills.
AudienceThis course is intended for experienced developers who wish to write, maintain, and/or optimize Apache Hadoop jobs. A background in Java is preferred, but experience with other programming language such as PHP, Python or C# is sufficient.
TRAINING SHEET
Take your knowledge to the next level with Clouderas Apache Hadoop Training and Certification
Cloudera has true expertise in their ranks, offering intimate insight and experience with the Apache Hadoop ecosystem.
Justin Hancock,Director
Developer Training for Apache Hadoop
Cloudera, Inc. 210 Portage Avenue, Palo Alto, CA 94306 USA | 1-888-789-1488 or 1-650-362-0488 | cloudera.com
2011 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.
-
TRAINING SHEET
Developer Training for Apache Hadoop
Cloudera, Inc. 210 Portage Avenue, Palo Alto, CA 94306 USA | 1-888-789-1488 or 1-650-362-0488 | cloudera.com
2011 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.
The Motivation For Hadoop o Problems with traditional large-scale systems o Requirements for a new approach
Hadoop Basic Concepts o An Overview of Hadoop o The Hadoop Distributed File System o Hands-On Exercise o How MapReduce Works o Hands-On Exercise o Anatomy of a Hadoop Cluster o Other Hadoop Ecosystem Components
Writing a MapReduce Program o The MapReduce Flow o Examining a Sample MapReduce Program o Basic MapReduce API Concepts o The Driver Code o The Mapper o The Reducer o Hadoops Streaming API o Using Eclipse for Rapid Development o Hands-On Exercise
Integrating Hadoop Into The Workflow o Relational Database Management Systems o Storage Systems o Creating workflows with Oozie o Importing Data from RDBMSs With Sqoop o Hands-On Exercise o Importing Real-Time Data with Flume o Accessing HDFS Using FuseDFS and Hoop
Delving Deeper Into The Hadoop API o Using Combiners o Using LocalJobRunner Mode for Faster Development o Reducing Intermediate Data with Combiners o The configure and close methods for MapReduce Setup and Teardown o Writing Partitioners for Better Load Balancing o Directly Accessing HDFS o Using The Distributed Cache o Hands-On Exercise
Using Hive and Pig o Hive Basics o Pig Basics o Hands-On Exercise
Common MapReduce Algorithms o Sorting and Searching o Indexing o Machine Learning with Mahout o Term Frequency - Inverse Document Frequency o Word Co-Occurrence o Hands-On Exercise
Practical Development Tips and Techniques o Testing with MRUnit o Debugging MapReduce Code o Using LocalJobRunner Mode for Easier Debugging o Eclipse development techniques o Retrieving Job Information with Counters o Logging o Splittable File Formats o Determining the Optimal Number of Reducers o Map-Only MapReduce Jobs o Implementing Multiple Mappers using ChainMapper o Hands-On Exercise
More Advanced MapReduce Programming o Custom Writables and WritableComparables o Saving Binary Data using SequenceFiles and Avro Files o Creating InputFormats and OutputFormats o Hands-On Exercise
Joining Data Sets in MapReduce Jobs o Map-Side Joins o The Secondary Sort o Reduce-Side Joins o Hands-On Exercise
Graph Manipulation in Hadoop o Introduction to graph techniques o Representing Graphs in Hadoop o Implementing a sample algorithm: Single Source Shortest Path
Creating Workflows with Oozie o The Motivation for Oozie o Oozies Workflow Definition Format o Hands-On Exercise
Cloudera Certified Developer for Apache Hadoop (CCDH) exam
Course Outline: Cloudera Developer Training for Apache Hadoop
-
Cloudera Certified Developer for Apache Hadoop (CCDH)Establish yourself as a trusted and valuable resource by completing the online certification exam for Apache Hadoop developers.The exam is demanding and is designed to test your fluency with concepts and terminology in the following areas:
Computing Environment The current mix of computing resources and demands that motivates use of a technology like Apache Hadoop
Hadoop Distributed File System How files are stored and managed in HDFS; the infrastructure that supports HDFS
MapReduce The phases of execution and framework for running a MapReduce job. Expected properties of job runs based on number of mappers, number of reducers and distribution of data
Hadoop API The Java classes that make up the API for developers who wish to write Apache Hadoop MapReduce jobs
Hadoop Platform The basic purpose, design and operation of tools that augment the Apache Hadoop core to make a comprehensive platform, including Hadoop Streaming, fuse-dfs, Apache Hive, Apache Pig, Apache Flume, Apache Sqoop, Apache HBase, Apache Oozie and HUE
TRAINING SHEET
Developer Training for Apache Hadoop
Cloudera, Inc. 210 Portage Avenue, Palo Alto, CA 94306 USA | 1-888-789-1488 or 1-650-362-0488 | cloudera.com
2011 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.