Big Data Course Outline

download Big Data Course Outline

of 3

Transcript of Big Data Course Outline

  • 7/28/2019 Big Data Course Outline

    1/3

    Big Data Concepts for Executives and Senior Management

    Objective

    Understand big data and how it can be applied to store, manage, process and analyze massive amounts of unstructured

    and poly structured data

    Explore the technologies underpinning big data including Hadoop and NoSQL

    Determine how big data systems can complement traditional data warehousing and business intelligence solutions and

    processes

    Utilize big data to differentiate your business and provide better service to your customers

    Examine case studies of how big data is influencing society and businesses

    Topics

    Understanding Big Data concepts

    Developing the business case for a big data solution

    Maintaining a technology ecosystem

    Examining how big data is influencing society and businesses

    The Emerging Role of a Data Scientist

    Social Media, the Quest for Real-Time and the Future

    Hadoop Concepts for Executives, Business Leaders, IT Managers, Technical Staff, Developers & Administrators

    Objective

    Understanding of the Hadoop technology stack, including MapReduce, HDFS, Hive, Pig, HBase, and provides an initial

    introduction to Mahout and other common utilities.

    What is Hadoop?

    The essential components of a Hadoop-based data management solution

    Pros and cons of implementing Hadoop

    How does Hadoop fit into our existing environment and architecture?

    The differences between various Hadoop distributions

    Examine case studies of how big data is influencing society and businesses

    Topics

    Why Hadoop?

    History & background

    Real-world use cases and case studies

    The Hadoop Platform

    Introduction to MapReduce and Hadoop File System (HDFS)

    Data warehousing with Hive

    Parallel processing with Pig

    Data mining with Mahout

    Data storage with HBase

    Common utilities - Sqoop, Flume, Hue, Scribe, Zookeeper, HCatalog

    Hadoop distributions - Apache Foundation, Cloudera, Hortonworks, MapR

  • 7/28/2019 Big Data Course Outline

    2/3

  • 7/28/2019 Big Data Course Outline

    3/3

    Determine hardware needs

    Monitor Hadoop clusters

    Recover from NameNode failure

    Handle DataNode failures

    Manage hardware upgrade processes including node removal, configuration changes, node

    installation and rebalancing clusters

    Manage log files

    Install, configure, deploy verify and maintain Hadoop clusters including:

    MapReduce

    HDFS

    Pig

    Hive (and MySQL)

    HBase (and ZooKeeper)

    HCatalog

    Mahout

    Day 1

    Overview of Hadoop

    Cluster Hardware and Installation of HDFS and MapReduce

    Rack Topology

    Setting up a Multi-user Environment

    Using Schedulers

    Hadoop Security with Kerberos

    Logs and Log Rotation

    Monitor, Maintain and Troubleshoot

    HDFS and MapReduce

    NameNode Failure and Recovery

    JobTracker Restarting

    Day 2

    Upgrade of Hardware Process

    Rebalancing

    Data Management

    Install Configure, Deploy and Verify Pig

    Install Configure, Deploy and Verify Hive

    Install Configure, Deploy and Verify MySQL

    Install Configure, Deploy and Verify HBase and ZooKeeper

    Install Configure, Deploy and Verify Other Hadoop Ecosystem (HCatalog, Mahout)