Big Data Course Outline
-
Upload
anujnet2002 -
Category
Documents
-
view
218 -
download
0
Transcript of Big Data Course Outline
-
7/28/2019 Big Data Course Outline
1/3
Big Data Concepts for Executives and Senior Management
Objective
Understand big data and how it can be applied to store, manage, process and analyze massive amounts of unstructured
and poly structured data
Explore the technologies underpinning big data including Hadoop and NoSQL
Determine how big data systems can complement traditional data warehousing and business intelligence solutions and
processes
Utilize big data to differentiate your business and provide better service to your customers
Examine case studies of how big data is influencing society and businesses
Topics
Understanding Big Data concepts
Developing the business case for a big data solution
Maintaining a technology ecosystem
Examining how big data is influencing society and businesses
The Emerging Role of a Data Scientist
Social Media, the Quest for Real-Time and the Future
Hadoop Concepts for Executives, Business Leaders, IT Managers, Technical Staff, Developers & Administrators
Objective
Understanding of the Hadoop technology stack, including MapReduce, HDFS, Hive, Pig, HBase, and provides an initial
introduction to Mahout and other common utilities.
What is Hadoop?
The essential components of a Hadoop-based data management solution
Pros and cons of implementing Hadoop
How does Hadoop fit into our existing environment and architecture?
The differences between various Hadoop distributions
Examine case studies of how big data is influencing society and businesses
Topics
Why Hadoop?
History & background
Real-world use cases and case studies
The Hadoop Platform
Introduction to MapReduce and Hadoop File System (HDFS)
Data warehousing with Hive
Parallel processing with Pig
Data mining with Mahout
Data storage with HBase
Common utilities - Sqoop, Flume, Hue, Scribe, Zookeeper, HCatalog
Hadoop distributions - Apache Foundation, Cloudera, Hortonworks, MapR
-
7/28/2019 Big Data Course Outline
2/3
-
7/28/2019 Big Data Course Outline
3/3
Determine hardware needs
Monitor Hadoop clusters
Recover from NameNode failure
Handle DataNode failures
Manage hardware upgrade processes including node removal, configuration changes, node
installation and rebalancing clusters
Manage log files
Install, configure, deploy verify and maintain Hadoop clusters including:
MapReduce
HDFS
Pig
Hive (and MySQL)
HBase (and ZooKeeper)
HCatalog
Mahout
Day 1
Overview of Hadoop
Cluster Hardware and Installation of HDFS and MapReduce
Rack Topology
Setting up a Multi-user Environment
Using Schedulers
Hadoop Security with Kerberos
Logs and Log Rotation
Monitor, Maintain and Troubleshoot
HDFS and MapReduce
NameNode Failure and Recovery
JobTracker Restarting
Day 2
Upgrade of Hardware Process
Rebalancing
Data Management
Install Configure, Deploy and Verify Pig
Install Configure, Deploy and Verify Hive
Install Configure, Deploy and Verify MySQL
Install Configure, Deploy and Verify HBase and ZooKeeper
Install Configure, Deploy and Verify Other Hadoop Ecosystem (HCatalog, Mahout)