Big Data and Hadoop Introduction

Post on 15-Apr-2017

544 views 1 download

Transcript of Big Data and Hadoop Introduction

BIG DATA AND HADOOP INTRODUCTIONNGUYEN PHAN DZUNG

SEPTEMBER 2015

AGENDA- Objectives

- Contents:• Big data• Apache Hadoop• Examples using Hadoop

- Demo- Q&A- References

Security Classification: Internal

Objectives

Big data and Hadoop introduction 3

• Big data overview.• Apache Hadoop common architecture:– Read/write a file in Hadoop File System– How Hadoop MapReduce tasks work– Hadoop 1 & 2 difference

• Develop a MapReduce job using Hadoop• Apply Hadoop in the real world

Big data introduction

Security Classification: Internal

Big data – Information explosion

Big data and Hadoop introduction 5

Security Classification: Internal

Big data – Definition

Big data and Hadoop introduction 6

“Big data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of

information processing that enable enhanced insight, decision making, and

process automation”- Gartner

Security Classification: Internal

Big data – The 3Vs

Big data and Hadoop introduction 7

• Volume :– Google receives over 2 million search queries every minute– transactional data or sensor data are being stored every fraction of

seconds• Variety :

– YouTube, Facebook generate video, audio, image and text data– Over 200 million emails are sent every minute

• Velocity:– Experiments at CERN generate colossal amounts of data.– Particles collide 600 million times per second.– Their Data Center processes about one petabyte of data every day.

Security Classification: Internal

Big data – Challenges

Big data and Hadoop introduction 8

• Difficult in identifying the right data and determining how to best use it.

• Struggling to find the right talent.• Data access and connectivity obstacle.• Data technology landscape is evolving extremely fast.• Finding new ways of collaborating across functions and

businesses.• Security concerns.

Security Classification: Internal

Big data – Landscape

Big data and Hadoop introduction 9

Security Classification: Internal

Big data – Plays part in firm’s revenue

Big data and Hadoop introduction 10

Apache Hadoop introduction

Security Classification: Internal

Apache Hadoop – What?

Big data and Hadoop introduction 12

• It is a software platform:

– allows us easily write and run data related applications

– facilitates processing and manipulating massive amount of data

– the processes are conveniently scalable

Security Classification: Internal

Apache Hadoop – Brief history

Big data and Hadoop introduction 13

Security Classification: Internal

Apache Hadoop – Characteristics

Big data and Hadoop introduction 14

• Reliable shared storage (HDFS) and analysis system (MapReduce).

• Highly scalable • Cost effective as it can work with commodity hardware.• Highly flexible and can process both structured as well as

unstructured data.• Built-in fault tolerance. • Write once and read multiple times.• Optimized for large and very large data sets.

Security Classification: Internal

Apache Hadoop – Design principals

Big data and Hadoop introduction 15

• Moving computation is cheaper than moving data• Hardware will fail, manage it• Hide execution details from the user• Use streaming data access• Use a simple file system coherency model

Security Classification: Internal

Apache Hadoop – Core architecture (1)

Big data and Hadoop introduction 16

Security Classification: Internal

Apache Hadoop – Core architecture (2)

Big data and Hadoop introduction 17

Security Classification: Internal

Apache Hadoop – HDFS architecture

Big data and Hadoop introduction 18

Security Classification: Internal

Apache Hadoop – HDFS architecture - Replication

Big data and Hadoop introduction 19

Security Classification: Internal

Apache Hadoop – HDFS architecture – Secondary namenode

Big data and Hadoop introduction 20

Security Classification: Internal

Apache Hadoop – HDFS – Read a file

Big data and Hadoop introduction 21

Security Classification: Internal

Apache Hadoop – HDFS – Write a file (1)

Big data and Hadoop introduction 22

Security Classification: Internal

Apache Hadoop – HDFS – Write a file (2)

Big data and Hadoop introduction 23

Security Classification: Internal

How MapReduce pattern works

Big data and Hadoop introduction 24

Security Classification: Internal

Apache Hadoop – Running jobs In Hadoop 1

Big data and Hadoop introduction 25

Security Classification: Internal

Apache Hadoop – Running jobs in Hadoop 1 – How it works

Big data and Hadoop introduction 26

Security Classification: Internal

Apache Hadoop – Running jobs In Hadoop 2

Big data and Hadoop introduction 27

Security Classification: Internal

Apache Hadoop – Running Jobs In Hadoop 2 – How it works

Big data and Hadoop introduction 28

Security Classification: Internal

Apache Hadoop – Using

Big data and Hadoop introduction 29

• When to use Hadoop:– Hadoop can be used in various scenarios including some of the following:– Analytics– Search– Data Retention– Log file processing– Analysis of Text, Image, Audio, & Video content– Recommendation systems like in E-Commerce Websites

• When Not to Use Hadoop:– Low-latency or near real-time data access.– Having a large number of small files to be processed. – Multiple writes scenario or scenarios requiring arbitrary writes or writes

between the files

Security Classification: Internal

Apache Hadoop – Ecosystem

Big data and Hadoop introduction 30

Examples using Hadoop

Security Classification: Internal

Examples using Hadoop – A retail management system

Big data and Hadoop introduction 32

Security Classification: Internal

Examples using Hadoop – SQL Server and Hadoop

Big data and Hadoop introduction 33

Security Classification: Internal

Real world applications/solutions using Hadoop – MS HDInsight

Big data and Hadoop introduction 34

Security Classification: Internal

Real world applications/solutions using Hadoop – Case studies

Big data and Hadoop introduction 35

Demo

Q & A

Security Classification: Internal

References

Big data and Hadoop introduction 38

- http://hadoop.apache.org- Hadoop in action – Chuck Lam- Hadoop: The definitive guide – Tom White- http://www.bigdatanews.com/- http://stackoverflow.com- http://codeproject.com- Hadoop 2 Fundamentals – LiveLession

Thank you for your attention!