Tackling Big Data with the Elephant in the Room
-
Upload
bti360 -
Category
Technology
-
view
157 -
download
3
Transcript of Tackling Big Data with the Elephant in the Room
TACKLING BIG DATA WITH THE ELEPHANT IN THE ROOM
WHAT’S THE PROBLEM WITH BIG DATA?
Volume Variety Velocity
WHAT’S THE SOLUTION TO BIG DATA?
“In pioneer days they used oxen for heavy pulling, and when one oxen couldn’t budge
a log, they didn’t try to grow a larger ox. We shouldn’t be trying for bigger
computers, but for more systems of computers.” – Grace Hopper
HADOOP’S SOLUTION
Sqoop
Pig Hive
HBase Mahout Flume
Oozie …
Hadoop Distributed File System
MapReduce
Hadoop Core
Components
Hadoop Ecosystem
WHAT IS
HDFS?
HOW DOES HDFS WORK?
Large Data File
Block #1
Block #2
HOW DOES HDFS WORK?
Large Data File
Block #1
Block #2
Block #1
Block #1
Block #1
HOW DOES HDFS WORK?
Large Data File
Block #1
Block #2
Block #1
Block #1
Block #1
Block #2
Block #2
Block #2
HOW DOES HDFS WORK?
Large Data File
Block #1
Block #2
Block #1
Block #1
Block #1
Block #2
Block #2
Block #2
WHAT IS MAP-REDUCE? Core Ideas
– Data Locality – Parallelism – Block Independence
Three Stages 1. Map 2. Swap & Sort 3. Reduce
WORD COUNT MAP
the cat sat on the mat the aardvark sat on the …
Node 1
the mahout drove the ….
Node 2
the cat sat on the mat The aardvark sat on the … The mahout drove the …
Mapper
WORD COUNT MAP
the cat sat on the mat the aardvark sat on the …
Node 1
the mahout drove the ….
Node 2
Mapper
map()
map()
Mapper
WORD COUNT MAP
the cat sat on the mat the aardvark sat on the …
Node 1
the mahout drove the ….
Node 2
Mapper
map()
map()
the 1
cat 1
sat 1
on 1
the 1
mat 1
the 1
mahout 1
drove 1
the 1
Mapper
WORD COUNT MAP
the cat sat on the mat the aardvark sat on the …
Node 1
the mahout drove the ….
Node 2
Mapper
map()
map()
the 1
cat 1
sat 1
on 1
the 1
mat 1
the 1
mahout 1
drove 1
the 1
map() the 1
aardvark 1
sat 1
on 1
the 1
WORD COUNT SWAP & SORT the 1
cat 1
sat 1
on 1
the 1
mat 1
the 1
mahout 1
drove 1
the 1
the 1
aardvark 1
sat 1
on 1
the 1
WORD COUNT SWAP & SORT the 1
cat 1
sat 1
on 1
the 1
mat 1
the 1
mahout 1
drove 1
the 1
the 1
aardvark 1
sat 1
on 1
the 1
aardvark 1
cat 1
mat 1
on 1,1
sat 1
the 1,1,1,1
drove 1
mahout 1
the 1,1
WORD COUNT SWAP & SORT the 1
cat 1
sat 1
on 1
the 1
mat 1
the 1
mahout 1
drove 1
the 1
the 1
aardvark 1
sat 1
on 1
the 1
aardvark 1
cat 1
mat 1
on 1,1
sat 1
the 1,1,1,1
drove 1
mahout 1
the 1,1
aardvark 1
cat 1
mat 1
mahout 1
sat 1
drove 1
on 1,1
the 1,1,1,1,1,1
Node 3
Node 4
Node 5
WORD COUNT REDUCER aardvark 1
cat 1
mat 1
mahout 1
sat 1
drove 1
on 1,1
the 1,1,1,1,1,1
Node 3
Node 4
Node 5
Reducer 0
Reducer 1
Reducer 2
aardvark 1
cat 1
mat 1
mahout 1
sat 1
drove 1
on 2
the 6
TAKE-AWAYS
Sqoop
Pig Hive
HBase Mahout Flume
Oozie …
Hadoop Distributed File System
MapReduce
Hadoop Core
Components
Hadoop Ecosystem
QUESTIONS?