Big Data Integration -SnapLogic-€¦ · Big Data Analytics - SnapLogic • Basic Understanding of...

14
www.company.com Big Data Integration -SnapLogic-

Transcript of Big Data Integration -SnapLogic-€¦ · Big Data Analytics - SnapLogic • Basic Understanding of...

Page 1: Big Data Integration -SnapLogic-€¦ · Big Data Analytics - SnapLogic • Basic Understanding of HDFS& MapReduce • Linux commands are not required –Drag & Drop • Not much

www.company.com

Big Data Integration-SnapLogic-

Page 2: Big Data Integration -SnapLogic-€¦ · Big Data Analytics - SnapLogic • Basic Understanding of HDFS& MapReduce • Linux commands are not required –Drag & Drop • Not much

www.company.com

Defining Big Data

• Volume

– MB/ GB/ TB/ PB/

• Variety

– Table/ Data Base/ Photo, Web, Audio/Social, Video,

Unstructured, Mobile

• Velocity

– Batch/ Periodic/ Near Real Time/ Real Time

Page 3: Big Data Integration -SnapLogic-€¦ · Big Data Analytics - SnapLogic • Basic Understanding of HDFS& MapReduce • Linux commands are not required –Drag & Drop • Not much

www.company.com

Unstructured Data Growing Rate

0

2

4

6

8

10

12

14

16

2004 2008 2012 Growing exponentially

Structured Data Unstructured Data

Page 4: Big Data Integration -SnapLogic-€¦ · Big Data Analytics - SnapLogic • Basic Understanding of HDFS& MapReduce • Linux commands are not required –Drag & Drop • Not much

www.company.com

Hadoop

• Hadoop is an open source

framework.

• HDFS – Hadoop

Distributed File System

• MapReduce – Batch

Processing

Page 5: Big Data Integration -SnapLogic-€¦ · Big Data Analytics - SnapLogic • Basic Understanding of HDFS& MapReduce • Linux commands are not required –Drag & Drop • Not much

www.company.com

Big Data Analytics

Hadoop vs. SnapLogic

– Hadoop usually runs on Linux and it is built on top of

Linux.

– SnapLogic offers cloud based integration, which means

it can run on any operating system.

Page 6: Big Data Integration -SnapLogic-€¦ · Big Data Analytics - SnapLogic • Basic Understanding of HDFS& MapReduce • Linux commands are not required –Drag & Drop • Not much

www.company.com

Big Data Analytics - Hadoop• Must be familiar with

Linux commands

• Understanding

architecture of HDFS&

MapReduce

• Configuration files

• Dependencies

• Managing Cluster and

each node

• Understanding and

Managing Hadoop

Ecosystem components

• HiveQL, Pig Latin, Java,

Python, Scala………etc

Page 7: Big Data Integration -SnapLogic-€¦ · Big Data Analytics - SnapLogic • Basic Understanding of HDFS& MapReduce • Linux commands are not required –Drag & Drop • Not much

www.company.com

Big Data Analytics - SnapLogic• Basic Understanding of

HDFS& MapReduce

• Linux commands are not

required – Drag & Drop

• Not much programming

needed

• Configuration files are

already set to go

• No dependency issues

• No Hadoop Ecosystem

Components

• Good compatibility with

other tools such as

Tableau, RedShift and

many others.

Page 8: Big Data Integration -SnapLogic-€¦ · Big Data Analytics - SnapLogic • Basic Understanding of HDFS& MapReduce • Linux commands are not required –Drag & Drop • Not much

www.company.com

Cloud Based Integration

Page 9: Big Data Integration -SnapLogic-€¦ · Big Data Analytics - SnapLogic • Basic Understanding of HDFS& MapReduce • Linux commands are not required –Drag & Drop • Not much

www.company.com

Example – Twitter Analysis

• Hadoop

• Hadoop Ecosystem Components

• Download Flume – extract the file

• Configure Variables in ~/.bashrc – setting

directories

• Setup Twitter API = channel, host name, file

format, batch size, write format, transaction

capacity and etc.

• Start Stream the Twitter data into HDFS –

• Download Hive – extract the file

• Configure variables in ~/.bashrc……

Page 10: Big Data Integration -SnapLogic-€¦ · Big Data Analytics - SnapLogic • Basic Understanding of HDFS& MapReduce • Linux commands are not required –Drag & Drop • Not much

www.company.com

Example – Twitter Analysis

SnapLogic

Page 11: Big Data Integration -SnapLogic-€¦ · Big Data Analytics - SnapLogic • Basic Understanding of HDFS& MapReduce • Linux commands are not required –Drag & Drop • Not much

www.company.com

Hospitla Evaluation Data Example

• Used Hadoop Ecosystem Components – Flume,

Hive. Used SerDe for quotated values.

Dependencies needed such as JDK 1.7.

Page 12: Big Data Integration -SnapLogic-€¦ · Big Data Analytics - SnapLogic • Basic Understanding of HDFS& MapReduce • Linux commands are not required –Drag & Drop • Not much

www.company.com

Hospitla Evaluation Data Example

• By using SnapLogic, this can be easily done.

Page 13: Big Data Integration -SnapLogic-€¦ · Big Data Analytics - SnapLogic • Basic Understanding of HDFS& MapReduce • Linux commands are not required –Drag & Drop • Not much

www.company.com

However…

• Not an open source tool.

• Less variety compare to Apache projects.

Page 14: Big Data Integration -SnapLogic-€¦ · Big Data Analytics - SnapLogic • Basic Understanding of HDFS& MapReduce • Linux commands are not required –Drag & Drop • Not much

www.company.com

Thank you

• Contact Information:

• Hyun Kim, Practice Head for Big Data

[email protected]