Opensource Frameworks and BigData Processing

Post on 12-Jul-2015

974 views 3 download

Tags:

Transcript of Opensource Frameworks and BigData Processing

Linux and Ubuntu 14.10 Release Conf 1

Big-Data Processing utilizingOpen-Source Technology Stack

By

Amir Sedighi

http://www.linkedin.com/in/amirsedighi@amirsedighi

Linux and Ubuntu 14.10 Release Conf 2

References

● http://www.slideshare.net/BernardMarr/140228-big-data-slide-share?qid=017848e2-9e2a-4dc3-963c-52b6a90fba2a&v=default&b=&from_search=1

● http://www.forbes.com/fdc/welcome_mjx.shtml

● ZYMR Spark Your Real-Time Big Data Analytics

● http://dataconomy.com

● https://datakulfi.wordpress.com/2013/03/27/big-data-open-source-technology-landscape/

● http://www.slideshare.net/andrefaria/big-data-abc?qid=1ac97e4a-4acc-460a-b3f8-9122f7210440&v=qf1&b=&from_search=12

● https://wiki.apache.org/hadoop/PoweredBy

Linux and Ubuntu 14.10 Release Conf 3

Data Explosion

Linux and Ubuntu 14.10 Release Conf 4

Data Explosion

Linux and Ubuntu 14.10 Release Conf 5

● Big-Data is that everything we do is increasingly leaving a digital trace which we (or others) can gather, use and analyze.

– Data Providers● Business Companies● People

Linux and Ubuntu 14.10 Release Conf 6

Volume, Velocity, Variety

● “There was 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing.” Eric Schmidt

Linux and Ubuntu 14.10 Release Conf 7

Big-Data Processing

Linux and Ubuntu 14.10 Release Conf 8

How to provide a Big-Data processing platform using commodity machines?

Linux and Ubuntu 14.10 Release Conf 9

Vertical or Horizontal?

Linux and Ubuntu 14.10 Release Conf 10

Scale Up vs Scale Out

Linux and Ubuntu 14.10 Release Conf 11

Scale Up vs Scale Out

Linux and Ubuntu 14.10 Release Conf 12

Big-Data Processing Open-Source Technology Stack

Linux and Ubuntu 14.10 Release Conf 13

Map-Reduce

Linux and Ubuntu 14.10 Release Conf 14

Hadoop Framework

Linux and Ubuntu 14.10 Release Conf 15

Apache Hadoop Main Projects

Linux and Ubuntu 14.10 Release Conf 16

Linux and Ubuntu 14.10 Release Conf 17

Data Stores

● Data Stores

– KeyValue

– Graph

– Columnar

– Document Store

– In Memory

Linux and Ubuntu 14.10 Release Conf 18

Data Transfer

● Apache Flume

● Apache Sqoop

Linux and Ubuntu 14.10 Release Conf 19

Search

● Elasticsearch

● Apache SolR

Linux and Ubuntu 14.10 Release Conf 20

Messaging and Queuing

● Apache Kafka

● ZeroMQ

Linux and Ubuntu 14.10 Release Conf 21

Log Management

● ELK

● Logstash

● FluentD

Linux and Ubuntu 14.10 Release Conf 22

Stream Processing

● Apache Storm

● Apache Samza

● Apache Spark

Linux and Ubuntu 14.10 Release Conf 23

Machine Learning

● Apache Mahout

● MLLib

● GraphX

Linux and Ubuntu 14.10 Release Conf 24

Questions?