Big Data Processing Utilizing Open-source Technologies - May 2015
-
Upload
amir-sedighi -
Category
Software
-
view
540 -
download
2
Transcript of Big Data Processing Utilizing Open-source Technologies - May 2015
Big-Data Processing utilizingOpen-Source Technologies
32 Slides
Amir SedighiRayanesh Dadegan Data Solutions Ltd.
May 2015
Amir Sedighi - May 2015 2
References● http://www.slideshare.net/BernardMarr/140228-big-data-slide-share?qid=017848e2-9e2a-4dc3-963c-52b6a90fba2a&v=default&b=&from_search=1
● http://www.forbes.com/fdc/welcome_mjx.shtml
● ZYMR Spark Your Real-Time Big Data Analytics
● http://dataconomy.com
● https://datakulfi.wordpress.com/2013/03/27/big-data-open-source-technology-landscape/
● http://www.slideshare.net/andrefaria/big-data-abc?qid=1ac97e4a-4acc-460a-b3f8-9122f7210440&v=qf1&b=&from_search=12
● https://wiki.apache.org/hadoop/PoweredBy
● Making Sense Of Streaming Processing by Martin Kleppmann
Amir Sedighi - May 2015 3
Data Explosion
Amir Sedighi - May 2015 4
Data Explosion
Amir Sedighi - May 2015 5
● Big-Data is that everything we do is increasingly leaving a digital trace which we (or others) can gather, use and analyze.– Data Providers
● Business Companies● People
Amir Sedighi - May 2015 6
Volume, Velocity, Variety● “There was 5 exabytes of
information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing.” Eric Schmidt
Amir Sedighi - May 2015 7
Big-Data Processing
Amir Sedighi - May 2015 8
How to setup a Big-Data processing platform using commodity machines?
Amir Sedighi - May 2015 9
Vertical or Horizontal?
Amir Sedighi - May 2015 10
Scale Up vs Scale Out
Amir Sedighi - May 2015 11
Scale Up vs Scale Out
Amir Sedighi - May 2015 12
Big-Data Processing Open-Source Technology Stack
Amir Sedighi - May 2015 13
Map-Reduce
Amir Sedighi - May 2015 14
Hadoop Framework
Amir Sedighi - May 2015 15
Apache Hadoop Main Projects
Amir Sedighi - May 2015 16
Amir Sedighi - May 2015 17
SQL on Hadoop
● Apache Hive● Apache Drill (Dremel)● Cloudera Impala● Facebook Presto● Apache Kylin
Amir Sedighi - May 2015 18
More Map-Reduce (YARN)
● Apache Spark● Apache Flink (Stratosphere)● Apache Hama● Apache Tez (DAG, Complex Data Processing)
Amir Sedighi - May 2015 19
Service Programming
● Apache Thrift● Apache Zookeeper● Apache Avro● Google Kryo
Amir Sedighi - May 2015 20
Data Stores
● Data Stores– KeyValue– Graph– Columnar– Document Store– In Memory
Amir Sedighi - May 2015 21
Data Transfer
● Apache Flume● Apache Sqoop
Amir Sedighi - May 2015 22
Search
● Elasticsearch● Apache SolR
Amir Sedighi - May 2015 23
Log Management
● ELK● Logstash● FluentD
Amir Sedighi - May 2015 24
Machine Learning
● Apache Mahout● MLLib● GraphX
Amir Sedighi - May 2015 25
Messaging and Queuing● Apache Kafka● ZeroMQ
Amir Sedighi - May 2015 26
Stream Processing
● Apache Storm● Apache Samza● Apache Spark
Amir Sedighi - May 2015 27
Data Processing
Transient Query– Issued once, then forgotten
Persistent DataStored until deleted by user or apps
Amir Sedighi - May 2015 28
Stream Processing
Transient Data– Deleted as Window Slides
Forward
Generated up-to-date answers as time goes on
Persistent Queries
Tim
e Ba
sed
Coun
t Bas
ed
Amir Sedighi - May 2015 29
Amir Sedighi - May 2015 30
Amir Sedighi - May 2015 31
● http://recommender.ir
● http://helio.ir
Amir Sedighi - May 2015 32
Thank You!
Find this slide here:
http://www.slideshare.net/AmirSedighi
LinkedIn:
http://www.linkedin.com/in/amirsedighi
Blog:
http://hexican.com
Email:
Twitter:
@amirsedighi