Big Trends in Big Data
-
Upload
naresh-chintalcheru -
Category
Technology
-
view
493 -
download
1
description
Transcript of Big Trends in Big Data
![Page 1: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/1.jpg)
Big Trends in
Big Data
-Naresh Chintalcheru
2013 AITP Region-5 Technical Conference
![Page 2: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/2.jpg)
● Batch to Real Time● Sql, Sql, Sql …● Cloud Platform Support● Apache Hadoop 2.0
○ Improved Performance○ Improved Scalability○ Improved Security
● Applications○ Pattern Discovery Analytics○ Sophisticated Visualization
● BI & Data Warehouse● Big Data Vision
Agenda - Big Data Trends
![Page 3: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/3.jpg)
● Batch to Real Time● Sql, Sql, Sql …● Cloud Platform Support● Hadoop 2.0
○ Improved Performance○ Improved Scalability○ Improved Security
● Applications○ Pattern Discovery Analytics○ Sophisticated Visualization
● BI & Data Warehouse● Big Data Vision
Agenda - Big Data Trends
![Page 4: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/4.jpg)
Changing image of Big Data from Batch to Real Time
Hadoop + MapReduce = Batch Processing
Batch to Real Time
![Page 5: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/5.jpg)
● Companies need real time processing of Big Data for various applications including online Fraud Detection, CEP (Complex Event Processing) and more.
● Emerging new frameworks, architectures and tools are making the real time processing dream come true.
Batch to Real Time
![Page 6: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/6.jpg)
● Twitter’s Storm is an open source, distributed, fault-tolerant and real time computation system.○ Storm is a stream processing system
○ Unlike Hadoop jobs Strom jobs never stop continue to process data as it arrives
● Other Real Time systems include Streambase, HStreaming, Apache S4, Dempsy and Esper.
Big Data Real-Time Computing Systems
![Page 7: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/7.jpg)
● Batch to Real Time● Sql, Sql, Sql …● Cloud Platform Support● Hadoop 2.0
○ Improved Performance○ Improved Scalability○ Improved Security
● Applications○ Pattern Discovery Analytics○ Sophisticated Visualization
● BI & Data Warehouse● Big Data Vision
Agenda - Big Data Trends
![Page 8: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/8.jpg)
Big Data Processing include ...● Writing complex Java MapReduce Jobs
● Apache Pig Latin scripting
● Slow Sql processing from Apache Hive
Big Data Sql Tools
![Page 9: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/9.jpg)
Inspired with Google’s Dremel paper now many vendors offer faster SQL based tools● Google BigQuery● Cloudera Impala● IBM BigSql● Greenplum HAWQ● Hortonworks Stinger (Improve Hive Sql by x100)
● Apache Drill
Big Data Sql Tools
![Page 10: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/10.jpg)
● Batch to Real Time● Sql, Sql, Sql …● Cloud Platform Support● Hadoop 2.0
○ Improved Performance○ Improved Scalability○ Improved Security
● Applications○ Pattern Discovery Analytics○ Sophisticated Visualization
● BI & Data Warehouse● Big Data Vision
Agenda - Big Data Trends
![Page 11: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/11.jpg)
Big Data needs many computing nodes for Data Storage and Data Processing which are elastic in nature …
● Cloud VM based computing is a perfect solution for Big Data infrastructure
● Public Cloud MegaStar Amazon AWS announced support for Hadoop, which means spin off Hadoop installed VM with basic configuration in 10mins
Big Data And Cloud
![Page 12: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/12.jpg)
● Batch to Real Time● Sql, Sql, Sql …● Cloud Platform Support● Hadoop 2.0
○ Improved Performance○ Improved Scalability○ Improved Security
● Applications○ Pattern Discovery Analytics○ Sophisticated Visualization
● BI & Data Warehouse● Big Data Vision
Agenda - Big Data Trends
![Page 13: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/13.jpg)
New in Hadoop 2x
● Improved Performance with YARN aka MapReduce 2.0
● Improved Scalability with HDFS Federation
● Support for Microsoft Windows
● Improved Security
● HDFS Snapshots
Hadoop 2.0
![Page 14: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/14.jpg)
Improved Performance with YARN aka MapReduce 2.0
● MapReduce JobTracker managed both Resource management and App Job life-cycle together before.
● Now two functions are divided into separate components.
● Application Master negotiates with global Resource Manager for various Job requests
Hadoop 2.0 - Performance
![Page 15: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/15.jpg)
HDFS Federation
● No more single NameNode(NN) and SNN.
● HDFS Federation supports multiple independent NameNodes and Namespaces.
● Each DataNode(DN) registers with all the NameNodes in the cluster. DN sends periodic heartbeats & block reports and handle commands from all NN.
Hadoop 2.0 - Scalability
![Page 16: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/16.jpg)
Improved Security
● Enforcement of HDFS file permission by NN and Access Control List (ACL) of users and groups
● Block Access Tokens for access control to Data block.
● Job Tokens to enforce Task authorization
● Network Encryption & Kerberos RPC. Now HDFS file transfer can be configured for encryption
Hadoop 2.0 - Security
![Page 17: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/17.jpg)
Improved Backup & Disaster Recovery
● HDFS Snapshots are read-only point-in-time copies of the file system.
● Snapshots can be taken on a subtree or entire file system.
● Useful for data backup, protection against user errors and disaster recovery
Hadoop 2.0 - HDFS Snapshots
![Page 18: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/18.jpg)
● Batch to Real Time● Sql, Sql, Sql …● Cloud Platform Support● Hadoop 2.0
○ Improved Performance○ Improved Scalability○ Improved Security
● Applications○ Pattern Discovery Analytics○ Sophisticated Visualization
● BI & Data Warehouse● Big Data Vision
Agenda - Big Data Trends
![Page 19: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/19.jpg)
● Infrastructure layer of Big Data is largely solved (.........secret Hadoop)
● Now the future innovation is focused on applications and analytics
Big Data Applications
![Page 20: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/20.jpg)
Pattern Discovery and Sense-Making based analytic applications.
● Wibi Data: Lessons learned and predictive apps
● Recorded Future: Web intelligence for Business decisions
● Nutonian: Uncovers relationships hidden with in complex
data
● R Studio: Data analysis tool
Big Data Analytic Applications
![Page 21: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/21.jpg)
Sophisticated Big Data Visualization tools.
● IBM BigSheets
● D3.js
● Fathom
● Processing.org
Big Data - Visualization Applications
![Page 22: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/22.jpg)
● Batch to Real Time● Sql, Sql, Sql …● Cloud Platform Support● Hadoop 2.0
○ Improved Performance○ Improved Scalability○ Improved Security
● Applications○ Pattern Discovery Analytics○ Sophisticated Visualization
● BI & Data Warehouse● Big Data Vision
Agenda - Big Data Trends
![Page 23: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/23.jpg)
Support from various BI vendors IBM Cognos, SAP Business Objects & Oracle Hyperion to connect directly to Hadoop Data using Apache Hive connectors.
Big Data & Business Intelligence
![Page 24: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/24.jpg)
Challenge of new multiple unstructured data sources such as Clickstreams, Social media, Mobile, Sensors and Web Logs requires massive processing and traditional data warehouse cost to scale.
The Big question is data warehouse survive the Big Data ?More on this in my next presentation :)
Big Data & Data Warehouse
![Page 25: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/25.jpg)
● Batch to Real Time● Sql, Sql, Sql …● Cloud Platform Support● Hadoop 2.0
○ Improved Performance○ Improved Scalability○ Improved Security
● Applications○ Pattern Discovery Analytics○ Sophisticated Visualization
● BI & Data Warehouse● Big Data Vision
Agenda - Big Data Trends
![Page 26: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/26.jpg)
Big Data requires a Big Vision
Big Data Vision
![Page 27: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/27.jpg)
● Unlike Business Intelligence, Big Data is an innovation originated from the IT side.
● The Business departments, which should come up with Big Data usage requirements needs constant coaching on the potential of the Big Data intelligence and successful stories.
Big Data requires Big Vision
![Page 28: Big Trends in Big Data](https://reader033.fdocuments.us/reader033/viewer/2022042714/54bca4d54a7959323a8b457e/html5/thumbnails/28.jpg)
Feedback appreciated
Nash [email protected]
Presentation pdf : www.slideshare.net/chintal75
Thank You