Data 360 Conference: Introduction to Big Data, Hadoop and Big Data Analytics
An introduction to Big Data
-
Upload
forwardsprint-d -
Category
Technology
-
view
178 -
download
2
Transcript of An introduction to Big Data
What is Big Data?
Volume • 100s of TBs • PB scale • Too big for
traditional transaction processing
Velocity • Distributed,
Parallel Processing
Variety • Structured &
unstructured content
Veracity • Trustworthiness,
Reliability
Drivers for Big Data Adoption
Big Data Adoption
Commodity Hardware Support
Open Source
Ecosystem Web
Economy
Reduced Storage
Costs
Sources of Big Data
Archives
Documents
Media
Business Apps, Data
Storage Public Web
Social Media
Machine/Sensor Data
Usage ScenariosW
hat
we
do? Activities
Conversations
Social Media Photographs
Videos Transactions
Wha
t bi
g da
ta D
oes?
Text Analysis Speech Analysis
Sentiment Analysis
Spending Analysis Geographical Analysis
Working with Big Data
Data Source /Ingestion Data Storage
Data Processing/
Transformation Data Analysis
& Output
Hadoop
Combination of MapReduce engine and HDFS
Shift of responsibilities for availability & distribution
Brings processing closer to the data
Hadoop Eco-System
Apache Hadoop
HBase, Cassandra
Hive, Pig
Sqoop
Mahout
MapReduce, HDFS
Database
Structured Queries
RDBMS Connectivity
Machine Learning/Data Mining
Hive
• Started as a sub-project of Hadoop • Now a top-level Apache project
• Provides SQL like abstraction layer over MapReduce
• Has its own HDFS table file format (and it’s fully schema-bound)
• Can also work over Hbase
• Acts as a bridge to many BI products which expect tabular data
NoSQL
• Neo4j • Hbase
• MongoDB • Amazon
DynamoDB
• Redis
Key-Value Stores
Document Stores
Graph Databases
Wide Column/Column Family
Additional Information} To learn more about big data & the eco-system,, get in
touch with us.
[email protected] www.forwardsprint.com
Thank you!
www.forwardsprint.com