Big data presentation
-
Upload
chinh-vo-wili -
Category
Software
-
view
395 -
download
0
Transcript of Big data presentation
How much data?
7 billion peopleGoogle processes 100 PB/day; 3 million serversFacebook has 300 PB + 500 TB/day; 35% of
world’s photosYouTube 1000 PB video storage; 4 billion
views/dayTwitter processes 124 billion tweets/yearSMS messages – 6.1T per yearUS Cell Calls – 2.2T minutes per yearUS Credit cards - 1.4B Cards; 20B
transactions/year3
Contents
4. Big Data Security
3. SQL vs NoSQL
2. Big Data Technology Today
1. Big Data Overview
5. Big data trends
6. Demo with MongoDB & Ref docs
1. Big Data Overview (tt)
“Big data is not a single technology but a combination of old and new tech-nologies that helps companies gain actionable insight”. (“Big Data For DummiesPublished by John Wiley & Sons,
Inc. ” book reference)
Structured Data(…)
Computer- or machine-generated: Machine-generated data generally
refers to data that is created by a machine without human intervention.(Sensor data, Web log data, Point-of-sale data, Financial data…)
Human-generated: This is data that humans, in interaction with computers, supply (Input data, Click-stream data, Gaming-related data…)
Unstructured Data(…)
Unstructured data is everywhereMachine-generated unstructured data: Satellite images, Scientific data, Photographs and video, Radar or sonar data…
Human-generated unstructured data:Text internal to your company, Social media data, Mobile data…
Managing different data types
Integrating data types into a big data environment need:
Connectors: enable you to pull data in from various big data sources
Metadata is the definitions, mappings, and other characteristics used to describe how to find, access, and use a company’s data (and software) components
Analysis• Querying• Statistic• Modeling• Data Mining• Text
analytics
Analysis & Processing
Processing• Data storage • Data transfer• Data
monitoring
What will we do with Big Data?
2.Big Data Technology Today(tt)
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
2.Big Data Technology Today(tt)
Instead of treating memory as a cache, why not treat it as a primary data store? Facebook keeps 80% of its
data in Memory (Stanford research)
RAM is 100-1000x faster than Disk (Random seek)• Disk - 5 -10ms • RAM – x0.001msec
20
EventsFACEBOOK
Memory Grid
Data Grid
Data Grid
Data Grid
2.Big Data Technology Today(tt)
Open-source software framework from Apache Hadoop Google MapReduce GFS (Google File System)
HDFS Map/Reduce
3. SQL vs NoSQL (…)
A relational database is a set of tables containing data fitted into predefined categories.
Each table contains one or more data categories in columns.
Each row contains a unique instance of data for the categories defined by the columns.
3. SQL vs NoSQL (…)
Key-value stores. As the name implies, a key-value store is a system that stores values indexed for retrieval by keys.
Some of the market leaders:
RiakAmazon DynamoVoldermort
3. SQL vs NoSQL (…)
Column-oriented databases. column-oriented databases contain one extendable column of closely related data
Some of the market leaders:
HBaseCassandra
3. SQL vs NoSQL (…)
Document-based stores. These databases store and organize data as collections of documents, rather than as structured tables with uniform sized fields for each record
Some of the market leaders:
MongoDBCouchDBSimpleDB
3. SQL vs NoSQL (…)
GridFS stores files in two collections: chunks stores the binary chunks. For
details, see The chunks Collection. files stores the file’s metadata. For
details, see The files Collection.
4. Big Data Security
• Secure computations in distributed programming frameworks
• Security best practices for non-relational data stores
• Secure data storage and transactions logs• Cryptographically enforced access control
and secure communication• Granular access control• Real-time security/compliance monitoring
4. Big Data Security (…)
Technical Recommendations for sercurity• Use Kerberos for node authentication• Use file layer encryption• Data anonymization• Use key management• Deployment validation• Use secure communication• Tokenization• Cloud database controls
5. Big data trends
• Big data – of the people, by the people, for the people
• Big data and social computing• Cloud computing• In memmory computing• Mobile Applications and HTML5• Internet and big data
6. Demo with MongoDB & Ref docs
Ref docs: Judith Hurwitz, Alan Nugent, Dr. Fern Halper,
and Marcia Kaufman: Big Data For Dummies. John Wiley & Sons, Inc. 2013.
“Technology Trends for 2013” prepared by Kaushal Amin, Chief Technology Officer, KMS Technology – Atlanta, GA, USA
Website: http://hadoop.apache.org/Demo with MongoDB