What is the past future tense of data?
-
Upload
ted-dunning -
Category
Technology
-
view
108 -
download
0
description
Transcript of What is the past future tense of data?
1©MapR Technologies - Confidential
The Shape of Data to Comeit isn’t what we thought it was
2©MapR Technologies - Confidential
Do you remember
the future?
3©MapR Technologies - Confidential
4©MapR Technologies - Confidential
Some things
turned out as
expected
5©MapR Technologies - Confidential
Guys wearing Fedoras
6©MapR Technologies - Confidential
What about “Big Data”?
7©MapR Technologies - Confidential
Harvard University will have 200 x 106 volumes by 2040
Fremont Rider, 1944
8©MapR Technologies - Confidential
To cope … only short papers should be published. … not more than 2500 characters counting “space,” punctuation marks, etc.
Gray and Ruston in IEEE Transactions on Electronic Computers, 1964
9©MapR Technologies - Confidential
Remember the guy in
the Fedora?
10©MapR Technologies - Confidential
He’s tweeting
about this right now
11©MapR Technologies - Confidential
So what is the big data monorail and what is the
cool hat?
12©MapR Technologies - Confidential
Data curationRigid SchemasEngineered Structure
13©MapR Technologies - Confidential
Data curationRigid SchemasEngineered Structure
MONORAIL
14©MapR Technologies - Confidential
Data as-you-find-it
Flexible schemasLate binding
15©MapR Technologies - Confidential
Data as-you-find-it
Flexible schemasLate bindingCoo
l Hats
16©MapR Technologies - Confidential
17©MapR Technologies - Confidential
MONORAIL
18©MapR Technologies - Confidential
19©MapR Technologies - Confidential
Cool H
ats
20©MapR Technologies - Confidential
Why is it different?How does it work?
21©MapR Technologies - Confidential
More data is being produced more quickly
Data sizes are bigger than even a very large computer can hold
Cost to create and store continues to decrease
The Conventional Answer
BUSTED!
22©MapR Technologies - Confidential
Analytics Scaling Laws
Analytics scaling is all about the 80-20 rule – Big gains for little initial effort– Rapidly diminishing returns
The key to net value is how costs scale– Old school – exponential scaling– Big data – linear scaling, low constant
Cost/performance has changed radically– IF you can use many commodity boxes
23©MapR Technologies - Confidential
Which bytes first?
24©MapR Technologies - Confidential
25©MapR Technologies - Confidential
26©MapR Technologies - Confidential
Net value optimum has a sharp peak well before maximum effort
27©MapR Technologies - Confidential
But scaling laws are changing both slope and shape
28©MapR Technologies - Confidential
More than just a little
29©MapR Technologies - Confidential
They are changing a LOT!
30©MapR Technologies - Confidential
31©MapR Technologies - Confidential
32©MapR Technologies - Confidential
33©MapR Technologies - Confidential
34©MapR Technologies - Confidential
Initially, linear cost scaling actually makes things worse
A tipping point is reached and things change radically …
35©MapR Technologies - Confidential
Evolution of Data Storage
FunctionalityCompatibility
Scalability
Linux
POSIX
Over decades of progress,Unix-based systems have set the standard for compatibility and functionality
36©MapR Technologies - Confidential
Evolution of Data Storage
FunctionalityCompatibility
Scalability
Linux
POSIX
HadoopHadoop achieves much higher scalability by trading away essentially all of this compatibility
37©MapR Technologies - Confidential
Evolution of Data Storage
FunctionalityCompatibility
Scalability
Linux
POSIX
Hadoop
MapR enhances Apache Hadoop by restoring the compatibility while increasing scalability and performance
38©MapR Technologies - Confidential
Introducing MapR
MapR offers thetechnology leading
distribution for Hadoop
39©MapR Technologies - Confidential
The Industry-Leaders Choose MapR in the Cloud
Google chose MapR to provide Hadoop on Google
Compute Engine
Amazon EMR is the largest Hadoop provider in revenue
and # of clusters
40©MapR Technologies - Confidential
MapR Supports Broad Set of Use Cases
Log analysis HBase
Customer targeting Social media analysis
Customer Revenue Analytics
ETL Offload
Advertising exchange analysis and optimization
Clickstream Analysis Quality profiling/field
failure analysis
Customer Sentiment
Network Analytics
Monitors and measures behavior of online shoppers
Fraud Detection Channel analytics
Customer Behavior Analysis Brand Monitoring
Customer targeting Viewer Behavioral analytics
Recommendation Engine Family tree connections
Intrusion detection & prevention Forensic analysis
Global threat analytics
Virus analysis
Patient care monitoring
Leading Retailer Recommendation Engine Fraud detection and Prevention
Leading Bank
41©MapR Technologies - Confidential
MapR
MapRThe guys with the
cool hats
42©MapR Technologies - Confidential
MapR’s Innovations
43©MapR Technologies - Confidential
Seamless integration with existing applications
100% POSIX compliant
Industry standard APIs - NFS, ODBC, LDAP, REST
More 3rd party solutions
Proprietary connectors unnecessary
Language neutral
44©MapR Technologies - Confidential
MapR’s Innovations
45©MapR Technologies - Confidential
MapR: Lights Out Data Center Ready
Reliable Compute Dependable Storage
Automated stateful failover Automated re-replication Self-healing from HW
and SW failures Load balancing Rolling upgrades No lost jobs or data 99999’s of uptime
End-to-end checksums Strong consistency Business continuity with
snapshots and mirrors Recover to a point in time
with snapshots Mirror across sites for
disaster recovery
46©MapR Technologies - Confidential
MapR’s Innovations
47©MapR Technologies - Confidential
Why MapR Is Faster
• Eliminates storage contentionLockless Storage Service™
• Provides throughput at device speed Direct Block Device IO
• Exploits MapR-FS architecture to deliver performance using Hadoop Direct Shuffle
Hadoop Direct Shuffle
• Reduces network overhead using automatic compression
Client Side Compression
• Eliminates sporadic Java garbage collection overhead (system written in C)C vs Java
48©MapR Technologies - Confidential
Security
MapR is pushing the envelope on Hadoop security
Integrates with Linux security (PAM)– Works with any user directory: Active Directory, LDAP, NIS, …
Strong wire-level authentication and encryption– Kerberos and non-Kerberos options
Fine-grained access control– Full POSIX permissions on files and directories– ACLs on tables, column families, columns, cells– ACLs on MapReduce jobs and queues– Administration ACLs on cluster and volumes
49©MapR Technologies - Confidential
Bullet-proof NoSQL with Zero Administration
ReliabilityPerformance Easy Administration
Benefit Features
High Performance Over 1 Million ops/sec with 10 Node Cluster
Continuous Low Latency No I/O Storms, No Compactions
24x7 Applications Instant Recovery, Online Schema Modification, Snapshots, Mirroring
Zero Administration No Processes to Manage, Automated Splits, Self-tuning
High Scalability 1 Trillion Tables
Low TCO Files and Tables on One Platform
50©MapR Technologies - Confidential
MapR M7 vs. CDH – Mixed Load (50-50)
51©MapR Technologies - Confidential
MapR M7 vs. CDH – Mixed Load (50-50)
52©MapR Technologies - Confidential
MapR
MapRThe guys with the
cool solutions
53©MapR Technologies - Confidential
MapR
MapRThe future of
the future
54©MapR Technologies - Confidential
Thank You