DataStax
-
Upload
michael-shaler -
Category
Technology
-
view
121 -
download
0
description
Transcript of DataStax
©2013 DataStax Confidential. Do not distribute without consent.
Extreme Data VelocityContinuous AvailabilityOperational SimplicityMichael ShalerSenior Director, Business Development
What is Big Data’s payoff?
DataStax: CRN’s “10 Coolest Big Data Startups” Cassandra: InfoWorld’s Technology of the Year
1,000+ production deployments and 300 customers$84M in funding from industry-leading investors
BHAGWe are the first viable alternative to
Oracle for modern online applications.
We seek to be the first and best choice in databases.
No, Seriously…
Real-world Use Cases
7
Internet of Things Database Requirements
• “UTC subject predicate”: Time series data and metadata are the lingua franca of sensors/device data communications
• FAST AND ALWAYS ON: High-velocity ingest rates from geographically dispersed inputs with variable schemas/data models is the norm—and unless you tell them to do so, sensors never, ever sleep…
• HOT AND COLD: Real-time data and analytics vs. data reservoir/data factory needs vary.
• DHTs: Wide-row column-oriented distributed hash tables are the optimal home for IoT operational datastores
• AND: Other key functionality needed includes indexed search, along with both batch and real-time analytics—with data-in-flight and data-at-rest security an emerging need
• SPOILER ALERT: DataStax Enterprise supports all of the above
Time Series Analytics: 70B readings
Smart Grid Proof of Concept: Analyze 2 years of Smart Meter data for 1M households
Improvements in demand forecasting could yield EBITDA > $100M per GW saved
• $5M CAPEX• 10 man/months delivery
(Deploy, DevOps, Tuning)• Ongoing OPEX of > $1M
• $450K OPEX• 2 DevOps running 15 AWS nodes• Faster performance in 2 weeks• …All in the cloud
Major Changes: The Evolving Data Center
LOBApp
Oracle
LOBApp
MySQL
LOBApp
SQLServer
“What’s Happening?”Hyper VelocityTransactional
NoSQL
Data Warehouse
Teradata/Exadata
“What Happened?”Massive Volume
Bit Bucket
Hadoop
The Application World *HAS* Changed
11
Common Use Cases
• Big data OLTP and write intensive systems
• Time series data management
• High velocity device data consumption and analysis
• Healthcare systems input and analysis
• Media streaming (music, movies, etc.)
• Online Web retail (shopping carts, user transactions, etc.)
• Online gaming (real-time messaging, etc.)
• Real time data analytics
• Social media input and analysis
• Web click-stream analysis
• Buyer event and behavior analytics
• Fraud detection and analysis
• Risk analysis and management
• Supply chain analytics
• Web product searches
• Internal document search (law firms, etc.)
• Real estate/property searches
• Social media match ups
• Web & application log management / analysis
Continuous Availability Commentary
LondonVirginia
Santa ClaraSydney
D3A1
A2
A3
B1
B2B3
C1
C2
C3
D1
D2Cassandra: Architecture as Foundation
14
The New DR: Simian Army “Dystopia as a Service”
15
Heterogeneous Workloads: Active Everywhere
WriteAnalyze
ReadSearch
Write
Write
Read
Search
Our Product Solution
• DataStax Enterprise powers the big data apps that transform business.
• Extreme Data Velocity
• Continuous Availability
• Operational Simplicity
17
©2012 DataStax
33M streaming customers
2TAPI calls/year
~1,200Servers
55AWS clusters
12 developers
4 operators
0New data centers
Operational Simplicity
“Our primary operational data store is now Cassandra, not Oracle.”
Performance: NoSQL Leadership
Source: Solving Big Data Challenges for Enterprise Application Performance Management
Tillman Rabl, University of Toronto et al VLDB 2012 (August 2012, Istanbul)
Cassandra vs. HBase:
• 10x more read throughput
• 100x faster read latency
• 8x more write throughput
• 8x faster scan latency
• 4x more scan throughput
19
Performance: NoSQL Leadership
©2012 DataStax
YCSB Load Process
YCSB Read-write mix
YCSB Read-mostly
YCSB Write-mostly
20
From STB to the Scalable Cloud Message Bus
Enabling a richer active consumer experience across multiple devices, multiple platforms
Even in pre-production environment prior to tuning, achieved near-linear scalability
21
Instagram Scales Engaged Networks
• Transitioned from Redis (in-memory cache) to Cassandra in Amazon Web Services EC2
• Doubled cluster—and then doubled again—to support 150MM users on new infrastructure
• Continue to scale in spite of Justin Bieber storms, video formats, new features, new markets
Our Vision
DataStax is driving Cassandra to be the first viable alternative to the Oracle database for companies who are transforming the way they interact with customers.
Getting ahead of exploding growth• Sign big, new contracts all the time (ESPN)
• 200M unique users per month• 40TB of data
Flexible architecture • “Couldn’t shoehorn RDBMS technology”
Very small operations team• 3 people• 20 clusters• 100’s of nodes
Why We Exist
Today’s applications must be always available and lightning fast as they scale to previously unimaginable levels.
Cassandra delivers both with a beautifully simple and elegant architecture.
“We need a real-time, massively scalable architecture, where no one node is a single point of failure, that can easily span multiple data centers and cloud availability zones, and that’s Cassandra.”
What We Do Best
Cassandra was designed to do things that are impossible in other databases when it comes to availability and performance. Forget about losing a machine here or there -- Cassandra delivers a world where you can lose an entire datacenter and still perform as your customers expect.
“We have to be ready for disaster recovery all the time. It’s really great that Cassandra allows for active-active multiple data centers where we can read and write anywhere”
Jay PatelTechnical Architect at eBay(Describing why they switched from legacy relational architecture)
The Modern “Application”
The Modern “Application”
Fraud Detection and Prevention
What It Means In Real Life
What It Means In Real Life
Cassandra Summit SF 2013
Real Growth In Production
We are the first viable alternative to Oracle for
modern online applications.
©2013 DataStax Confidential. Do not distribute without consent.
Thank You
We power the big data apps that transform business.
©2013 DataStax Confidential. Do not distribute without consent.
DataStax OpsCenter 4.0
©2013 DataStax Confidential. Do not distribute without consent.
DataStax OpsCenter 4.0
©2013 DataStax Confidential. Do not distribute without consent.
DataStax OpsCenter 4.0
©2013 DataStax Confidential. Do not distribute without consent.
DataStax OpsCenter 4.0
©2013 DataStax Confidential. Do not distribute without consent.
DataStax OpsCenter 4.0
Security in Cassandra FEA
TU
RES
BEN
EFIT
S
Internal Authentication
Manages login IDs and passwords inside
the database
+Ensures only authorized users
can access a database system
using internal validation
+Simple to implement and easy
to understand
+No learning curve from the relational
world
Object Permission Management
controls who has access to what and
who can do what in the database
+Provides granular based control over
who can add/change/delete/re
ad data
+Uses familiar GRANT/REVOKE from relational systems
+No learning curve
Client to Node Encryption
protects data in flight to and from a
database cluster
+Ensures data cannot be captured/stolen in route to a server
+Data is safe both in flight from/to a
database and on the database; complete coverage is ensured
Advanced Security in DataStax EnterpriseFEA
TU
RES
BEN
EFIT
S
External Authentication uses
external security software packages to
control security
+Only authorized users have access
to a database system using
external validation
+Uses most trusted external security
packages (Kerberos, LDAP), mainstays in
government and finance
+Single sign on to all data domains
Transparent Data Encryption
encrypts data at rest
+Protects sensitive data at rest from
theft and from being read at the file system level
+No changes needed at application level
+Can encrypt both Cassandra and Hadoop data
Data Auditingprovides trail of who
did and looked at what/when
+Supplies admins with an audit trail of
all accesses and changes
+Granular control to audit only what’s
needed
+Uses log4j interface to ensure
performance and efficient audit
operations