Taming Big Data with NoSQL
-
Upload
basho-technologies -
Category
Technology
-
view
162 -
download
2
Transcript of Taming Big Data with NoSQL
This presentation includes information that is confidential and proprietary to Basho Technologies and should not be forwarded or distributed without Basho's prior written consent. © 2014. Basho Technologies, Inc. All Rights Reserved.
This presentation includes information that is confidential and proprietary to Basho Technologies and should not be forwarded or distributed without Basho's prior written consent. © 2014. Basho Technologies, Inc. All Rights Reserved.
Matt Brender Developer Advocate
Taming
Big Data with NoSQL
Relational databases are not bad
Data Scientist Big Data
Basho Confidential 3
Data Scientist
Big Data
4
Big Data is
Basho Confidential 5
6
And it’s a distributed systems problem
Basho Confidential 7
Beyond the Scope of one tool
Beyond the Scope of one file system
Beyond the Scope of
one database
8
Ergo, NoSQL
9
NoSQL is
10
11
For Good Reason
Basho Confidential 12
13
Consistency LevelConflict ResolutionPartitioning Strategy
14
Consistency Level
Eventually Consistent
C = ConsistencyA = AvailabilityP = Partition Tolerance
Client Client
DBDBDB
Network Partition
Cap theorem states that a distributed system can at most support 2 out of these 3 properties
16
Consistency Level
17
Conflict Resolution
Last Write Winsvs.
Causal Context
18
Conflict Resolution
19
Partition Strategy
Master
Slave Slave Slave
OR
Node 1 Node 2 Node 3
20
Partition Strategy
21
“data is powerful when stored and analyzed”
Relational databases are not bad
23
Storing
24
Report on this
Basho Confidential 25
Report on this
26
vs
27
A single scalable system
28
A single scalable system
29
Analyzing
What kind of questions do you need to ask?
Basho Confidential 31
32
Error Analysis?As simple as NoSQL + Solr
33
Patterns? Machine Learning?
Multi-client writes to NoSQL & HDFS
OR
35
NoSQL + ETL process => other datastore + Spark
and/or Hadoop M/R
OR
37
NoSQL & Apache Storm to Kafka to HDFS
38
LAMDA
39
So we agree. NoSQL is helpful.
Side Note: NoSQL is a terrible term
42
In Review
43
You can’t analyze what you don’t have.
And you don’t want an analysis system to be unreliable.
THE COST OF DOWNTIME
44
Basho Confidential 45
46
everything works at small scale
47
Nothing matters if..
Basho Confidential 48
• Hadoop A framework that allows for the distributed processing of large data sets across clusters
• SparkA fast, general engine for large-scale data processing
• Storm A distributed real-time computation system
• NoSQLA collection of highly scalable, highly available systems that fall within CAP theorem
• SolrApache project for indexing text for search
• KafkaDistributed scalable pub/sub messaging queue
Summary
50
Thank You!
Matt Brender@mjbrender