NoSQL Slideshare Presentation
-
Upload
ericsson-labs -
Category
Technology
-
view
3.570 -
download
0
description
Transcript of NoSQL Slideshare Presentation
Data Research Day 2013
for Telco
Prepared by
Nicolas Seyvet
Help from
N. Hari Kumar P. Matray
Ericsson Internal | 2013-06-03 | Page 2
› Software Developer10+ years at Ericsson
› HLR, PGM, IMS-M, MMS, MTV, BCS
› Joined Research late 2012–BMUM -> BUSS (5+ years)–DUCI (<6 months)
› Active member in various /// groups–Linux (ELX, UMWP, etc.), Agile, SWAN, EQNA
› Open source contributor
Who AM I?
Ericsson Internal | 2013-06-03 | Page 3
›Why NoSQL?›CAP›Research activities›Market trends
The Plan
Data Research Day 2013
NoSQL: Why?
Ericsson Internal | 2013-06-03 | Page 5
NoSQL: Why?Trends – Usual Suspects
Gartner Data Center TCO Report, June 2012.
Internet Hypertext, RSS, Wikis, blogs, wikis, tagging, user generated content, RDF, ontologies
GossipSDN
Ericsson Internal | 2013-06-03 | Page 6
NoSQL: Why?TrendS: Architecture
1980s: Mainframe applications 1990s: Database as integration hub 2000s: Decoupled services
› Multicore› Parallelization/Distributed› Cloud› Schemaless
ApplicationApplicationApplicationApplication ApplicationApplication ApplicationApplication ApplicationApplication ApplicationApplication ApplicationApplication
Ericsson Internal | 2013-06-03 | Page 7
Two Ways to ScaleGo BIG or many?
PARTITION
(replication)
Data Research Day 2013
CAP
Vaila
bilit
y
artition
Ericsson Internal | 2013-06-03 | Page 9
› 2000 Prof Eric Brewer, PoDC Conference Keynote› 2002 Seth Gilbert and Nancy Lynch, ACM SIGACT News 33 (2)
CAP Theorem Brewer’s Conjecture
“Of three properties of shared-data systems – data Consistency, system Availability and tolerance to network Partitions – only two can be achieved at at any given moment in timeany given moment in time.”
Ericsson Internal | 2013-06-03 | Page 10
CAP Theorem The business decision
Partition
CONSISTENT
Available
OR
Ericsson Internal | 2013-06-03 | Page 11
CAP Summary
CA
CP
AP
Available
Consistent Partition Tolerance
Voldemort, Riak, Cassandra,
CouchDb, Dynamo like systemsTraditional relational: MySQL, PostgreSQL, etc.
HBase, MongoDB, Redis, BigTable like systems
AP: Requests will complete at any node possibly violating consistency
CP: Requests will complete at nodes that have quorum
Ericsson Internal | 2013-06-03 | Page 12
› Trends
Why NoSQL now?
“Internet size”, Cluster friendly
Rapid development / Solution oriented
Polyglot Persistence
Schemaless
Data Research Day 2013
Research ActivitiesTelCO ApplicabilityAggregationEvent Streams
Ericsson Internal | 2013-06-03 | Page 14
HBAseBigTable/Columnar
Region allocationFailoverLog splittingLoad balancingOne active (elected), many stand by
Holds regionsHandle I/O requestsIn-Memory data (MemStore)Split regionsCompact regions
Data filesWrite-Ahead Log (WAL)Rack awareDefault data replication x3
CoordinationMaster selectionRoot region lookupNode registration…
› ZooKeeper (cluster)
› Hadoop (cluster)
› HBase: 1 elected master / many region servers
Ericsson Internal | 2013-06-03 | Page 15
›Comprehensive report
›Using HBase is DOABLE!
TelCO Applicability StudyHbase For HLR data?
OK!
Ericsson Internal | 2013-06-03 | Page 16
HBASE BULK ProcessingEvent Processing & Aggregation
› 100 Million rowsQueries evaluatedSELECT col1 FROM tableSELECT SUM(col1) FROM table WHERE col2=val2
GROUP BY col3
› CPU
› RAM
› Network
› Schema
› Map/Reduce
› Scan
› Co-processor
Ericsson Internal | 2013-06-03 | Page 17
Bulk ProcessingScaling out/Horizontally
› 100 Million rows
› Linear scaling!
SELECT SUM(col1) FROM table WHERE col2=val2GROUP BY col3
Ericsson Internal | 2013-06-03 | Page 18
READ/WRITE100000 iterations
› 150,000,000 rows› row = key + 1 column (1K)
Entire cluster up and running8 nodes ( 1Master / 7 slaves)
Periodic degradation
Ericsson Internal | 2013-06-03 | Page 19
RobustnessKilling Them Softly…
Master
Slaves
Ericsson Internal | 2013-06-03 | Page 20
How much Data can it Fit?ITK / Constellation / CEA
› Network produces events– RNC, SGSN, S-&R-KPI– Traffic DPI– GTP-C
› CEA (Perfmon)– Correlated events
1000+ K events/s
10+ K events/s
EventFeederEvent
Feeder
HBaseBulkLoader
HBaseBulkLoader
Lookup data
Staging data on HDFS
Map/Reduce
HBasePutLoader
HBasePutLoader
Put.. Put.. Put…
10,000,000 subscribers
Ericsson Internal | 2013-06-03 | Page 21
The Upcoming Fight
Storkluster18 machines
Bigdata2 machines
Ericsson Internal | 2013-06-03 | Page 22
› It scales!
› TestDFSIO benchmark- Read > 3000 GB/s- Writes > 2000 GB/s
› But
…. it is not that simple…
What about HDFS ? Small files
(250 B)
CPU and I/OCPU and I/ONetworkNetwork
CPUCPU
Larger files(1 KB)
Larger files(1 KB)
Ericsson Internal | 2013-06-03 | Page 23
› It scales!
› And it gets…
more complicated
What about End to End?writing to Hbase included
200 K events/s
100 K events/s
Ericsson Internal | 2013-06-03 | Page 24
› Within ~2 hours – Rows/s ----------- 7K/s– CPU +++ x2– IO +++++++++ 100%
But….
Ericsson Internal | 2013-06-03 | Page 25
› Remember what we were doing?– Hint: Creating lots of small files to add to HBase?..
› Major compaction storm! – Manage compaction and region splitting
HDFS CURSECompaction Storm
HBaseBulkLoader
HBaseBulkLoader
M/R
Ericsson Internal | 2013-06-03 | Page 26
› Scalability … Scalability… Scalability
› It works but it is not so easy…
› Recommendation:– Polyglot data storage
Conclusion
Ericsson Internal | 2013-06-03 | Page 27
Data Research Day 2013
NoSQL
Ericsson Internal | 2013-06-03 | Page 29
› It is not about saying SQL is bad or should not be used
› ”An accidental neologism” – Martin Fowler› A twitter hash
› No prescriptive definition, just observations of common characteristics
– “Any database that is not a Relational Database”– Running well on clusters (scalable)– schemaless
› Polyglot persistence– Using different stores in different circumstances
NoSQL: The name
The term was coined at a meetup with the creators behind some prominent emerging databases... then there was a conference ...... and a mailing list ...... the name caught on ...... then there were more conferences ...... and here we are!
Ericsson Internal | 2013-06-03 | Page 30
NoSQL: Why?Trend No 2/4: Connectedness
Internet Hypertext, RSS, Wikis, blogs, wikis, tagging, user generated content, RDF, ontologies
Application
M2M
Ericsson Internal | 2013-06-03 | Page 31
NoSQL: Why?Trend No 3/4: Content Individualization
› Individualization of content› Decentralization
Schemaless•Extend at runtime•De-normalize•Domain design (not schema migration)
Ericsson Internal | 2013-06-03 | Page 32
› 4 emerging categoriesKey-Value
Graph
BigTable
Document
(NewSQL)
(Object)
NoSQL Landscape
DBN
Ericsson Internal | 2013-06-03 | Page 33
Consistency
“A system is consistent if an update is applied to all relevant nodes at the same logical time”
NoSQL solutions DO support Transactions
Standard database replication (or caching) IS NOT strongly consistent, as such any solutions making use of any of those is by definition Eventually Consistent at best
Strong consistency Weak consistencyAtomicity Consistency Isolation Durability (ACID)
Eventual consistency (inconsistency window)
Ericsson Internal | 2013-06-03 | Page 34
› “The network will be allowed to lose arbitrarily many messages sent from one node to another” [..]
› “For a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response”
Gilbert and Lynch, SIGACT 2002
Partition Tolerance / Availability
High latency ~= Partition
CP: Requests will complete at nodes that have quorum
AP: Requests will complete at any node possibly violating consistency
Ericsson Internal | 2013-06-03 | Page 35
HBASE BULK ProcessingEvent Processing & Aggregation
Queries evaluatedSELECT col1 FROM tableSELECT SUM(col1) FROM table WHERE col2=val2
GROUP BY col3
› 100 Million rows