Databases for Big Data
Transcript of Databases for Big Data
![Page 1: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/1.jpg)
Databases for Big Data
2021.9.22, Guiyang
![Page 2: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/2.jpg)
Contents• Databases for Big Data
• OldSQL: SQL + ACID• Problems for big data
• NoSQL systems and its problems• Scalibility, (SQL), No ACID • CAP Theorem
• NewSQL systems• SQL,• Scalabiltiy,• ACID
• Challenges of NewSQL systems• Architecture• Distributed concurrency control
![Page 3: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/3.jpg)
1. Databases for Big Data
Big Data Every WhereLots of data is being
collected and warehousedWeb data
Social networkWeChat 微信
FacebookE-commerce淘宝,Amazon
Mobile communicationGPS …
![Page 4: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/4.jpg)
How much data?•Google processes 20 PB a day •Facebook has 2.5 PB of user data + 15 TB/day •eBay has 6.5 PB of user data + 50 TB/day
640K ought to be enough for anybody.
![Page 5: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/5.jpg)
The Earthscope• the world's largest science project. Designed to track North America's geological evolution, this observatory records data over 3.8 million square miles, amassing 67 terabytes of data. It analyzes seismic slips in the San Andreas fault, sure, but also the plume of magma underneath Yellowstone and much, much more. (http://www.msnbc.msn.com/id/44363598/ns/technology_and_science-future_of_technology/#.TmetOdQ--uI)
1.
![Page 6: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/6.jpg)
Type of Data•Relational Data (Tables/Transaction/Legacy Data)•Text Data (Web)•Semi-structured Data (XML) •Graph Data
–Social Network, Semantic Web (RDF), …
•Streaming Data –You can only scan the data once
![Page 7: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/7.jpg)
Big Data Definition
• No single standard definition…
“Big Data” is data whose scale, diversity, and complexity require new architecture,
techniques, algorithms, and analytics to manage it and extract value and hidden
knowledge from it…
![Page 8: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/8.jpg)
Characteristics of Big Data: 1-Scale (Volume)
• Data Volume• 44x increase from 2009 , 2020• From 0.8 zettabytes to 35zb
• Data volume is increasing exponentially
Exponential increase in collected/generated data
![Page 9: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/9.jpg)
Characteristics of Big Data: 2-Complexity (Varity)
• Various formats, types, and structures
• Text, numerical, images, audio, video, sequences, time series, social media data, multi-dim arrays, etc…
• Static data vs. streaming data
• A single application can be generating/collecting many types of data To extract knowledge all these
types of data need to linked together
![Page 10: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/10.jpg)
Characteristics of Big Data: 3-Speed (Velocity)
• Data is begin generated fast and need to be processed fast
• Online Data Analytics
• Late decisions missing opportunities
• Examples• E-Promotions: Based on your current location, your
purchase history, what you like send promotions right now for store next to you
• Healthcare monitoring: sensors monitoring your activities and body any abnormal measurements require immediate reaction
![Page 11: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/11.jpg)
Big Data: 3V’s
![Page 12: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/12.jpg)
Big Data: 4V’s
![Page 13: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/13.jpg)
Challenges in Handling Big Data
How to handle big dataOld SQL (Classic SQL)NoSQLNewSQL
How to use/analyze big dataOLTPOLAPData Mining
![Page 14: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/14.jpg)
Traditional Transaction Processing
Remember/Image how we used to buy tickets in 80’sBy telephoneThrough a professional terminal operator
At the human speedIn 1985, 1,000 transactions per second was
considered an incredible stretch goal!!!!Today: 1000K/second
![Page 15: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/15.jpg)
Traditional Transaction Processing
Workload was a mix of updates and queriesTo an ACID data base system
Make sure you never lose my dataMake sure my data is correct
At human speedBread and butter of RDBMSs (OldSQL)
Oracle, DB2, etc.
![Page 16: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/16.jpg)
How has TP Changed in 25 Years?
The internetClient is no longer a professional terminal
operatorInstead Aunt Martha is using the web herselfSends TP volume through the roofSerious need for scalability and performance
![Page 17: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/17.jpg)
How has TP Changed in 25 Years?
Mobile devices Your cell phone is a transaction originatorSends TP volume through the roofSerious need for scalability and performance
Need in some traditional markets for much higher performance!
![Page 18: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/18.jpg)
And TP is Now a Much Broader Problem(New TP)
The internet enables a green field of new TP applications
Massively multiplayer games state of the game, leaderboards, selling virtual goods are
all TP problems)Social networking (social graph is a TP problem)Real time ad placementReal time couponingAnd TP volumes are ginormous!!Serious need for speed and scalability!
![Page 19: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/19.jpg)
In all cases…..
Workload is a mix of updates and queriesComing at you like a firehose Still an ACID problem
Don’t lose my dataMake sure it is correct
Tends to break traditional solutionsScalability problems (volume)Response time problems (latency)
![Page 20: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/20.jpg)
Solution Choices
OldSQLLegacy RDBMS vendorsSQL + ACIDNot Scalable
NoSQL
Scalability
(SQL) No ACID
![Page 21: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/21.jpg)
OldSQL
Traditional SQL vendors (the “elephants”)SQL+ACID
Centralized (one-server) architecture
performance replies on IOE
Not scalable
Mediocre performance on New TP
Cannot handle big data demands
Availability ???
![Page 22: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/22.jpg)
2. NoSQL systems
NoSQL: Not Only SQL (not No-SQL) database
A NoSQL database provides a mechanism for storage and retrieval of data that • is highly scalable; and • has higher availability.
![Page 23: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/23.jpg)
Why NoSQL?• A large collection of structured or semi structured data
• Social media sites such as Facebook, Twitter• Lack of scalability of OldSQL• High availability
Others:• Dynamic schemas
• No fixed table schemas• Frequent simpler queries
• No joins• No Nested queries
![Page 24: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/24.jpg)
Scaling Up OldSQL were not designed to be distributed
Distribution and ACID propertyDistribution and join operations
How to use a collection of x86 serversMaster-slaveSharding
![Page 25: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/25.jpg)
Some influential events• Some major papers/systems were the seeds of
the NoSQL movement• BigTable (Google)• Dynamo (Amazon)
• Gossip protocol (discovery and error detection)• Distributed key-value data store• Eventual consistency
• CAP Theorem • MapReduce and Hadoop
![Page 26: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/26.jpg)
The Perfect StormLarge datasets, acceptance of alternatives, and
dynamically-typed data has come together in a perfect storm
Not a backlash/rebellion against RDBMSSQL is a rich query language that cannot be rivaled
by the current list of NoSQL offerings
![Page 27: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/27.jpg)
What we gain?• Scalability
• Using x86 servers• Big data• High Performance
• Availability • Replications
What we lose?• SQL• ACID
Why?
![Page 28: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/28.jpg)
Brewer’s CAP Theorem
A distributed system can support only two of the following three characteristics:• Consistency• Availability• Partition tolerance
![Page 29: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/29.jpg)
Consistency
all nodes see the same data at the same time
client perceives that a set of operations has occurred all at once
More like Atomic in ACID transaction properties
![Page 30: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/30.jpg)
Availability
node failures do not prevent survivors from continuing to operate
Every operation must terminate in an intended response
![Page 31: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/31.jpg)
Partition Tolerance
the system continues to operate despite arbitrary message loss
Operations will complete, even if individual components are unavailable
![Page 32: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/32.jpg)
AvailabilityTraditionally, thought of as the server/process
available five 9’s (99.999 %).However, for large node system, at almost any
point in time there’s a good chance that a node is either down or there is a network disruption among the nodes. Want a system that is resilient in the face of network
disruption
![Page 33: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/33.jpg)
Eventual ConsistencyWhen no updates occur for a long period of time,
eventually all updates will propagate through the system and all the nodes will be consistent
For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service
![Page 34: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/34.jpg)
BASE Transactions
Basically Available, Soft state
consistency is the developer's problem and should not be handled by the database
Eventually Consistent
![Page 35: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/35.jpg)
BASE Transactions
CharacteristicsWeak consistency
stale data OKAvailability firstBest effortApproximate answers OKAggressive (optimistic)Simpler and faster
![Page 36: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/36.jpg)
What kinds of NoSQL databasesNoSQL solutions fall into two major
areas:Key/Value or ‘the big hash table’.
Amazon S3 (Dynamo)VoldemortScalaris
Schema-less which comes in multiple flavors column-based (Cassandra, HBase) document-based (CouchDB) graph-based (Neo4J)
![Page 37: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/37.jpg)
Key/ValuePros:
very fastvery scalablesimple modelable to distribute horizontally
Cons: - many data structures (objects) can't be easily
modeled as key value pairs
![Page 38: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/38.jpg)
Schema-LessPros:
- Schema-less data model is richer than key/value pairs
- eventual consistency- many are distributed- still provide excellent performance and scalability
Cons: - typically no ACID transactions or joins
![Page 39: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/39.jpg)
Common AdvantagesCheap, easy to implement (open source)Data are replicated to multiple nodes
(therefore identical and fault-tolerant) and can be partitionedDown nodes easily replacedNo single point of failure
Easy to distributeDon't require a schemaCan scale up and downRelax the data consistency requirement
(CAP)
![Page 40: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/40.jpg)
List of NoSQL systems (www.nosql-database.org)
CassandreHaddop & HbaseCouchDBMongoDBStupidDBMapReduceTriftCouddataEtc.
![Page 41: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/41.jpg)
What are we giving up?joinsgroup byorder byACID transactionsSQL as a sometimes frustrating but still powerful
query languageeasy integration with other applications that
support SQL
![Page 42: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/42.jpg)
CassandraOriginally developed at FacebookFollows the BigTable data model
column-oriented
Uses the Dynamo Eventual Consistency modelWritten in JavaOpen-sourced and exists within the Apache familyUses Apache Thrift as it’s API
![Page 43: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/43.jpg)
Thrift
Created at Facebook along with Cassandra
Is a cross-language, service-generation frameworkBinary Protocol (like Google Protocol Buffers)Compiles to: C++, Java, PHP, Ruby, Erlang, Perl, ...
![Page 44: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/44.jpg)
SearchingRelational
SELECT `column` FROM `database`,`table` WHERE `id` = key;
SELECT product_name FROM rockets WHERE id = 123;
Cassandra (standard)keyspace.getSlice(key, “column_family”, "column")
keyspace.getSlice(123, new ColumnParent(“rockets”), getSlicePredicate());
![Page 45: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/45.jpg)
Typical NoSQL APIBasic API access:
get(key) -- Extract the value given a keyput(key, value) -- Create or update the value given
its keydelete(key) -- Remove the key and its associated
valueexecute(key, operation, parameters) -- Invoke an
operation to the value (given its key) which is a special data structure (e.g. List, Set, Map .... etc).
![Page 46: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/46.jpg)
Data ModelWithin Cassandra, you will refer to data this way:
Column: smallest data element, a tuple with a name and a value :Rockets, '1' might return: {'name' => ‘Rocket-Powered Roller Skates', ‘toon' => ‘Ready Set Zoom', ‘inventoryQty' => ‘5‘, ‘productUrl’ => ‘rockets\1.gif’}
![Page 47: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/47.jpg)
Data Model Continued
ColumnFamily: There’s a single structure used to group both the Columns and SuperColumns. Called a ColumnFamily (think table), it has two types, Standard & Super. Column families must be defined at startup
Key: the permanent name of the recordKeyspace: the outer-most level of organization. This is
usually the name of the application. For example, ‘Acme' (think database name).
![Page 48: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/48.jpg)
Cassandra and ConsistencyTalked previous about eventual consistencyCassandra has programmable read/writable
consistencyOne: Return from the first node that respondsQuorom: Query from all nodes and respond with the
one that has latest timestamp once a majority of nodes responded
All: Query from all nodes and respond with the one that has latest timestamp once all nodes responded. An unresponsive node will fail the node
![Page 49: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/49.jpg)
Classification of NoSQL databases
Physical layoutConsistency modelArchitecture
![Page 50: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/50.jpg)
Physical LayoutRow storage
Dominant in transitional DBMSColumn storage
Efficient scan-level accessEasy compressionOptimized analytical workloadsSyBase IQ, C-Store, MonetDB-X100
Hybridcombination of row and column storesTable: ne-gained hybridBlock: partition attributes across
![Page 51: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/51.jpg)
Conceptual modeUnstructured data
uninterrupted, isolated and stored in binary objectsimplest format and maximum exibility
Semi-structured datainner structure without xed schemadocument-oriented store, column-family store
Structured databased on the relational modelstructured entities with strict relationships according
to schemaconstraints: entity integrity and referential integrity
![Page 52: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/52.jpg)
Data model Taxonomy
![Page 53: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/53.jpg)
consistency
SerializablilitySnapshot isolation
Transaction sees a consistent committed snapshot of the database
Optimistic Read and WriteStrict consistencyEventual consistence
![Page 54: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/54.jpg)
Consistency mode taxonomy
![Page 55: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/55.jpg)
SMP on shared-memory architecture
Processors with own cacheThreads allocated to processorsCoherent memory pool for
sharing dataScale-up by adding processorsBounded by resource limit
Memory/IO bus bandwidthL2 cache consistencyNumber of coresCustom-built with high price
![Page 56: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/56.jpg)
MPP on Shared-disk architectureParallel SMP clustersCommon storage by
shared I/ODisk arrays in SANMitigate I/O bus
bandwidthCache fusion protocol
Oracle Exadata
Query streaming preprocess IBM Netezza
![Page 57: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/57.jpg)
Dispatching on shared-nothing architectureCenteral coordinator
DevisionDispatchpost-process
Loosely coupled machines with dedicated memory and disk
Independent database instanceOperating on portion of the
data
Each to scale out
![Page 58: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/58.jpg)
System architecture Taxonomy
![Page 59: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/59.jpg)
3. NewSQL Systems• OldSQL
• SQL• ACID
• NoSQL• Scalability• Availability• BASE
• Why Not to have both?• SQL• Scalability• Availability• ACID
Guiyang
![Page 60: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/60.jpg)
NewSQL Systems• A new class of modern database system
• SQL• Scalability
• X86 servers• High performance• Big data
• Availability• Replication
• ACID
Guiyang
![Page 61: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/61.jpg)
Who Needs ACID?
Funds transferOr anybody moving something from X to Y
Anybody with integrity constraintsBack out if failsAnybody for whom “usually ships in 24 hours”
is not an acceptable outcome
Anybody with a multi-record stateE.g. move and shoot
Guiyang
![Page 62: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/62.jpg)
Who needs ACID in replication
Anybody with non-commutative updatesFor example, + and * don’t commute
Anybody with integrity constraintsCan’t sell the last item twice….
( Eventual consistency means “creates garbage”)
Guiyang
![Page 63: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/63.jpg)
Summary
Old TP
Guiyang
OldSQL for New OLTP § Too slow§ Does not scale
NoSQL for New OLTP § Lacks consistency guarantees§ Low-level interface
NewSQL for New OLTP § Fast, scalable and consistent§ Supports SQL
New TP
![Page 64: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/64.jpg)
Why not if we can have both ?
How to achieve our goal ?
![Page 65: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/65.jpg)
4. Challenges of NewSQL• Without using storage systems
• That is, without using IOE
• One thread or multiple thread• Main memory or otherwise• Availability with Replication
Guiyang
![Page 66: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/66.jpg)
![Page 67: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/67.jpg)
A NewSQL Example: H-Store/VoltDB
• Main-memory Linux SQL DBMS
• Anti-caching
• Multi-node and sharded nothing
• Stored procedure interface
• No locking
• Pure ACID
• Single threaded
• Data partition
• Built-in HA and durability
• No log (in the traditional sense)
Guiyang
![Page 68: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/68.jpg)
Where all the time goes… revisited
Guiyang
Before VoltDB
![Page 69: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/69.jpg)
Current VoltDB Status
Runs a subset of SQL (which is getting larger)On VoltDB clusters (in memory on commodity
gear)With LAN and WAN replication70X a popular OldSQL DBMS on TPC-C5-7X Cassandra on VoltDB K-V layerScales to 384 cores (biggest iron we could get
our hands on)Clearly note this is an open source system!
Guiyang
![Page 70: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/70.jpg)
Performance of H-Store
Simplified TPC-C with stored procedures
Partitions 16 64
Number of concurrent clients Not available not available
Only local partition accesses 39,000 140,000
30% remote partition accesses 7,000 18,000
![Page 71: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/71.jpg)
Characteristics of H-store/VoltDB
• No concurrency control algorithms• Stored procedures• Partition lockes for cross partition
accesses• Main memory databases
![Page 72: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/72.jpg)
Resources
Li-Yan Yuan, Dept of Computing Science, University of Alberta,
Edmonton, Canada
http://openstack.org
Launchpad: http://launchpad.net/network-service
Cisco Open Stack project: http://bit.ly/cisco-ucs-openstack
![Page 73: Databases for Big Data](https://reader036.fdocuments.us/reader036/viewer/2022072617/62df23814c451c07f74258e2/html5/thumbnails/73.jpg)
Thanks