CAN THE ELEPHANTS HANDLE THE NOSQL...
Transcript of CAN THE ELEPHANTS HANDLE THE NOSQL...
![Page 1: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/1.jpg)
CAN THE ELEPHANTS HANDLE THE CAN THE ELEPHANTS HANDLE THE CAN THE ELEPHANTS HANDLE THE CAN THE ELEPHANTS HANDLE THE
NOSQL ONSLAUGHT?NOSQL ONSLAUGHT?NOSQL ONSLAUGHT?NOSQL ONSLAUGHT?
by
SRIDHAR REDDY VORUGANTISRIDHAR REDDY VORUGANTISRIDHAR REDDY VORUGANTISRIDHAR REDDY VORUGANTI
CSU ID: 2607043CSU ID: 2607043CSU ID: 2607043CSU ID: 2607043
![Page 2: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/2.jpg)
ABSTRACTABSTRACTABSTRACT
• Traditional DBMSs under attack.
• NoSQL vs. SQL.
• Result (evaluation).
![Page 3: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/3.jpg)
WHAT WE DISCUSS??WHAT WE DISCUSS??WHAT WE DISCUSS??
![Page 4: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/4.jpg)
INTRODUCTIONINTRODUCTIONINTRODUCTION…………
• The database community is currently at an unprecedented and exciting inflection point.
• RDBMSs are no longer the only viable alternative for data-driven applications.
• At the other end of the big data application spectrum are analytical decision support workloads that are characterized by complex queries on massive amounts of data.
• The results are shown for the sole purpose of providing relative comparisons for this paper, and should not be compared to official benchmark results.
![Page 5: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/5.jpg)
BACKGROUNDBACKGROUNDBACKGROUND…………
• Parallel Data Warehouse (PDW)
• Hive
• MongoDB
![Page 6: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/6.jpg)
…
• Parallel database system.
• Two types of nodes-compute and control.
• Data-horizontally partitioned.
• DMS-shuffling data between nodes.
• Post-processing and re-integration by control node.
![Page 7: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/7.jpg)
…
• Open-source data warehouse.
• HDFS-data storage.
• HiveSQL.
• Multiple data storage formats.
![Page 8: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/8.jpg)
…
• Open-source NoSQL database.
• Collections-Documents.
• No need of schema.
• Supports Auto-partitioning technique.
• Supports replica sets.
![Page 9: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/9.jpg)
EVALUATIONEVALUATIONEVALUATION…………
• Evaluation of RDBMS and a NoSQL system
• We use TPC-H to evaluate Microsoft’s PDW and Hive.
• Compare MongoDB with Microsoft SQL Server using YCSB benchmark.
![Page 10: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/10.jpg)
HARDWARE CONFIGURATIONHARDWARE CONFIGURATIONHARDWARE CONFIGURATION…………
• 1Gbit HP Procurve Ethernet switch with 16nodes.
• Each node with 2.13 GHz, 32 GB of main memory, and 10 SAS 10K RPM 300GB hard drives.
• When evaluating PDW and Hive, we used eight disks to store the data.
• YCSB experiments-eight nodes were used as servers.
![Page 11: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/11.jpg)
SOFTWARE CONFIGURATIONSOFTWARE CONFIGURATIONSOFTWARE CONFIGURATION…………
• Hive and Hadoop
• PDW
• MongoDB (Mongo-AS)
![Page 12: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/12.jpg)
…
• Hive 0.7.1 and Hadoop 0.20.203
• RCFile format instead of text files
• JVM size 2GB.
![Page 13: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/13.jpg)
…
• PDW– Version AU3
– Maximum 24GB memory.
• MongoDB– Version 1.8.2
– “Global lock” for write.
![Page 14: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/14.jpg)
HIVE VS. PDWHIVE VS. PDWHIVE VS. PDW…………
• Workload Description
• Data Layout
• Data Preparation and Load Times
• Experimental Evaluation
![Page 15: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/15.jpg)
DATA LAYOUT…
� Hive-Partitions and buckets
� PDW-Partitions and Replication
![Page 16: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/16.jpg)
…
Data preparation steps
• Generate TPC-H dataset• Hive table for each TPC-H table• Load data in two phases
• Data loaded to HDFS• Data converted to RCFile
Hive PDW
• TPC-H is generated on landing node• Specify schema and tables• Text files split into multiple chunks
• Chunks loaded to nodes
![Page 17: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/17.jpg)
EXPERIMENTAL EVALUATION…
![Page 18: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/18.jpg)
QUERIESQUERIESQUERIES…………
• Performance Analysis– Query 5
– Query 19
• Scalability Analysis– Query 1
– Query 22
![Page 19: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/19.jpg)
• Query 5(joins customer, orders, lineitem, supplier, nation and region)
• Query 19(joins lineitem, part)
![Page 20: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/20.jpg)
• Query 1
– Scans ‘lineitem’
• Query 22
– Scans customer table
– 4 sub-queries
![Page 21: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/21.jpg)
MONGODB VS. SQL SERVERMONGODB VS. SQL SERVERMONGODB VS. SQL SERVER…………
• Workload Description– YCSB benchmark
• Read heavy and Read only
• Experimental Evaluation– YCSB benchmark
• Update heavy, Read latest and Short ranges
![Page 22: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/22.jpg)
YCSB BENCHMARKYCSB BENCHMARKYCSB BENCHMARK…………
![Page 23: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/23.jpg)
CONCLUSIONS AND FUTURE WORKCONCLUSIONS AND FUTURE WORKCONCLUSIONS AND FUTURE WORK…………
• Popular alternatives.
• the TPC-H benchmark and the YCSB benchmark.
• Our results find that the relational systems continue to provide a significant performance advantage over their NoSQL counterparts, but the NoSQL alternatives are competitive in some cases.
• Expand SQL and NoSQL systems and revisit the performance differences in a few years.
![Page 24: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/24.jpg)
REFERENCESREFERENCESREFERENCES…………
• http://hadoop.apache.org/
• http://mongodb.org/
• http://tpc.org/tpch/
![Page 25: CAN THE ELEPHANTS HANDLE THE NOSQL ONSLAUGHT?cis.csuohio.edu/~sschung/cis611/CIS611SridharSQLOnSlaught.pdf · • RDBMSs are no longer the only viable alternative for data-driven](https://reader036.fdocuments.us/reader036/viewer/2022062506/5f0b5ec17e708231d4302efb/html5/thumbnails/25.jpg)