SQL and NoSQL in SQL Server

24
SQL and NoSQL in the Context of SQL Server Michael Rys Program Manager, Microsoft Corp. @SQLServerMike

description

SQL Saturday 109 Presentation on NoSQL Paradigms in SQL Server context

Transcript of SQL and NoSQL in SQL Server

  • 1. SQL and NoSQLin the Context of SQL ServerMichael RysProgram Manager, Microsoft Corp.@SQLServerMike

2. Key Session Takeaways Scaling your Business is important What are the NoSQL paradigms You can use NoSQL Paradigms with SQLServer and SQL Azure We are working on moving the paradigmsinto SQL Server 3. The Web 2.0 Business ArchitectureAttract IndividualConsumers:- Provide interestingservice- Provide mobility- Provide socialMonetize Individual:- Upsell serviceOnlineMonetize the Social:- Improve individual- VIP- Speed Business experience- Re-sell Aggregate Data- Extra CapabilitiesApplication (e.g., Advertisers) 4. Social Networking: the Business Problem 100s of million of users 10s of million of usersconcurrently Terabytes to petabytes ofdata Structured and unstructured Required (eventual) dataconsistency across users E.g. show your updated statein your friends profile pages 5. Solution Shard/Partition user data acrosshundreds to thousands of SQLDatabases Propagate data changes usingreliable, async Message Service No Global Transactions! Hinder scale and availability! Provide a caching layer forperformance Also used for Clean-up state (e.g. on account close) Deploy business logic (stored procedures) 6. Example Architecture (MySpace.com)1-1000 3001-4000AsyncMy DB I change Message gets updated my statusService TX1 TX3 TX2 Dispatcher AsyncuserId=1024 Message Async2001-3000 Message 1001-2000TX4TX54001-50005001-6000Web TierData Tier 7. Many Large Scale Customers using Similar Patterns Patterns Sharding and reliable messaging Sharding and fan/out query layer Caching layer Customer Examples Social Networking: Facebook, MySpace, etc Online electronic stores (cannot give names ) Travel reservation systems (e.g. Choice International) MSN Casual Gaming etc. 8. Lessons Learned from these Scenarios Require high availability Be able to scale out Functional and Data Partitioning Architecture Provide scale-out processing Be able to deal with failures Be able to quickly grow and change Elastic scale Flexible, open schema Multi-version schema supportMove better support for these patterns into the DataPlatform! 9. What is NoSQL about? NoSQL = operational and developer agility at low CapEx and OpEx! Low Cost Free Software and Support Scale CapEx cost below customer growth rate Web friendly developer model and tool chain, Easy to use Processing Paradigms High Availability Data and Processing Scale-out Performance Tunable/Eventual Consistency Data Model Paradigms Data first: Flexible Schema Low-impedance mismatch between programming and data modelFrom devices, over OLTP Web 2.0 applications to BigData Analytics 10. Data ModelsData ModelExample StoresSimple Key-Value PairsMemcache, Redis, Dynamo, Voldermort, LevelDB,Azure CachingWide Sparse Column Sets HyperTable, Big Table, Cassandra, HBASE,Hyperbase, Amazon DynamoDB, Windows AzureTables, SQL Server/Azure Sparse columnsBLOBs Amazon S3, Oracle Berkeley NoSQL, WindowsAzure Blob Store, SQL Server RBS/FileTableJSON DocumentsMongoDB, CouchBase, Riak, RavenDBGraph Neo4J, GraphDB, HypergraphDB, Stig,IntellidimensionObjects and XML Documents Versant, Oracle Berkeley NoSQL, MarkLogic,existDB, EMC HiveDB, SQL Server/Azure, Oracle,IBM DB2Extended Relational Oracle, EMC SQLFire, IBM DB2, MySQL, Postgres,SQL Server/Azure/Parallel DW 11. Operational Agility You want: Availability of service (scalability) Global consistency Network Partition Tolerance You can only get 2 of 3 (CAP Theorem) In Brave New World: Online businesses need availability It is distributed, because it is big thus Network Partitioning is unavoidable Hence global consistency must be relaxed BASE vs ACID 12. BASE vs ACID Consistency ACID :Atomicity, Consistency, Isolation, Durability Full Serializability provides all 4 Distributed transactions providing all 4 limits service availability, throughput and scalability BASE: Basically Available, Soft state, Eventualconsistency Relaxes ACID properties to increaseReplica availability, throughput and scalabilityPrimary Replica consistency:Replica Impacts recoverability Cross-node consistency:Replica Impacts globally consistent view of the worldPrimaryReplica 13. Operational Agility Performance and Scale Automate management lifecycle (or fail) Simple deployment lifecycle No DB or OS Admin telling me what to do 14. Developer Agility Code First and revise quickly Application-model first (before database) Flexible open data models You dont know exactly what you are looking for Lower Pain of adoption and maintenance No DB or OS Admin telling me what to do 15. NoSQL and BigData: Two sides of the same coin BigData: Origin: large unstructured data processing (sensor data, scientific research, web stream analysis) Analytics focused (new OLAP, Map-Reduce, Hadoop) Scale-out data and processing paradigm at low cost NoSQL: Origin: developing agile, scalable web applications Realtime customer transaction focused (new OLTP) Scale-out data and processing paradigm with flexible data model at low cost Both use many of the same paradigms 16. The Web 2.0 Business ArchitectureAttract IndividualConsumers:- Provide interestingservice- Provide mobility- Provide socialMonetize Individual:- Upsell serviceOnlineMonetize the Social:- Improve individual- VIP- Speed Business experience- Re-sell Aggregate Data- Extra CapabilitiesApplication (e.g., Advertisers) 17. Scale-Out Data PLATFORM ArchitectureReadable ReplicaPrimaryCopy ShardOLTP WorkloadsReadable Replica Traditional OLAP WorkloadsHighly Available known schemaHigh Scale Data warehouse, Star joinsHigh FlexibilityReadable Replicamostly touching 1 Primaryto low number of Shard Dynamic OLAP WorkloadsshardsReadable Replica 3Vs (Volume, Velocity, Variety) ExploratoryReadable Scale-out queries, often using Replica eventual consistent scale-out frameworks like HadoopPrimary Shard QueryReadable Replica 18. What does SQL Server provide today? Scale-programming models Service Broker provides: Functional, service-oriented architecture Scale out on demand Async reliable messaging provides for true eventual consistency SQL Azure Federations provides Sharding support Distributed Queries SQL Server Parallel Data Warehouse Programmer Agility XML, XQuery for XML documents FileTable for documents (but what is equivalent solution in the cloud?) Open Schema: Sparse Columns and column sets (but still schema first) CLR extensibility, but No indexing, bad cost-models Difficult to deploy (and DB Admins often do not allow it!) Failure Resilience SQL Azure has local automatic HA, self-healing Rich Services Semantic Extraction and Similarity Search in SQL Server 2012 DB/OS Admin interference SQL Azure: Self-maintaining and Self-provisioning 19. Introducing SQL Azure Federations Provides Data Partitioning/Shardingat the Data Platform Enables applications to build elasticscale-out applications Provides non-blocking SPLIT/DROP forshards (MERGE to come later) Auto-connect to right shard based onsharding keyvalue Provides SPLIT resilient query mode 20. SQL Azure Federation Concepts FederationAzure DB with Federation Root Represents the data being sharded Federation Root Federation Directories, Federation Database that logically houses Users, Federation Distributions, federations, contains federation meta data Federation Key Value that determines the routing of a piece Federation Orders_Fed of data (defines a Federation Distribution)(Federation Key: CustomerID) Federation Member (aka Shard) Physical container for a set of federated tables of a specific key range and reference Member: PK [min, 100) tables Atomic Unit AU PK=5 AUPK=25AU PK=35 All rows with the same federation key value: always together! Federated Table Member: PK [100, 488) Table that contains only atomic units for the members key range AU AUAU ConnectionReference TablePK=105 PK=235PK=365Gateway Non-sharded tableMember: PK [488, max) AUAUAUShardedPK=555PK=2545 PK=356520 Application 21. DemoMap-Reduce scale-outover SQL Azure Federations 22. SQL Azure: A Not Only SQL Data PlatformSQL Azure adds support for NoSQL paradigms in the data platform: No CapEx, Low OpEx (which should/will be even lower ) High-Availability (each DB has two replicas) Sharding support with federations: Data platform provides online SPLIT/DROP Filtered connection to provide split resilient programming model Flexible Data Models: XML support Sparse columns/Column sets More to come in the future More scale and tunable HA (to support OLTP/OLAP model) Taking Federations further (orthogonality, merge, fanout) Integration with Hadoop eco-system More data-first (data-driven columnsets, JSON) 23. Call to Action Download the Presentation from:http://www.slideshare.net/MichaelRys/presentations Fill out SQL Azure Federation Survey:http://connect.microsoft.com/BusinessPlatform/Survey/Survey.aspx?SurveyID=13625 24. Related Content Related Whitepapers and Presentations: CACM: Scalable SQL: http://cacm.acm.org/magazines/2011/6/108663-scalable-sql NoSQL and the Windows Azure Platform: http://download.microsoft.com/download/9/E/9/9E9F240D-0EB6-472E-B4DE- 6D9FCBB505DD/Windows%20Azure%20No%20SQL%20White%20Paper.pdf SQL Federation blog: http://blogs.msdn.com/b/cbiyikoglu/archive/2011/03/03/nosql-genes-in- sql-azure-federations.aspx Windows Gaming Experience Case Study: http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=4 000008310 NoSQL Presentations: http://www.slideshare.net/MichaelRys/presentations Contact me: [email protected] @SQLServerMike http://sqlblog.com/blogs/michael_rys/default.aspx