Database ChoicesLynn Langit
Jan 2014 – Startup Code Camp in the OC
Data Expertise / Lynn Langit
• Industry awards– Microsoft – MVP for SQL Server – Google – GDE for Cloud Platform– 10Gen – Master for MongoDB
• Practicing Architect• Technical author / trainer
– Pluralsight – Google Cloud Series– DevelopMentor – SQL Server 2012 Series – 2 books on SQL Server BI– Cloudera trainer (certified)
• Former MSFT FTE– 4 years
Databases Now a Menu of
Choices
Data Pipeline
Clean Existing
Acquire New
Process All
Store Some
Query & Mine
Is Big Data = NoSQL and just Hadoop?
HUGE Hype factor since 2011
Apache Hadoop • a software framework that supports data-intensive
distributed applications • under a free license enables applications to work with thousands of
nodes and petabytes of data • was inspired by Google's MapReduce and Google File System (GFS)
papers
Hadoop in the Enterprise
How you ‘get’ Hadoop
• roll your own
Open source
• Cloudera• MapR• Hortonworks• More…
Commercial distribution
• AWS• HDInsight
Rent it via the cloud
Demo – AWS MapReduce
Working with Hadoop
About Hadoop MapReduce
Image from - https://developers.google.com/appengine/docs/python/images/mapreduce_mapshuffle.png
The Hadoop on premises
Market LeaderIs
Cloudera
Example Comparison: RDBMS vs. Hadoop
Traditional RDBMS Hadoop / MapReduce
Data Size Gigabytes (Terabytes) Petabytes and greater
Access Interactive and Batch Batch – NOT Interactive
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
Query Response Time
Can be near immediate Has latency (due to batch processing)
“Small” BigData vs. “Big” BigData
Hadoop
NoSQL
RDBMS
Hadoop
NoSQL
RDBMS
On Premises In the Cloud
But wait…
is there a relational database
that scalesthat is cheap
that runs in the cloud?
DEMO - AWS Redshift• About $1k per Terabyte per year - relational
Cloud-hosted NoSQL up to 50x CHEAPER
So many NoSQL options
• More than just the Elephant in the room• Over 150+ types of NoSQL databases
Flavors of NoSQLKey/ValueVolatile
Key/valuePersistent
Wide-Column Document Graph
Key / Value Database• Just keys and values
– No schema• Persistent or Volatile• Examples
– AWS Dynamo DB– Riak
DEMO - AWS DynamoDB
• Key/Value store on the AWS cloud
File (BLOB) Storage Buckets in the Cloud
• Amazon – S3 or Glacier• Google – Cloud Storage• Microsoft Azure BLOBS
DEMO - Battle of the Buckets
• Google Cloud Storage VS.• Windows Azure BLOBS VS.• AWS S3 (Archiving) in to AWS Glacier
Column Database
• Wide, sparse column sets• Schema-light
• Examples:– HBase w/Hadoop– Google Cloud Datastore– SQL Server Columnstore Indexes or SSAS Tabular Models
Types of Column Databases• Column-families
– Non-relational– Sparse– Examples:
• HBase• Cassandra• xVelocity (SQL 2012 Tabular)
• Column-stores– Relational– Dense– Example:
• SQL Server 2012 – Columnstore index
DEMO – Google Cloud Datastore
DEMO – SQL Server ‘NoSQL’
• SQL Server 2012 Columnstore Index• SQL Server 2012 Tabular Model (SSAS)
Document Database (Mongo DB)• document-oriented (collection of
JSON documents) w/semi structured data– Encodings include BSON, JSON, XML…
• binary forms – PDF, Microsoft Office documents --
Word, Excel…)
• Examples:– MongoDB– Couchbase
Demo - Mongo DB
Graph Databases
• a lot of many-to-many relationships• recursive self-joins • when your primary objective is quickly
finding connections, patterns and relationships between the objects within lots of data
• Examples:– Neo4J– Google Freebase
DEMO – Neo4J
“Small” BigData vs. “Big” BigData
Hadoop
Key/Value or Column
Document or Graph
RDBMS
On Premise or In the Cloud
Cloud-hosted RDBMS
• AWS RDS – SQL Server, mySQL, Oracle– Medium cost– Solid feature set, i.e.
backup, snapshot– Use existing tooling
• Google – mySQL– Lowest cost– Most limited RDBMS
functionality• Microsoft – SQLAzure
– Highest cost
DEMO - AWS RDS
• SQL Server, MySQL or Oracle• Essential to understand pricing models
Image - http://blog.outsourcing-partners.com/wp-content/uploads/2012/10/performance.png
NoSQL Applied
Soci
al G
ames
Prod
uct C
atal
ogs
Soci
al a
ggre
gato
rs
Log
File
s
Line
-of-B
usin
ess
ColumnstoreHBase
Key/ValueDynamoDB
DocumentMongoDB
GraphNeo4j
RDBMSSQL Server
Cloud Offerings– RDBMS AND NoSQL
AWS Google Microsoft
RDBMS RDS – all major mySQL SQL Azure
NoSQL buckets S3 or Glacier Cloud Storage Azure Blobs
NoSQL Key-Value DynamoDB Cloud Datastore Azure Tables
Streaming ML or (Mahout)
Custom EC2 Prospective Search &Prediction API
StreamInsight
NoSQL Document or Graph
MongoDB on EC2 Freebase MongoDB on Windows Azure
NoSQL – ColumnHadoop (HBase)
Elastic MapReduce using S3 & EC2
none HDInsight
Dremel/Warehousing
RedShift BigQuery none
But wait…how do I queryNoSQL data?
Alw
ays
Map
Redu
ce?
Can Excel help?
Connector to Hadoop Data Explorer Data Quality
Services
Master Data Services
Integration with Azure
Data Market
Visualize with PowerView
Data Mining w/Predixion
Demo - Hadoop Connector to Excel
Other types of cloud data services
Hosting public datasets• Pay to read• Earn revenue by offering for
read
Cleaning / matching (your) data • ETL – Microsoft Data
Explorer, Google Refine• Data Quality – Windows
Azure Data Market, InfoChimps, DataMarket.com
Collecting for “BigData”• Sensors everywhere• Structured, Semi-structured, Unstructured vs. Data
Standards• M2M• Public Datasets
– Freebase– Azure DataMarket– Hillary Mason’s list
42
NoSQL To-Do List
Understand types of NoSQL databases• Use NoSQL when business needs designate• Use the right type of NoSQL for your business problem
Try out NoSQL on the cloud• Quick and cheap for behavioral data• Mashup cloud datasets• Good for specialized use cases, i.e. dev, test , training environments
Learn NoSQL access technologies & services• New query languages, i.e. MapReduce, R, Infer.NET • New query tools (vendor-specific) – Google Refine, Amazon
Karmasphere, Microsoft Excel connectors, etc…• Windows Azure Data Market, other public data markets
www.TeachingKidsProgramming.org• Free Courseware (Java, Small Basic or C# [on Pluralsight])• Do a Recipe Teach a Kid (Ages 10 ++)
• recipes)
Keep Learning• Twitter: @LynnLangit• YouTube:
http://www.youtube.com/user/SoCalDevGal
• Hire me– To help build your BI/Big Data solution– To teach your team next gen BI– To learn more about using NoSQL
solutions
Top Related