SQL to NoSQL: Top 6 Questions

48
SQL to NoSQL: Top 6 Questions Mike Broberg Marketing Communications, Cloudant, IBM Cloud Data Services Ryan Millay Field Engineer, Cloudant, IBM Cloud Data Services

Transcript of SQL to NoSQL: Top 6 Questions

Page 1: SQL to NoSQL: Top 6 Questions

SQL to NoSQL: Top 6 Questions

Mike BrobergMarketing Communications, Cloudant, IBM Cloud Data Services

Ryan MillayField Engineer, Cloudant, IBM Cloud Data Services

Page 2: SQL to NoSQL: Top 6 Questions

Agenda

2

• Top 6 Questions When Moving to NoSQL1. Why NoSQL?

a. What Is Cloudant?

2. Rows and Tables Become ... What?

3. Will I Have to Rebuild My App?

4. Each of My Tables Becomes a Different Type of JSON Document?

5. What if I Need Relationships? Can Cloudant Do JOINs?

6. Are There Tools That Make Migrating My Data to Cloudant Easier?

• Live Q&A

Page 3: SQL to NoSQL: Top 6 Questions

Housekeeping Notes

3

• Today’s webcast is being recorded. We will send you a link to the recording, a link to the library and its code examples, and a copy of the slide deck after the presentation.

• The webcast recording will be available on our website: https://cloudant.com

• If you would like to ask a question during today’s presentation, please type in your question using the GoToWebinar tool bar.

Page 4: SQL to NoSQL: Top 6 Questions

1. Why NoSQL?

4

Page 5: SQL to NoSQL: Top 6 Questions

But, What Is NoSQL, Really?

5

• Umbrella term for databases using non-SQL query languages• Key-Value stores• Wide column stores• Document stores• Graph stores

• Some also say "non-relational," because data is not decomposed into separate tables, rows, and columns

• As we’ll see, it’s still possible to represent relationships in NoSQL

• The question is, are these relationships always necessary?

Page 6: SQL to NoSQL: Top 6 Questions

Today's NoSQL Focus: Document Stores

6

• That's databases like MongoDB, Apache CouchDB™, Cloudant, and MarkLogic

• Optimized for "semi-structured" or "schema-optional" data

• People say "unstructured," but that's inaccurate

• Each document has its own structure

Page 7: SQL to NoSQL: Top 6 Questions

Schema Flexibility

7

• Cloudant uses JavaScript Object Notation (JSON) as its data format

• Cloudant is based on Apache CouchDB. In both systems, a "database" is simply a collection of JSON documents

{ "docs": [ { "_id": "df8cecd9809662d08eb853989a5ca2f2", "_rev": "1-8522c9a1d9570566d96b7f7171623270", "Movie_runtime": 162, "Movie_rating": "PG-13", "Person_name": "Zoe Saldana", "Actor_actor_id": "0757855", "Movie_genre": "AVYS", "Movie_name": "Avatar", "Actor_movie_id": "0499549", "Movie_earnings_rank": "1", "Person_pob": "New Jersey, USA", "Person_id": "0757855", "Movie_id": "0499549", "Movie_year": 2009, "Person_dob": "1978-06-19" } ]}

Page 8: SQL to NoSQL: Top 6 Questions

Horizontal Scaling

8

• Many commodity servers vs. few expensive ones

• Performance improves linearly with cost, not exponentially

Page 9: SQL to NoSQL: Top 6 Questions

Master-Master Replication

9

Or "masterless replica architecture"

• Replicate data widely to mitigate disasters

• No single point of failure

• Minimize latency by putting data close to users

• Cloudant excels at data movement

Page 10: SQL to NoSQL: Top 6 Questions

The Cloudant Data Layer

10

• Distributed NoSQL data persistence layer

• Available as a fully-managed DBaaS, or managed by you on-premises

• Transactional JSON document database with REST API

• Spreads data across data centers & devices for scale & high availability

• Ideal for apps that require:

• Massive, elastic scalability

• High availability

• Geo-location services

• Full-text search

• Offline-first design for occasionally connected users

Page 11: SQL to NoSQL: Top 6 Questions

Not One DB Server; a Cluster of Servers

• A Cloudant cluster

• Horizontal scale

• Redundant load balancers backed by multiple DB servers

• Designed for durability

• Saves multiple copies of data

• Spreads copies across cluster

• All replicas do reads & writes

• Access Cloudant over the Web

• Developers get an API

• Cloudant manages it all behind the scenes

11

lb2 (failover)lb1

db1

db2 db3

HAProxy

NGINXCloudant

Dashboard

Page 12: SQL to NoSQL: Top 6 Questions

Bringing OSS and Custom Technology Together

12

Operational

ToolingReshard / Rebalance

MonitoringBuilt-in monitoring

and system collection

CouchDB 2.0JSON storage, API,

Replication

LuceneText indexing &

Search

HaproxyLoad Balancing

GeoJSONGeospatial indexing

& query

Cloudant Query

Declarative Lang.

DiagnosticsTooling for diagnosing common issues with

clusters

Using Apache CouchDB™ 2.0 as one of the core components and wrapping additional features and operational expertise

Page 13: SQL to NoSQL: Top 6 Questions

2. Rows and Tables Become ... What?

13

Page 14: SQL to NoSQL: Top 6 Questions

... This!

SQL Terms/Concepts

database -->

table -->

row -->

column -->

materialized view -->

primary key -->

table JOIN operations -->

Document Store Terms/Concepts

database

bunch of documents

document

field

index/database view/secondary index

"_id":

entity relations

14

Page 15: SQL to NoSQL: Top 6 Questions

Rows --> Documents

15

• Use some field to group documents by schema

• Example: "type":"user" or "type":"edge:follower"

• Don't worry. We'll return to this example later on

Page 16: SQL to NoSQL: Top 6 Questions

Tables --> Databases

16

• Put all tables in one database; use "type": to distinguish

• Model entity relationships with secondary indexes

• More on this later in the webinar

• Can't wait? We're talking about concepts described in the CouchDB documentation on entity relationships

• http://wiki.apache.org/couchdb/EntityRelationship

Page 17: SQL to NoSQL: Top 6 Questions

Indexes and Queries

17

• An "index" in Cloudant is not strictly a performance optimization• Instead, more akin to "materialized view" in RDBMS terms• Index also called a "database view" in Cloudant

• Index, then query

• You need one before you can do the other

• Create index, then query by URL

• Can create a secondary index on any field within a document• You get primary index (based on reserved "_id": field) by default

• Indexes precomputed, updated in real time• Indexes are updated using incremental MapReduce• You don't need to rebuild the entire index every time a document is changed,

added, or deleted• Performant at big-honkin' scale

Page 18: SQL to NoSQL: Top 6 Questions

Aside: One Cloudant DB, Many Indexes

18

• Cloudant comes with several different indexing & query systems

• Cloudant Query: declarative query system

• Borrows syntax from MongoDB, but applied to Cloudant's REST API

• Incremental MapReduce view engine: traditional CouchDB approach

• Efficient range queries at large scale. Useful for aggregate functions/light analytics on operational data

• Cloudant Search: full-text indexing via Apache Lucene™

• Cloudant Geospatial: proprietary tech for GeoJSON spec

• Beyond bounding box with custom polygons, predictive path, etc.

• All out-of-the-box in Cloudant. No added integration or separate systems to maintain

Page 19: SQL to NoSQL: Top 6 Questions

3. Will I Have to Rebuild My App?

19

Page 20: SQL to NoSQL: Top 6 Questions

Yes

20

By ripping out the bad parts:

• Extract, Transform, Load

• Schema migrations

• JOINs that don't scale

Page 21: SQL to NoSQL: Top 6 Questions

Scale Whale

• A little more work up-front, but your application will adapt to scale much better

21

Page 22: SQL to NoSQL: Top 6 Questions

4. Each of My Tables Becomes a Different Type of JSON Document?

22

Page 23: SQL to NoSQL: Top 6 Questions

No

• Fancy explanation:

• Best practice is to denormalize data into 3rd normal form

• Or, less fancy:

• Smoosh relationships for each entry all together into one JSON doc

• Denormalization

• Approach to data modeling that shards well and scales well

• Works well with data that is somewhat static, or infrequently updated

23

A smooshed and griddled cheese sandwich

Page 25: SQL to NoSQL: Top 6 Questions

5. What if I Need Relationships?Can Cloudant Do JOINs?

27

Page 26: SQL to NoSQL: Top 6 Questions

Yes ...

28

• Enter Cloudant "JOINs" via materialized views

Page 27: SQL to NoSQL: Top 6 Questions

... But First, What Not To DoRelationships as single documents

29

http://www.sarahmei.com/blog/2013/11/11/why-you-should-never-use-mongodb/

Page 28: SQL to NoSQL: Top 6 Questions

Materialized View: Example

30

Each transaction an immutable entry ... + accumulator

Page 29: SQL to NoSQL: Top 6 Questions

Some "Key" Concepts

31

• Inject logic into "_id": field to enforce uniqueness

• Example: "_id":"<course>-<student>" ensures at most one document per course per student

• Give your documents a "type": field

• Add relations as separate "edge" documents

• Exploit powerful materialized view engine

Page 30: SQL to NoSQL: Top 6 Questions

Let's See One in Action

32

https://webinar.cloudant.com/relational

Page 31: SQL to NoSQL: Top 6 Questions

Preview: Defining an Index/View

33

• This design document (built in Cloudant Web dashboard) encapsulates everything that follows

• It builds our secondary index/database view, which we will soon query• It's the incremental MapReduce view engine we cited earlier• https://webinar.cloudant.com/relational/_design/join

Page 32: SQL to NoSQL: Top 6 Questions

Sample Related Data: Twitter

34

User documents flexible & straightforward

Page 33: SQL to NoSQL: Top 6 Questions

How Do We Deal With Followers?

35

a. Update each user document with a list

b. Create relation documents and "join"

Page 34: SQL to NoSQL: Top 6 Questions

E.g., Follower Graph

36

Page 35: SQL to NoSQL: Top 6 Questions

Relationships as Documents

37

Page 36: SQL to NoSQL: Top 6 Questions

Goal: Materialize Users & Following List

38

"join" by selecting rows at lines 103–105

Page 37: SQL to NoSQL: Top 6 Questions

Index Sorting Rules

39

http://wiki.apache.org/couchdb/View_collation

Page 38: SQL to NoSQL: Top 6 Questions

Materialize Users, With All Followed

40

Keys only, for now

Page 39: SQL to NoSQL: Top 6 Questions

Materialize Users, With All Followed

41

Keys +emitted values

Page 40: SQL to NoSQL: Top 6 Questions

Let's Query That View

42

https://webinar.cloudant.com/relational/_design/join/_view/follows?

startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}]

System-generated unique doc "_id":

Sort key Pointer to related followed user's doc "_id":

Page 41: SQL to NoSQL: Top 6 Questions

Let's Query That View, and Follow Pointers

43

https://webinar.cloudant.com/relational/_design/join/_view/follows?

startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}]&include_docs=true

Page 42: SQL to NoSQL: Top 6 Questions

Pretty Printed

44

Page 43: SQL to NoSQL: Top 6 Questions

Wait. What Did We Get?

45

• kocolosk’s USER document

• list of all USERs kocolosk FOLLOWS

• full USER document for all USERs that kocolosk FOLLOWS

• In a fast, single query:https://webinar.cloudant.com/relational/_design/join/_view/follows?

startkey=["user:kocolosk"]&endkey=["user:kocolosk",{}]&include_docs=true

Page 44: SQL to NoSQL: Top 6 Questions

6. Are There Tools That Make Migrating My Data to Cloudant Easier?

46

• Yes

• https://cloudant.com/for-developers/migrating-data/

• But every use case is different and everyone’s data is different

• Lots of DIY tools on github that could work for you

• Cloudant’s Homegrown CSV --> JSON Tools

• python: https://github.com/claudiusli/csv-import

• Java: https://github.com/cavanaugh-ibm/db-data-loader

• Some support for direct SQL queries to database

Page 45: SQL to NoSQL: Top 6 Questions

Big Time

47

• IBM InfoSphere

• Complex ETL tool that profiles, cleanses, and transforms data from heterogeneous data sources

• http://ibm.com/software/data/infosphere/

• SPViewer CouchDBPumper for Oracle

• Commercial tool for migrating data back and forth from CouchDB and Oracle

• http://spviewer.com/couchdbpump.html

• Eight-Wire Conductor

• Commercial tool for moving data between different sources

• http://www.eight-wire.com/

Page 46: SQL to NoSQL: Top 6 Questions

Legal Slide #1

48

© "Apache", "CouchDB", "Apache CouchDB", "Apache Lucene," "Lucene", and the CouchDB logo are trademarks or registered trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.

Page 47: SQL to NoSQL: Top 6 Questions

Legal Slide #2

49

© Copyright IBM Corporation 2015.

IBM and the IBM Cloudant logo are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at ibm.com/legal/copytrade.shtml