03 net saturday anton samarskyy ''document oriented databases for the .net platform''

27
Document-Oriented Databases for the .NET platform Anton Samarskyy

description

 

Transcript of 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Page 1: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Document-Oriented Databases for the .NET platform

Anton Samarskyy

Page 2: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Agenda

• Challenges of Relational Databases• NoSQL: not only SQL• Document store concept• Document-oriented databases• Raven DB• Raven DB Demo• MapReduce (optional)

Page 3: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Relational Databases properties

• ACID Atomic, Consistent, Isolated, Durable• Relational based on relation algebra & Codd’s work• Table / Row based• Rich querying capabilities• Foreign keys• Schema

Page 4: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

What do our apps need?

• Need to scale horizontally• Partition and replication• OnLine Transaction Processing and

OnLine Analytical Processing• Web 2.0• Performance, Performance, Performance• Flexibility• Big even Huge datasets

http://www.graph-database.org

Page 5: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Not only SQL philosophy

• Being non-relational, distributed, cloud-ready

• Open-source• Horizontally scalable: easy replication

support• Schema-free• Simple API• BASE (not ACID): Basically Available, Soft

state, Eventual consistency• Huge data amount

Page 6: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

noSQL Pros

+ Cheap, easy to implement+ Removes impedance mismatch between objects and tables+ Quickly process large amounts of data+ Data modeling flexibility+ Command Query Responsibility Segregation (CQRS), Event Sourcing

Page 7: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

noSQL Cons

- New technologies- Data is generally duplicated,

potential for inconsistency- No standard language or format for

queries- Depends on application layer to

enforce data integrity- Reporting

Page 8: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

NoSQL types

Common• Wide Column

Store / Column Families

• Key Value / Tuple Store

• Document Store• Graph Databases• Object Databases

Other• Grid & Cloud

Database Solutions

• XML Databases• Multivalue

Databases• File Databases

Page 9: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

CAP

• Consistency Each client has the same view

• Availability All clients can read and write

• Partition tolerance Works well across different network partitions

http://www.julianbrowne.com/article/viewer/brewers-cap-theorem

Page 10: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

You pick only two!

Page 11: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Who is using noSQL?

Page 12: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Document-oriented databases are

• Collection of independent documents: XML, JSON, JAML

• Non relational, i.e. do not store data in tables with uniform sized fields for each record

• Not limited with number of fields or length • Usually accessible via a RESTful HTTP/JSON

API• Horizontally scalable• Can be distributed• Fault-tolerant

Page 13: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Why documents store?

• Schema free• User generated content• Storing full complex object graphs• Low overhead – usually operate on a

single document:- One read, one write

• Fast• Known format means the database

can do interesting things with it…

Page 14: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Indexing

• Order in schema free world• Materialized views• Built on the background• Allow stale reads• Don’t slow down CRUD ops

Page 15: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Index concept

{ "name": "ayende", ”twitter": "@ayende", "projects": [ "rhino mocks", "nhibernate", "raven db", ] }

from doc in docs from prj in doc.projects select new {

Project = prj, Name = doc.Name

}

http://ayende.com/blog/4459/that-no-sql-thing-document-databases

GET /indexes/ProjectAndName?query=Project:raven

Page 16: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Document DB family• CouchDB: Apache project created by

Damien Katz;• RavenDB: Oren Eini and Hybernating

Rhinos project;• MongoDB: 10gen project.• SimpleDB: Amazon project. It is used

as a web service in concert with Amazon Elastic Compute Cloud;

Page 17: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Comparison

• CouchDB: Elang, REST API, JavaScript map-reduce quering (concurrent), via .NET helpers;

• MongoDB: C++, Dynamic Query (non-concurrent MapReduce), custom TCP/IP access, .NET drivers: 10gen, NoRM (Linq);

• RavenDB: .NET, REST API, Linq map to Lucene .NET + MapReduce;

• SimpleDB: Erlang, Name/Value store, basic queries, not RESTful, via .NET helpers.

Page 18: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Raven DB

• Build on excising infrastructure (ESENT) that is known to scale to amazing sizes

• Can be transactional, i.e. ACID: supports System.Transactions and can take part in distributed transactions

• Indexes via Linq query, implements IQueryable that map to Lucene

• Supports map/reduce operations

Page 19: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Raven DB

• Comes with fully functional .NET client API, Unit of Work, change tracking

• REST based, so you can access it via the Java Script API directly

• Support optimistic concurrency blocking

• Can be extended with MEF• Has triggering support• Supports Sharding and Replication

http://ravendb.net

Page 20: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Raven Extensibility

• MEF (Managed Extensibility Framework)

• Triggers- PUT trigger- DELETE trigger- Read trigger- Index update triggers

• Request Responders• Custom Serialization/Deserialization

Page 21: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Demo: RavenDB

• Setup, Server• RavenDB Client API• Denormalization, modeling

documents• CRUD• Attachments• Indexes• MapReduce indexes• Sharding

Page 22: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

MapReduce

• MapReduce is a programming model and an associated implementation for processing and generating large data sets

• Map function processes a key/value pair to generate a set of intermediate key/value pairs

• Reduce function that merges all intermediate values associated with the same intermediate key

Page 23: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Map

Page 24: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Sort

Page 25: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Reduce

Page 26: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Sharding

• Sharding refers to horizontal partitioning of data across multiple machines

• The idea is to split the load across many commodity machines, instead of buying huge expensive servers

Page 27: 03 net saturday anton samarskyy ''document oriented databases for the .net platform''

Thanks!

Questions or comments?