03 net saturday anton samarskyy ''document oriented databases for the .net platform''
-
Upload
dneprciklumevents -
Category
Education
-
view
113 -
download
1
description
Transcript of 03 net saturday anton samarskyy ''document oriented databases for the .net platform''
Document-Oriented Databases for the .NET platform
Anton Samarskyy
Agenda
• Challenges of Relational Databases• NoSQL: not only SQL• Document store concept• Document-oriented databases• Raven DB• Raven DB Demo• MapReduce (optional)
Relational Databases properties
• ACID Atomic, Consistent, Isolated, Durable• Relational based on relation algebra & Codd’s work• Table / Row based• Rich querying capabilities• Foreign keys• Schema
What do our apps need?
• Need to scale horizontally• Partition and replication• OnLine Transaction Processing and
OnLine Analytical Processing• Web 2.0• Performance, Performance, Performance• Flexibility• Big even Huge datasets
http://www.graph-database.org
Not only SQL philosophy
• Being non-relational, distributed, cloud-ready
• Open-source• Horizontally scalable: easy replication
support• Schema-free• Simple API• BASE (not ACID): Basically Available, Soft
state, Eventual consistency• Huge data amount
noSQL Pros
+ Cheap, easy to implement+ Removes impedance mismatch between objects and tables+ Quickly process large amounts of data+ Data modeling flexibility+ Command Query Responsibility Segregation (CQRS), Event Sourcing
noSQL Cons
- New technologies- Data is generally duplicated,
potential for inconsistency- No standard language or format for
queries- Depends on application layer to
enforce data integrity- Reporting
NoSQL types
Common• Wide Column
Store / Column Families
• Key Value / Tuple Store
• Document Store• Graph Databases• Object Databases
Other• Grid & Cloud
Database Solutions
• XML Databases• Multivalue
Databases• File Databases
CAP
• Consistency Each client has the same view
• Availability All clients can read and write
• Partition tolerance Works well across different network partitions
http://www.julianbrowne.com/article/viewer/brewers-cap-theorem
You pick only two!
Who is using noSQL?
Document-oriented databases are
• Collection of independent documents: XML, JSON, JAML
• Non relational, i.e. do not store data in tables with uniform sized fields for each record
• Not limited with number of fields or length • Usually accessible via a RESTful HTTP/JSON
API• Horizontally scalable• Can be distributed• Fault-tolerant
Why documents store?
• Schema free• User generated content• Storing full complex object graphs• Low overhead – usually operate on a
single document:- One read, one write
• Fast• Known format means the database
can do interesting things with it…
Indexing
• Order in schema free world• Materialized views• Built on the background• Allow stale reads• Don’t slow down CRUD ops
Index concept
{ "name": "ayende", ”twitter": "@ayende", "projects": [ "rhino mocks", "nhibernate", "raven db", ] }
from doc in docs from prj in doc.projects select new {
Project = prj, Name = doc.Name
}
http://ayende.com/blog/4459/that-no-sql-thing-document-databases
GET /indexes/ProjectAndName?query=Project:raven
Document DB family• CouchDB: Apache project created by
Damien Katz;• RavenDB: Oren Eini and Hybernating
Rhinos project;• MongoDB: 10gen project.• SimpleDB: Amazon project. It is used
as a web service in concert with Amazon Elastic Compute Cloud;
Comparison
• CouchDB: Elang, REST API, JavaScript map-reduce quering (concurrent), via .NET helpers;
• MongoDB: C++, Dynamic Query (non-concurrent MapReduce), custom TCP/IP access, .NET drivers: 10gen, NoRM (Linq);
• RavenDB: .NET, REST API, Linq map to Lucene .NET + MapReduce;
• SimpleDB: Erlang, Name/Value store, basic queries, not RESTful, via .NET helpers.
Raven DB
• Build on excising infrastructure (ESENT) that is known to scale to amazing sizes
• Can be transactional, i.e. ACID: supports System.Transactions and can take part in distributed transactions
• Indexes via Linq query, implements IQueryable that map to Lucene
• Supports map/reduce operations
Raven DB
• Comes with fully functional .NET client API, Unit of Work, change tracking
• REST based, so you can access it via the Java Script API directly
• Support optimistic concurrency blocking
• Can be extended with MEF• Has triggering support• Supports Sharding and Replication
http://ravendb.net
Raven Extensibility
• MEF (Managed Extensibility Framework)
• Triggers- PUT trigger- DELETE trigger- Read trigger- Index update triggers
• Request Responders• Custom Serialization/Deserialization
Demo: RavenDB
• Setup, Server• RavenDB Client API• Denormalization, modeling
documents• CRUD• Attachments• Indexes• MapReduce indexes• Sharding
MapReduce
• MapReduce is a programming model and an associated implementation for processing and generating large data sets
• Map function processes a key/value pair to generate a set of intermediate key/value pairs
• Reduce function that merges all intermediate values associated with the same intermediate key
Map
Sort
Reduce
Sharding
• Sharding refers to horizontal partitioning of data across multiple machines
• The idea is to split the load across many commodity machines, instead of buying huge expensive servers
Thanks!
Questions or comments?