From sql server to mongo db

20
From SQL Server to MongoDB Ryan Hoffman, Senior Software Architect @tekmaven http://architectryan.com

description

Presentation given to the MongoDB NYC User Group on 9/27/2012. My blog: http://architectryan.com Twitter: @tekmaven

Transcript of From sql server to mongo db

Page 1: From sql server to mongo db

From SQL Server to MongoDB

Ryan Hoffman, Senior Software Architect

@tekmaven

http://architectryan.com

Page 2: From sql server to mongo db

2© TNTP 2012

TNTP + TeacherTrack

• TNTP is a national nonprofit committed to ending the injustice of educational inequality. Founded by teachers in 1997, TNTP works with schools, districts and states to provide excellent teachers to the students who need them most and advance policies and practices that ensure effective teaching in every classroom.

• TeacherTrack is a web-based applicant tracking and teacher evaluation system. TNTP recruits teachers for districts nationwide, including in New Orleans, Philadelphia, and New York City, with TeacherTrack.

Page 3: From sql server to mongo db

3© TNTP 2012

TeacherTrack Technology

• .NET 4.0

o ASP.NET Web Forms

o ASP.NET MVC

o WCF

o WF4

• NHibernate ORM for SQL Server

• MongoDB .NET Driver

• NServiceBus

• Lucene.NET

• Much, much more…

Page 4: From sql server to mongo db

4© TNTP 2012

Survey Templates

• TeacherTrack uses a flexible data structure called a Survey to store a majority of data. A survey works very similarly to the conceptual model of a SurveyMonkey survey.

• A Survey Template is a “master” survey in which blank survey instances are created from. A Survey template consists of some header data (a string key, an ID, as well as what site it is for) and an array of questions.

• Each question contains the question text, as well as properties that govern how the question is rendered (for example if it is a text box or a drop down).

Page 5: From sql server to mongo db

5© TNTP 2012

Surveys

• A blank survey is instantiated from the survey template. It contains header data that associates that survey to a user, and contains an array of responses.

• Each response contains the entire set of data from the question.

o If the original survey template is changed, we will always be able to load the original questions the survey was filled out with.

o It also allows for rendering a survey without needing to load a template.

Page 6: From sql server to mongo db

TeacherTrack Survey Demo

Page 7: From sql server to mongo db

7© TNTP 2012

Storing Surveys and Survey Object

One table for Surveys and another for Responses.

• 1 row in the Survey table.

• 1 row per response in Response table.

A survey with 20 responses would be stored in 21 rows.

class Survey {Guid Id { get; set; }Guid AccountId { get; set; }string Title { get; set; }List<Response> Responses { get; set; }

}

class Response {Guid Id { get; set; }string Value { get; set; }string QuestionText { get; set; }string QuestionTitle { get; set; }ElementTypes QuestionElementType { get; set; }ControlTypes QuestionControlType { get; set; }string Watermark { get; set; }

}//Additional fields omitted for brevity

Page 8: From sql server to mongo db

8© TNTP 2012

Page 9: From sql server to mongo db

9© TNTP 2012

SQL Server Challenges

• Performance!• Joining between the two tables was slow! We had >1 million

surveys and >16 million responses before converting to MongoDB.

• Actual query time in the application could easily be >200ms for one survey.

• There were existing pages in the application where we could easily need to load over 20 surveys. 10 second page load times are not fun to work with.

• Iterative Development• When alter tables take 20 minutes to run, deployment scripts

which were not designed with this in mind break and time out.

Page 10: From sql server to mongo db

10© TNTP 2012

Page 11: From sql server to mongo db

11© TNTP 2012

Why TNTP selected MongoDB

• Performance, durability, and scaling.

o Document databases allow for a richer schema.

o Replica sets are elegant, easy to set up, and reliable.

o Auto-sharding is a great future option to scale.

• 10gen rocks.

o Training. Switching from an RDBMS so a document database is a big paradigm shift. 10gen’s Developer and Administrator training did a great job giving key team members the skills to make this possible.

o Great support options. TNTP uses MMS to get insight their MongoDB servers, and we love that 10gen proactively can reach out to us based on server telemetry.

o Great people. From day one at training, I met many 10gen employees, including people responsible for the Windows version. This type of access and interaction can not be understated.

Page 12: From sql server to mongo db

12© TNTP 2012

Survey Documents in MongoDB

• Surveys are a great match for MongoDB.

• The number of responses never changes after a survey is instantiated, making it an ideal candidate for being an embedded array in the survey document.

• <10ms query times!

{"_id" : BinData(3,"vD+ifVfvS0qlk5vN8OPQOQ=="),"AccountId" : BinData(3,"B1giiULLskSEG7rYmdqBUA=="),"Title" : "Registering","Responses" : [{"_id" : BinData(3,"UvqabcPS1UGZipKODPKgGA=="),"Value" : "Ryan","QuestionText" : "What is your first name?","QuestionElementType" : 1,"QuestionControlType" : 1

}]

}

Page 13: From sql server to mongo db

13© TNTP 2012

Conversion

Query

SQL

Server

Convert

to BSON

Insert

into

Mongo

Page 14: From sql server to mongo db

14© TNTP 2012

Conversion - Multithreading

• The original proof of concept was single threaded. It took over two days to convert the data. When we refactored to a multithreaded model, conversion took less then 20 hours.

• Each of the three parts of the conversion run in their own thread

• A queue between each thread allows the threads to pass data along.

The query thread to add objects to a conversion queue for the conversion thread.

Similarly, the conversion thread adds converted objects to the insert queue for the insert thread.

System.Collections.Concurrent.BlockingCollection<T>made this very easy.

Page 15: From sql server to mongo db

15© TNTP 2012

Conversion – Auto Batching

• Returning millions of rows in one query is clearly not going to work well. We need to batch the source queries and iterate until 0 rows are returned in the batch.

• Querying batches out of SQL Server was very inconsistent. With no other load on the server, batches would take 45 seconds to over 10 minutes.

• Instead of making each batch a fixed number of rows, we had logic that timed how long the previous batch took. Based on trial and error, a 1 minute batch time became the target. The code would adjust the number of rows based on the previous query’s number of rows and the query’s time.

Page 16: From sql server to mongo db

16© TNTP 2012

Conversion - Incremental

• Converting the data is still a time consuming process. When we deploy code that uses MongoDB, all the data needs to be converted. Deployments generally take less then an hour. The “20 hours of downtime” discussion is not a great conversation to have with stakeholders

• The answer: pre-convert the data! When we deploy, convert only the last 24 hours of data, which may only take minutes.

• Surveys have a ModifiedOn date field. Using this is the key to converting! We did a lot of work and testing to make sure this field was always updated when a change was made.

• Surveys are never deleted. A delete flips a deleted flag on the row. This allowed us to not worry about incrementally tracking deletes.

• A command line switch allowed us to specify the start date of the conversion.

Page 17: From sql server to mongo db

17© TNTP 2012

Deployment Lessons

• Practice makes perfect. We took stories over 3 sprints (each sprint is 3 weeks) to prepare for the conversion.

• Always explicitly set your oplog size! The defaults created a 40gb oplog on the production servers. Since MongoDB uses memory mapped files, that 40gb oplogwas loaded into ram. The servers have 48gb of RAM. We resized to a more sane 3gb.

• If you have profiling turned on, you can’t fsyncLockthe server. We didn’t know this, and it immediately broke the backup scripts the first night. I added a ticket to 10gen for this, and the documentation now reflects this.

Page 18: From sql server to mongo db

18© TNTP 2012

Using MongoDB as a .NET Developer

• Since most users run MongoDB on Linux, I was concerned about reliability and performance running on Windows. I’m happy to say that MongoDB works very well on Windows and we’ve had no issues.

• The MongoDB .NET Driver is excellent. It allows raw BsonDocument access, or can map documents to your objects. It has very good LINQ support, and is constantly improving its API.

• Guids are the primary key for most structures. Working with them is very inconvenient in the shell. In fact, without the “UUID” helper from the C# driver’s git repo, it would be nearly impossible to use the shell to work with Guids.

Page 19: From sql server to mongo db

19© TNTP 2012

Wrap Up

• MongoDB was a game changer for TeacherTrack. Think. In. Documents.

• 10gen is a great company to work with. We are depending on MonogDB, and knowing that the people behind MongoDB were available for us was a huge plus.

• Pre-conversion and incremental conversion are the keys of minimizing deployment time when working with a large set of data.

• Most importantly, this was all made possible because of very talented team members at TNTP. You guys rock!

Page 20: From sql server to mongo db

Questions

Slides will be made available on my blog, located at

http://architectryan.com/