Infovision 2011, BFSI Risk Management, AN Jayaraman and Zeeshan Rahseed, TCS
Infovision Anand S _ no sql workshop
-
Upload
information-excellence -
Category
Documents
-
view
351 -
download
3
description
Transcript of Infovision Anand S _ no sql workshop
NoSQLData ScientistGramener.com
Relational DatabasesNon
Relational DatabasesNon
Professor Hired on Course
Anand 10 Jan 2012 Maths
Bala 15 Jan 2012 Physics
Chandra 20 Jan 2012 Chemistry
Dileep 25 Jan 2012 ???
INSERT ANOMALY
UPDATE ANOMALY
Professor Hired on Course
Anand 10 Jan 2012 Maths
Bala 15 Jan 2012 Physics
Chandra 20 Jan 2012 Chemistry
Dileep 25 Jan 2012 Biology
DELETE ANOMALY
Name Address PhoneAnand 10 Mount Rd, Chennai 98765 43210
Bala 15, Janpath, New Delhi 90123 45678
Chandra 20, Marine Dr, Mumbai 91234 56780
Chandra 20, Marine Dr, Mumbai 91234 56781
WHY NOW?
DATA VOLUME IS GROWING
SEMI-STRUCTURED DATA DISTRIBUTED ARCHITECTURE
161253
397
623
988
2006 2007 2008 2009 2010
DATA IS INCREASINGLY NETWORKED
How many programmers?
… who’ve programmed NoSQL DBs
A POLL
How many non-IT folks?
data is stored in
TABLES
Key-value stores
Document databases
Graph databases
C
A
P
Brewer’s CAPTheorem
Pick Two
Consistency PartitionTolerance
Availability
Key-value stores
Document databases
Graph databases
Columnar databases
KEY VALUE STORES DOCUMENT DATABASES
COLUMNAR DATABASES GRAPH DATABASES
RedisCassandraMemcacheVoldemortDynamoTokyo Cabinet
CouchDBMongoDBSimpleDBRiakTerrastoreLotus Domino
CassandraBigTableHypertableHbaseVerticaInfiniDB
Neo4jFlockDBGraphDBOrientDBInfiniteGraphAllegroGraph
KEY VALUE STORES DOCUMENT DATABASES
COLUMNAR DATABASES GRAPH DATABASES
RedisCassandraMemcacheVoldemortDynamoTokyo Cabinet
CouchDBMongoDBSimpleDBRiakTerrastoreLotus Domino
CassandraBigTableHypertableHbaseVerticaInfiniDB
Neo4jFlockDBGraphDBOrientDBInfiniteGraphAllegroGraph
The first time round, the mistakes were around scalability. I used a SQL “ORDER BY RAND()” statement to return the next page to review. I knew this was an inefficient operation, but I assumed that it wouldn’t matter since the button would only be clicked occasionally.
Something like 90% of our database load turned out to be caused by that one SQL statement, and it only got worse as we loaded more pages in to the system. This caused multiple site slow downs and crashes.
The second time round I turned to my new favourite in-memory data structure server, redis, and its SRANDMEMBER command (a feature I requested a while ago with this exact kind of project in mind). The system maintains a redis set of all IDs that needed to be reviewed for an assignment to be complete, and a separate set of IDs of all pages had been reviewed. It then uses redis set intersection (the SDIFFSTORE command) to create a set of unreviewedpages for the current assignment and then SRANDMEMBER to pick one of those pages.
CouchDB
[email protected] gramener.com s-anand.net @sanand0 on Twitter
+91 9741 552 552
EXERCISE: DESIGN THE SSLC MARKS DATABASE
Each student has an ID.There are totally 11 languages and 92 non-language subjects.
Students usually write 3 language and 3 non-language exams.
For example,
• (English, Hindi, Sanskrit), (Maths, Physics, Chemistry)• (Kannada, Urdu, Marathi), (Commerce, Accountancy, Economics)
You need to record their marks in all 6 subjects, and the total.
EXERCISE: DESIGN THE SSLC MARKS DATABASE
Common queries:
Who scored the highest in Maths?
Which subject had the highest fail %?
How many failed in 1 subject?
Some scenarios:
Access from multiple locations
Real-time marks updation
Guarantee of correctness