NoSQL: Delivering scale without complexity
Chris Smith Chief Architect Companies House
O!
nce Upon A Time....
1985 - 2008 ICL Mainframe
1997 Companies House First Website
MainframeWebsite
2004Web enabled filing of information
Data ModelWebsite required data storage
Fetch data from Mainframe
Store in website DB
Populate HTML form
Submit data back to DB
When complete, send to Mainframe
User iterate
Directors
Secretaries
Shareholders
Addresses
Shares
First stepsDesign a data model
NORMALISATION
Data ModellingCompanies House data is not normalised
Observation:A normalised model
increased software complexity
What about performance?
Normalised model:Multiple inserts and joins per entity
Core System
HTML form
XML
DB Tables Director Person Address Address
Iterate
XML
DB Tables Director Person Address Address
Website
Observation:Companies with large datasets experienced poor performance
Our first LessonUnderstand your data
ComplexityMultiple websites and API’s
Companies House Direct WebFiling WebCheck XML Gateway
Core Registry Additional Resources
3rd Party Systems
Where did this lead us?
It’s all about dataInformation is everything
It’s about customersWhat do customers need from the data?
How do they use it?
Second LessonUnderstand your use cases
What did we do?
Expose our dataFirst design a use case API
JSON, REST
Data ModellingModel to satisfy the API
Challenge conventionDuplicate data where required
Single website uses APISame use cases across channels
Choose a “database”Relational didn’t work Need an object store
Other criteriaHigh performance and dynamic
queries on rapidly changing data
Why MongoDB?Met all our needs
Agile, highly scalable and resilient Objects map neatly to language types
Design for performancePre-load MongoDB with API data
Reduce complexityData stored exactly as exposed by API
Data mirrorMongoDB mirrors core system data
Core system Oracle triggers
JSON API Delta update system
MongoDB Collections
HTTP
Queue
SynchronisationOracle triggers drive delta-update process
Information change
Handling deviationFunction to re-sync any data on demand
SimplicitySingle website and API
Companies House Direct WebFiling WebCheck XML Gateway
Core Registry Additional Resources
3rd Party Systems
Website
JSON API
Core Registry Additional Resources
3rd Party Systems
MaintainabilitySchemas restrict flexibility
Customer benefitsReliable, open access to data suiting need
Open RESTfull JSON API
Developer benefitsReduced complexity and learning curve,
flexibility
Service benefitsHighly available and horizontally scalable
Highly maintainable
MongoDB infrastructureThree replica set, SSD and spindle shards
System infrastructureMesos/Marathon nodes on
virtualised blade infrastructure NetApp storage
Software technologies
MongoDB ElasticSearch Mesos Marathon AWS Kibana Fluentd
Perl Scala Java nodejs Mojolicious compass swagger
statd git Ansible Jenkins HAProxy nginx Linux
Questions?