Clusterpoint Inside-Out
Jur is Orupsģ
Development stages Planning – Idea Infant – Minimum Viable Product Child – Trial and error Teenager – Pivot & Execute
Grown ups – ... soon (:
Inspiration (2001) FTS for Sybase & FoxPro First distributed design & implementation – trying to bite Google (: Folk song search portal www.dainuskapis.lv
Long long time ago
Talk is cheap.Show me the code.(c) Linus Torvalds
Inverted Index Problem – real time updates to index
Pierpaolo Basile, Information Access with Lucene, Slideshare.net
Talk is cheap.Show me the code.(c) Linus Torvalds
Inverted Index
Pierpaolo Basile, Information Access with Lucene, Slideshare.net
Talk is cheap.Show me the code.(c) Linus Torvalds
Inverted Index
Pierpaolo Basile, Information Access with Lucene, Slideshare.net
Infant (2006) Clusterpoint (2006) – first startup in LV Seeded by Imprimatur Capital Team of 2.5 developers and 0.5 CEO 6 months wicked C/C++ coding biting Google again – search appliance vertical
- “didn't go well”
Talk is cheap.Show me the code.(c) Linus Torvalds
Inverted index Two type FTS indices:
− Memory (mutable)− Disk based (immutable)
Dump memory index when full Merge dumpings Problem solved – real time updates!
Talk is cheap.Show me the code.(c) Linus Torvalds
Query language Simple query js developer Dublin
Advanced queryjs developer<sex>=”female”</sex><salary>2000 .. 5000</salary><place>=”Dublin”</place>
Aggregation (SQL like)SELECT sex, count(sex) GROUP BY sex limit 1
Talk is cheap.Show me the code.(c) Linus Torvalds
Lookup tables (column-stores) Associative array/hash map Constant access/modify time Memory mapped Append only Perfect when accesing data by column
i.e. aggregation, faceting, filtering
Child (2008)
Trust in enterprise sales model First commercial customers
(directories, portals, e-shops, public sector) Positioning as database challenging NoSQL – heard nothing about it ... mhm maybe we are NoSQL ?!The San Francisco NOSQL Meetup on June 11, 2009 was important to the trend's development.
(Wikipedia)
“Family”
Market Trends
Teenager (these days) Less trust in enterprise model Shift to free software & Cloud Grow customer base Innovate Develop for developers
Talk is cheap.Show me the code.(c) Linus Torvalds
Transactions – for what? ATM cash withdrawal Checkout Transfer of goods (monies, credits, lifes :) Booking
Talk is cheap.Show me the code.(c) Linus Torvalds
Transactions – example Begin Retrieve value for A1 Retrieve value for A2 Check Update value for A1 Update value for A2 Commit
Talk is cheap.Show me the code.(c) Linus Torvalds
Transactions – behind the scenes Begin – fix the “view of the world” Retrieve A1 (version v1) Retrieve A2 (version v2) Check Update A1: if v1' != v1 then rollback else
continue Update A2: if v2' != v2 then rollback else
continue Commit (save final versions in transaction log)
Talk is cheap.Show me the code.(c) Linus Torvalds
Transactions – “view of the world”
D1: TID1, TID6D2: TID2
D3: TID3, TID8D4: TID4D5: TID5
Shard1
D6: TID1, TID6D7: TID2, TID8
D8: TID3D9: TID4
D10: TID5
Shard2
TID1: D1,D6TID2: D2, D7TID3: D3, D8TID4: D4, D9TID5: D5, D10TID6: D1, D6TID7: D9, D8TID8: D3, D7
Transaction Log
1.
2. Retrieve
3.
Talk is cheap.Show me the code.(c) Linus Torvalds
Transactions – behind the scenes Begin – fix the “view of the world” Retrieve A1 (version v1) Retrieve A2 (version v2) Check Update A1: if v1' != v1 then rollback else
continue Update A2: if v2' != v2 then rollback else
continue Commit (save final versions in transaction log)
Talk is cheap.Show me the code.(c) Linus Torvalds
Transactions - distributed Tough because of sharding & replication Transaction log – no SPOF and it scales via
sharding & replication Optimistic locking – high concurrency Isolation – phantom reads
Talk is cheap.Show me the code.(c) Linus Torvalds
Transactions API
Benchmarks(single shard)
Ingestion (structured) – 25'000 ops Ingestion (text) – 1'800 ops Query (fts) – 4'700 ops Transactions (2r + 2w) – 3'500 ops
Cloud 6 months of stacking & racking & wiring 800 CPU Cores/250TB Storage/3TB RAM Real on-demand resources Pay per use model
Lots of hardware
How does it work?
Once database is stored in Clusterpoint Cloud it is broken up in many shards and distributed among many servers.
Try it Signup for Cloud
http://cloud.clusterpoint.com Atendees 3 months free of charge access upt
to 100GB storage Be part of community Have a fun!
twitter.com/clusterpoint
Top Related