My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
Transcript of My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
I am and always will be the optimist. The hoper of far-flung hopes and the dreamer of improbable dreams.
The Doctor, Season 6, Episode 6
THESE ARE THE SPEAKERS NOTES.
https://www.flickr.com/photos/boyce-d/4205175031
Next Generation Cassandra Conference Gary Dusbabek
Gary from SVDSCommitter since 2009Seen a lot of what has transpiredCall for presentations first went out.Interesting work I had done around metrics at Rackspace.Fat Client and other ways of extending Cassandra
CurrentGeneration(Past)NextGeneration(Futshure)Realized a few things as I started preparing.But thats not really next generation, is it?As I was working on metrics for Rackspace, I thoughtIt would be nice to change this, and that, and, and, and.This is not the Current Generation Cassandra Conference
Some of these ideas and challenges are worth talking about.
https://www.flickr.com/photos/saarblitz/16803524015
A Heretics Vision of the Future of Cassandra Gary Dusbabek
Leveraging Cassandra to Build New Distributed SystemsAt some points I will wax heretical. I will hold up this blue star. Indication to suspend your disbelief and/or cognitive biases.We will pause in the middle of this presentation for a reality check and a few deep breathing exercises.
Current StatusThe FutshureHow to Get ThereMeasuring SuccessAddress ConcernsRoadmap
For the most part, this will be a fun ride.Humor me.
https://www.flickr.com/photos/simone_pittaluga/6877522821
Current StateRegular releasesMany usersAmong the top NoSQL databasesFastScalableUsability is getting better
This is remarkable
https://www.flickr.com/photos/downeym/6063328180
What is our trajectory?What trajectory are we on?Not What new features are we going to add?But What is the next generation of this project?
https://www.flickr.com/photos/blair25/3240324932
What is our trajectory?What trajectory are we on?Not What new features are we going to add?But What is the next generation of this project?Its more of a path.
https://www.flickr.com/photos/blair25/3240324932Six MonthsvsThree Years
What is the next generation for this project?If you had to rate cassandra as being in the beginning- middle- or end- of its lifecycle, where would you put it?Umm. What is the expected lifecycle?Dunno.Look at life cycles and trajectories of other open source projects.
https://www.flickr.com/photos/judy-van-der-velden/6637487865Other Projects
Look at life cycles and trajectories of other open source projects.Examine attributes that made them successful.
https://www.flickr.com/photos/judy-van-der-velden/6637487865Apache Httpd
Version 1 2 Migration
Incremental upgrades until version 2.Then incremental upgrades to 1 and 2.Point: Devs dug in and made necessary changes for the future in 2.Mistakes:But 1.3 continued to be good enough, which ruled over 2s dunno2 had some FUD problems (cautionary tale)Upgrade process was not simple (multiple files, etc.)Composability
Existed as a library for a long time.Begat SolrMerged with it,Then kind of split from it again.Elasticsearch has its roots in Lucene.Point: Focus on how ES composes Lucene
PostgresSince 1996
Dozens of service and support companies(see http://www.postgresql.org/support/professional_support)
Postgres (success story)since 1996dozens of companies providing support and servicesEvidence of a healthy ecosystem.Project is obviously playing a very long game.Since 2007Spawned subprojects:HBase, Pig, AvroHive, Spark
since 2007.roughly same age as CassandraHas spawned subprojects, etc., Pig, Hive, HBase, Spark, AvroInteresting:Few years older than cassandraStarting to lose its edge => look at traction around SPARKProlonged by YARN
Since 2009Ten service and support companies.
Cassandra, since 2009. Our wiki lists 10 support companiesNo subprojectsJust a really good database.
Ecosystems
Every open source project is different (use snowflake image here).Is there anything in common?Summarize: these projects spawned ecosystemsWhat enabled them to do this?Answer: each project did different things.Question: What is the cassandra community doing to develop an ecosystem?My take: not muchBut I think there are good reasons for this.
https://www.flickr.com/photos/chaoticmind75/5529107926
We are just getting started
The good news is that I think were just getting started.Lets look at some data (that suggests we are just getting started).
https://www.flickr.com/photos/matthewpaulson/5794901439
WARNING!Possibly meaningless data.I draw my own conclusions.Wrong in the past.
WARNING: Data ahead.Used to support my argumentsCould be used to support other argumentsYou do not have to agree with me
https://www.flickr.com/photos/arenamontanus/3492063978
#cassandra = 285source: https://twitter.com/postgresql/status/586210482433818624
https://twitter.com/postgresql/status/586210482433818624 (#cassandra was 285)Job Postings
Cassandra more popular than postgres in this metric.Still kind of small given a different context.
Overall trend = downLucene+Solr the winner hereMongo makes strong showingFindings?
What conclusions can you draw?Not scientific. Draw your own.I think it points to: that were just getting started....or that we are just our own snowflake.
https://www.flickr.com/photos/tambako/3593686294
Futuristic Vision
Taking all those things into accountRisks of having futuristic views
Our predictions are always sterileBecause they tend towards utopiaWhat we need is to distil the ideasAnd figure out what we like about the utopiaIsolate them and target them.In this picture you dont see the garbage menOr all the pipes and systems undergroundI cant imagine a world without tractors and ditches.There will always be tractors and ditches.
https://www.flickr.com/photos/x-ray_delta_one/3815958811Reusable Composable Parts
Id like to introduce my Futuristic Version of the Future of Cassandras FutureAn Ecosystem where the parts of cassandra can be re-used to build systems that may outlive the project itself.It will end up growing our base.There are good side effects (will get to those later)You might be thinking...
And thats ok.Get this:Reusable Composable PartsHealthier EcosystemBetter SoftwareMore UsersMore Committers
I think a Cassandra made up of parts will have some positive attributes:
Ok. What parts?
What do we need to do?
What is the next stepThis is how we could proceed.
https://www.flickr.com/photos/iansand/3999841402
What Must Be Done TodayModularize the code
Modularize the code so we can be more nimble.Nimble?Make reusable parts.Maybe even subprojects.
https://www.flickr.com/photos/pedromourapinheiro/5075612989
What Must Be Done TodayCommit Log
Commit LogA concurrent journaling system that accepts bytes and supports checkpoints and recovery.Refactor out the parts that keep track of CF last write.
https://www.flickr.com/photos/pedromourapinheiro/5075612989
What Must Be Done TodayInternode Messaging
Internode MessagingAlmost its own framework.Core: a way to send acked and non-acked messages between nodes with guaranteed delivery.Also includes the notion of verbs and handlers.Wed need ad hoc verbs.
https://www.flickr.com/photos/pedromourapinheiro/5075612989
What Must Be Done TodayFailure Detector + Gossiper
Failure DetectorClustering software
https://www.flickr.com/photos/pedromourapinheiro/5075612989
What Must Be Done TodayPluggable Storage
Not the first time this topic has come upOptimized for read cases.Probably more interested in tuning the lookup/query strategy for reducing seeks.
https://www.flickr.com/photos/pedromourapinheiro/5075612989
What Must Be Done TodaySEDA Architecture
Internal event and worker queues.Seeing some of this shake out in guavaCombine Function object and Threadpools.
https://www.flickr.com/photos/pedromourapinheiro/5075612989
These next two things are hard for me to say.But Im going to say them anyway.
What Must Be Done TodayAdopt a modern build tool
Current build file is a discombobulation
https://www.flickr.com/photos/pedromourapinheiro/5075612989
What Must Be Done TodayKill the singletons
Kill the singletons
https://www.flickr.com/photos/pedromourapinheiro/5075612989
Evidence of Success(or failure)
This is where we get to experience some aspect of the utopia
SubprojectsMore CommittersMore Users (cross-project adoption)
Generally a bigger footprintThis isnt all...
StabilityEasier testsNimbler
Since the systems are independent and less coupledThis would be a BIG help for the project moving forward.Might even help us hit some of those short term goals more quickly.Less coupling means that we can write better tests more easily.
Some may argue: at what expense?There are always tradeoffs, right?Well cover those.
Have Concerns?
This is where I address your concerns before you have them.Maybe.There are many reasons that make this not practicalReasons why this may not happen.
https://www.flickr.com/photos/boyce-d/4205175031
Have Concerns?Best case - no bugsWorst case - many bugs(byte buffers, anybody?)
Best case: no bugs introduced.
https://www.flickr.com/photos/boyce-d/4205175031
Have Concerns?Touches every classComplicated Merges
Complicates merges.
https://www.flickr.com/photos/boyce-d/4205175031
Have Concerns?Gives up the short term
https://www.flickr.com/photos/boyce-d/4205175031
Have Concerns?Is this right for the database?
Is this right for the database?
https://www.flickr.com/photos/boyce-d/4205175031
Have Concerns?Is this right for the project?
Is this right for the database?
https://www.flickr.com/photos/boyce-d/4205175031
Real Question:BenefitsCost
Real question: Do the benefits outweigh the costs?Not something Ill attempt to argue for or against here.But I set forth what I perceive as the benefits early on:Stability, Testing, Bigger Footprint.Call to greatness.
https://www.flickr.com/photos/archeon/2941655917
We have the opportunity to make something great.
Call to greatnessNot just something great,But a great thing even betterAnd the chance of greater things
https://www.flickr.com/photos/lara604/5405044734THANK YOU!!!!11Gary [email protected]@gdusbabekYes, were [email protected]#Something went wrong with the font on the "Thank You"Photo & Image CreditsConan O'BrienThe InternetGoathttps://www.flickr.com/photos/saarblitz/16803524015Roadhttps://www.flickr.com/photos/simone_pittaluga/6877522821Stoneshttps://www.flickr.com/photos/downeym/6063328180Froghttps://www.flickr.com/photos/blair25/3240324932Clockhttps://www.flickr.com/photos/judy-van-der-velden/6637487865FeatherApache FoundationLuceneApache FoundationSolrApache FoundationElasticsearchElasticsearch.comPostgresPostgreSQL Global Development GroupHadoopApache FoundationCassandraApache FoundationSnowflakehttps://www.flickr.com/photos/chaoticmind75/5529107926Duckhttps://www.flickr.com/photos/matthewpaulson/5794901439Warninghttps://www.flickr.com/photos/arenamontanus/3492063978Monkeyhttps://www.flickr.com/photos/tambako/3593686294Futurehttps://www.flickr.com/photos/x-ray_delta_one/3815958811Jackie ChanInternet MemeGearshttps://www.flickr.com/photos/iansand/3999841402Cranehttps://www.flickr.com/photos/pedromourapinheiro/5075612989FistpumpInternet MemeTardishttps://www.flickr.com/photos/boyce-d/4205175031Scalehttps://www.flickr.com/photos/archeon/2941655917Baconhttps://www.flickr.com/photos/lara604/5405044734
All Flickr images are CC BY-NC-ND 2.0