The Future of Big Data is Relational (or why you can't escape SQL)

54
The Future of Relational (or Why You Can't Escape SQL) [email protected] Twitter: @tobrien Thursday, February 28, 13

Transcript of The Future of Big Data is Relational (or why you can't escape SQL)

Page 1: The Future of Big Data is Relational (or why you can't escape SQL)

The Future of Relational (or Why You Can't

Escape SQL)

[email protected]

Twitter: @tobrien

Thursday, February 28, 13

Page 2: The Future of Big Data is Relational (or why you can't escape SQL)

In this session...OuroborosCopernican RevolutionPtolemaic EntrenchmentJanusA two minute summary of the last 15 yearsGoogle MagicThe Future of SQL

Thursday, February 28, 13

Page 3: The Future of Big Data is Relational (or why you can't escape SQL)

Tim O’Brien I’m a developer who also writes

[email protected] Twitter: @tobrien

Thursday, February 28, 13

Page 4: The Future of Big Data is Relational (or why you can't escape SQL)

Thursday, February 28, 13

Page 5: The Future of Big Data is Relational (or why you can't escape SQL)

Thursday, February 28, 13

Page 6: The Future of Big Data is Relational (or why you can't escape SQL)

Revolution

Thursday, February 28, 13

Page 7: The Future of Big Data is Relational (or why you can't escape SQL)

Remember all that Big DataStuff?

Thursday, February 28, 13

Page 8: The Future of Big Data is Relational (or why you can't escape SQL)

Remember when we all thought it was time to give up schemas?

Man, wasn’t that a lot of work.

Thursday, February 28, 13

Page 9: The Future of Big Data is Relational (or why you can't escape SQL)

What if the relational database “catches up”?

What then?

Thursday, February 28, 13

Page 10: The Future of Big Data is Relational (or why you can't escape SQL)

How we market Big Data:

Big Data == Paradigm Shift

“singularity” > “disruptor”

Thursday, February 28, 13

Page 11: The Future of Big Data is Relational (or why you can't escape SQL)

Thursday, February 28, 13

Page 12: The Future of Big Data is Relational (or why you can't escape SQL)

Thursday, February 28, 13

Page 13: The Future of Big Data is Relational (or why you can't escape SQL)

“Big Data” is to “Traditional Databases” as...

Copernicus is to Ptolemy

Thursday, February 28, 13

Page 14: The Future of Big Data is Relational (or why you can't escape SQL)

Out with the “old”In with the “new”

Thursday, February 28, 13

Page 15: The Future of Big Data is Relational (or why you can't escape SQL)

Copernicus’model

1543 AD

Claudius Ptolemy~150 AD

Thursday, February 28, 13

Page 16: The Future of Big Data is Relational (or why you can't escape SQL)

Google’s BigTablePaper - 2006

Edgar F. Codd

“A Relational Model ofData for Large Shared

Data Banks”1970

Hadoop - 2007

Thursday, February 28, 13

Page 17: The Future of Big Data is Relational (or why you can't escape SQL)

Thursday, February 28, 13

Page 18: The Future of Big Data is Relational (or why you can't escape SQL)

Google’s BigTablePaper - 2006

Codd

Hadoop - 2007

+ =Text

Google F1, SpannerTranslattice, Impala,Drawn-to-Scale

NuoDB, Akiban, manymore NewSQL products

Thursday, February 28, 13

Page 19: The Future of Big Data is Relational (or why you can't escape SQL)

Thursday, February 28, 13

Page 20: The Future of Big Data is Relational (or why you can't escape SQL)

Thursday, February 28, 13

Page 21: The Future of Big Data is Relational (or why you can't escape SQL)

YouthLooking Forward

AgeLooking Backward

Thursday, February 28, 13

Page 22: The Future of Big Data is Relational (or why you can't escape SQL)

Whatever.

Haven’t you heard?

Databases don’t scale.

Let’s create a schema.

Ok?

Thursday, February 28, 13

Page 23: The Future of Big Data is Relational (or why you can't escape SQL)

And, both are right...

Thursday, February 28, 13

Page 24: The Future of Big Data is Relational (or why you can't escape SQL)

• \

Thursday, February 28, 13

Page 25: The Future of Big Data is Relational (or why you can't escape SQL)

Thursday, February 28, 13

Page 26: The Future of Big Data is Relational (or why you can't escape SQL)

Thursday, February 28, 13

Page 27: The Future of Big Data is Relational (or why you can't escape SQL)

Text

2000 In the beginning...

Proprietary app servers

Big Oracle database

Thursday, February 28, 13

Page 28: The Future of Big Data is Relational (or why you can't escape SQL)

2001

Text

More traffic?

Specialized application servers

Throw hardware at the database

Thursday, February 28, 13

Page 29: The Future of Big Data is Relational (or why you can't escape SQL)

2002-2005 More traffic?

Specialized application servers

Throw hardware at the database

Thursday, February 28, 13

Page 30: The Future of Big Data is Relational (or why you can't escape SQL)

2005 Event More Traffic?

Sharding.... ugh.

Everything else was scaling horizontal exceptthe database.

Tex

Thursday, February 28, 13

Page 31: The Future of Big Data is Relational (or why you can't escape SQL)

2006 - New Reality of Big Data

Google’s BigTablePaper - 2006

Hadoop - 2007

Q: What would Google do?A: Not use a RDBMs

Thursday, February 28, 13

Page 32: The Future of Big Data is Relational (or why you can't escape SQL)

2006

Big Data for a few

RDBMs for most

vs.

Thursday, February 28, 13

Page 33: The Future of Big Data is Relational (or why you can't escape SQL)

2007

Who  needs  Foreign  Keys?Transac3ons?  Just  Simplify

Text

•The  rise  of  Database  “Luddites”

Thursday, February 28, 13

Page 34: The Future of Big Data is Relational (or why you can't escape SQL)

2007

Text

•The  rise  of  Database  “Luddites”

Rails  hacked  away  @  database  “orthodoxy”

Opened  the  door  to  alterna3ve  approaches

Thursday, February 28, 13

Page 35: The Future of Big Data is Relational (or why you can't escape SQL)

•Although,  Basecamp  is  s3ll  a  single  RDBMS…

Thursday, February 28, 13

Page 36: The Future of Big Data is Relational (or why you can't escape SQL)

2007- present == Alternatives•Documents

–MongoDB  –  Started  in  2007,  OSS  in  2009–CouchDB  –  Started  in  2005

•Graphs–Neo4j

•Key-­‐Value  Stores–Cassandra–Riak–Tokyo  Cabinet

•Memory–Memcached  /  Redis

•Tabular–HBase

Thursday, February 28, 13

Page 37: The Future of Big Data is Relational (or why you can't escape SQL)

2012 Q: What databasedo you use?

A: All of them

Oracle, Mongo, MySQL, Impala,Riak, some memcache, and some Hadoop thrown in for fun

Text

Thursday, February 28, 13

Page 38: The Future of Big Data is Relational (or why you can't escape SQL)

Thursday, February 28, 13

Page 39: The Future of Big Data is Relational (or why you can't escape SQL)

Big Data a Necessity at Largest Scale

Most development still RDBMS

“A certain kind of developer at a certain kind of company”

Thursday, February 28, 13

Page 40: The Future of Big Data is Relational (or why you can't escape SQL)

•There’s  this  company  that  sells  adver3sing–~96%  of  revenue  came  from  adver3sing  in  2011–~75%  of  the  US  Search  Advert  Market  in  2011–~44%  shared  of  overall  online  ad  market

•One  of  the  most  important  applica3ons  at  Google  ran  on  MySQL  –AdWords  missed  the  NoSQL  revolu3on

Thursday, February 28, 13

Page 41: The Future of Big Data is Relational (or why you can't escape SQL)

Digging into the evolution of Storage at Google

•Google’s  BigTable  –  2006–Tabular–Sparse,  distributed,  mul3-­‐dimensional  sorted  map

Thursday, February 28, 13

Page 42: The Future of Big Data is Relational (or why you can't escape SQL)

Digging into the evolution of Storage at Google

•Google’s  BigTable  –  2006

–“New  users  []  uncertain  of  how  to  best  use  the  BigTable  interface,  par3cularly  if  they  are  accustomed  to  using  rela3onal  databases  that  support  general-­‐purpose  transac3ons.”

Thursday, February 28, 13

Page 43: The Future of Big Data is Relational (or why you can't escape SQL)

Digging into the evolution of Storage at Google

•Google’s  Megastore  –  2010–Hierarchical  “schemas”–Posi3oned  as  a  NoSQL  store–ACID  within  par33ons

Thursday, February 28, 13

Page 44: The Future of Big Data is Relational (or why you can't escape SQL)

Digging into the evolution of Storage at Google

•Google’s  Megastore  –  2010

–“Supports  two-­‐phase  commit  for  atomic  updates  []  these  transac3ons  have  much  higher  latency  and  increase  the  risk  of  conten3on,  we  generally  discourage  applica3ons  from  using  the  feature“

Thursday, February 28, 13

Page 45: The Future of Big Data is Relational (or why you can't escape SQL)

Digging into the evolution of Storage at Google•Google’s  Spanner  &  F1  –  2012•Paper  published  in  2012–Hierarchical,  Semi-­‐rela3onal  Schemas–ACID  across  con3nents  possible  -­‐  14ms  transac3on  overhead  in  a  data-­‐center  with  clock  uncertainty  of  1ms.–SQL

–Focus  on  Performance  •Gated  by  Clock  Uncertainty•Consensus:  Paxos

Thursday, February 28, 13

Page 46: The Future of Big Data is Relational (or why you can't escape SQL)

What Differentiates Google Spanner?•Transac3ons  are  only  possible  because  of  Paxos

•Forget  NTP,  Google  has  “Reified  Clock  Uncertainty”•Epsilon,  clock  uncertainty,  is  the  ga3ng  factor  for  gaining  consensus  on  transac3on  3mestampe.

•It’s  all  about  Time•“as  the  underlying  system  enforces  3ghter  bounds  on  clock  uncertainty,  the  overhead  of  the  stronger  seman3cs  decreases.  As  a  community,  we  should  no  longer  depend  on  loosely  synchronized  clocks  and  weak  3me  APIs  in  designing  distributed  algorithms.

Thursday, February 28, 13

Page 47: The Future of Big Data is Relational (or why you can't escape SQL)

Let me reiterate Google has Mastered Time

Thursday, February 28, 13

Page 48: The Future of Big Data is Relational (or why you can't escape SQL)

What Differentiates Google Spanner?•Hierarchical,  Schema3zed  Tables

•Similar  to  Akiban’s  approach.

•Leads  to  some  interes3ng  possibili3es.

•Nested  Subqueries  and  Tree  Results

Thursday, February 28, 13

Page 49: The Future of Big Data is Relational (or why you can't escape SQL)

What Differentiates Google Spanner?

To reiterate:

* hierarchical, schematized tables* distributed “compute fabric” for data* Google has mastered Time* Google built a warp reactor

Thursday, February 28, 13

Page 50: The Future of Big Data is Relational (or why you can't escape SQL)

As goes Google so does the world... Translattice Drawn-to-Scale Akiban Impala

Several NewSQL companies quickly jumped on this train:- NuoDB- VoltDB

Yes, we’ve had Hive for a while, but these new initiatives resemble a more robust effort.

Thursday, February 28, 13

Page 51: The Future of Big Data is Relational (or why you can't escape SQL)

Translattice Translattice identifies itself as a database that resembles F1

It is a hosted database service which provides distributed transactions.

Translattice uses Paxos

They’ve extended Postgresql and emphasize customer control over data. A distributed, cloud-based database

Thursday, February 28, 13

Page 52: The Future of Big Data is Relational (or why you can't escape SQL)

Akiban Akiban’s approach to storage almost *exactly* matches the strategy Google uses in

Spanner.

Akiban lacks the distributed transaction capability of Spanner and F1, but they are working on developing the capability.

Akiban has implemented a query parser, optimizer, and execution engine atop a hierarchical approach to storage.

Thursday, February 28, 13

Page 53: The Future of Big Data is Relational (or why you can't escape SQL)

Drawn-to-Scale

Reports: the most similar to F1 in the market. Fault-tolerant in distributed environments

Created a Query Parser + Optimizer + Execution Engine atop a distributed “compute fabric”

No Paxos or Transactions... yet. To be released, shortly. Stay tuned.

Drawn to Scale aims to be an “installable” database. Not going the hosted route.

Data stored in HDFS/HBase.

Thursday, February 28, 13

Page 54: The Future of Big Data is Relational (or why you can't escape SQL)

So there.Big Data is turning into a Big Relational Database

Thursday, February 28, 13