Patella railsconf 2012

Post on 15-Jan-2015

2.228 views 0 download

Tags:

description

This talk will feature: memcache, resque, a bit of metaprogramming, a look at caching in the wild and code that fixes some usual problems, and a fairly epic SQL query with some nice Postgres features you should know about.

Transcript of Patella railsconf 2012

PATELLAMEMOIZATION INTO MEMCACHED DONE IN RESQUE

JEFF DWYER

PATIENTSLIKEME

@JDWYAH

TODAY

Engineers will never be successful if we are the brake on innovation.

Technique for innovating safely

Learn a bit about meta-programming

TODAY

Setup the problem

Sketch the solution

Nitty Gritty Details

TODAY

Setup the problem

Sketch the solution

Nitty Gritty Details

1) NIH PRESENTATION IN 4 WEEKS!

Integrate clinicaltrials.gov into our site

Search by trial type

Search by trial phase

Search by trial conditions mapped from Mesh to Meddra

Search by trial facility locations…

• Location search…

WE HAVE A CHOICE

WHAT IS RIGHT

PostGIS spatial database extensions for PostgreSQL

MongoDB built in support for two dimensional geospatial indexes

AND WHAT IS EASYsqrt(pow(69.1 * (clinical_trial_locations.lat - 40.948073),2) + pow(53.0 * (clinical_trial_locations.lng - -90.36871),2)) AS distance

CHOOSE THE EASY!

CHOOSE THE EASY!

Who knows if location is even important?

Who knows if this project is even important?

MongoDB requires dev setup, automated staging setup, production setup, monitoring.

BUT, OH GOD THE HUMANITY

<query plan pic>

BUT, OH GOD THE HUMANITY

<query plan pic>

2) PATIENT LIKE ME SEARCH

2) PATIENT LIKE ME SEARCH

PATIENT SEARCH RANKING

Very basic search

Plus very complex ordering

Not as many great solutions in this space

N^2 similarity matrix @ 100k patients about 4 TB

And did I mention it’s N^2?

Postgres is an amazingly viable solution.

ELEGANT CODE…

LOVELY SQL…

BUT IT’S JUST THIS SIDE OF ‘REAL-TIME’

One second queries just don’t fly.

And oh, yeah 16 people hitting it at the same time would clobber the servers.

3) A FORWARD LOOKING TIME MACHINE

Maybe those were aberrations?

Crazy right?

A FORWARD LOOKING TIME MACHINE

AND HERE’S MY CEO PROMISING IT AT TED

A FORWARD LOOKING TIME MACHINE

STEPPING BACK

Conflict

• Relational data is most easily queried relationally. • Relational queries don’t necessarily scale and stay in the

millisecond range

• Denormalized queries & special solutions scale• But take longer to implement

• (note) This isn’t just SQL, I’m talking about anything slow

We want to experiment/fail fast

• But we don’t want…

DON’T WANT TO LET THIS:

TURN INTO:

TURN INTO: :-C

TURN INTO: :-C

TODAY

Setup the problem

Sketch the solution

Nitty Gritty Details

WHAT WE WANT

Trivially easy way for developers to declare that some methods are not to be run without adult supervision.

Consistent framework so that ops doesn’t need to be afraid of new, sometimes expensive experiments.

SOLUTION SPACE

Doing it right all the time

• Too slow and expensive• Slows innovation

SOLUTION SPACE

Memoization

• Brilliant• Functional Programming Nirvana• No cache-key shenanagins

• But also no expiry…

• There’s just one thing…• It only works in a single request

SOLUTION SPACE

Memcached

• Great• Simple to setup.

• Could be simpler. Handmade cache keys feels wrong.

• But it doesn’t solve our :-C problem.• The first request still slams the server.

• So you do some cache warming thing…• But this is a PITA again.

WHAT COULD MAKE THIS SIMPLER?

Remove one constraint.

A basic Rails.cache.fetch guarantees you a result

• But no performance guarantee

Flip that deal around.

• Guarantee performance• Don’t guarantee a result• It’s ok not to know the answer!

BUT IT NEEDS A NAME!

TECHNOLOGIES!

Memoization into Memcached with everything calculated in Resque.

TODAY

Setup the problem

Sketch the solution

Nitty Gritty Details

PATELLA DEVELOPER INTERFACE

SEND LATER

Super easy way to just do something later while in the same context.

Most workers are real boring.

Single worker for suffices for many background jobs.

Makes testing/development easier by bypassing Resque in configuration.

AR extension. Coordinates logging / monitoring.

SENDLATER

User.send_later :expensive, arg1, arag2

SENDLATER RESQUE WORKER

MEMOIZATION

SendLater gets things calculated in Resque, but that’s step 1.

We still need:

Memoization.

Stored in memcached.

THIS IS NOT A GOOD SLIDE

PATELLA RESULT

THE METHOD

WITH PATELLA

THE REPLACEMENT

THE ONE THAT DOES THE WORK

MAYBE IT’S BETTER NOW?

DOG PILE

THE REPLACEMENT

DOG PILE

LONG ARGUMENTS

LONG ARGUMENTS

LONG ARGUMENTS

SOFT EXPIRATION

Memcached is great, but it doesn’t tell you when something expires.

Our strategy was to add a ‘soft_expiry’

This gets stored along with the result.

Then recalculate if soft_expiry < now()

ABJALWAYS BE JSON

Beware putting not JSON in memcached

You really don’t want to know

META IS MAGIC

REAL LIFE

PRETTY BORING

Except that it works.

Round 1: Major Pain Points

Round 2: Magic Scaling Sprinkles

Super alpha gem here:

https://github.com/kbrock/patella

Alternative https://github.com/csquared/rack-worker

Very REST-ish, request based.

JOE@JOERODRIGUEZ

AMY@AMYNEWELL

KEENAN@KBROCK

WINFIELD@WPETERSON