Know thy cost (or where performance problems lurk)

20

Transcript of Know thy cost (or where performance problems lurk)

RavenDB Conference 2016

Know thy cost(or where performance bugs lurk)

Federico [email protected]/redknightloistwitter.com/federicolois

What do I do?

• 10+ years building performance sensitive software.• 5+ years in highly parallel computation.

• CPU and GPU.• Machine Learning.• Computer Graphics.

• 3+ years at RavenDB.• Low level optimization.• Performance consultant.

Why do I care?

Performance IS-A Feature *

* Shamelessly stolen from well known (famous) developers like Jeff Atwood and Matt Warren

The recipe

1. Measure2. Measure! 3. Measure!!!

And then,

when you know what’s going on…

The recipe

MEASURE AGAIN!!!

Performance in RavenDB

• Many versions in the wild (2.5, 3.0, 3.5, soon 4.0)• Many different subsystems:

• Indexing, • Searching,• Storage,• Serialization,• Caching,• Client Services,• Wire Communication,• Etc.

And on top of all of it…

…your application code

…your architecture

…your hardware

The usual suspects…

When the s%# hit the fan…

• Nasty Indexes • and I have seen my fair share.

• Overloaded sessions• being there, done that

• Round trips to the server• Serializing a 1+ MB document

• again, and again, and again.

• Bulk operations in serial fashion• Guilty as charged

• Poor Hardware IO Performance• We love you Azure!!!

And then some more…

• Deleting Indexes is a O(n) operation• unless you are from the future and already using 4.0.• I dare you to try that on a 100M+ map-reduce index.

• Compression is costly• Yeah, you can ask that to IIS

• Using BSON is actually slower than JSON• Use the compression bundle with care.• No wonder it wont even exist in 4.0

• New indexes force you to scan the whole DB• Doing that while operating at cruise speed, can be a gamble.

Performance is a feature…

…because you have to design for performance.

Designing for Performance(the default advise)

• Don’t do more roundtrips than needed.• Use bulk operations when possible.• Prefer few broad indexes than many narrow ones.• Beware of caching when in low memory systems.• Instead of multiple Load<T>(id) use Load<T>(idArray).

• or better Lazy operations.

Designing for Performance(the default advise)

• If possible, shard your data after 100s of GB.• Exploit well designed IDs and start-with APIs.• Indexing DateTime is slower than indexing ticks.• LoadDocument can be pretty bad for your health.

Designing for Performance(the not so default advise)

• Your deployment environment matter.• Know your IO patterns• Exploit the asymetric profile of your hardware

• What your OS do matter• Help him, help you.

• If you know data will not change• Isolate and summarize it.

• Different operations can exploit different configuration.• Monitor your indexes size per database.

Questions?

Get in

TOUCH

Let’s do something great together.

[email protected]

@Corvalius

/Corvalius

Av. Federico Lacroze 2352

7th Floor, Buenos Aires, Argentina

T. (+54 11) 4772-0650

c o r v a l i u s . c o m