Low latency for high throughput

Peter LawreyCEO of Higher Frequency Trading

JAX Finance 2015

Low Latency: The best way to high throughput

Peter Lawrey

Java Developer/Consultant for hedge fund and trading firms for 6 years.

Most answers for Java and JVM on stackoverflow.com

Founder of the Performance Java User’s Group.

Architect of Chronicle Software

Agenda

• Little’s law and concurrency

• Co-ordinated omission

• Why should you use less servers?

Little’s law

Little’s law states;

The long-term average number of customers in a stable system L is equal to the long-term average effective arrival rate, λ, multiplied by the (Palm-)average time a customer spends in the system, W; or expressed algebraically: L = λW

Little’s law as work.

The number of active workers must be at least the average arrival rate of tasks multiplied by the average time to complete those tasks.

workers >= tasks/second * seconds to perform task.

Or

throughput <= workers / latency.

Consequences of Little’s law

• If you have a problem with a high degree of independent tasks, you can throw more workers at the problem to handle the load. E.g. web services

• If you have a problem with a low degree of independent tasks, adding more workers will mean more will be idle. E.g. many trading systems. The solution is to reduce latency to increase throughput.

Consequences of Little’s law

• Average latency is a function, sometimes the inverse, of the throughput.

• Throughput focuses on the average experience. The worst case is often the ones which will hurt you, but averages are very good at hiding your worst cases. E.g. from long GC pauses.

• Testing with Co-ordinated omission also hides worst case latencies.

Co-ordinated omission

• A term coined by Gil Tene.

• Co-ordinated omission occurs when the system being tested is allowed to apply back pressure on the system doing the testing. When the tested system being tested is slow, it can effectively pause the test, esp. when averages or latency percentiles are considered.

Co-ordinated omission: Example

• A shop is open 10 hours a day between 8 AM and 6 PM.

• A customer comes every 5 minutes, waits to be served and leaves.

• When the shop keeper is there, he takes 1 minute to serve.

• But if he takes a 2 hour lunch break, how does this effect the average latency or the 98th percentile?

How not to measure latency.

• You have one person go to the shop and time how long she has to wait. Once per day she has to wait 2 hours and 1 minute, but the rest of the day it only takes 1 minute.

• The average of 97 tests is 2.2 minutes. Had the shop been open all day, there would be 120 tests, but one took 2 hours. Not great but doesn’t sound much worse than 1 minute.

• The 98th percentile is 1 minute.

Avoiding co-ordinated omission

• You have as many people as you need. Most of the time, only one is waiting, however over the lunch break, there is 31 people delayed 121, 117, 113, 109 … 5 mins.

• The average of 120 tests is 16.5 minutes wait time. This is much higher than the 2.2 minutes calculated previously.

• The 98th percentile is 111 minutes, instead of 1 minute in the previous test.

Why use less servers?

• You can buy commodity mid range servers with 38 cores and 512 GB of memory for a reasonable price. < £20K each.

• Increasing number of libraries support off heap storage allowing you to support much larger datasets in memory.

Why use less servers?

• Deploying to one servers lowers the cost of development. The cost of development is often higher than the cost of the hardware.

• Deploying to one server also reduces the network latency, increasing the throughput.

Even latencies you can’t see add upData passing Latency Human scale Throughput on at a

time

Method call Inlined: 0Real call: 50 ns.

Eye blink 20,000,000/sec

Shared memory 200 ns Mouse click 5,000,000/sec

SYSV Shared memory 2 µs Drop a phone. 500,000/sec

Low latency network 8 µs Flight a paper plane 125,000/sec

Typical LAN network 30 µs Half a minute 30,000/sec

Typical data grid system 800 µs Running three miles 1,250/sec

60 Hz power flickers 8 ms A football game 120/sec

4G request latency in UK 55 ms A summer’s day. 18/sec

Doesn’t the GC stop the world?

• The GC only pauses the JVM when it has some work to do. Produce less garbage and it will pause less often

• Produce less than 1 GB/hour of garbage and you can get less than one pause per day. (With a 24 GB Eden)

Do I need to avoid all objects?

• In Java 8 you can have very short lived objects placed on the stack. This requires your code to be inlined and escape analysis to kick in. When this happens, no garbage is created and the code is faster.

• You can have very long lived objects, provided you don’t have too much.

• The rest of your data you can place in native memory (off heap)

• You can create 1 GB/hour of garbage and still not GC

Low Latency with lots of Lambdas

Chronicle Wire is an API for generic serialization and deserialization. You determine what you want to read/write, but the exact wire format can be injected. This works for Yaml, Binary Yaml, and raw data. It will support XML, FIX, JSON and BSON.

This uses lambdas extensively but the objects associated can be eliminated.

Low Latency with lots of Lambdaswire.writeDocument(false, out ->

out.write(() -> "put")

.marshallable(m ->

m.write(() -> "key").int64(n)

.write(() -> "value").text(words[n])));

As Yaml

--- !!data

put: { key: 1, value: hello }

As Binary Yaml

⒗٠٠٠Ãput\u0082⒎٠٠٠⒈åhello

Next Steps

• Chronicle is open source so you can start right away!

• Working with clients to produce Chronicle Enterprise

• Support contract for Chronicle and consultancy

Q & A

Peter Lawrey

@PeterLawrey

http://chronicle.software

http://vanillajava.blogspot.com

http://chronicle.software/

http://vanillajava.blogspot.com/

Low latency for high throughput

Technology

Transcript of Low latency for high throughput