GOTO 2016_real_final

33
Monolithic Batch Goes Microservice Streaming A story about one transformation Charles Tye & Anton Polyakov

Transcript of GOTO 2016_real_final

Monolithic Batch Goes Microservice StreamingA story about one transformation

Charles Tye & Anton Polyakov

Who are We?

3 •

Anton Polyakov

Head of Application Development

2 years in Nordea

Charles Tye

Head of Core Services & Risk IT

17 years in Nordea

Develop solutions forMarket RiskCredit Risk

Liquidity RiskStress Testing

Messaging

Together with around 70 other people from all over the world

What We Do

Market Risk

4 •

The high level view

Quantify potential losses and exposures

Do many small risks add up to a big risk?

Can risks combine in unusual and unexpected ways?

Market Risk

5 •

Line of Defence

Protect Nordea and our customers

Daily internal reporting and external reporting to regulators

Independent function

Analysis and insight into the sources of risk

Control of risk

Management of capital

Examples of Risk Analysis

6 •

Value at Risk

Look at last 2 years of market history

Average of the worst 1% of outcomes

Simulate if the same thing happened again today.

Highly non-linear but requirement to drill in and find the drivers

Examples of Risk Analysis

7 •

Stress Scenarios

“Black Swan” worst case scenarios

Unexpected outcomes from future events

Example: Brexit

Simulate if it happened

An Interesting Technology Problem

8 •

Consistent

Non-linear

Volume

Speed

Risk Analysis: Everything has to be included = know when you are complete

Risk does not sum over hierarchies

Drill-down is non trivial

Traditional OLAP aggregate & incrementdoesn’t work10,000,000,000,000

Reactive nearreal-time calculations

Streaming dataFast corrections and “what-if”

Interactive sub-second queries on hugedata sets

Challenge No 1.

Find the seams

Break it up

Reusable components

Replace a piece at a time

9 •

Spaghetti

Challenge No 2.

10 •

Develop a new service

Integrate into the legacy system

Reconcile the output

Find and fix legacy bugs

Fight complification

Challenge No 3.

Batch is synchronous state transfer. The only way to achieve consistency?

11 •

Consistency is seriously hard to combine with streaming

Event sourced and streaming approach

More robust, scalable and faster, especially for recovery

Comes with a cost

Challenge No 4.

Legacy SQL was slow

12 •

Partitions and horizontally scales out across commodity hardware.

Tougher challenges on terabyte-scale hardware due to NUMA limitations. Some cubes already > 200gb and larger ones planned.

Replace with in-memory aggregation

Aggregate billions of scenarios in-memory and pre-compute total vectors over hierarchies (linear)

Non-linear measures computed lazily

Reactive and continuous queries

Solution: Microservices!Well almost…

Single responsibility – replace pieces of legacy from the inside out

Self contained with business functional boundaries• Independent and rapid development – team owns the whole stack• Organisationally scalable – horizontally scale your teams

Flexible and maintainable – evolve the architecture

Smart endpoints and dumb pipes

Innovation and short lifecycles

13 •

The problem• Business:

• Multi-model Market Risk calculator for Nordea portfolio• VaR on different organization levels with 5-6 different models in parallel

• IT:• 7000 CPU hours of grid calculation• More than 4000 SQL jobs

• Graph with more than 10000 edges• Nightly batch flow

14 •

How did it look like?

• Well, you know. 10 years of development

• In SQL

• No refactoring(who needs it?)

15 •

Precisely, how did it look?

16 •

Logical architecture

Monolith staged app

17 •

Now a little of complication

Sloo-o-o-ow Fat. So it breaksCan be parallel?

18 •

So what to do?

We all know the answer probably (since we are at this section )

- Find logically isolated blocks- Keep an eye on non-functional aspect- Think of how they communicate- Think about what happens if something dies

19 •

Not quite a “classical” microservices…or?

produce enrich aggregate

- Request/response is not feasible- Synchronous interaction is too long- Some results are expensive to reproduce

20 •

So we need…

A middleware which

- “Glues” services together- Caches important results- Serves as a coordinator and work distributor

21 •

Scale out

Fast pub/sub Queues and setspull and dedup

Distributed locks

22 •

Scale out

Fast pub/sub Queues and setspull and dedup

Distributed locks

Locks? Who needs locks?

23 •

store store store

Pub/sub messaging as notifier

Producer Enricher Aggregator

consumer

Redis pub/sub

24 •

But…

25 •

There are two main problems in distributed messaging:2) Guarantee that each message is only delivered once1) Guarantee messages order2) Guarantee that each message is only delivered once

Enricher

Redis pub/sub

Incoming queue

Processing queue

EnricherProducer

store

Queues with atomic operations

BRPOPLPUSH

26 •

Sets and Hmaps – all good for dedup

In eventually consistent world dedup is your best friend

store - HSET

Enricher

Multiple inserts due to recovery

Consistent state due to dedup

27 •

So how to scale out?

logically concurrently

Enricher <type A>Enricher <type B>

Enricher <type X>

Redis pub/sub

Aggregator <day 1>Aggregator <day 2>

Aggregator <day 3>

Steal workFilter my events

RedLock + TTL

28 •

Demo

store store store

Producer Enricher Aggregator

consumer

Redis pub/sub

Incoming queue

Processing queue

RedLock + TTL

29 •

The Result and What We LearnedSuccess!

• Aggregate and produce risk: 5 hours → 30 mins• Corrections: 40 mins → 1 second• Earlier deliveries – more time to manage the risks• Faster recovery from problems• Happy risk managers

Important (and painful) to integrate new services into the existing system

Consistency is hard to combine with streaming (subject of another talk maybe)

When distributing remember first law of distributed objects architecture(do you remember it?)

30 •

The Result and What We Learned

First Law of Distributed Object Design:

"don't distribute your objects"

31 •

And of course…

32 •

https://dk.linkedin.com/in/charles-tye-a8aa88b

https://github.com/parallelstream/