Adtech x Scala x Performance tuning

83
× Best Practice for Better Performance Scala Days 2015 San Francisco Un-conference 2015-03-19 @mogproject Ad Tech Performance Tuning Scala ×

Transcript of Adtech x Scala x Performance tuning

Page 1: Adtech x Scala x Performance tuning

×

~ Best Practice for Better Performance ~

Scala Days 2015 San Francisco Un-conference 2015-03-19 @mogproject

Ad Tech Performance Tuning

Scala×

Page 2: Adtech x Scala x Performance tuning

Agenda

About Demand Side Science

Introduction to Performance Tuning

Best Practice in Development

Japanese language version here: http://www.slideshare.net/mogproject/scala-41799241

Page 3: Adtech x Scala x Performance tuning

Yosuke Mizutani (@mogproject)

Joined Demand Side Science in April 2013(thanks to Scala Conference in Japan 2013) Full-stack engineer (want to be…) Background: 9-year infrastructure engineer

About Me

Page 4: Adtech x Scala x Performance tuning

http://about.me/mogproject

Page 5: Adtech x Scala x Performance tuning

http://mogproject.blogspot.jp

in Japanese Language

Page 6: Adtech x Scala x Performance tuning

http://demand-side-science.jp/blog

in Japanese Language

Page 7: Adtech x Scala x Performance tuning

http://functional-news.com

in Japanese Language

Page 8: Adtech x Scala x Performance tuning

Nov 2012Established Demand Side Science Inc.

Brief History of DSS

Demand× Side

Science×

Page 9: Adtech x Scala x Performance tuning

2013Developed private DSP package fractale

Brief History of DSS

Demand× Side

Platform×

Page 10: Adtech x Scala x Performance tuning

Advertiser’s side of realtime ads bidding (RTB)

What is DSP

Supply Side Platform

Page 11: Adtech x Scala x Performance tuning

Dec 2013Moved into the group of Opt, the e-marketing agency Oct 2014Released dynamic creative tool unis

Brief History of DSS

× ×

Page 12: Adtech x Scala x Performance tuning

unis is a third-party ad server which creates dynamic and/or personalized ads under the rules. http://www.opt.ne.jp/news/pr/detail/id=2492

unis

items on sale most popular items

fixed items

re-targeting

Page 13: Adtech x Scala x Performance tuning

With venture mind + advantage of Opt group …

Future of DSS

Demand× Side

Science×

Page 14: Adtech x Scala x Performance tuning

We will create various products based on Science!

Future of DSS

???× ???

Science×

Page 15: Adtech x Scala x Performance tuning

For everyone’s …

Future of DSS

Marketer × Publisher Consumer×

Page 16: Adtech x Scala x Performance tuning

happiness!

Future of DSS

Win × Win Win×

Page 17: Adtech x Scala x Performance tuning

We

DSS and Scala

Adopt Scala ×for all products

from the day of establishment×

Page 18: Adtech x Scala x Performance tuning

System Architecture ExampleRDBMS NOSQL

log storage cache

Log Aggregation Machine Learning

Cache Making etc.

Page 19: Adtech x Scala x Performance tuning

Today, I will not talk about JavaScript tuning.

System Architecture Example

Page 20: Adtech x Scala x Performance tuning

Agenda

About Demand Side Science

Introduction to Performance Tuning

Best Practice in Development

Page 21: Adtech x Scala x Performance tuning

Resolve an issue

Reduce infrastructure cost(e.g. Amazon Web Services)

Motivations

Page 22: Adtech x Scala x Performance tuning

Application goes wrong with high load

Bad latency under the specific condition

Slow batch execution than expectations

Slow development tools

Resolve an Issue

Page 23: Adtech x Scala x Performance tuning

Very important especially in ad tech industry

Cost tends to go bigger and bigger High traffic Need to response in few milli seconds Big database, big log data

Business requiresBenefit from mass delivery > Infra Investment

Reduce Infrastructure Cost

Page 24: Adtech x Scala x Performance tuning

You need to care about cost (≒ engineer’s time) and risk (possibility to cause new trouble) for performance tuning itself.

Don’t lose you goal

Scaling up/out of Infra can be the best solution, naively

Don’t want to be perfect

Page 25: Adtech x Scala x Performance tuning

We iterate

Basic of Performance Tuning

Measure metrics

× Find bottleneck

Try with hypothesis×Don't take erratic steps.

Page 26: Adtech x Scala x Performance tuning

http://en.wikipedia.org/wiki/Pareto_principle

“80% of a program’s processing time come from 20% of the code”

— Pareto Principle

Page 27: Adtech x Scala x Performance tuning

※CAUTION: This is my own impression

Bottle Neck in My Experience

others 1%

Network 4%

JVM parameter 5%

Library 5%

OS 10%

Scala 10%

Async・Thread 15%

Database (RDBMS/NOSQL)

50%

Page 28: Adtech x Scala x Performance tuning

What is I/O

Memory × Disk

Network×

Page 29: Adtech x Scala x Performance tuning

Approximate timing for various operations

http://norvig.com/21-days.html#answers

execute typical instruction 1/1,000,000,000 sec = 1 nanosec

fetch from L1 cache memory 0.5 nanosec

branch misprediction 5 nanosec

fetch from L2 cache memory 7 nanosec

Mutex lock/unlock 25 nanosec

fetch from main memory 100 nanosec

send 2K bytes over 1Gbps network 20,000 nanosec

read 1MB sequentially from memory 250,000 nanosec

fetch from new disk location (seek) 8,000,000 nanosec

read 1MB sequentially from disk 20,000,000 nanosec

send packet US to Europe and back 150 milliseconds = 150,000,000 nanosec

Page 30: Adtech x Scala x Performance tuning

If Typical Instruction Takes 1 second…

https://www.coursera.org/course/reactive week3-2

execute typical instruction 1 second

fetch from L1 cache memory 0.5 seconds

branch misprediction 5 seconds

fetch from L2 cache memory 7 seconds

Mutex lock/unlock ½ minute

fetch from main memory 1½ minute

send 2K bytes over 1Gbps network 5½ hours

read 1MB sequentially from memory 3 days

fetch from new disk location (seek) 13 weeks

read 1MB sequentially from disk 6½ months

send packet US to Europe and back 5 years

Page 31: Adtech x Scala x Performance tuning

A batch reads 1,000,000 files of 10KB from disk for each time.

Data size: 10KB × 1,000,000 ≒ 10GB

Horrible and True Story

Page 32: Adtech x Scala x Performance tuning

Assuming 1,000,000 seeks are needed,

Estimated time:8ms × 106 + 20ms × 10,000 ≒ 8,200 sec ≒ 2.5 h

If there is one file of 10GB and only one seek is needed,

Estimated time: 8ms × 1 + 20ms × 10,000 ≒ 200 sec ≒ 3.5 min

Horrible and True Story

Page 33: Adtech x Scala x Performance tuning

Have Respect for the Disk Head

http://en.wikipedia.org/wiki/Hard_disk_drive

Page 34: Adtech x Scala x Performance tuning

JVM Trade-offs

JVM Performance Triangle

Memory Footprint ↓

Throughput ↑ Latency ↓

longest pause ≒ time for Full GC

Page 35: Adtech x Scala x Performance tuning

In the other words…

JVM Performance Triangle

Compactness

Throughput Responsiveness

Page 36: Adtech x Scala x Performance tuning

C × T × R = a

JVM Performance Triangle

Tuning: vary C, T, R for fixed a

Optimization: increase a

Reference: Everything I ever learned about JVM performance tuning @twitter by Attila Szegedi

http://www.beyondlinux.com/files/pub/qconhangzhou2011/Everything%20I%20ever%20learned%20about%20JVM%20performance%20tuning%20@twitter%28Attila%20Szegedi%29.pdf

Page 37: Adtech x Scala x Performance tuning

Agenda

About Demand Side Science

Introduction to Performance Tuning

Best Practice in Development

Page 38: Adtech x Scala x Performance tuning

1. Requirement Definition / Feasibility 2. Basic Design 3. Detailed Design 4. Building Infrastructure / Coding 5. System Testing 6. System Operation / Maintenance

Development Process

Only topics related to performance will be covered.

Page 39: Adtech x Scala x Performance tuning

Make the agreement with stakeholders about performance requirement

Requirement Definition / Feasibility

How many user IDs internet users in Japan: 100 million unique browsers: 200 ~ x00 million will increase? data expiration cycle? type of devices / browsers? opt-out rate?

Page 40: Adtech x Scala x Performance tuning

Requirement Definition / Feasibility

Number of deliver requests for ads Number of impressions per month

In case 1 billion / month=> mean: 400 QPS (Query Per Second)=> if peak rate = 250%, then 1,000 QPS

For RTB, bid rate? win rate? Goal response time? Content size? Plans for increasing? How about Cookie Sync?

Page 41: Adtech x Scala x Performance tuning

Requirement Definition / Feasibility

Number of receiving trackers Timing of firing tracker Click rate? Conversion(*) rate?* A conversion occurs when the user performs the specific action that the advertiser has defined as the campaign goal.e.g. buying a product in an online store

Page 42: Adtech x Scala x Performance tuning

Requirement Definition / Feasibility

Requirement for aggregation Indicates to be aggregated

Is unique counting needed? Any exception rules?

Who and when secondary processing by ad agency?

Update interval Storage period

Page 43: Adtech x Scala x Performance tuning

Requirement Definition / Feasibility

Hard limit by business side Sales plan

Christmas selling? Annual sales target?

Total budget

Page 44: Adtech x Scala x Performance tuning

The most important thing is to provide numbers, although it is extremely difficult to approximate precisely in the turbulent world of ad tech.

Requirement Definition / Feasibility

Architecture design needs assumed value Performance testing needs numeric goal

Page 45: Adtech x Scala x Performance tuning

Architecture design Choose framework

Web framework Choose database

RDBMS NOSQL

Basic Design

Page 46: Adtech x Scala x Performance tuning

Threading model design Reduce blocking Future based

Callback & function composition Actor based

Message passing Thread pool design

We can’t know the appropriate thread pool size unless we complete performance testing in production.

Basic Design

Page 47: Adtech x Scala x Performance tuning

Database design Access pattern / Number of lookup Data size per one record

Create model of distribution when the size is not constant

Number of records Rate of growth / retention period

Memory usage At first, measure the performance of the database itself

Detailed Design

Page 48: Adtech x Scala x Performance tuning

Log design Consider compression ratio for disk usage

Cache design Some software needs the double of capacity for processing backup (e.g. Redis)

Detailed Design

Page 49: Adtech x Scala x Performance tuning

Simplicity and clarity come first

“It is far, far easier to make a correct program fast than it is to make a fast program correct”— C++ Coding Standards: 101 Rules, Guidelines, and Best Practices (C++ in-depth series)

Building Infrastructure / Coding

Page 50: Adtech x Scala x Performance tuning

— Donald Knuth

“Premature optimization is the root of all evil.”

Page 51: Adtech x Scala x Performance tuning

— Jon Bentley

“On the other hand, we cannot ignore efficiency”

Page 52: Adtech x Scala x Performance tuning

Avoid the algorithm which is worse than linear as possible

Measure, don’t guesshttp://en.wikipedia.org/wiki/Unix_philosophy

Building Infrastructure / Coding

Page 53: Adtech x Scala x Performance tuning

SBT Plugin for running OpenJDK JMH (Java Microbenchmark Harness: Benchmark tool for Java)https://github.com/ktoso/sbt-jmh

Micro Benchmark: sbt-jmh

Page 54: Adtech x Scala x Performance tuning

addSbtPlugin("pl.project13.scala" % "sbt-jmh" % "0.1.6")

Micro Benchmark: sbt-jmh

plugins.sbt

jmhSettingsbuild.sbt

import org.openjdk.jmh.annotations.Benchmark class YourBench { @Benchmark def yourFunc(): Unit = ??? // write code to measure}

YourBench.scala

Just put an annotation

Page 55: Adtech x Scala x Performance tuning

> run -i 3 -wi 3 -f 1 -t 1

Micro Benchmark: sbt-jmh

Run benchmark in the sbt console

Number of measurement iterations to do

Number of warmup iterations to do

How many times to forks a single benchmark

Number of worker threads to run with

Page 56: Adtech x Scala x Performance tuning

[info] Benchmark Mode Samples Score Score error Units[info] c.g.m.u.ContainsBench.listContains thrpt 3 41.033 25.573 ops/s[info] c.g.m.u.ContainsBench.setContains thrpt 3 6.810 1.569 ops/s

Micro Benchmark: sbt-jmh

Result (excerpted)

By default, throughput score will be displayed. (larger is better)

http://mogproject.blogspot.jp/2014/10/micro-benchmark-in-scala-using-sbt-jmh.html

Page 57: Adtech x Scala x Performance tuning

Scala Optimization Example

Use Scala collection correctly

Prefer recursion to function callby Prof. Martin Odersky in Scala Matsuri 2014

Try optimization libraries

Page 58: Adtech x Scala x Performance tuning

def f(xs: List[Int], acc: List[Int] = Nil): List[Int] = { if (xs.length < 4) { (xs.sum :: acc).reverse } else { val (y, ys) = xs.splitAt(4) f(ys, y.sum :: acc) }}

Horrible and True Story pt.2

Group by 4 elements of List[Int], thencalculate each sum respectively

scala> f((1 to 10).toList)res1: List[Int] = List(10, 26, 19)

Example

Page 59: Adtech x Scala x Performance tuning

Horrible and True Story pt.2

List#length takes time proportional to the length of the sequence

When the length of the parameter xs is n, time complexity of List#length is O(n) Implemented in LinearSeqOptimized#lengthhttps://github.com/scala/scala/blob/v2.11.4/src/library/scala/collection/LinearSeqOptimized.scala#L35-43

Page 60: Adtech x Scala x Performance tuning

Horrible and True Story pt.2

In function f,xs.length will be evaluated n / 4 + 1 times,so number of execution of f is also proportional to n

Therefore, time complexity of function f is O(n2) It becomes too slow with big n

Page 61: Adtech x Scala x Performance tuning

Horrible and True Story pt.2

For your information, the following one-liner does same work using built-in method

scala> (1 to 10).grouped(4).map(_.sum).toListres2: List[Int] = List(10, 26, 19)

Page 62: Adtech x Scala x Performance tuning

ScalaBlitz

Page 63: Adtech x Scala x Performance tuning

Library for optimising Scala collection(by using macro)

http://scala-blitz.github.io/

Presentation in Scala Days 2014https://parleys.com/play/53a7d2c6e4b0543940d9e549/chapter0/about

ScalaBlitz

Page 64: Adtech x Scala x Performance tuning

System feature testing Interface testing Performance testing Reliability testing Security testing Operation testing

System Testing

Page 65: Adtech x Scala x Performance tuning

Simple load testing

Scenario load testing mixed load with typical user operations

Aging test (continuously running test)

Performance Testing

Page 66: Adtech x Scala x Performance tuning

Apache attached Simple benchmark tool http://httpd.apache.org/docs/2.2/programs/ab.html

Adequate for naive requirements Latest version recommended(Amazon Linux pre-installed version’s bug made me sick)

Example

ab - Apache Bench

ab -C <CookieName=Value> -n <NumberOfRequests> -c <Concurrency> “<URL>“

Page 67: Adtech x Scala x Performance tuning

Result example (excerpted)

ab - Apache Bench

Benchmarking example.com (be patient)Completed 1200 requestsCompleted 2400 requests(略)Completed 10800 requestsCompleted 12000 requestsFinished 12000 requests(略)Concurrency Level: 200Time taken for tests: 7.365 secondsComplete requests: 12000Failed requests: 0Write errors: 0Total transferred: 166583579 bytesHTML transferred: 160331058 bytesRequests per second: 1629.31 [#/sec] (mean)Time per request: 122.751 [ms] (mean)Time per request: 0.614 [ms] (mean, across all concurrent requests)Transfer rate: 22087.90 [Kbytes/sec] received(略)Percentage of the requests served within a certain time (ms) 50% 116 66% 138 75% 146 80% 150 90% 161 95% 170 98% 185 99% 208 100% 308 (longest request)

Requests per second = QPS

Page 68: Adtech x Scala x Performance tuning

Load testing tool written in Scala http://gatling.io

Gatling

Page 69: Adtech x Scala x Performance tuning

An era of Apache JMeter has finished Say good bye to scenario making with GUI

With Gatling, You load write scenario with Scala DSL

Gatling

Page 70: Adtech x Scala x Performance tuning

Care for the resource of stressor side Resource of server (or PC) Network router (CPU) can be bottleneck

Don’t tune two or more parameters at one time

Leave change log and log files

Days for Testing and Tuning

Page 71: Adtech x Scala x Performance tuning

System Operation / Maintenance

Logging ×Anomaly Detection

Trends Visualization×

Page 72: Adtech x Scala x Performance tuning

Day-to-day logging and monitoring Application log GC log Profiler

Anomaly detection from several metrics Server resource (CPU, memory, disk, etc.) abnormal response code Latency

Trends visualization from several metrics

System Operation / Maintenance

Page 73: Adtech x Scala x Performance tuning

GC logAdd JVM options as follows

JVM Settings

-verbose:gc-Xloggc:<PathToTheLog>-XX:+PrintGCDetails-XX:+PrintGCDateStamps-XX:+UseGCLogFileRotation-XX:NumberOfGCLogFiles=10-XX:GCLogFileSize=10M

Page 74: Adtech x Scala x Performance tuning

— Real customer

“If someone doesn’t enable GC logging in production,

I shoot them!

http://www.oracle.com/technetwork/server-storage/ts-4887-159080.pdf p55

Page 75: Adtech x Scala x Performance tuning

JMX (Java Management eXtensions)Add JVM options as follows

JVM Settings

-Dcom.sun.management.jmxremote-Dcom.sun.management.jmxremote.port=<PORT NUMBER>-Dcom.sun.management.jmxremote.ssl=false-Dcom.sun.management.jmxremote.authenticate=false

Page 76: Adtech x Scala x Performance tuning

stdout / stderr Should redirect to file Should NOT throw away to /dev/null Result of thread dump(kill - 3 <PROCESS_ID>) will be written here

JVM Settings

Page 77: Adtech x Scala x Performance tuning

SLF4J + Profiler http://www.slf4j.org/extensions.html

Coding example

Profiler

import org.slf4j.profiler.Profiler

val profiler: Profiler = new Profiler(this.getClass.getSimpleName)

profiler.start(“A”)doA()

profiler.start(“B”)doB()

profiler.stop()logger.warn(profiler.toString)

Page 78: Adtech x Scala x Performance tuning

SLF4J + Profiler Output example

Example: Log the result of the profiler when timeout occurs

Profiler

+ Profiler [BASIC]|-- elapsed time [A] 220.487 milliseconds. |-- elapsed time [B] 2499.866 milliseconds. |-- elapsed time [OTHER] 3300.745 milliseconds. |-- Total [BASIC] 6022.568 milliseconds.

Page 79: Adtech x Scala x Performance tuning

For catching trends, not for anomaly detection Operation is also necessary not to look over the sign of change Not only for infrastructure /application, but business indicates Who uses the console?

System user System administrator Application developer Business manager

Trends Visualization

Page 80: Adtech x Scala x Performance tuning

Grafana (+Graphite)

Page 81: Adtech x Scala x Performance tuning

Graphite - http://graphite.readthedocs.org Manage and visualize numeric time-series data

Grafana - http://grafana.org/ Visualize Graphite data more stylish(or Kibana-like)

Grafana (+Graphite)

Page 82: Adtech x Scala x Performance tuning

×

~ Best Practice for Better Performance ~

Scala Days 2015 San Francisco Un-conference 2015-03-19 @mogproject

Thank very much! you

×

Page 83: Adtech x Scala x Performance tuning

"Yosuke Mizutani - Kanagawa, Japan | about.me" - http://about.me/mogproject "mog project" - http://mogproject.blogspot.jp/ "DSS Tech Blog - Demand Side Science ㈱ の技術ブログ" - http://demand-side-

science.jp/blog/ "FunctionalNews - 関数型言語ニュースサイト" - http://functional-news.com/

"『ザ・アドテクノロジー』~データマーケティングの基礎からアトリビューションの概念まで~ / 翔泳社 新刊のご紹介" - http://markezine.jp/book/adtechnology/

"オプト、ダイナミック・クリエイティブツール「unis」の提供開始 ~ パーソナライズ化された広告を自動生成し、広告効果の最大化を目指す ~ | インターネット広告代理店 オプト" - http://www.opt.ne.jp/news/pr/detail/id=2492

"The Scala Programming Language" - http://www.scala-lang.org/ "Finagle" - https://twitter.github.io/finagle/ "Play Framework - Build Modern & Scalable Web Apps with Java and Scala" - https://www.playframework.com/ "nginx" - http://nginx.org/ja/ "Fluentd | Open Source Data Collector" - http://www.fluentd.org/ "Javaパフォーマンスチューニング(1):Javaパフォーマンスチューニングのルール (1/2) - @IT" - http://www.atmarkit.co.jp/ait/articles/0501/29/news011.html

"パレートの法則 - Wikipedia" - http://ja.wikipedia.org/wiki/パレートの法則

"Teach Yourself Programming in Ten Years" - http://norvig.com/21-days.html#answers "企業が作る国際ネットワーク最前線 - [4]いまさら聞けない国際ネットワークの基礎知識:ITpro" - http://itpro.nikkeibp.co.jp/article/COLUMN/20100119/

343461/ "Coursera" - https://www.coursera.org/course/reactive "アースマラソン - Wikipedia" - http://ja.wikipedia.org/wiki/アースマラソン

"Hard disk drive - Wikipedia, the free encyclopedia" - http://en.wikipedia.org/wiki/Hard_disk_drive "Everything I ever learned about JVM performance tuning @twitter(Attila Szegedi).pdf" - http://www.beyondlinux.com/files/pub/qconhangzhou2011/Everything%20I%20ever%20learned%20about%20JVM%20performance%20tuning%20@twitter%28Attila%20Szegedi%29.pdf

"Amazon.co.jp: C++ Coding Standards―101のルール、ガイドライン、ベストプラクティス (C++ in-depth series): ハーブ サッター, アンドレイ アレキサンドレスク, 浜田 光之, Herb Sutter, Andrei Alexandrescu, 浜田 真理: 本" - http://

www.amazon.co.jp/gp/product/4894716860 "UNIX哲学 - Wikipedia" - http://ja.wikipedia.org/wiki/UNIX哲学

"ktoso/sbt-jmh" - https://github.com/ktoso/sbt-jmh "ScalaBlitz | ScalaBlitz" - http://scala-blitz.github.io/ "Parleys.com - Lightning-Fast Standard Collections With ScalaBlitz by Dmitry Petrashko" - https://parleys.com/play/53a7d2c6e4b0543940d9e549/chapter0/about "mog project: Micro Benchmark in Scala - Using sbt-jmh" - http://mogproject.blogspot.jp/2014/10/micro-benchmark-in-scala-using-sbt-jmh.html "Gatling Project, Stress Tool" - http://gatling.io/ "WEB+DB PRESS Vol.83|技術評論社" - http://gihyo.jp/magazine/wdpress/

archive/2014/vol83 "「Javaの鉱脈」でGatlingの記事を書きました — さにあらず" - http://

blog.satotaichi.info/gatling-is-awesome-loadtester "Garbage Collection Tuning in the Java HotSpot™ Virtual Machine" - http://www.oracle.com/technetwork/server-storage/ts-4887-159080.pdf "SLF4J extensions" - http://www.slf4j.org/extensions.html "Graphite Documentation — Graphite 0.10.0 documentation" - http://graphite.readthedocs.org/en/latest/ "Grafana - Graphite and InfluxDB Dashboard and graph composer" - http://grafana.org/ "Grafana - Grafana Play Home" - http://play.grafana.org/#/dashboard/db/grafana-play-home "不動産関係に使える 無料画像一覧" - http://free-realestate.org/information/

list.html "AI・EPSの無料イラストレーター素材なら無料イラスト素材.com" - http://www.無料イラスト素材.com/

"大体いい感じになるKeynoteテンプレート「Azusa」作った - MEMOGRAPHIX" -

http://memo.sanographix.net/post/82160791768

References