Design choices of golang for high scalability

53
Design Choices of Golang for High Scalability SeongJae Park <[email protected]>

Transcript of Design choices of golang for high scalability

Page 1: Design choices of golang for high scalability

Design Choices of Golang for High Scalability

SeongJae Park <[email protected]>

Page 2: Design choices of golang for high scalability

This work by SeongJae Park is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. To

view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/.

Page 3: Design choices of golang for high scalability

These slides were presented during GDG Seoul Meetup 201709(https://www.meetup.com/GDG-Seoul/events/242054608/)

Page 4: Design choices of golang for high scalability

Nice To Meet You

SeongJae Park

[email protected]

Part time linux kernel programmer at KOSSLAB

Page 5: Design choices of golang for high scalability

What Makes Golang So Special on Multicore?

● People says Go is a good choice for high performance and scalability● Why scalability is so important?● Why existing solutions are not sufficient?● What makes Go so special for the problems?● TL; DR: Goroutines, Dynamic stack management, and Integrated Poller

DISCLAIMER: This talk is based on Dave Chenny’s OSCON15 presentation (http://cdn.oreillystatic.com/en/assets/1/event/129/High%20performance%20servers%20without%20the%20event%20loop%20Presentation.pdf)

Page 6: Design choices of golang for high scalability

Why Scalability?A long time ago, in a galaxy far, far away...

Page 7: Design choices of golang for high scalability

Moore’s Law

https://www.karlrupp.net/wp-content/uploads/2015/06/35years.png

Page 8: Design choices of golang for high scalability

● Law: Number of transistors per square inch doubles roughly every 18 months

Moore’s Law

https://www.karlrupp.net/wp-content/uploads/2015/06/35years.png

# of transistors

Single thread perf

Clock speed

Power (Watts)

Number of cores

Page 9: Design choices of golang for high scalability

● Law: Number of transistors per square inch doubles roughly every 18 months● CPU vendors used the law to increase cpu clock speed; Only one thing that

programmers need to have for better performance was patience for free lunch

Moore’s Law

https://www.karlrupp.net/wp-content/uploads/2015/06/35years.png

# of transistors

Single thread perf

Clock speed

Power (Watts)

Number of cores

Page 10: Design choices of golang for high scalability

● Law: Number of transistors per square inch doubles roughly every 18 months● CPU vendors used the law to increase cpu clock speed; Only one thing that

programmers need to have for better performance was patience for free lunch● However, CPU clock speed stopped to increase over a decade ago

Moore’s Law

https://www.karlrupp.net/wp-content/uploads/2015/06/35years.png

# of transistors

Single thread perf

Clock speed

Power (Watts)

Number of cores

Page 11: Design choices of golang for high scalability

Why No Clock Speed?https://i.ytimg.com/vi/9S9vP2inD_U/maxresdefault.jpg

Page 12: Design choices of golang for high scalability

Why No Clock Speed?

● Electrons move between transistors for every clock(Clock speed is analogous to switch on/off speed in below circuit diagram)

http://fourthgradespace.weebly.com/uploads/1/3/3/9/13397069/2935717_orig.jpg

https://i.ytimg.com/vi/9S9vP2inD_U/maxresdefault.jpg

Page 13: Design choices of golang for high scalability

Why No Clock Speed?

● Electrons move between transistors for every clock(Clock speed is analogous to switch on/off speed in below circuit diagram)

● Moving a thing requires energy; We use electrical energy here

http://fourthgradespace.weebly.com/uploads/1/3/3/9/13397069/2935717_orig.jpg

https://i.ytimg.com/vi/9S9vP2inD_U/maxresdefault.jpg

Page 14: Design choices of golang for high scalability

Why No Clock Speed?

● Electrons move between transistors for every clock(Clock speed is analogous to switch on/off speed in below circuit diagram)

● Moving a thing requires energy; We use electrical energy here● Few of the electrical energy leaks from transformation to kinetic energy and

becomes heat energy; temperature goes high

http://fourthgradespace.weebly.com/uploads/1/3/3/9/13397069/2935717_orig.jpg

https://i.ytimg.com/vi/9S9vP2inD_U/maxresdefault.jpg

Page 15: Design choices of golang for high scalability

Why No Clock Speed?

● Electrons move between transistors for every clock(Clock speed is analogous to switch on/off speed in below circuit diagram)

● Moving a thing requires energy; We use electrical energy here● Few of the electrical energy leaks from transformation to kinetic energy and

becomes heat energy; temperature goes high● High temperature damages CPU

http://fourthgradespace.weebly.com/uploads/1/3/3/9/13397069/2935717_orig.jpg

https://i.ytimg.com/vi/9S9vP2inD_U/maxresdefault.jpg

Page 16: Design choices of golang for high scalability

Why No Clock Speed?

● Electrons move between transistors for every clock(Clock speed is analogous to switch on/off speed in below circuit diagram)

● Moving a thing requires energy; We use electrical energy here● Few of the electrical energy leaks from transformation to kinetic energy and

becomes heat energy; temperature goes high● High temperature damages CPU● In short, increasing clock speed results in amplified power consumption, heat

dissipation, and CPU damage

http://fourthgradespace.weebly.com/uploads/1/3/3/9/13397069/2935717_orig.jpg

https://i.ytimg.com/vi/9S9vP2inD_U/maxresdefault.jpg

Page 17: Design choices of golang for high scalability

Moore’s Law is Still There, Vendors Are Changed

● In same clock speed, two 0.5-square inch processors would consume power as similar as 1-square inch single processor(Total distance of electrons movement per clock would be similar)

● Vendors now, thus, prefer to supply multi-core processors

http://happierhuman.wpengine.netdna-cdn.com/wp-content/uploads/2012/11/One-cookie-vs-two-cookies.jpg

Page 18: Design choices of golang for high scalability

Parallelism is Not Free

● Multi-core system cannot help zero-concurrency programs● Just increasing concurrency does not guarantee proportional speedup;

Clumsy concurrency controls can make things even worse on multi-core● Go has made important design choices for highly scalable concurrency

control. Remainder of this talk will describe some of the choices

https://img.devrant.io/devrant/rant/r_373632_a3SmV.jpg

Page 19: Design choices of golang for high scalability

Context ManagementProcess? Thread? Goroutine!

Page 20: Design choices of golang for high scalability

Resource Sharing and Context

● Concurrent tasks share processors and memory(Number of tasks is usually larger than number of processors)

● To pause and resume an execution, need to manage context of the task○ Context in this context: point to next instruction, stack frames, data in registers, ...

https://headguruteacher.files.wordpress.com/2017/05/x20142711071202qitokro-s8uda-pagespeed-ic-afnisfpvf0.jpg?w=640

Page 21: Design choices of golang for high scalability

Process: Analogous to a room for lease

● Abstraction of an execution of given program● Process context switching require many expensive operations

https://www.youtube.com/watch?v=4OclkGRLuxw

Page 22: Design choices of golang for high scalability

Process: Analogous to a room for lease

● Abstraction of an execution of given program● Process context switching require many expensive operations

○ Finding out a process to run next, management of waiting / pending processes

https://www.youtube.com/watch?v=4OclkGRLuxw

Page 23: Design choices of golang for high scalability

Process: Analogous to a room for lease

● Abstraction of an execution of given program● Process context switching require many expensive operations

○ Finding out a process to run next, management of waiting / pending processes○ Back-up of current all CPU registers, restore all CPU registers to last backup of next process

https://www.youtube.com/watch?v=4OclkGRLuxw

Page 24: Design choices of golang for high scalability

Process: Analogous to a room for lease

● Abstraction of an execution of given program● Process context switching require many expensive operations

○ Finding out a process to run next, management of waiting / pending processes○ Back-up of current all CPU registers, restore all CPU registers to last backup of next process○ Flush virtual memory mapping cache (TLB)

https://www.youtube.com/watch?v=4OclkGRLuxw

Page 25: Design choices of golang for high scalability

Process: Analogous to a room for lease

● Abstraction of an execution of given program● Process context switching require many expensive operations

○ Finding out a process to run next, management of waiting / pending processes○ Back-up of current all CPU registers, restore all CPU registers to last backup of next process○ Flush virtual memory mapping cache (TLB)○ All above operations should be run in operating system kernel; it means context switch

between user mode and kernel mode

https://www.youtube.com/watch?v=4OclkGRLuxw

Page 26: Design choices of golang for high scalability

Thread: a.k.a Light-Weight Process

● Threads are similar with processes but they share address space● Because of address space sharing, thread context is smaller than process

context; Thread is faster than process for creation and switching● Still context switch overhead exists

https://www.topdraw.com/assets/uploads/2015/04/standing-desk.jpg

Page 27: Design choices of golang for high scalability

Goroutine

● Not thread, not coroutine, goroutine.● Major primitive of Go for concurrent task execution● Designed to have minimal context overhead only

http://edinburghopendata.info/wp-content/uploads/2015/05/141107-hackathon_18_d893499f2c13fe1fa05bd46252246b1e.jpg

Page 28: Design choices of golang for high scalability

Goroutine: Co-operative scheduling

● Cooperative scheduling minimizes context switching itself● Goroutines do context switch only in well-defined situations

http

s://r

eneg

adei

nc.c

om/w

p-co

nten

t/upl

oads

/201

6/05

/RIn

c-C

oope

ratio

n-19

69.jp

g

Page 29: Design choices of golang for high scalability

Goroutine: Co-operative scheduling

● Cooperative scheduling minimizes context switching itself● Goroutines do context switch only in well-defined situations

○ Channel send / receive operation

http

s://r

eneg

adei

nc.c

om/w

p-co

nten

t/upl

oads

/201

6/05

/RIn

c-C

oope

ratio

n-19

69.jp

g

Page 30: Design choices of golang for high scalability

Goroutine: Co-operative scheduling

● Cooperative scheduling minimizes context switching itself● Goroutines do context switch only in well-defined situations

○ Channel send / receive operation○ `go` statement

http

s://r

eneg

adei

nc.c

om/w

p-co

nten

t/upl

oads

/201

6/05

/RIn

c-C

oope

ratio

n-19

69.jp

g

Page 31: Design choices of golang for high scalability

Goroutine: Co-operative scheduling

● Cooperative scheduling minimizes context switching itself● Goroutines do context switch only in well-defined situations

○ Channel send / receive operation○ `go` statement○ Blocking system calls (file or network I/O)

http

s://r

eneg

adei

nc.c

om/w

p-co

nten

t/upl

oads

/201

6/05

/RIn

c-C

oope

ratio

n-19

69.jp

g

Page 32: Design choices of golang for high scalability

Goroutine: Co-operative scheduling

● Cooperative scheduling minimizes context switching itself● Goroutines do context switch only in well-defined situations

○ Channel send / receive operation○ `go` statement○ Blocking system calls (file or network I/O)○ Garbage collection

http

s://r

eneg

adei

nc.c

om/w

p-co

nten

t/upl

oads

/201

6/05

/RIn

c-C

oope

ratio

n-19

69.jp

g

Page 33: Design choices of golang for high scalability

Goroutine: Co-operative scheduling

● Cooperative scheduling minimizes context switching itself● Goroutines do context switch only in well-defined situations

○ Channel send / receive operation○ `go` statement○ Blocking system calls (file or network I/O)○ Garbage collection

● If goroutines are not cooperative, starvation is possible(https://gist.github.com/sjp38/dcdb6295e10f1cfe919b)

http

s://r

eneg

adei

nc.c

om/w

p-co

nten

t/upl

oads

/201

6/05

/RIn

c-C

oope

ratio

n-19

69.jp

g

Page 34: Design choices of golang for high scalability

Goroutine: Minimized Context

● In case of processes or threads, kernel should backup / restore entire registers because kernel doesn’t know which registers are actually in use

https://i.pinimg.com/originals/c3/38/5f/c3385f909b2d2c36877f7ad02f841471.jpg

Page 35: Design choices of golang for high scalability

Goroutine: Minimized Context

● In case of processes or threads, kernel should backup / restore entire registers because kernel doesn’t know which registers are actually in use

● Go compiler emit code for actually using register check and backup of them for the every context switching event

https://i.pinimg.com/originals/c3/38/5f/c3385f909b2d2c36877f7ad02f841471.jpg http://www.cohoots.info/wp-content/uploads/2017/07/coworking-space-Co-Hoots.jpg

Page 36: Design choices of golang for high scalability

Goroutine: User-space scheduling

● M goroutines are multiplexed onto N kernel threads by user space go runtime scheduler

● No transition between user mode and kernel mode

https://image.slidesharecdn.com/realtime-linux-140810101151-phpapp02/95/making-linux-do-hard-realtime-74-638.jpg?cb=1429570932

Page 37: Design choices of golang for high scalability

Goroutine: Minimized Context Switch Overhead

● Minimize context switching● Minimize size of context● No transition between user mode and kernel mode at all● As a result, Tens of thousands of goroutines in a single process are the norm

https://github.com/ashleymcnamara/gophers/blob/master/GOPHER_SHARE.png

Page 38: Design choices of golang for high scalability

Stack ManagementFinding optimal size of stack

Page 39: Design choices of golang for high scalability

● Stack is a storage for task’s call frame○ Each call frame stores where to return, parameters, local variables

● Should not be overlapped with other concurrent task’s stack

Stack

Parameters, Return address, local variables

StackFramePointer

StackPointer

Stack Frame

High

Low

Stack grows downside

Page 40: Design choices of golang for high scalability

Stack Management of Threads

● Threads allocate fixed size stack memory when created

http://docs.roguewave.com/legacy-hpp/thrug/images/stackallocation.gif

Page 41: Design choices of golang for high scalability

Stack Management of Threads

● Threads allocate fixed size stack memory when created● By default, 2 MiB On Linux/x86-32. With pthreads library NPTL

implementation, stack size can be specified in thread creation time

http://docs.roguewave.com/legacy-hpp/thrug/images/stackallocation.gif

Page 42: Design choices of golang for high scalability

Stack Management of Threads

● Threads allocate fixed size stack memory when created● By default, 2 MiB On Linux/x86-32. With pthreads library NPTL

implementation, stack size can be specified in thread creation time● Too large stack size could limit number of concurrent threads

http://docs.roguewave.com/legacy-hpp/thrug/images/stackallocation.gif

Page 43: Design choices of golang for high scalability

Stack Management of Goroutines

● Compiler knows how many stack size is required for a given function● Goroutine starts with very small stack● Just before a function call, Go checks whether current stack can commodate

the function’s stack size requirement; If not sufficient with current stack, increase the stack size

● The stack can be shrinked, too● As a result, goroutines can keep only necessary size of stack and allow

maximum concurrent goroutines

func f() {g()

}

go func() {

f();}()

Page 44: Design choices of golang for high scalability

Stack Management of Goroutines

● Compiler knows how many stack size is required for a given function● Goroutine starts with very small stack● Just before a function call, Go checks whether current stack can commodate

the function’s stack size requirement; If not sufficient with current stack, increase the stack size

● The stack can be shrinked, too● As a result, goroutines can keep only necessary size of stack and allow

maximum concurrent goroutines

func f() {g()

}

go func() {

f();}()

Compiler

f() requires 1KiB stack,g() requires 1.5KiB stack

Page 45: Design choices of golang for high scalability

Stack Management of Goroutines

● Compiler knows how many stack size is required for a given function● Goroutine starts with very small stack● Just before a function call, Go checks whether current stack can commodate

the function’s stack size requirement; If not sufficient with current stack, increase the stack size

● The stack can be shrinked, too● As a result, goroutines can keep only necessary size of stack and allow

maximum concurrent goroutines

func f() {g()

}

go func() {

f();}()

Compiler

f() requires 1KiB stack,g() requires 1.5KiB stack

Goroutine starts with 2KiB stack

Page 46: Design choices of golang for high scalability

Stack Management of Goroutines

● Compiler knows how many stack size is required for a given function● Goroutine starts with very small stack● Just before a function call, Go checks whether current stack can commodate

the function’s stack size requirement; If not sufficient with current stack, increase the stack size

● The stack can be shrinked, too● As a result, goroutines can keep only necessary size of stack and allow

maximum concurrent goroutines

func f() {g()

}

go func() {

f();}()

Compiler

f() requires 1KiB stack,g() requires 1.5KiB stack

Goroutine starts with 2KiB stack

f() will use 1KiB. Current stack (2KiB free) is enough

Page 47: Design choices of golang for high scalability

Stack Management of Goroutines

● Compiler knows how many stack size is required for a given function● Goroutine starts with very small stack● Just before a function call, Go checks whether current stack can commodate

the function’s stack size requirement; If not sufficient with current stack, increase the stack size

● The stack can be shrinked, too● As a result, goroutines can keep only necessary size of stack and allow

maximum concurrent goroutines

func f() {g()

}

go func() {

f();}()

Compiler

f() requires 1KiB stack,g() requires 1.5KiB stack

Goroutine starts with 2KiB stack

f() will use 1KiB. Current stack (2KiB free) is enough

g() will use 1.5KiB. Current stack (1KiB free) is not enough. Allocate bigger stack!

Page 48: Design choices of golang for high scalability

C10K Problemwithout EventLoop

Event? Threads? Goroutines and Integrated Poller!

Page 49: Design choices of golang for high scalability

C10K Problem

● How to hold 10,000 concurrent sessions● 10,000 threads for 10,000 sessions would incur high overhead● Event loop usually results in complex callback spaghetti code

https://www.youtube.com/watch?v=SgjAv1TnS5k

Page 50: Design choices of golang for high scalability

Integrated Poller: Goroutines Allocation

● Allocate 10,000 goroutines for 10,000 concurrent sessions;Don’t worry, goroutine creation is fast enough;tens of thousands of goroutines in single process is norm

● Goroutines waiting for events are just scheduled outGo scheduler would not increase number of threads under the hood because most of goroutines would scheduled out due to slow event completion time

https://github.com/ashleymcnamara/gophers/blob/master/GOPHER_MIC_DROP.png https://github.com/ashleymcnamara/gophers/blob/master/DRAWING_GOPHER.png

Page 51: Design choices of golang for high scalability

Integrated Poller: Polling and Scheduling

● Runtime of Go uses select / kqueue / epoll / IOCP to know which socket is ready instead of the goroutine for the socket

● As runtime knows which goroutine is waiting for the socket, runtime put the goroutine back on the same CPU as soon as the socket is ready

● In short, waiting for event and waking up appropriate goroutine is dedicated to Go runtime while

● As a result, gophers can enjoy Simple programming model and Appropriate context management overhead

https://talks.golang.org/2012/waza.slide#22

Page 52: Design choices of golang for high scalability

Conclusion

● Go is so special on multi-core system owing to its clever design choices● Goroutine is super cheap, fast for context management● Dynamic size stack management of goroutine allows more concurrency● Integrated Poller in Go help gophers to have only benefit of threads and event

loop

https://github.com/ashleymcnamara/gophers/blob/master/GOPHER_LEARN.png

Page 53: Design choices of golang for high scalability

Thank You