Essentials of Multithreaded System Programming in C++

Post on 11-Nov-2014

6.469 views 4 download




Transcript of Essentials of Multithreaded System Programming in C++


Shuo Chen2011/02


Shuo Chen (



Challenges in multithreaded system programming Thread safety of C and C++ libraries RAII and fork() fork() and signal handling in multithreaded programs


Shuo Chen (


Audience: C++ programmers

Familiar with Pthreads and Sockets API Knows thread safety, deadlock, race condition, etc.

In a word: read through APUE2e and UNP3e (vol. 1) by W. Richard Stevens et al.

All discussions are based on Linux 2.6.x, x >= 28 There are new syscalls, eg signalfd, eventfd, and timerfd x86 and x64 platforms


Shuo Chen (


Multi-threaded system programming

Multithreading is inevitable in this multi-core era The difficulties are not learning synchronization

primitives (mutexes, condition variables) ~10 functions are sufficient to do it right

But understanding interactions between existing system calls and library functions

Understands how threads affect system design Use it wisely and effectively Avoid common pitfalls and fallacies



11 essential Pthreads functions

11 out of 110+ pthreads functions 2 -> create and join threads 4 -> init/destroy, lock/unlock mutexes 5 -> init/destroy, wait/signal/broadcast cond vars Think twice if you need more

Some are okay, eg. once and key, maybe rwlock Some are bad, eg. cancel and kill, semaphores

Check muduo/base for encapsulation in C++ click thread

2011/02 Shuo Chen (

Shuo Chen (


An asynchronous world

Never assume the sequence of events without proper synchronization. Knows happens-before relation, memory visibility, etc.

The effect of an interaction between two [thread]s must be independent of the speed at which it is carried out. --- Brinch Hansen 1973



Standards and practices

Although the latest official standards of C and C++ languages (C99 and C++03) do not say a word about process or thread

We write multi-process and/or multi-threaded C/C++ programs in real life, as a real-world need

We can’t wait it to be standardized, as standards usually fall behind practices for years btw, if there are not real life multi-threads programs ,

how do people what/how to standardize? We adhere to some de facto standards

A lot simpler if we focus on one hardware and one OS2011/02 Shuo Chen (

Shuo Chen (


Thread identifier on Linux

Use pid_t as thread id, instead of pthread_t, on Linux pthread_t thid = pthread_self(), thid is opaque (uintptr_t) pid_t tid = ::gettid(), tid is task id, usually a small integer /proc/tid/, /proc/pid/task/tid/, ps, top all work

fine How to implement gettid() efficiently? Thread local?

gettid(2) is a syscall, but the output should never change getpid(2) caches the result, should gettid() do the same? What if fork(), will it caches the old value in child proc? How about pthread_atfork() to clear it up? Check muduo/base/ for details



Creation of threads

A library should not create its own ‘background’ thread without prior informed consent Makes a program non-forkable

Never create thread before main() Avoid creating thread in ctor of static or global object Breaks static objects constructing, eg. protobuf registering

The number of threads created should be independentof system load, eg. # of connections, # of requests otherwise non-scalable

Reuse threads, by assign multiple roles to it Doing IO and timer with muduo EventLoop class For simple task, do it within IO callbacks in IO threads

2011/02 Shuo Chen (

Shuo Chen (


Three ways of termination

Natural death – return from thread function, good Suicide – call pthread_exit() Mudered – killed by pthread_cancel()

Rule: let it die, never suicide or murder a thread Why? inherently deadlock-prone: no chance to unlock Design your program so that a thread can be waken up

and safely exits For reference

Java Thread.{stop, suspend and destroy} are deprecated Boost Threads doesn’t provide thread::cancel()


Shuo Chen (


pthread_cancel() and C++

In C, we have concept of ‘cancellation point’ In C++, pthread_cancel() throws an exception in that

thread, helps unwinding objects on stack The exception must reach the outmost function,

otherwise core dump:FATAL: exception not rethrown

Aborted (core dumped)

Always rethrow in catch(…) cause Ulrich Drepper “Cancellation and C++ Exceptions”

Better: never cancel or kill a thread



exit() is not thread safe in C++

exit() destructs static or global objects, (_exit() doesn’t) The destructor may try to hold a lock The caller function may have held the same lock already End up in a dead lock

Check following code for an example of dead lock How to quit a multi-threaded program safely?

An irregular but simple solution: make a process killable, eg. p.29

It’s not fault of exit(), but static or global objects Try to avoid static or global objects in C++, except for PODs

2011/02 Shuo Chen (

Shuo Chen (


Thread local __thread in g++

Thread safe by natural, unless escaped to other thread More efficient implementation, than pthread_key_t

See “ELF Handling For Thread-Local Storage” In C++, must be initialized with constant-expression

No __thread string t_obj("Chen Shuo"); No __thread string* t_obj = new string; Only __thread string* t_obj = NULL; More rules:

Use pthread_key_t if you want auto destruction2011/02

Shuo Chen (


Use non-recursive mutex only

A basic assumption of holding a mutex Once I lock it, I can modify the guarded object safely Which is not true for recursive mutex, eg.

Recursive mutexes by David Butenhof

Recursive locks - a blessing or a curse?



Shuo Chen (


Impacts of introducing threads

Threading is a late patch to OS kernel Unix kernel and API formed in early 1970s First implementation of threads emerged in early 1990s Breaks lots of assumptions made during the 20 years

Library functions with side effects must be revisited malloc/free, fread/fseek can be made thread-safe with locks

Functions that return or use static allocated space are not thread safe but may have thread-safe variants asctime_r, ctime_r, gmtime_r, rand_r, stderror_r, strtok_r

errno is not an ‘extern int’, but a per-thread value extern int *__errno_location(void); #define errno (*__errno_location())2011/02

Shuo Chen (


Thread safety of C library

Individual system calls must be thread safe Be caution of interfering of same file descriptor from

multiple threads Most of glibc library functions are thread safe nowadays

Counterintuitively, Posix standards lists functions thatare not required to be thread safe, it's a black list.

2.9.1 Thread-Safety :

All functions defined by this volume of POSIX.1-2008 shall be thread-safe, except that the following functions need not be thread-safe. Notably, getenv/putenv/setenv/system() are not safe


Shuo Chen (


FILE* functions are thread safe

Read ‘man flockfile’, but they are not composable, eg. fseek(), followed by fread()

The file position may change during the course by a different thread Wrap with flockfile(FILE*) and funlockfile(FILE*)

Same applies to lseek(2) and read(2), but how to lock? Use pread(2) instead, which doesn’t change the file offset

In general, a function that calls two thread-safe functions is not guaranteed to be thread-safe Just like exception-safety, thread-safety is not composable


Shuo Chen (


Thread safety is not composable

A solution works in single-threaded program may not apply to multi-threaded program. Any solution calls two or more thread safe function are not

necessarily correct in multi-threaded program What’s the time in London now? Program runs in New Yorkstring oldTz = getenv("TZ"); // save TZ

putenv("TZ=Europe/London"); tzset(); // set TZ to London

struct tm localTimeInLN = *localtime(time(NULL));

setenv("TZ", oldTz.c_str(), 1); tzset(); // restore old TZ

This code impacts localtime() in other threads Thread safe functions are not composable unless you

carefully design the interface and interactions2011/02

Shuo Chen (


Thread safety of C++ std library

Although not required by the standard, the de facto says Unshared objects are independent: Two threads can freely

use different objects without any special action on the caller's part. We call it "same level as built-in types."

This applies to STL containers like map, vector, string Pure functions are safe, eg. Most of STL algorithms.

The global cin/cout objects are shared by threads, and are not thread safe. Moreover, they can't be made safe cout << a << b; cout.operator<<(a).operator<<(b); Two function calls can be interrupted by another thread Use printf(3) instead, it's thread safe and atomic.

Allocators must be thread safe, as they are shared2011/02

Shuo Chen (


Thread-Safe vs. Thread-Efficient

printf(3) and malloc(3) are thread safe, but not necessarily efficient enough, esp. on multi-cores

printf(3) locks FILE* stdout, synchronizes threads not good for multi-threaded logging, we need a better lib

your default malloc(3) may not optimized for multi-threads and multi-cores it may lock global heap for each allocation try tcmalloc, Google's thread-cache malloc see Intel. Is your memory management multi-core ready?



Operate one fd in one thread

Although system calls of file descriptors are safe What if a thread close a fd when other thread is block

reading it? What happens if a thread add a fd to epoll watch list

while other thread is epoll_wait()ing it? What happens if two threads poll same fd, and find it

readable simultaneously? What if two threads read the same TCP socket but each

get partial data? How do you tell which part comes first? Rule: all operations on one file descriptor should

happen in one thread, make your life a lot easier2011/02 Shuo Chen (

Shuo Chen (


File descriptors in threads

File descriptors are small integers, unlike HANDLE When create a new fd, kernel picks the lowest unused one

Higher possibility of cross-talk, if careless, eg. A fd shared by two threads The first thread have just close()d it The second is about to read() it But a third thread happened to create a new fd with same

id (the lowest available int reused) during the period What does the second thread read from? Any other impact?

Solution: manage resource with RAII idiom And use the usual technique to manage object life cycles


Shuo Chen (


C++ and fork()

A object could construct once but destruct twiceint main()


Foo foo; // call 'Foo::Foo'

fork(); // fork to two process

// call 'Foo::~Foo' in parent *and* child processes


It might be a problem, if Foo owns some resource that is not inherited by child process Again, avoid static or global objects in C++ In child process, the object may not be properly initialized

A global muduo::Timestamp startTime(now()) is wrong2011/02


RAII and fork()

fork() doesn't copy all state Open file descriptors are inherited by child process

But the offset of file are independent The child does not inherit

its parent's memory locks (mlock(2), mlockall(2)) record locks from its parent (fcntl(2)) timers from its parent (setitimer(2), alarm(2),

timer_create(2)), and others So the RAII idiom may not work well in fork()ed process

A RAII class that wraps timer_create/timer_delete in ctor/dtor may fail in child process after fork()

Use pthread_atfork() as the last resort2011/02 Shuo Chen (

Shuo Chen (


C++ and threads

Use scoped lock guard only, check muduo/base/Mutex.h Don't allow exceptions to propagate across module

boundaries don't let exception propagate out of the thread main

function, catch all exceptions in the outer-most function But, rethrow the one of pthread_cancel(), as we said before

Don't allow exceptions to propagate out of your callback, esp. callbacks from C library, eg. the init_routine registered to pthread_once()

Better: don't use exception in C++


Shuo Chen (


Threads and fork()

The fork() model doesn’t fit well in threads A fundamental flaw of Posix OSes, as other threads

disappear in child, the state is not consistent in child proc After fork a multi-threaded program you may only call

async-signal-safe functions in child, as if in signal handler malloc() is not safe, other thread may hold the lock when

fork()ing, and no chance to unlock in the new process So does printf(), pthread_* and others.

The only safe way to use fork() in a multi-threaded program is calling exec() immediately in child process And make sure set close-on-exec flag on every file

descriptors in parent process for security reasons.2011/02

Shuo Chen (


Signals and threads

The whole Posix signal mechanism is a shit Only async-signal-safe functions can be called in

signal handler, also called 'reentrant functions' Most of the functions are not async-signal-safe, except

those listed in Posix standards, so it's a white list 'man 7 signal' to get the list on Linux None of pthread_* are not async-signal-safe, you can't

notify a cond var or lock a mutex in signal handler Surprisely, gettimeofday(2) is not async-signal-safe


Shuo Chen (


Deal with signals in MT programs

Rule 1: do not use signal don't use it as IPC, eg. SIGUSR1, SIGUSR2, SIGINT, SIGHUP don't use library functions built upon signals, eg. alarm,

sleep, usleep, timer_create, etc. Rule 2: when you absolutely need, convert an async signal

to synchronous file descriptor readable event use signalfd in high Linux kernel version

Normally, the set of signals to be received via the file descriptor should be blocked using pthread_sigmask(3), to prevent the signals being handled according to their default dispositions.

or open a pipe(2), write(2) one byte in signal handler, and read(2) or poll(2) it in main thread2011/02

Shuo Chen (


Other resources Seven posts in


Shuo Chen (


To be continued

Essential of non-blocking network programming in C++ Birth of a reactor

– design and implementation of Muduo


Shuo Chen (


Avoid static or global objects

Except for PODs
