High Performance Logging System for Embedded UNIX and GNU/Linux Applications
description
Transcript of High Performance Logging System for Embedded UNIX and GNU/Linux Applications
High Performance Logging System for Embedded UNIX and GNU/Linux Applications
IEEE RTCSA 2013 (8/21/13)Cisco Systems
Jaein Jeong
2 / 25
Introduction- Embedded UNIX in many places
File System
KERNEL
USER
Buffer
AppProcessAppProcess
AppProcess
…
log
log
logsyslogd
syslog
Traditional UNIX Logging System
3 / 25
Problem Statement- Apps slow down w. large amount of logging
• Long latency to logging daemon• Inefficiency of unbuffered writes to flash FS• Long latency even with output buffering
FlashFile System
KERNEL
USER
Buffer
AppProcessAppProcess
AppProcess
…
log
log
logsyslogd
syslog FlashFile System
KERNEL
USER
Buffer
AppProcessAppProcess
AppProcess
…
log
log
logsyslogd
syslog FlashFile System
KERNEL
USER
Buffer
AppProcessAppProcess
AppProcess
…
log
log
logsyslogd
syslog FlashFile System
KERNEL
USER
Buffer
AppProcessAppProcess
AppProcess
…
log
log
logsyslogd
syslog FlashFile System
KERNEL
USER
Buffer
AppProcessAppProcess
AppProcess
…
log
log
logsyslogd
syslog
FlashLogger
Named pipe
4 / 25
Our Approach
• Faster Message Transfer• Compatibility with Existing Logging Apps• Destination-Aware Message Formatting
5 / 25
Organization
• Related Work for UNIX Logging Systems• Background– Cisco UCS and Virtual Interface Card (VIC)– Evolution of VIC Logging System
• Design Requirements and Implementation• Evaluation and Optimization• Conclusion
6 / 25
Related Work- Logging Methods for UNIX Apps
• Not designed for embedded/flash logging– Slow msg passing (msg copying over kernel)– Unbuffered message writes
Syslog• Introduced in early 80’s• Still most notable one
Syslog-ng• An extension based on nsyslogd• Reliable transport, encryption, and richer set of information and filtering
Rsyslog• An extension used in latest distros• Multi-threading.
7 / 25
Background- Cisco UCS and Virtual Interface Card
Cisco UCS datacenterserver system
Cisco UCS server
128Programmable
VirtualInterfaces
Ethernet NICs Fibre Channel HBAs
10GBASE-KRUnified NetworkFabric, 1 to EachFabric Extender
Cisco UCS Virtual Interface Card (VIC)
Mgmt CPUFCPU 0
VIC ASIC
FCPU 1
Mgmt CPU
MIPS proc core(500MHz, MIPS 24Kc)
Embedded Linux(Linux kernel 2.6.23-rc5)
8 / 25
Background- Evolution of VIC Logging System
• Logging from Multiple Processes• Different Severity Levels• Formatting and flash writing
• Forwards serious msgs to switches• Functional, but with worse write performance
• Improves flash write performance of unbuffered syslogd• Still suffers long latency
JFFS2
Flash
AppProcessAppProcess
AppProcess
…
log
log
loglogd
Logd – a simple logging daemon
Unbuffered syslogd System
ProcessSystemProcess
SystemProcess
…
Switch Switch
JFFS2
Flash
KERNEL
USER
Buffer
AppProcessAppProcess
AppProcess
…
log
log
logsyslogd
syslogBuffered syslogd System
ProcessSystemProcess
SystemProcess
…
Switch Switch
JFFS2
FlashFlash
Logger
KERNEL
USER
Named pipeBuffer
AppProcessAppProcess
AppProcess
…
log
log
logsyslogd
syslog
9 / 25
Organization
• Related Work for UNIX Logging Systems• Background– Cisco UCS and Virtual Interface Card (VIC)– Evolution of VIC Logging System
• Design Requirements & Implementation• Evaluation & Optimization• Conclusion
10 / 25
SystemProcessSystemProcess
SystemProcess
…
Switch Switch
JFFS2
FlashFlash
Logger
KERNEL
USER
Named pipe
AppProcessAppProcess
AppProcess
…
log
log
logmqlogd
MemoryMapped
File
enqueue
dequeue
Design Requirements - Faster Message Transfer
• Avoid kernel-to-user space msg copying
Syslogd Logging Mqlogd LoggingSystemProcessSystemProcess
SystemProcess
…
Switch Switch
JFFS2
FlashFlash
Logger
KERNEL
USER
Named pipeBuffer
AppProcessAppProcess
AppProcess
…
log
log
logsyslogd
syslog
11 / 25
SystemProcessSystemProcess
SystemProcess
…
Switch Switch
JFFS2
FlashFlash
Logger
KERNEL
USER
Named pipeBuffer
AppProcessAppProcess
AppProcess
…
log
log
logsyslogd
syslog
Design Requirements - Faster Message Transfer
• Reduce message copying from 4 to 2
1
2 34
1’2’
Syslogd Logging Mqlogd LoggingSystemProcessSystemProcess
SystemProcess
…
Switch Switch
JFFS2
FlashFlash
Logger
KERNEL
USER
Named pipe
AppProcessAppProcess
AppProcess
…
log
log
logmqlogd
MemoryMapped
File
enqueue
dequeue
App local copy1Write to kernel buffer2Write directly to shared memory1’ Write from shared memory to named pipe2’Write to named pipe4Syslogd local copy3
12 / 25
Design Requirements- Compatibility with Existing Logging Apps
• Thru Logging API– Replace syslog() with
share memory lib calls
• Direct Syslog Calls– Server receives msgs
through UDP Unix socket
Logging Server (Syslogd)
Logging Client
syslog() library call
klogd fls …
UDP Unix Socket
Logging Server (Syslogd)
Logging Client
syslog() library call
mcp fls …
UDP Unix Socket
Logging API :log_info(), log_error(), …
Logging Server (mqlogd)
Logging Client
klogd xinetd …syslog() library call
UDP Unix Socket
Logging Server (mqlogd)
Logging Client
app1 app2 …Logging API :
log_info(), log_error(), …
Shared MemoryLogging Library
13 / 25
Design Requirements- Destination-Aware Message Formatting
• Syslogd– Working but limited– Redundant– Coarse time granularity (in seconds)
• Mqlogd– Destination-aware formatting with space saving– Uses system supported timing (in micro-seconds)
14 / 25
Implementation- Shared Memory and Circular Queue
• Notification Mechanism– Write-and-select– Signal
• Locking Mechanism– Semaphore lock– Pthread lock
EnqueueLoggingClient
Shared Memory
… LoggingClient
LoggingServer
Dequeue
LoggingEvent Notification
Disable Flag
Circular Queue Header
NotificationDisable Flag
…
Non-Header EntryHeader Entry
Queue Memory Layout
Non-Header Entry
Non-Header Entry
Notification
15 / 25
Organization
• Related Work for UNIX Logging Systems• Background– Cisco UCS and Virtual Interface Card (VIC)– Evolution of VIC Logging System
• Design Requirements & Implementation• Evaluation & Optimization• Conclusion
16 / 25
Evaluation
• Metrics– Request Latency– Request Drop Rate
• Parameters– Number of clients– Number of iterations (Depth of queue size)– Locking mechanism– Notification mechanism
17 / 25
Performance Results- Performance compared to syslogd
• Avg Latency: >10x speed-up• Min Latency: >20x speed-up• Max Latency: >2x speed-up
100 1000 5000 10000 500000
200
400
600
800
Average Request Latency - 1 Client
syslogdmqlogd (select, semaphore)mqlogd (signal, semaphore)mqlogd (select, pthread)mqlogd (signal, pthread)
Number of Iterations
Late
ncy
(us)
100 1000 5000 10000 500000
100200300400500600700
Minimum Request Latency - 1 Client
syslogdmqlogd (select, semaphore)mqlogd (signal, semaphore)mqlogd (select, pthread)mqlogd (signal, pthread)
Number of Iterations
Late
ncy
(us)
100 1000 5000 10000 500000
10000
20000
30000
40000
50000
Maximum Request Latency - 1 Client
syslogdmqlogd (select, semaphore)mqlogd (signal, semaphore)mqlogd (select, pthread)mqlogd (signal, pthread)
Number of Iterations
Late
ncy
(us)
18 / 25
Performance Results- Effect of Queue Size
• No drops within queue size (e.g. 10000)• Queue size should be larger than max
expected burst size
100 1000 5000 10000 500000%
20%
40%
60%
80%
100%
Request Drop Rate - 1 Client
mqlogd (select, semaphore)mqlogd (signal, semaphore)mqlogd (select, pthread)mqlogd (signal, pthread)
Number of Iterations
Perc
ent
19 / 25
Performance Results- Effect of Multiple Clients
• Avg request latency increases proportionally• With 2 clients, request starts to drop with
smaller number of iterations
100 1000 5000 10000 500000.0
400.0
800.0
1200.0
1600.0
Avg Request Latency - 1 and 2 Clients
syslogd (1 client)syslogd (2 clients)mqlogd (select, 1 client)mqlogd (select, 2 clients)
Number of Iterations
Late
ncy
(us)
100 1000 5000 10000 500000%
20%
40%
60%
80%
100%
Request Drop Rate - 1 and 2 Clients
mqlogd (select, 1 client)mqlogd (select, 2 clients)
Number of Iterations
Perc
ent
20 / 25
Performance Results - Effect of Notification Mechanisms
• Makes little difference
100 1000 5000 10000 500000
20
40
60
80
100
Average Request Latency - 1 Client
mqlogd (select, semaphore)mqlogd (signal, semaphore)
Number of Iterations
Late
ncy
(us)
100 1000 5000 10000 500000
5
10
15
20
25
Minimum Request Latency - 1 Client
mqlogd (select, semaphore)mqlogd (signal, semaphore)
Number of Iterations
Late
ncy
(us)
100 1000 5000 10000 500000
5000
10000
15000
20000
Maximum Request Latency - 1 Client
mqlogd (select, semaphore)mqlogd (signal, semaphore)
Number of Iterations
Late
ncy
(us)
21 / 25
Performance Results - Effect of Lock Mechanisms
• Pthread mutex is 40% faster than semaphore.• Semaphore is used for our production code due to a
limitation of pthread mutex lock(Linux kernel 2.6.23-rc5)..
100 1000 5000 10000 500000
50
100
150
200
250
Average Request Latency - 1 Client
mqlogd (select, semaphore)mqlogd (select, pthread)
Number of Iterations
Late
ncy
(us)
100 1000 5000 10000 500000
20406080
100120140
Minimum Request Latency - 1 Client
mqlogd (select, semaphore)mqlogd (select, pthread)
Number of Iterations
Late
ncy
(us)
100 1000 5000 10000 500000
5000
10000
15000
20000
Maximum Request Latency - 1 Client
mqlogd (select, semaphore)mqlogd (select, pthread)
Number of Iterations
Late
ncy
(us)
22 / 25
Performance Results- Effect of Client Interface Type
• Logging using UNIX socket interface– Backward compatibility is no faster– About the same level as syslogd.– For compatibility, not for general use.
100 1000 5000 10000 500000
200
400
600
800
1000
Average Request Latency - 1 Client
syslogdmqlogd (select, semaphore)mqlogd (Unix socket)
Number of Iterations
Late
ncy
(us)
23 / 25
Optimization- Effects of deferred notification
• Sends one notification for a batch of msgs• Measured time for host-to-adapter commands
(capability & macaddr) with and w.o. logging• 2x speed-up in latency
write-and-se-lect
deferred syslogd0
200400600800
10001200140016001800
Latency for 'capability' command
msg xfer time (us)
write-and-se-lect
deferred syslogd0
100200300400500600700800900
1000
Latency for 'macaddr' command
msg xfer time (us)
write-and-se-lect
deferred syslogd0
200400600800
10001200140016001800
Latency for 'capability' command
logging time (us)
msg xfer time (us)
write-and-se-lect
deferred syslogd0
100200300400500600700800900
1000Latency for 'macaddr' command
logging time (us)
msg xfer time (us)
24 / 25
Future Works
• Reduce kernel msg copying even further• Improve performance with faster lock• Avoid loss of serious messages
FlashLogger
Named pipe
File System
KERNEL
USER
AppProcessAppProcess
AppProcess
…
log
log
logmqlogd
MemoryMapped
File
enqueuedequeue
FlashLogger
File System
KERNEL
USER
AppProcessAppProcess
AppProcess
…
log
log
logmqlogd
MemoryMapped
File
enqueuedequeue
MemoryMapped
File
25 / 25
Conclusion
• Logging system for embedded UNIX apps• Up to 100x speed-up in latency, 10x throughput• Backward Compatibility• Commercially used in Cisco UCS Virtual
Interface Cards