Applications of Non-Blocking Data Structures to Real-Time Systems
-
Upload
nicole-david -
Category
Documents
-
view
21 -
download
3
description
Transcript of Applications of Non-Blocking Data Structures to Real-Time Systems
Håkan Sundell, [email protected]
Chalmers University of Technology
1
Applications of Non-Blocking Data Structures to Real-Time Systems
Seminar for the degree of Licentiate of Philosophy
Håkan Sundell
Computing Science
Chalmers University of Technology
Håkan Sundell, [email protected]
Chalmers University of Technology
2
Background
• ARTES project: ”Applications of wait/lock-free protocols to real-time systems”
• Started in March 1999.
• One active Ph.D.-student.
• Project leader: Philippas Tsigas
Håkan Sundell, [email protected]
Chalmers University of Technology
3
Schedule• Introduction
– Real-Time Systems– Synchronization
• Shared Data Objects: Snapshots– Evaluation
• The Effect of Using Timing Information– Snapshot– Shared Register
• Software engineering part• Conclusions & Future Work
Håkan Sundell, [email protected]
Chalmers University of Technology
4
Real-Time Systems
• Uni- or Multi-processor system
• Interconnection Network– e.g. The Controller Area Network (CAN).
CPUCPU CPUCPU
CPUCPU CPUCPU
Håkan Sundell, [email protected]
Chalmers University of Technology
5
Real-Time Systems• Shared Memory
CPU CPU CPU
CPU CPU CPU CPU CPU CPU
Cache Cache Cache
Cache bus Cache bus Cache bus
Memory
Memory Memory Memory
...
. . .
... .... . .
- Uniform Memory Access (UMA)
- Non-Uniform Memory Access (NUMA)
Håkan Sundell, [email protected]
Chalmers University of Technology
6
Real-Time Systems• Cooperating Tasks
– Timing Constraints
• Inter-task Communication: Shared Data Objects– Needs Synchronization
? ? ?? ? ?? ? ?? ? ?
T1T1
T2T2
T3T3
Håkan Sundell, [email protected]
Chalmers University of Technology
7
Schedule• Introduction
– Real-Time Systems– Synchronization
• Shared Data Objects: Snapshots– Evaluation
• The Effect of Using Timing Information– Snapshot– Shared Register
• Software engineering part• Conclusions & Future Work
Håkan Sundell, [email protected]
Chalmers University of Technology
8
Synchronization
• Synchronization using Locks– Uses semaphores, spinning,
disabling interrupts
– Negative• Blocking
• Priority inversion
• Risk of deadlock
– Positive• Execution time guarantees easy to do, but pessimistic
Take lockTake lock
... do operation ...... do operation ...
Release lockRelease lock
Håkan Sundell, [email protected]
Chalmers University of Technology
9
Non-blocking Synchronization
• Lock-Free Synchronization– Retries until not interfered by other operations
• Usually detecting interference by using some kind of shared variable indicating busy-state or similar.
Change flag to unique value, or remember current stateChange flag to unique value, or remember current state
... do the operation while preserving the active structure ...... do the operation while preserving the active structure ...
Check for same value or state and then validate changes, Check for same value or state and then validate changes, otherwise retryotherwise retry
Håkan Sundell, [email protected]
Chalmers University of Technology
10
Non-blocking Synchronization
• Lock-Free Synchronization– Negative
• No execution time guarantees, can continue forever - thus can cause starvation
– Positive• Avoids blocking and priority inversion
• Avoids deadlock
• Fast execution on average
Håkan Sundell, [email protected]
Chalmers University of Technology
11
Non-blocking Synchronization
• Non-blocking Synchronization – Uses atomic synchronization primitives
– Uses shared memory
• Wait-Free Synchronization– Always finish in a finite number of its
own steps
– Negative• Complex algorithms
• Memory consuming
Test&SetTest&Set
CompareCompare&Swap&Swap
CopyingCopying
HelpingHelping
AnnouncingAnnouncing
SplitSplitoperationoperation
??????
Håkan Sundell, [email protected]
Chalmers University of Technology
12
Non-blocking Synchronization
• Wait-Free Synchronization– Positive
• Execution time guarantees
• Fast execution
• Avoids blocking and priority inversion
• Avoids deadlock
• Avoids starvation
• Same implementation on both single- and multiprocessor systems
Håkan Sundell, [email protected]
Chalmers University of Technology
13
Schedule• Introduction
– Real-Time Systems– Synchronization
• Shared Data Objects: Snapshots– Evaluation
• The Effect of Using Timing Information– Snapshot– Shared Register
• Software engineering part• Conclusions & Future Work
Håkan Sundell, [email protected]
Chalmers University of Technology
14
Shared Data Objects
• Correctness criteria for concurrent operations: linearizability– All concurrent executions can be transformed
into an equivalent serial sequence of atomic operations preserving the partial order
t
Read
Write
Writeti
tj
tk
ser
Håkan Sundell, [email protected]
Chalmers University of Technology
15
Snapshot
• Snapshot– A consistent momentous state of a set of several
shared variables that are logically related– One reader (scanner)
• Reads the whole set of variables in one atomic step
– Many writers (updaters)• Writes to only one variable each time
Håkan Sundell, [email protected]
Chalmers University of Technology
16
Snapshot: Correctness
• Atomicity / Linearizability criteria
t
t
Write Write
Read
Write Write
Read
YES
YES
ci
ci
= returned by scanner
tWrite Write
Read
ci
NO
Håkan Sundell, [email protected]
Chalmers University of Technology
17
Snapshot: Correctness
• Atomicity / Linearizability criteria
tWrite Write
Read
ciNO
= returned by scanner
Write Write
Write
ci
cj tNO
Håkan Sundell, [email protected]
Chalmers University of Technology
18
Schedule• Introduction
– Real-Time Systems– Synchronization
• Shared Data Objects: Snapshots– Evaluation
• The Effect of Using Timing Information– Snapshot– Register
• Software engineering part• Conclusions & Future Work
Håkan Sundell, [email protected]
Chalmers University of Technology
19
Used by writerUsed by writer
Used by readerUsed by reader
What are we evaluating
• Wait-free snapshot algorithm by Ermedahl et. al– 3 register copies for each component
– Uses the Test&Set atomic primitive for synchronization
Håkan Sundell, [email protected]
Chalmers University of Technology
20
Analysis
• Real-Time System: Measured schedulability• Created “realistic” scenarios on a theoretic 68020
uni-processor system– Real RTOS parameters– Manual WCET-analysis on cycle level– 1 scanner (5 components), 24 updaters (10 real-time
tasks, 15 interrupts)– Fixed priority response time analysis– Schedulable without any synchronization– Adding lock/wait-free or semaphore synchronization
Håkan Sundell, [email protected]
Chalmers University of Technology
22
Experiments
• Simulation– RT-simulator written in Erlang by Ermedahl
and Sjödin.• Fixed priority preemptive scheduler
• Semaphores
• Messages
– Subset of scenarios used in analysis
Håkan Sundell, [email protected]
Chalmers University of Technology
23
Experiments: Schedulability (%)
Håkan Sundell, [email protected]
Chalmers University of Technology
24
Experiments
• Multi-node: Simulation of CAN-bus 1 MHz– 10 nodes connected using messages– Local snapshots on each node – 1 super-snapshot task on 1 node– Subset of scenarios used for single-node
analysis
Håkan Sundell, [email protected]
Chalmers University of Technology
25
Experiments: Rsnap for multi-node
Håkan Sundell, [email protected]
Chalmers University of Technology
26
Schedule• Introduction
– Real-Time Systems– Synchronization
• Shared Data Objects: Snapshots– Evaluation
• The Effect of Using Timing Information– Snapshot– Register
• Software engineering part• Conclusions & Future Work
Håkan Sundell, [email protected]
Chalmers University of Technology
27
Timing Information
• Previously used by Chen and Burns in 1999.– Assuming system with periodic fixed-priority
scheduling– Notations from Standard Real-Time Response Time
Analysis
– Use information about• Periods , T• Worst-case Computation time , C• Worst-case Response times , R
)(ihpjj
j
iii C
T
RCR
Håkan Sundell, [email protected]
Chalmers University of Technology
28
Schedule• Introduction
– Real-Time Systems– Synchronization
• Shared Data Objects: Snapshots– Evaluation
• The Effect of Using Timing Information– Snapshot– Register
• Software engineering part• Conclusions & Future Work
Håkan Sundell, [email protected]
Chalmers University of Technology
29
Snapshot
• Back to Basics: Unbounded Memory Protocol– The reader increases global index and scans backwards.
tv ? ? ? ? w nil nil
v ? ? ? ? w nil nil
v ? ? ? ? w nil nilc1
ci
cc
Snapshotindex ? = previous values / nilw = writer position
. . .
. . .
. . .
Håkan Sundell, [email protected]
Chalmers University of Technology
30
Snapshot• Bounded Memory: Cyclical Buffers
– Needed buffer length is dependent on how fast the updaters is compared to the scanner
– Each component can have different buffer lengths
Håkan Sundell, [email protected]
Chalmers University of Technology
31
Timing Information• Bounding
– Needed buffer length for component k
– Can be refined even further
where Ts is the period for the snapshot taskTw is the period for the writer tasks
2max*2 )(
S
Wkwrik T
Tl i
Håkan Sundell, [email protected]
Chalmers University of Technology
32
Experiments
• Using a Sun Enterprise 10000 multiprocessor computer
• 1 scanner task and 10 updater tasks, one on each CPU
• Comparing two wait-free snapshot algorithms– Using timing information– Using Test-and-Set synchronization
Håkan Sundell, [email protected]
Chalmers University of Technology
33
Experiments• Scenarios with different ratios between
scanner/updater:
– Measuring response time for scan versus update operations
Ratio 500/50
200/50
100/50
50/50
50/100
50/200
50/500
Buffer length 3 3 3 4 6 10 22
Håkan Sundell, [email protected]
Chalmers University of Technology
34
Experiments• Scan operation - Average Response Time
Håkan Sundell, [email protected]
Chalmers University of Technology
35
Experiments• Update operation – Average Response Time
Håkan Sundell, [email protected]
Chalmers University of Technology
36
Schedule• Introduction
– Real-Time Systems– Synchronization
• Shared Data Objects: Snapshots– Evaluation
• The Effect of Using Timing Information– Snapshot– Shared Register
• Software engineering part• Conclusions & Future Work
Håkan Sundell, [email protected]
Chalmers University of Technology
37
Shared Register• Target domain: Shared Memory (Even no cache
coherency)• Wait-Free Atomic Shared Buffer by Vitanyi et. al
– A Matrix of 1-reader 1-writer registers– Each register contains a value/tag pair encoded as one value
... ... ...
R21 R22 …
R11 R12 ... Readers
Writers
Rij - written by processor i read by processor j
tag value
Håkan Sundell, [email protected]
Chalmers University of Technology
38
Shared Register• Algorithm:
– Readers scans its column for highest tag and returns the corresponding value
– Writers scan its column and writes the next tag together with the new value to its row
• Unbounded maximum size for the tag field in the value/tag pair– Assume 8 writer tasks with 10 ms period
• Maximum tag after one hour is 2880000 which needs 22 bits!
Håkan Sundell, [email protected]
Chalmers University of Technology
39
Timing Information• Analyzing the maximum difference between tags possible
observable by a task at two consecutive invocations of the algorithm– In any possible execution:
• Tmax is the longest period
• Rmax is the longest response time
• Twr is the period of the writer tasks
• Recycling tags:– Newer tags can restart from zero when we reach a certain tag value– In order to be able to decide if newer tags are newer we need to have:
n
i Wr
n
i Wr iiT
R
T
TMaxTagDiff
1
max
1
max
2*MaxTagDiffzeTagFieldSi
v3 v4 v1 v2
0 N
v3 v4
Håkan Sundell, [email protected]
Chalmers University of Technology
40
Examples• Example Task Scenario on 8 processors:
• Unbounded algorithm would have reached tag 68400 in one hour , needing >16 bits
Task Period Task Period
Wr1 1000 Rd1 500
Wr2 900 Rd2 450
Wr3 800 Rd3 400
Wr4 700 Rd4 350
Wr5 600 Rd5 300
Wr6 500 Rd6 250
Wr7 400 Rd7 200
Wr8 300 Rd8 150
n
i Wr
n
i Wr iiT
R
T
TMaxTagDiff
1
max
1
max
38100010008
1
i WrWr iiTT
76*2 MaxTagDiffzeTagFieldSi
776log2 tsTagFieldBi
1000maxmax RT
Håkan Sundell, [email protected]
Chalmers University of Technology
41
Schedule• Introduction
– Real-Time Systems– Synchronization
• Shared Data Objects: Snapshots– Evaluation
• The Effect of Using Timing Information– Snapshot– Register
• Software engineering part• Conclusions & Future Work
Håkan Sundell, [email protected]
Chalmers University of Technology
42
Background
• Multithreaded programming needs communication.
• Communicating using shared data structures like stacks, queues, lists and so on.
• This needs synchronization!• Locks (Mutual exclusion) has several drawbacks,
especially for Real-Time Systems.• Non-blocking solutions are often complex to
implement and have non-standard interfaces.
Håkan Sundell, [email protected]
Chalmers University of Technology
43
NOBLE: A Non-Blocking Inter-Process Communication Library
• Designed with the following properties:– Functionality – Stacks, Queues, Lists,
Snapshot, Register… with clear specifications– Programmer friendly - #include <noble.h> ,
NBL<function>– Easy to adapt existing solutions – Provides
locks as well as non-blocking synchronization
Håkan Sundell, [email protected]
Chalmers University of Technology
44
NOBLE: A Non-Blocking Inter-Process Communication Library
• Designed with the following properties (cont.):– Efficient – Object oriented design “virtual
functions and inheritance with base classes” in C
– Portable – Modular design, platform-dependent code separated
– Adaptable for different programming languages – C, C++, Standard dynamic linked library
Håkan Sundell, [email protected]
Chalmers University of Technology
45
Examples
• #include <noble.h>• First create a global variable handling the shared
data object, for example a stack:NBLStack *stack;stack=NBLCreateStackLF(10000);
• When some thread wants to do some operation:NBLStackPush(stack, item);
oritem=NBLStackPop(stack);
Håkan Sundell, [email protected]
Chalmers University of Technology
46
Examples
• When the data structure is not in use anymore:NBLStackFree(stack);
• To change the synchronization mechanism, only one line of code has to be changed!stack=NBLStackCreateLF(10000);replaced withstack=NBLStackCreateLB();
Håkan Sundell, [email protected]
Chalmers University of Technology
47
Experiment
• Set of 50000 random operations performed multithreaded on each data structure, with either low or high contention.
• Comparing the different synchronization mechanisms and implementations available.
• Varying number of threads from 1 – 30.• Performed on multiprocessors:
– Sun Enterprise 10000 with 64 CPUs, Solaris– Compaq PC with 2 CPUs, Win32
Håkan Sundell, [email protected]
Chalmers University of Technology
48
Experiments: Linked List (high)
Håkan Sundell, [email protected]
Chalmers University of Technology
49
Status
• Multiprocessor support– Sun Solaris (Sparc)– Win32 (Intel x86)– SGI (Mips) – Evaluation stage– Linux (Intel x86) – Evaluation stage
• Extensive Manual• Web site up and running,
http://www.cs.chalmers.se/~noble
Håkan Sundell, [email protected]
Chalmers University of Technology
50
Schedule• Introduction
– Real-Time Systems– Synchronization
• Shared Data Objects: Snapshots– Evaluation
• The Effect of Using Timing Information– Snapshot– Register
• Software engineering part• Conclusions & Future Work
Håkan Sundell, [email protected]
Chalmers University of Technology
51
Conclusions
• Contributions:– Evaluations of snapshot
• Non-blocking performs better than lock-based in all cases. Lock-free performs best on uni-processor systems.
– The effect of using Timing Information• Snapshot and Shared Register• Algorithms can be simplified and increase the
performance significantly.• Efficient recycling of time-stamps is possible
Håkan Sundell, [email protected]
Chalmers University of Technology
52
Conclusions
• Contributions (cont.):– A library of non-blocking protocols
• Easy to use, efficient and portable
• Non-blocking protocols always performs better than lock-based, especially on multi-processor systems.
• Concluding judgment:– Non-blocking protocols are highly applicable to real-
time systems. Lock-free protocols seems very promising and will be applicable to real-time systems with applied analysis
Håkan Sundell, [email protected]
Chalmers University of Technology
53
Future work
• NOBLE– Adapt to commercial RTOS (Enea OSE).– Extend to embedded systems
• Simpler uni- and multi-processor systems including 8-bit processors with/without or different support for atomic synchronization primitives.
• Timing Information– Create lock-free translations to fulfill real-time systems
properties– General time-stamp recycling scheme– More non-blocking protocols