Analyzing Aborts in Software Transactional Memory
description
Transcript of Analyzing Aborts in Software Transactional Memory
ANALYZING ABORTS IN SOFTWARE TRANSACTIONAL MEMORYPresented by: Ofer Kiselov &
Omer KiselovSupervised by: Dmitri
Perelman
Final Presentation
Overview Repeating midterm presentation on the following subjects
* Software Transactional Memory abstraction * STM implementation example - TL2 overview * Aborts in STM * Unnecessary aborts in STM * Project goal * Implementation * Overview
Online part – implementation Online logging Evaluation Hardware Deuce Benchmarks Results Conclusion and analysis Nice to have Future work
Importance Of Parallel Programming
Frequency barrier – the single core processor’s performance can not improve.
Switch to multi-cores. Parallel programs allow utilizing
multi-core processors. Need for synchronization for
accessing shared data
Transactional Memory – why? Current synchronization – locks
Coarse-grained – limit parallelism Fine-grained – high programming complexity Error-prone (deadlocks / livelocks)
Transactional memory solution Intuitive for a programmer Provides a “transaction” abstraction for a
critical section (operations executed atomically)
Implemented in both software and hardware.
Why Do Aborts Happen?
OBJECT1
OBJECT2
T1
T2 T3
T4T1 T2 T3 Read from O1
T4 Reads from O2 and writes to O1
To maintain consistency if T4 commits T1 T2 & T3 must abort!
Aborted
Committed
T1 T2 T3 write to O2
Unnecessary Aborts Aborts are bad
work is lost, resources are wasted, throughput decreases
Some aborts are necessary continuing the run would violate correctness
And some aborts are not Analysis whether the algorithm should is too
expensive. “Unnecessary” abort: it could be avoided
keep more versions, better check of transactional dependencies.
o1
o2C
A
T1T2
T3
Project Goals Build a software analysis tool:
measures aborts statistics for a given run
evaluate how many of them were unnecessary
evaluate the damage to performance “Will it pay off to add designs to stop
the unnecessary aborts?”
Project Formation
An offline part for analyzing the run: reads the log of the run. gathers statistics. analyzes unnecessary aborts.
An online part for logging the run: is inserted to a specific algorithm run in a benchmark flushes the run info to an XML log file
Offline Part
XML Log Parser
Analyzer
Output of analysis is a precedence graph
showing the transactions and
their actions.RUN DESCRIPTOR
Abort Analyzer
Matlab histograms and final analysis
Parser Every log line represents transactional action
represented by LogLine abstract class Parser responsibility:
iterate over the xml create appropriate LogLine instances
LogLine factories for different operation types transactional start read operation write operation transactional commit
Analyzer
Gives basic statistics regarding the transactions run. Counts aborts per reason. Counts reads, writes Count transactions
Inserting the Path into Run Descriptor ADT Struct.
Transactional DependenciesRun Descriptor is a precedence graph!
RUN DESCRIPTOR
T1
T4
Reader
OBJECT1
OBJECT2
Reader
OBJECT1 Version2
OBJECT2 Version2
Writer
Writer
WaRWaR
In order to create the graph we needed to establishA way to make the basic run into a graph
ABORTS ANALYZER Searches for unnecessary aborts in RUN
DESCRIPTOR Speculatively adds the edges of the aborted
transaction to the RUN DESCRIPTOR Using DFS – Finds circles in the precedence
graph.Circles represent necessary aborts
Removes the edges at the end of analysis.Built as visitor pattern
Flexible for more complex analysis
Online partOur goals: Run benchmarks to prepare the
statistics for offline part. Be sure that the measurements don’t
distort the scheduling picture.
Platform Supporting STM
Deuce STM is an open source java STM environment.
With Deuce STM, if the method:public void doThing() {…} is not thread-safe…@AtomicPublic void doThing() {…} is!!
Introducing:Deuce STM!!!
Created By: Guy Korland, Nir Shavit, Pascal Felber, Igor Berman
Source Codefinal public class Context implements org.deuce.transaction.Context {
private static String objectId(Object reference, long field) {return Long.toString(System.identityHashCode(reference) + field);}
final static AtomicInteger clock = new AtomicInteger(0);
TL2 Work
MethodWith
Logging
Deuce Frame Work
How To Utilize Deuce for Logging Modified code to call logging utils. More exceptions type to distinct
between different aborts types.
Logger
Deuce Framework
TL2 Algorithm
Transactions Code:StartReadWrite
Commit
A Perfectly Scalable Code
Online Part Implementation Version 1
Main Problem : Adding to priority queue damages
parallelism and lowers performance
Online Part ImplementationVersion 2
The Back End
CollectorThe threads don’t do any
Extra actions to log therun.
The Loglines have ended
The program has ended
What Do we Check? Commit rate Unnecessary aborts (classified by
types) Wasted work
Testbenches SSCA2 – Short transactions, low
contention, high memory utilization Vacation – High contention, Medium length
transaction, Mostly reads. AVL tree – customizable contention,
medium length transactions. Random choice between add, remove or
search for a random integer in the tree. Ability to change integer range for custom
contention. Created by us.
Hardware Benchmarks run on Trinity:
8 quad-cores 132 GB RAM Machine was idle for our use.
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
Pre
cent
age
Of S
ucce
ssfu
l Com
mits
100
101
102
0
1000
2000
3000
4000
5000
6000
Number Of Threads
Am
ount
Of U
nnec
essa
ry A
borts
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
prec
enta
ge O
f Unn
eces
sary
Abo
rts
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
Pre
cent
age
Of W
aste
d R
eads
Simulation Results – AVL treeCommit Ratio
Percentage of Unnecessary Aborts
All graphs are a function of the thread amount
Amount of Aborts & Unnecessary Aborts
Percentage of Wasted Reads
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
Pre
cent
age
Of S
ucce
ssfu
l Com
mits
100
101
102
0
500
1000
1500
Number Of Threads
Am
ount
Of U
nnec
essa
ry A
borts
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
prec
enta
ge O
f Unn
eces
sary
Abo
rts
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
Pre
cent
age
Of W
aste
d R
eads
Simulation Results – SSCA2Commit Ratio
Percentage of Unnecessary Aborts
All graphs are a function of the thread amount
Amount of Aborts & Unnecessary Aborts
Percentage of Wasted Reads
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
Pre
cent
age
Of S
ucce
ssfu
l Com
mits
100
101
102
0
100
200
300
400
500
600
700
Number Of Threads
Am
ount
Of U
nnec
essa
ry A
borts
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
prec
enta
ge O
f Unn
eces
sary
Abo
rts
100
101
102
0
0.2
0.4
0.6
0.8
1
Number Of Threads
Pre
cent
age
Of W
aste
d R
eads
Simulation Results – VacationCommit Ratio
Percentage of Unnecessary Aborts
All graphs are a function of the thread amount
Amount of Aborts & Unnecessary Aborts
Percentage of Wasted Reads
Simulation Results – AVL treeAll graphs are a function of the thread amount
46%
11%
43%
threads2
43%
14%
43%
threads4
39%
25%
36%
threads8
51%
28%
22%
threads16
57%27%
16%
threads32
16%
79%
5%threads64
Version Too HighObject LockedReadset Invalid
Simulation Results – SSCA2All graphs are a function of the thread amount
23%
12%
65%
threads2
26%
14%60%
threads4
22%
19%60%
threads8
36%
18%
45%
threads16
35%
24%
41%
threads32
28%
36%
36%
threads64
Version Too HighObject LockedReadset Invalid
Simulation Results – VacationAll graphs are a function of the thread amount
55%
12%
34%
threads2
61%10%
29%
threads4
62%
6%
32%
threads8
68%
5%
27%
threads16
62% 15%
23%
threads32
38%
49%
13%
threads64
Version Too HighObject LockedReadset Invalid
1 2 3 4 5 60
500
1000
1500
2000
2500
3000
3500
log2 of Number Of Threads
Am
ount
Of A
borts
Version Too HighObject LockedReadset Invalid
1 2 3 4 5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
log2 of Number Of Threads
Am
ount
Of A
borts
Version Too HighObject LockedReadset Invalid
Simulation Results – AVL treeAll graphs are a function of the thread amount
Percentage of Aborts by typesAmount of Aborts by types
1 2 3 4 5 60
100
200
300
400
500
600
700
log2 of Number Of Threads
Am
ount
Of A
borts
Version Too HighObject LockedReadset Invalid
1 2 3 4 5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
log2 of Number Of Threads
Am
ount
Of A
borts
Version Too HighObject LockedReadset Invalid
Simulation Results – SSCA2All graphs are a function of the thread amount
Percentage of Aborts by typesAmount of Aborts by types
1 2 3 4 5 60
50
100
150
200
250
300
350
400
log2 of Number Of Threads
Am
ount
Of A
borts
Version Too HighObject LockedReadset Invalid
1 2 3 4 5 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
log2 of Number Of Threads
Am
ount
Of A
borts
Version Too HighObject LockedReadset Invalid
Simulation Results – VacationAll graphs are a function of the thread amount
Percentage of Aborts by typesAmount of Aborts by types
Logger impact on performance
Logger access obviously demands more from the Deuce framework. More memory accesses More exception types On every read & write
How much distortion does the logger cause?
100
101
102
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number Of Threads
Pre
cent
age
Of S
ucce
ssfu
l Com
mits
With loggerWithout logger
AVL test with logging – commit ratio
Conclusions
Parallelism increases → aborts rate, unnecessary abort rate and the wasted work rate increase as well.
Parallelism increases → more aborts are caused by locked objects.
To improve STM performance over highly parallel workloads, algorithms may be improved to prevent unnecessary aborts.
Nice To Have Drawing the precedence graph
automatically to a drawing in Microsoft Visio.
Possibility to analyze according to abort types.
GUI. Expansion of the simulation to more
algorithms and test benches – makes the comparison of performance between algorithms possible.
Future Work
Drop in abort rates after 128 threads due to a drop in concurrency – further analysis is required.
Unfit versions cause a lot of aborts. The new SMV algorithm may solve this
problem.
BIBLIOGRAPHY I. Keidar and D. Perelman. On avoiding spare aborts in
transactional memory. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures, pages 59–68, 2009.
I. Keidar and D. Perelman .SMV: Selective Multi-Versioning STM
O. S. D. Dice and N. Shavit. Transactional locking II. In Proceedings of the 20th International Symposium on Distributed Computing, pages 194–208, 2006.
M. Herlihy, V. Luchangco, M. Moir, and W. N. Scherer, III. Soft-ware transactional memory for dynamic-sized data structures. In Pro-ceedings of the twenty-second annual symposium on Principles of distributed computing, pages 92–101, 2003.
?QUESTIONS