ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join
PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS...
Transcript of PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS...
![Page 1: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/1.jpg)
PARALLEL CONSTRUCTION OF
SIMULTANEOUS DETERMINISTIC
FINITE AUTOMATA ON SHARED-
MEMORY MULTICORES
Minyoung Jung1, Jinwoo Park1,
Johann Blieberger2 and Bernd Burgstaller1
1Yonsei University, Korea
2Vienna University of Technology, Austria
46th International Conference of Parallel Processing
Bristol, United Kingdom in August 14 - 17, 2017
![Page 2: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/2.jpg)
Motivation
2
String pattern matching with finite automata (FAs) is a
well-established method across many areas.
Text editors
Compiler front-ends
Internet search engines
Security and DNA sequence analysis
The sequential FA algorithm has linear complexity in
the size of the input.
Significant research effort has been spent on parallelizing
FA matching to improve the sequential performance
Hard to be parallelized due to the dependency between
state transitions
![Page 3: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/3.jpg)
Limitation of parallel FA matching
Motivation (cont.)
3
DFA
![Page 4: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/4.jpg)
Limitation of parallel FA matching
Motivation (cont.)
4
![Page 5: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/5.jpg)
Limitation of parallel FA matching
Motivation (cont.)
5
What is the start state?
![Page 6: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/6.jpg)
Limitation of parallel FA matching
Motivation (cont.)
6
![Page 7: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/7.jpg)
Limitation of parallel FA matching
Motivation (cont.)
7
![Page 8: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/8.jpg)
Limitation of parallel FA matching
Motivation (cont.)
8
![Page 9: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/9.jpg)
Limitation of parallel FA matching
Motivation (cont.)
9
![Page 10: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/10.jpg)
SFA construction
Simultaneous Finite Automata (SFAs)
Accumulated state transition information
Simulates the parallel execution of |Q| DFAs on a
single DFA
10
DFA SFA
Motivation (cont.)
![Page 11: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/11.jpg)
Parallel FA matching
Parallel SFA matching
Motivation (cont.)
11
![Page 12: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/12.jpg)
Parallel FA matching
Parallel SFA matching
Motivation (cont.)
12
![Page 13: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/13.jpg)
Parallel FA matching
Parallel SFA matching
Motivation (cont.)
13
![Page 14: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/14.jpg)
Parallel FA matching
Parallel SFA matching
Motivation (cont.)
14
![Page 15: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/15.jpg)
Parallel FA matching
Parallel SFA matching
Motivation (cont.)
15
![Page 16: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/16.jpg)
Motivation (cont.)
16
3 states
6 states
![Page 17: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/17.jpg)
Our contributions
17
Introduce fingerprint-based hashing of SFA-
states to speed up state comparisons.
Provide x86 SIMD-based transposition
kernels for SFA-state construction to leverage
data-parallelism and cache-locality.
Perform in-memory compression of SFA-states
to mitigate the space constraints of large problems.
Parallelize SFA construction for shared-memory
multicores with lock-free synchronization on all
data-structures including thread-local queues supporting work-stealing.
1.
2.
3.
4.
![Page 18: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/18.jpg)
Start with the initial state .
DFA over
Sequential SFA construction
18SFA
![Page 19: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/19.jpg)
DFA over
Sequential SFA construction
19
Until no more states to process
SFA
![Page 20: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/20.jpg)
Sequential SFA construction
20
DFA over
SFA
![Page 21: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/21.jpg)
Sequential SFA construction
21
Insert into the processed set
DFA over
SFA
![Page 22: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/22.jpg)
Sequential SFA construction
22
Iterate with every symbol
DFA over
SFA
![Page 23: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/23.jpg)
Sequential SFA construction
23
Find new states
DFA over
SFA
![Page 24: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/24.jpg)
Sequential SFA construction
24
Update the SFA transition function
DFA over
SFA
![Page 25: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/25.jpg)
Sequential SFA construction
25
Check existence &
add new state to the set
(set membership test)
DFA over
SFA
![Page 26: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/26.jpg)
Sequential SFA construction
26
Generate a next state with symbol
DFA over
SFA
![Page 27: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/27.jpg)
Sequential SFA construction
27
Generate a next state with symbol
DFA over
SFA
![Page 28: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/28.jpg)
DFA over
Sequential SFA construction
28
Choose the unprocessed state
SFA
![Page 29: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/29.jpg)
DFA over
Sequential SFA construction
29SFA
Generate a next state with symbol
![Page 30: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/30.jpg)
DFA over
Sequential SFA construction
30
Until no more states to process
SFA
![Page 31: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/31.jpg)
Sequential SFA construction
31
Set the initial and the final state
DFA over
SFA
![Page 32: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/32.jpg)
Optimizing SFA construction
32
![Page 33: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/33.jpg)
Optimizing SFA construction
33
Parameterized transposition
Fingerprint-based hashing
![Page 34: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/34.jpg)
Fingerprint-based hashing
34
Fingerprints ( ) Short bit-strings for larger objects (SFA-states)
CityHash, FarmHash, Rabin’s method, etc. create fingerprints
Speed up comparisons of SFA-states
exhaustive SFA-state comparisons
![Page 35: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/35.jpg)
Fingerprint-based hashing
35
Fingerprints ( ) Short bit-strings for larger objects (SFA-states)
CityHash, FarmHash, Rabin’s method, etc. create fingerprints
Speed up comparisons of SFA-states
fingerprint comparisons
![Page 36: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/36.jpg)
Fingerprint-based hashing
36
Fingerprints ( ) Short bit-strings for larger objects (SFA-states)
CityHash, FarmHash, Rabin’s method, etc. create fingerprints
Speed up comparisons of SFA-states
Fingerprint-collisions
It follows from the properties of the hash function that if fingerprints are
different, SFA-states are different.
No exhaustive comparison necessary.
With small probability, different SFA-states generate same fingerprint.
Fingerprint-collision
If fingerprints are the same, SFA-states may be the same.
exhaustive comparisons are required.
![Page 37: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/37.jpg)
Fingerprint-based hashing (cont.)
37
Hashing of SFA-states Speed up lookups, reduces number of SFA-state comparisons
Hash key: fingerprint % size of the hash-table
Value: fingerprint, SFA-state
0
1
2
Hash-table (size=3)
![Page 38: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/38.jpg)
Fingerprint-based hashing (cont.)
38
Hash-collisions Different SFA-states may map to the same hash-key due to the modulo-
operation.
0
1
2
Hash-table (size=3)
Hash-collision
![Page 39: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/39.jpg)
Fingerprint-based hashing (cont.)
39
Hash-collisions Different SFA-states may map to the same hash-key due to the modulo-
operation.
Resolved by closed addressing with chaining
0
1
2
Hash-table (size=3)
![Page 40: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/40.jpg)
Parameterized transposition
40
Speed up creating next SFA-states of each SFA-state
1 0 0
1 0 2
2 2 2
a b c
0
1
2
Non-optimized:
compute next states one by one
DFA transition table
![Page 41: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/41.jpg)
Parameterized transposition
41
Speed up creating next SFA-states of each SFA-state
1 2 1
0 2 0
0 2 0
a
b
c
1 0 0
1 0 2
2 2 2
a b c
0
1
2
DFA transition table
Optimized: transpose the table to the table
according to the DFA-states of the source SFA-state
![Page 42: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/42.jpg)
1 2 1
0 2 0
0 2 0
Parameterized transposition
42
Speed up creating next SFA-states of each SFA-state
a
b
c
1 0 0
1 0 2
2 2 2
a b c
0
1
2
DFA transition table
Optimized: transpose the table to the table
according to the DFA-states of the source SFA-state
![Page 43: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/43.jpg)
Parameterized transposition (cont.)
43
DFA transition table (17x20)
8x8 8x8
1x1
8x8 8x8
4x8 4x8
x86 SIMD-intrinsics-based transposition kernels
20 next SFA-states (20x17)
Example transposed transition table
# DFA-states: 17, # symbols: 20
![Page 44: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/44.jpg)
Parameterized transposition (cont.)
44
DFA transition table (17x20)
8x8 8x8
1x1
8x8 8x8
4x8 4x8
x86 SIMD-intrinsics-based transposition kernels
20 next SFA-states (20x17)
Example transposed transition table
# DFA-states: 17, # symbols: 20
![Page 45: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/45.jpg)
Parameterized transposition (cont.)
45
DFA transition table (17x20)
8x8 8x8
1x1
8x8 8x8
4x8 4x8
x86 SIMD-intrinsics-based transposition kernels
20 next SFA-states (20x17)
Example transposed transition table
# DFA-states: 17, # symbols: 20
![Page 46: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/46.jpg)
Parameterized transposition (cont.)
46
DFA transition table (17x20)
8x8 8x8
1x1
8x8 8x8
4x8 4x8
x86 SIMD-intrinsics-based transposition kernels
20 next SFA-states (20x17)
Example transposed transition table
# DFA-states: 17, # symbols: 20
![Page 47: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/47.jpg)
Work (SFA-state) distribution
47
New SFA-states are pushed to the global queue:
Thread 1: Thread 2:
Highly contentedFront Back
Observations:
1) The amount of work changes dynamically.
Few available states at the beginning, but soon all cores are saturated.
2) Switching the work distribution scheme dynamically adapts to the
changing load condition and reduces the cache-coherence overhead.
Scheme 1: static distribution via a global queue:
Advantage: avoid coherence-overhead at front of the queue from work-
stealing attempts of idle threads
Back of the queue is not contended because initially little work is
available.
![Page 48: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/48.jpg)
Work (SFA-state) distribution (cont.)
48
Scheme 2: dynamic distribution via thread-local queues
Work-stealing: steal work from the other’s queue once the local queue
is empty
Work will be popped exactly once by a thread because of lock-free
synchronization using compare-and-swap (CAS) operation
Advantage: avoid coherence-overhead from the highly contended back
of the global queue
Dequeuing SFA-states from other thread-local queues (work-stealing)
makes front of the queue highly contended (cache coherence overhead)
when little work is available
Thread-local queues:
Thread 1:
(owner)
Thread 2:
(thief)
CAS fails
CAS succeeds
Thread 0:
(thief)
![Page 49: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/49.jpg)
In-memory compression
49
SFA-state compression mitigates state explosion problem
Dictionary-based compression shows high compression
ratios due to structural properties of FAs
FA-states tend to repeat in SFA-states
Compression requires additional costly computation
Initiate once a critical memory threshold is reached
27 KB per SFA-state
Compress
![Page 50: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/50.jpg)
In-memory compression (cont.)
50
Mitigate intractable problem sizes
Conduct SFA construction in three phases
First phase: construct an SFA with un-compressed SFA-states
![Page 51: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/51.jpg)
Dictionary-based
lossless compression
In-memory compression (cont.)
51
Mitigate intractable problem sizes
Conduct SFA construction in three phases
First phase: construct an SFA with un-compressed SFA-states
Second phase: compress all generated SFA-states once a critical
memory threshold is reached
![Page 52: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/52.jpg)
Mitigate intractable problem sizes
Conduct SFA construction in three phases
First phase: construct an SFA with un-compressed SFA-states
Second phase: compress all generated SFA-states once a critical
memory threshold is reached
Third phase: resume SFA construction with compressed SFA-states
In-memory compression (cont.)
52
Decompress
CompressSet membership test
![Page 53: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/53.jpg)
Experimental evaluation
53
Benchmarks: 1250 patterns from PROSITE protein database
Their minimal DFAs are generated by Grail+.
Exclude patterns take several days to convert to minimal DFAs.
Proposed algorithm implemented in C11 using POSIX threads.
Performance results are obtained by PAPI allows accesing
hardware performance counters.
Evaluation platforms:
4-CPU (64 cores) AMD Opteron system
2-CPU (44 cores, 2 hyperthreads per core) Intel Xeon Broadwell E5-
2699 v4 system
Linux CentOS version 7
![Page 54: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/54.jpg)
Experimental evaluation (cont.)
54
Speedups of optimized sequential algorithm over the previous algorithm
Hashing: max 4.1x on AMD, 3.1x on Intel
Combination of hashing and transposition:
max 6.8x on AMD, 5.2x on Intel
On the AMD system On the Intel system
![Page 55: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/55.jpg)
Experimental evaluation (cont.)
55
Speedups of parallelization
Based on our fastest sequential algorithm using hashing and
parameterized transposition
On the AMD
system
(Max. 108.9x)
On the Intel
system
(Max. 46.1x)
![Page 56: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/56.jpg)
Experimental evaluation (cont.)
56
Performance and size comparison with and w/o compression
Six benchmarks on the Intel system (four benchmarks are intractable
w/o compression and two benchmarks are added to compare them)
Set our memory manager’s threshold to 200 GB to force compression
of two tractable benchmarks
Intractable w/o compression
![Page 57: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/57.jpg)
Conclusion
57
Introduced fingerprints and hashing to reduce state
comparisons and set membership tests.
Parameterized transposition of the transition table ensures
cache locality of memory accesses.
Dynamic switch from global work queue to thread local
queues with work-stealing avoids contention of cache-lines at
front and back of queue.
Dynamically switch to in-memory compression of SFA-states
once they cannot fit into the main memory.
Overall speedups including fingerprint-based hashing,
parameterized transposition and parallelization without
compression are up to 312x on AMD and 193x on Intel.
Compression ratios are up to 30 on the Intel system.
![Page 58: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/58.jpg)
This research was supported by:
the Austrian Science Fund (FWF) project I
1035N23
the Next-Generation Information Computing
Development Program through the National
Research Foundation of Korea (NRF), funded by
the Ministry of Science, ICT & Future Planning
under grant NRF2015M3C4A7065522
Acknowledgments
58
![Page 59: PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC … · PARALLEL CONSTRUCTION OF SIMULTANEOUS DETERMINISTIC FINITE AUTOMATA ON SHARED- MEMORY MULTICORES Minyoung Jung1, Jinwoo](https://reader034.fdocuments.us/reader034/viewer/2022051811/601e1ec81c79876ee1093031/html5/thumbnails/59.jpg)
Thank you!
Q&A