Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte...
-
Upload
miriam-patten -
Category
Documents
-
view
218 -
download
5
Transcript of Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte...
![Page 1: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/1.jpg)
Data-Parallel Finite-State Machines
Todd Mytkowicz, Madanlal Musuvathi, and Wolfram SchulteMicrosoft Research
![Page 2: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/2.jpg)
New method to break data dependencies
• Preserves program semantics• Does not use speculation• Generalizes to other domains, but this talk focuses on FSM
![Page 3: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/3.jpg)
FSMs contain an important class of algorithms• Unstructured text (e.g., regex matching or lexing)• Natural language processing (e.g., Speech Recognition)• Dictionary based decoding (e.g., Huffman decoding)• Text encoding / decoding (e.g., UTF8)
Want parallel versions to all these problems, particularly in the context of large amounts of data
![Page 4: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/4.jpg)
𝑆0
*x
𝑆1
//
x
𝑆2
/x
𝑆3
**
x
*
/
T / * x
state = ;foreach(input in) state = T[in][state];
Data Dependence limits ILP, SIMD, and multicore parallelism
![Page 5: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/5.jpg)
Demo UTF-8 Encoding
![Page 6: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/6.jpg)
/ * X X X * * / X X
𝑃0 𝑃1
𝑆1𝑆0 …𝑆2
T / * x
Enumeration breaks data dependences but how do we make it scale?- Overhead is proportional to # of states
Breaking data dependences with enumeration
![Page 7: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/7.jpg)
/ * X X X * * / X X
𝑃0 𝑃1
𝑆1𝑆0 …𝑆2
After 2 characters of input, FSM converges to 2 unique states- Overhead is proportional to # of unique states
Intuition: Exploit convergence in enumeration
![Page 8: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/8.jpg)
Convergence for worst case inputs
Almost all (90%) FSMs converge to <= 16 states after 10 steps on adversarial inputsHowever, many FSM take thousands of steps to converge to <= 4 states
![Page 9: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/9.jpg)
Convergence for real inputs
All FSM converge to less than 16 states after 20 steps on real input
![Page 10: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/10.jpg)
/ * X X X * * / X X
𝑃0 𝑃1
𝑆1𝑆0 …𝑆2
Why convergence happens
• FSM has structure• Many states transition to an error state on a character• FSM often transition to “homing” states after reading sequence of characters• e.g., after reading */ the FSM is very likely, though not guaranteed, to reach the
“end-of-comment” state.
![Page 11: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/11.jpg)
Contributions
• Enumeration, a method to break data dependencies
• Enumeration for FSM is gather• Gather is a common hardware primitive• Our approach should scale with faster support for gather
• Paper introduces two optimizations, both in terms of gather which exploit convergence• Reduces overhead of enumerative approach• See paper for details
![Page 12: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/12.jpg)
How do we implement enumerative FSMs with gather?
![Page 13: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/13.jpg)
/ * X X X * * / X X
𝑃0 𝑃1
𝑆1𝑆0 …𝑆2
Implementing Enumeration with Gather
T / * xT / * x
Current states are addresses used to gather from T[input]
![Page 14: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/14.jpg)
Enumeration makes FSMs embarrassingly parallel
• Some hardware has gather as a primitive• Our approach will scale with that hardware
• Some hardware lacks gather
• Paper shows how to use:• _mm_shuffle_epi8 to implement gather in x86 SIMD• ILP because gather is associative• Multicore with openmp
![Page 15: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/15.jpg)
Single-Core performanceGood performance Not so good performance
More hardware to help scaling
Hardware gather or multicore parallelism
![Page 16: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/16.jpg)
Bing Tokenization
![Page 17: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/17.jpg)
Case StudiesSNORT Regular Expressions Huffman Decoding
![Page 18: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/18.jpg)
Related Work
• Prior parallel approaches• Ladner and Fischer (1980) – Cubic in number of states• Hillis and Steele (1986) – Linear in number of states
• Bit Parallelism• Parabix – FSM to sequence of bit operations
• Speculation• Prakash and Vaswani (2010) – “Safe speculation” as programming construct
![Page 19: Data-Parallel Finite-State Machines Todd Mytkowicz, Madanlal Musuvathi, and Wolfram Schulte Microsoft Research.](https://reader030.fdocuments.us/reader030/viewer/2022032701/56649c735503460f94925a85/html5/thumbnails/19.jpg)
Conclusion
• Enumeration: A new method to break data dependencies• Not speculation and preserves semantics• Exploits redundancy, or convergence, in computation to scale• Generalizes to other domains (Dynamic Programming in PPOPP 2014)
• Enumeration for FSM is gather• Scales with new hardware implementations of gather• Paper demonstrates how to use SIMD, ILP, and Multicore on machines which
lack intrinsic support for gather