Methods and Algorithms for the Minimization of the Energy ...
DFA minimization algorithms in map reduce
-
Upload
iraj-hedayati -
Category
Data & Analytics
-
view
263 -
download
0
Transcript of DFA minimization algorithms in map reduce
DFA Minimization Algorithms in Map-
ReduceIraj Hedayati Somarin
Master Thesis Defense – January 2016
Computer Science and Software EngineeringFaculty of Engineering and Computer Science
Concordia University
Supervisor: Gösta K. GrahneExaminer: Brigitte JaumardExaminer: Hovhannes A. HarutyunyanChair: Rajagopalan Jayakumar
2
Outline• Introduction• DFA Minimization in Map-Reduce• Cost Analysis• Experimental Results• Conclusion
3
INTRODUCTIONAn introduction about the problem and related works done
so far
4
DFA, Big-Data and our Motivation• Finite Automata• Deterministic Finite Automata• DFA Minimization is the process of:
• Removing unreachable states• Merging non-distinguishable states
• What is Big-Data? (e.g. peta equal to 250 or 1015)• Insufficient study of DFA minimization for data-intensive
applications and parallel environments
𝐴=⟨𝑄 , Σ , 𝛿 , 𝑠 ,𝐹 ⟩
5
DFA Minimization Methods(Watson, 1993)
Equivalence of States
()
Equivalence Relation
Bottom-Up Top-Down
Layer-wise Unordered State Pairs
Point-Wise
BrzozowskiDenote as a partition on , then:
6
Moore’s Algorithm (Moore, 1956)• Input is DFA where and • Initialize partition over where:
• Iteratively refine the partition using equivalence relation in iteration (
• The initial partition is • Complexity
7
Hopcroft’s Algorithm (Hopcroft, 1971)
• The idea is avoiding some unnecessary operations • Input is DFA where and • Initialize partition over where:
• Keep list of splitters• Iteratively divide partitions using splitter
where and
• Update the list of splitters• Complexity= ; Number of Iterations =
8
Hopcroft’s Algorithm (Example)𝑃 𝐵
𝑄𝑈𝐸={ ⟨𝑃 ,𝑎 ⟩ , ⟨𝑃1 ,𝑎 ⟩ , ⟨𝑃2 ,𝑎 ⟩ }
𝑃1
𝑃2
𝑄𝑈𝐸=𝑄𝑈𝐸∪⟨ 𝐵1 ,𝑎⟩
𝑃 𝐵1 𝐵2
𝑃1
𝑃2
9
Map-Reduce Model
DFSData 1
Data 2
Data 3
Data 4
Mapping
Mapper 1
Mapper 2
Reduce
Reducer 1
Reducer 2
Reducer 3
DFS
Data 1
Data 2
Data 3
Original Data Mapped Data
10
Related Works in Parallel DFA Minimization
1) Employing EREW-PRAM model (Moore’s method) (Ravikumar and Xiong 1996)
2) Employing CRCW-PRAM model (Moore’s method) ()(Tewari et al. 2002)
3) Employing Map-Reduce model (Moore’s method) [Moore-MR] (Harrafi 2015)
• Challenge is how to store block numbers:1) Parallel in-block sorting and rename blocks in serial2) Parallel Perfect Hashing Function and partial sum3) No action is taken
11
Cost Model• Communication Complexity (Yao 1979 & Kushilevitz 1997)• The Lower Bound Recipe for Replication Rate (Afrati et al. 2013)• Computational Complexity of Map-Reduce (Turan 2015)
12
Cost Model – Communication Complexity
• Yao’s two-party model
Bob𝑦∈ {0,1 }𝑛
Alice𝑥∈ {0,1 }𝑛
𝑓 : {0,1 }𝑛× {0,1 }𝑛→ {0,1}
How much communication isrequired?
Upper Bound (Worst Case):
Rec 6Rec 4
Rec 1
Rec 2
Rec 5
Rec 3
𝐴⊂ {0,1 }𝑛
𝐵⊂ {0,1 }𝑛
Lower Bound:
where is the number of rectangles
Fooling set is a well-known method for finding f-monochromatic rectangles
13
Cost Model – Lower Bound Recipe(Afrati et al. 2013)
Reducer 1
Reducer 2
Reducer n
Reducer Capacity =
Input =
𝜌1
𝜌2
𝜌𝑛
Output = O
𝑔 (𝜌1)
𝑔 (𝜌¿¿ 2)¿
𝑔 (𝜌¿¿𝑛)¿
ℛ=∑𝑖=1
𝑛
𝜌𝑖
¿ 𝐼∨¿¿
14
Cost Model – Computational Complexity(Turan 2015)
• Lets denote a Turing machine where:• indicates whether it is a mapping task () or a reducer task ()• indicates the round number• indicates the input size• indicates the reducer size
• there is an -space and -time Turing machine and
15
DFA MINIMIZATION IN MAP-REDUCE
Proposed algorithms for minimizing a DFA in Map-Reduce model
16
Enhancement to Moore-MR• Moore-MR (Harrafi 2015):• Input • Pre-Processing: generate with records from • Mapping Schema: map every transition record of based on if and based on
and if
• Reducer Task: Compute new block number using Moore method• Note that, in order to accomplish reducer task in reducer , it requires for every
state it has a transition to. Transitions with are responsible to carry these data• Challenge is new block numbers are concatenation of other block numbers.
After round , the size of each is equal to .
17
Enhancement to Moore-MRPPHF-MR
• Having and where , then is a one-to-one function
• Mapping: map every record to • Reducer Task: assign new block
number from range where is reducer number
Moore-MR-PPHF is obtained by applying PPHF-MR after each iteration of Moore-MR
18
Hopcroft-MRPre-
Processing
PreProcessing
Mapper Reducer
Iterate Until QUE is not empty
PartitionDetect
Mapper Reducer
BlockUpdate
Mapper Reducer
PPHF-MR
Mapper Reducer
Construct
Minimal DFA
h (𝑞) h (𝑞) h (𝑝) h (𝜋𝑝)
Transition:
Δ blocks[a,Bi]
Block tuple:
, blocks[a,Bi]
Update tuple:
new, blocks[a,Bi],new , blocks[a,Bi]
19
Hopcroft-MR vs. Hopcroft-MR-PAR
• In Hopcroft-MR we pick one splitter at a time while in Hopcroft-MR-PAR we pick all the splitters from QUE
• In Hopcroft-MR,
• In Hopcroft-MR-PAR, A
• Where A is bit vector
20
COST ANALYSISAnalyzing cost measures for the proposed algorithms as well
as finding lower bound and upper bound on each
21
Communication Cost Bounds• Upper-Bound for DFA minimization problem in parallel
environments
where and • Lower-Bound on DFA minimization problem in parallel
environments
22
Lower Bound on Replication Rate
• : For every input record (transition) a reducer produces exactly one record of output. Hence
• The output is exactly equal to input size containing updated transitions. Hence, .
23
Moore-MR-PPHF
• where is number of Map-Reduce rounds
24
Hopcroft-MR
25
Hopcroft-MR-PAR
26
Comparison of Complexity Measures
Replication Rate
Communication Cost
Sensitive to Skewness
Lower Bound 1 -Moore-MR (Harrafi 2015)
No
Moore-MR-PPHF NoHopcroft-MR YesHopcroft-MR-PAR Yes
27
EXPERIMENTAL RESULTS
Plotting the results gathered from running proposed algorithms on different data sets
28
Data Generator - CircularInput DFA Minimized DFA
29
Data Generator – Duplicated RandomInput DFA Minimized DFA
30
Data Generator – Linear
31
Moore-MR vs. Moore-MR-PPHF
32
Circular DFA
33
Replicated Random DFA
34
Number of Rounds
35
CONCLUSIONConcluding work done in this thesis and suggesting future
works and further questions
36
Conclusion• In this work we studied DFA minimization algorithms in Map-Reduce and
PRAM• Proposed an enhancement to a DFA minimization algorithm in Map-
Reduce by introducing PPHF in Map-Reduce• Proposed a new algorithm in Map-Reduce based on Hopcroft’s method• Found lower bound on Replication Rate in Map-Reduce and
Communication Cost in parallel environment for DFA minimization problem
• Studied different measures of Map-Reduce algorithms• Found that two critical measures are missing: Sensitivity to Skewness
and Horizontal growth of data
37
Future Works• Reducer Capacity vs. Number of Rounds trade-off• Investigating other methods of minimization • Extending complexity model and class• Is it possible to compare Map-Reduce algorithms with others in
different models (PRAM, serial, and etc.)?
38
Thank you
Questions & Answer