Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2:...

71
Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension Carl Chapman*, Peipei Wang, Kathryn T. Stolee Sandia National Laboratories Albuquerque*, North Carolina State University [email protected], [email protected], [email protected] Nov 1st, 2017 1 / 26

Transcript of Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2:...

Page 1: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Exploring Regular Expression Comprehension

Carl Chapman*, Peipei Wang, Kathryn T. Stolee

Sandia National Laboratories Albuquerque*, North Carolina State University

[email protected], [email protected], [email protected]

Nov 1st, 2017

1 / 26

Page 2: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Why should we use regular expressions?

A succinct way to express pattern matching.

Less code and flexible.

2 / 26

Page 3: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Why should we NOT use regular expressions?

Hard to write the correct regular expression.

Complicated to understand.

Difficult to test and debug.

3 / 26

Page 4: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Example of Bad Regex

Regexˆ[\s\u200c]+|[\s\u200c]+$

4 / 26

Page 5: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Example of Bad Regex

Regexˆ[\s\u200c]+|[\s\u200c]+$

4 / 26

Page 6: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

State of the Art

Tools for visual debugging (e.g., Regex101,Regexr)Tools for graphical regular expression (e.g., Rex,Brics)Tools for automatic generation of regex andstrings(e.g., Rex, ReLIE)

5 / 26

Page 7: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Running Example

Which regular expression should we use?

A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]

Difference: How to express Double-Bounded repeti-tion of digits?

A: repetition bounds using {}B: digits can appear or not appear using ?C: explicit repetitions using OR

6 / 26

Page 8: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Running Example

Which regular expression should we use?

A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]

Difference: How to express Double-Bounded repeti-tion of digits?

A: repetition bounds using {}B: digits can appear or not appear using ?C: explicit repetitions using OR

6 / 26

Page 9: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Running Example

Which regular expression should we use?

A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]

Difference: How to express Double-Bounded repeti-tion of digits?

A: repetition bounds using {}B: digits can appear or not appear using ?C: explicit repetitions using OR

6 / 26

Page 10: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Regex Representation

Regex representation: syntactic expression

matching a digit (Custom Character Class):[0123456789], (0|1|2|3|4|5|6|7|8|9), [0-9], [\u30-\u39],

\d, . . .

matching at least one digit (Lower-Bounded):[0-9]+, [0-9][0-9]*, [0-9]{1,}, [0-9][0-9]{0,}, \d+, . . .

matching at most three digits and at least onedigit (Double-Bounded): [1-9][0-9]{0,2},

[1-9][0-9]?[0-9]?, [1-9]|[1-9][0-9]|[1-9][0-9][0-9],

[1-9]\d{0,2}, . . .

7 / 26

Page 11: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Regex Representation

Regex representation: syntactic expression

matching a digit (Custom Character Class):[0123456789], (0|1|2|3|4|5|6|7|8|9), [0-9], [\u30-\u39],

\d, . . .

matching at least one digit (Lower-Bounded):[0-9]+, [0-9][0-9]*, [0-9]{1,}, [0-9][0-9]{0,}, \d+, . . .

matching at most three digits and at least onedigit (Double-Bounded): [1-9][0-9]{0,2},

[1-9][0-9]?[0-9]?, [1-9]|[1-9][0-9]|[1-9][0-9][0-9],

[1-9]\d{0,2}, . . .

7 / 26

Page 12: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Regex Representation

Regex representation: syntactic expression

matching a digit (Custom Character Class):[0123456789], (0|1|2|3|4|5|6|7|8|9), [0-9], [\u30-\u39],

\d, . . .

matching at least one digit (Lower-Bounded):[0-9]+, [0-9][0-9]*, [0-9]{1,}, [0-9][0-9]{0,}, \d+, . . .

matching at most three digits and at least onedigit (Double-Bounded): [1-9][0-9]{0,2},

[1-9][0-9]?[0-9]?, [1-9]|[1-9][0-9]|[1-9][0-9][0-9],

[1-9]\d{0,2}, . . .

7 / 26

Page 13: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Regex Representation

Regex representation: syntactic expression

matching a digit (Custom Character Class):[0123456789], (0|1|2|3|4|5|6|7|8|9), [0-9], [\u30-\u39],

\d, . . .

matching at least one digit (Lower-Bounded):[0-9]+, [0-9][0-9]*, [0-9]{1,}, [0-9][0-9]{0,}, \d+, . . .

matching at most three digits and at least onedigit (Double-Bounded): [1-9][0-9]{0,2},

[1-9][0-9]?[0-9]?, [1-9]|[1-9][0-9]|[1-9][0-9][0-9],

[1-9]\d{0,2}, . . .

7 / 26

Page 14: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Research Goals

Explore regex comprehension

1 Which regex representations are mostunderstandable? (understandability study)

2 Which regex representations are used mostfrequently? (community study)

3 Which regex representations should we use?(desirability analysis)

8 / 26

Page 15: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Regex Comparison Prerequisite

Equivalence class: a group of behaviorallyequivalent regexes

Match the same set of character stringsDifferent regex representationsEquivalent DFAs (Deterministic FiniteAutomaton)

9 / 26

Page 16: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Regex Comparison Prerequisite

Equivalence class: a group of behaviorallyequivalent regexes

Match the same set of character strings

Different regex representationsEquivalent DFAs (Deterministic FiniteAutomaton)

9 / 26

Page 17: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Regex Comparison Prerequisite

Equivalence class: a group of behaviorallyequivalent regexes

Match the same set of character stringsDifferent regex representations

Equivalent DFAs (Deterministic FiniteAutomaton)

9 / 26

Page 18: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Regex Comparison Prerequisite

Equivalence class: a group of behaviorallyequivalent regexes

Match the same set of character stringsDifferent regex representationsEquivalent DFAs (Deterministic FiniteAutomaton)

9 / 26

Page 19: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Double-Bounded Group of Equivalence Classes

D2

[1-9][0-9]{0,2}

D1

[1-9]|[1-9][0-9]|[1-9][0-9][0-9]

D3

0-90-91-9

[1-9][0-9]?[0-9]?

DBB GROUP

10 / 26

Page 20: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Five Equivalence Classes & 18 RegexRepresentations

LWB GROUP(using the abstract `A{2,}’where A is any pattern)

AA+

L3

AAA*

L2

A{2,}

L1

CCC GROUP(using the concrete example of `[0-9a]’ and assuming an ASCII charset)

LIT GROUP(using the concrete example `\a\$>’

and assuming an ASCII charset )

\a\$>

T1

\007\036\062

T4

\x07\x24\x3E

T2

\a[$]>

T3

[0-9a]

C1

(0|1|2|3|4|5|6|7|8|9|a)([0-9]|a)(\d|a)

C5

[\da]

C4

[0123456789a]

C2

[^\x00-/:-`b-\177]

C3

DBB GROUP(using the abstract `pB{1,3}s’ where B is any pattern),

p and s are any (possibly empty) pre!x, su"x

pBB?B?s

D2

pB{1,3}s

D1

pBs|pBBs|pBBBs

D3

SNG GROUP(using the abstract `S{3}’ where S is any pattern)

SSS

S2

S{3,3}

S3

S{3}

S1

S SSB

s

s

B

Bp

s

A A

A

a

0-9

a $ >

11 / 26

Page 21: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Understandability Study

RQ1Which representations are most understandable?

180 Amazon‘s Mechanical Turk (MTurk)participants60 regular expressions26 equivalence groups (18 of two members, 8 ofthree members)41 pairs of equivalent regexes

12 / 26

Page 22: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Understandability Study

RQ1Which representations are most understandable?

180 Amazon‘s Mechanical Turk (MTurk)participants60 regular expressions26 equivalence groups (18 of two members, 8 ofthree members)41 pairs of equivalent regexes

12 / 26

Page 23: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Study Example

13 / 26

Page 24: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Comprehension Metrics

1 Matching2 Composition

14 / 26

Page 25: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Comprehension Metrics

1 Matching2 Composition

String ‘RR*’ Oracle P1 P2 P3 P41

“ARROW”

2

“qRs” 5 5 ?

3

“R0R” ? -

4

“qrs” 5 5 -

5

“98” 5 5 5 5 -Score 1.00 0.80 0.80 0.50 1.00

= match, 5= not a match, ? = unsure, – = left blank

14 / 26

Page 26: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Comprehension Metrics

1 Matching2 Composition

String ‘RR*’ Oracle P1 P2 P3 P41 “ARROW”2 “qRs”

5 5 ?

3 “R0R”

? -

4 “qrs” 5

5 -

5 “98” 5

5 5 5 -

Score 1.00

0.80 0.80 0.50 1.00

= match, 5= not a match, ? = unsure, – = left blank

14 / 26

Page 27: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Comprehension Metrics

1 Matching2 Composition

String ‘RR*’ Oracle P1 P2 P3 P41 “ARROW”2 “qRs”

5 5 ?

3 “R0R”

? -

4 “qrs” 5

5 -

5 “98” 5 5

5 5 -

Score 1.00

0.80 0.80 0.50 1.00

= match, 5= not a match, ? = unsure, – = left blank

14 / 26

Page 28: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Comprehension Metrics

1 Matching2 Composition

String ‘RR*’ Oracle P1 P2 P3 P41 “ARROW”2 “qRs”

5 5 ?

3 “R0R”

? -

4 “qrs” 5

5 -

5 “98” 5 5

5 5 -

Score 1.00 0.80

0.80 0.50 1.00

= match, 5= not a match, ? = unsure, – = left blank

14 / 26

Page 29: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Comprehension Metrics

1 Matching2 Composition

String ‘RR*’ Oracle P1 P2 P3 P41 “ARROW”2 “qRs” 5

5 ?

3 “R0R”

? -

4 “qrs” 5 5

-

5 “98” 5 5 5

5 -

Score 1.00 0.80 0.80

0.50 1.00

= match, 5= not a match, ? = unsure, – = left blank

14 / 26

Page 30: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Comprehension Metrics

1 Matching2 Composition

String ‘RR*’ Oracle P1 P2 P3 P41 “ARROW”2 “qRs” 5 5

?

3 “R0R” ?

-

4 “qrs” 5 5

-

5 “98” 5 5 5 5

-

Score 1.00 0.80 0.80

0.50 1.00

= match, 5= not a match, ? = unsure, – = left blank

14 / 26

Page 31: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Comprehension Metrics

1 Matching2 Composition

String ‘RR*’ Oracle P1 P2 P3 P41 “ARROW”2 “qRs” 5 5

?

3 “R0R” ?

-

4 “qrs” 5 5

-

5 “98” 5 5 5 5

-

Score 1.00 0.80 0.80 0.50

1.00

= match, 5= not a match, ? = unsure, – = left blank

14 / 26

Page 32: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Comprehension Metrics

1 Matching2 Composition

String ‘RR*’ Oracle P1 P2 P3 P41 “ARROW”2 “qRs” 5 5 ?3 “R0R” ? -4 “qrs” 5 5 -5 “98” 5 5 5 5 -

Score 1.00 0.80 0.80 0.50 1.00= match, 5= not a match, ? = unsure, – = left blank

14 / 26

Page 33: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Comprehension Metrics

1 Matching2 Composition

Regex Composition scoreP1 (q4fab|ab)

xyzq4fab 1

P2 (q4fab|ab)

acb 0

14 / 26

Page 34: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Comprehension Metrics

1 Matching2 Composition

Regex Composition scoreP1 (q4fab|ab) xyzq4fab

1

P2 (q4fab|ab)

acb 0

14 / 26

Page 35: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Comprehension Metrics

1 Matching2 Composition

Regex Composition scoreP1 (q4fab|ab) xyzq4fab 1P2 (q4fab|ab)

acb 0

14 / 26

Page 36: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Comprehension Metrics

1 Matching2 Composition

Regex Composition scoreP1 (q4fab|ab) xyzq4fab 1P2 (q4fab|ab) acb

0

14 / 26

Page 37: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Comprehension Metrics

1 Matching2 Composition

Regex Composition scoreP1 (q4fab|ab) xyzq4fab 1P2 (q4fab|ab) acb 0

14 / 26

Page 38: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Which representations are most understandable?

Double-Bounded Groups

((q4f)?ab)

((q4f){0,1}ab)

(q4fab|ab) (deedo(do)?)

(dee(do){1,2})

(deedo|deedodo)

15 / 26

Page 39: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Which representations are most understandable?

Double-Bounded Groups

((q4f)?ab)

((q4f){0,1}ab)

(q4fab|ab)

(deedo(do)?)

(dee(do){1,2})

(deedo|deedodo)

15 / 26

Page 40: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Which representations are most understandable?

Double-Bounded Groups

((q4f)?ab)

((q4f){0,1}ab)

(q4fab|ab) (deedo(do)?)

(dee(do){1,2})

(deedo|deedodo)

15 / 26

Page 41: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Which representations are most understandable?

Regex Match Comp((q4f){0,1}ab) 82.93 50.00((q4f)?ab) 79.25 40.00(q4fab|ab) 84.50 60.00

((q4f)?ab)

((q4f){0,1}ab)

(q4fab|ab)

Regex Match Comp(dee(do){1,2} 84.83 66.67(deedo(do)?) 77.17 60.00(deedo|deedodo) 90.00 63.33

(deedo(do)?)

(dee(do){1,2})

(deedo|deedodo)

16 / 26

Page 42: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Which representations are most understandable?

Regex Match Comp((q4f){0,1}ab) 82.93 50.00((q4f)?ab) 79.25 40.00(q4fab|ab) 84.50 60.00

((q4f)?ab)

((q4f){0,1}ab)

(q4fab|ab)

Regex Match Comp(dee(do){1,2} 84.83 66.67(deedo(do)?) 77.17 60.00(deedo|deedodo) 90.00 63.33

(deedo(do)?)

(dee(do){1,2})

(deedo|deedodo)

16 / 26

Page 43: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Which representations are most understandable?

Regex Match Comp((q4f){0,1}ab) 82.93 50.00((q4f)?ab) 79.25 40.00(q4fab|ab) 84.50 60.00

((q4f)?ab)

((q4f){0,1}ab)

(q4fab|ab)

Regex Match Comp(dee(do){1,2} 84.83 66.67(deedo(do)?) 77.17 60.00(deedo|deedodo) 90.00 63.33

(deedo(do)?)

(dee(do){1,2})

(deedo|deedodo)

16 / 26

Page 44: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Which representations are most understandable?

Regex Match Comp((q4f){0,1}ab) 82.93 50.00((q4f)?ab) 79.25 40.00(q4fab|ab) 84.50 60.00

((q4f)?ab)

((q4f){0,1}ab)

(q4fab|ab)

Regex Match Comp(dee(do){1,2} 84.83 66.67(deedo(do)?) 77.17 60.00(deedo|deedodo) 90.00 63.33

(deedo(do)?)

(dee(do){1,2})

(deedo|deedodo)

16 / 26

Page 45: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Which representations are most understandable?

Regex Match Comp((q4f){0,1}ab) 82.93 50.00((q4f)?ab) 79.25 40.00(q4fab|ab) 84.50 60.00

((q4f)?ab)

((q4f){0,1}ab)

(q4fab|ab)

Regex Match Comp(dee(do){1,2} 84.83 66.67(deedo(do)?) 77.17 60.00(deedo|deedodo) 90.00 63.33

(deedo(do)?)

(dee(do){1,2})

(deedo|deedodo)

16 / 26

Page 46: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Which representations are most understandable?

Regex Match Comp((q4f){0,1}ab) 82.93 50.00((q4f)?ab) 79.25 40.00(q4fab|ab) 84.50 60.00

((q4f)?ab)

((q4f){0,1}ab)

(q4fab|ab)

Regex Match Comp(dee(do){1,2} 84.83 66.67(deedo(do)?) 77.17 60.00(deedo|deedodo) 90.00 63.33

(deedo(do)?)

(dee(do){1,2})

(deedo|deedodo)

16 / 26

Page 47: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Topological Ordering

((q4f)?ab)

((q4f){0,1}ab)

(q4fab|ab)

D2

D1

D3

(deedo(do)?)

(dee(do){1,2})

(deedo|deedodo)

D2

D1

D3

Understandability OrderingD3 > D1 > D2

17 / 26

Page 48: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Topological Ordering

((q4f)?ab)

((q4f){0,1}ab)

(q4fab|ab)

D2

D1

D3 (deedo(do)?)

(dee(do){1,2})

(deedo|deedodo)

D2

D1

D3

Understandability OrderingD3 > D1 > D2

17 / 26

Page 49: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Topological Ordering

((q4f)?ab)

((q4f){0,1}ab)

(q4fab|ab)

D2

D1

D3

(deedo(do)?)

(dee(do){1,2})

(deedo|deedodo)

D2

D1

D3

Understandability OrderingD3 > D1 > D2

17 / 26

Page 50: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Topological Ordering

((q4f)?ab)

((q4f){0,1}ab)

(q4fab|ab)

D2

D1

D3

(deedo(do)?)

(dee(do){1,2})

(deedo|deedodo)

D2

D1

D3

Understandability OrderingD3 > D1 > D2

17 / 26

Page 51: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Community Study

RQ2Which representations have the strongest communitysupport based on frequency?

13,597 distinct regex patterns from 1,544Github Python projectsMapping regexes to representations: PCREfeature, string pattern, token stream

18 / 26

Page 52: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Community Study

RQ2Which representations have the strongest communitysupport based on frequency?

13,597 distinct regex patterns from 1,544Github Python projectsMapping regexes to representations: PCREfeature, string pattern, token stream

18 / 26

Page 53: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Frequent Representations

Rep Example nPatterns % patterns nProjects % projectsD1 ((q4f){0,1}ab) 346 2.5% 234 15.2%D2 ((q4f)?ab) 1,871 13.8% 646 41.8%D3 (q4fab|ab) 10 .1% 27 1.7%

D2

D1

D3

Community OrderingD2 > D1 > D3

19 / 26

Page 54: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Frequent Representations

Rep Example nPatterns % patterns nProjects % projectsD1 ((q4f){0,1}ab) 346 2.5% 234 15.2%D2 ((q4f)?ab) 1,871 13.8% 646 41.8%D3 (q4fab|ab) 10 .1% 27 1.7%

D2

D1

D3

Community OrderingD2 > D1 > D3

19 / 26

Page 55: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Frequent Representations

Rep Example nPatterns % patterns nProjects % projectsD1 ((q4f){0,1}ab) 346 2.5% 234 15.2%D2 ((q4f)?ab) 1,871 13.8% 646 41.8%D3 (q4fab|ab) 10 .1% 27 1.7%

D2

D1

D3

Community OrderingD2 > D1 > D3

19 / 26

Page 56: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Frequent Representations

Rep Example nPatterns % patterns nProjects % projectsD1 ((q4f){0,1}ab) 346 2.5% 234 15.2%D2 ((q4f)?ab) 1,871 13.8% 646 41.8%D3 (q4fab|ab) 10 .1% 27 1.7%

D2

D1

D3

Community OrderingD2 > D1 > D3

19 / 26

Page 57: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Desirability Analysis

RQ3Which regex representations should we use?

A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]

B

A

CD2B

D1A

D3C

Topological OrderingUnderstandability: D3 > D1 > D2

Community: D2 > D1 > D3

20 / 26

Page 58: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Desirability Analysis

RQ3Which regex representations should we use?

A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]

B

A

C

D2B

D1A

D3C

Topological OrderingUnderstandability: D3 > D1 > D2

Community: D2 > D1 > D3

20 / 26

Page 59: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Desirability Analysis

RQ3Which regex representations should we use?

A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]

B

A

C

D2B

D1A

D3C

Topological OrderingUnderstandability: D3 > D1 > D2

Community: D2 > D1 > D3

20 / 26

Page 60: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Desirability Analysis

RQ3Which regex representations should we use?

A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]

B

A

C

D2B

D1A

D3C

Topological OrderingUnderstandability: D3 > D1 > D2

Community: D2 > D1 > D3

20 / 26

Page 61: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Desirability Analysis

RQ3Which regex representations should we use?

A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]

B

A

C

D2B

D1A

D3C

Topological OrderingUnderstandability: D3 > D1 > D2

Community: D2 > D1 > D3

20 / 26

Page 62: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Desirability Analysis

RQ3Which regex representations should we use?

A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]

B

A

C

D2B

D1A

D3C

Topological OrderingUnderstandability: D3 > D1 > D2Community: D2 > D1 > D3

20 / 26

Page 63: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Desirability Analysis

RQ3Which regex representations should we use?

A = [1-9][0-9]{0,2}B = [1-9][0-9]?[0-9]?C = [1-9]|[1-9][0-9]|[1-9][0-9][0-9]

B

A

C

D2B

D1A

D3C

Topological OrderingUnderstandability: D3 > D1 > D2Community: D2 > D1 > D3

20 / 26

Page 64: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Ordering Results

Equivalence Class Understandability CommunityCustom Character Class C1 C5 C3 C4 C2 C1 C3 C2 C4 C5Double-Bounded D3 D1 D2 D2 D1 D3Lower-Bounded L3 L2 L3 L2 L1Single-Bounded S2 S1 S2 S1 S3Literal T1 T3 T2 T4 T1 T3 T2 T4

21 / 26

Page 65: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

What We Learn

1 Commonly used regexes are NOT always easierto understand!

2 Replace * with + when possible.3 Use literal character! If not possible, use hex

encoding.4 Use range feature for character sets when

possible.letters a to g: [a-g], [abcdefg], [a|b|c|d|e|f|g]

22 / 26

Page 66: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Limitations

Five types of equivalence classesPython codeRegex length is short

ab|ababthisbadchoice|thisbadchoicethisbadchoice

DFA size is small: 2 to 8. . .

23 / 26

Page 67: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Post Analysis

ANOVA analysis: which factor can impactcomprehension?

Regex representationDFA size (matching: *α = 0.05, composition: **α = 0.01)

Regex length

24 / 26

Page 68: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Opportunities for Future Work!

DFA sizeHow does DFA size impact comprehension?

More types of equivalence classesConsider multiline option, case insensitive, backrefer-ence?

Automatic identificationCould we automatically build equivalence classes?

25 / 26

Page 69: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Opportunities for Future Work!

DFA sizeHow does DFA size impact comprehension?

More types of equivalence classesConsider multiline option, case insensitive, backrefer-ence?

Automatic identificationCould we automatically build equivalence classes?

25 / 26

Page 70: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Opportunities for Future Work!

DFA sizeHow does DFA size impact comprehension?

More types of equivalence classesConsider multiline option, case insensitive, backrefer-ence?

Automatic identificationCould we automatically build equivalence classes?

25 / 26

Page 71: Exploring Regular Expression Comprehension · Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion Exploring Regular Expression Comprehension

Introduction RQ1: Understandability Study RQ2: Community Study RQ3: Desirability Analysis Conclusion

Questions?

Peipei [email protected]

North Carolina State University

26 / 26