Anthony Molinaro, OpenX, Erlang LA Meetup Slides

27
Knowing Your Options What a micro optimization exercise taught me about Ports, NIFs, and RE2 Wednesday, June 8, 2011

description

Knowing Your OptionsWhat a micro optimization exercise taught me about Ports, NIFs, and RE2From the first Erlang LA

Transcript of Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Page 1: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Knowing Your OptionsWhat a micro optimization exercise taught me about

Ports, NIFs, and RE2

Wednesday, June 8, 2011

Page 2: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Introductions

• Me (https://github.com/djnym)

• OpenX (http://openx.org/)

Wednesday, June 8, 2011

Page 3: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

The Problem

• General

• Given a list of patterns and a string determine if the string matches one of the patterns

• Specifically

• IAB Spiders and Bots check of User Agent

Wednesday, June 8, 2011

Page 4: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Current Solution• Implemented in Java

• 324 alternates in a large pattern

• each segment in pattern is basically a substring match

• there are a couple of ‘^’ and other regex pieces, not too many, but enough to want to leave this as a regex

• case insensitive match

Wednesday, June 8, 2011

Page 5: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Exampleindy\\+library|infolink|inktomi search|inktomi\\+search|internet ninja|internet\\+ninja|internetseer|inverse ip

insight|inverse\\+ip\\+insight|isilo|jakarta|jobo|justview|keynote|kilroy|larbin|libwww-perl|linkbot|linkchecker|

linklint|linkscan|linkwalker|lisa|^lwp|lydia|magus bot|magus\\+bot|mediapartners-google|mfc_tear_sample|microsoft scheduled cache content download service|microsoft url

control|microsoft\\+scheduled\\+cache\\+content\\+download\\+service|microsoft\\+url\\+control|minuteman|

miva|mj12bot|mobipocket webcompanion|mobipocket\\+webcompanion|monitor|monster|mozilla/5\\.0 \\

(compatible; msie 5\\.0\\)|Wednesday, June 8, 2011

Page 6: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 1 : re module

• Precompile the large pattern of alternates using re:compile/2

• Use re:run/3 to match

Wednesday, June 8, 2011

Page 7: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 1 : Code 1

Wednesday, June 8, 2011

Page 8: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 1 : Code 2

Wednesday, June 8, 2011

Page 9: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 1 : Code 3

Wednesday, June 8, 2011

Page 10: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 1 : Results• Poor!

1> re_test:test_all("ua.10000").Processed 10000 resulting in 100 matches and9900 nomatchesRE Alternates : 69341006 : 6934.100600 micros avgok

• about 7 ms per call (70 seconds for 10000)

• about 2x current overhead of component

Wednesday, June 8, 2011

Page 11: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 2 : perl port

• Curious about perl performance, implemented a simple program to run alternate pattern using perl, it ran really fast, so decided to turn it into a port

Wednesday, June 8, 2011

Page 12: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 2 : Code 1

Wednesday, June 8, 2011

Page 13: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 2 : Code 2

Wednesday, June 8, 2011

Page 14: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 2 : Code 3

Wednesday, June 8, 2011

Page 15: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 2 : Code 4

Wednesday, June 8, 2011

Page 16: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 2 : Code 5

Wednesday, June 8, 2011

Page 17: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 2 : Code 6

Wednesday, June 8, 2011

Page 18: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 2 : Results• Better

1> re_test:test_all("ua.10000").Processed 10000 resulting in 100 matches and9900 nomatchesPerl Server : 8151691 : 815.169100 micros avgok

• about 815 micro seconds per call (8.15 seconds for 10000)

Wednesday, June 8, 2011

Page 19: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 3 : re module again

• Wanted to sanity check my use of re module and see if separate patterns and regexes would improve performance

Wednesday, June 8, 2011

Page 20: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 3 : Code 1

Wednesday, June 8, 2011

Page 21: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 3 : Code 2

Wednesday, June 8, 2011

Page 22: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 3 : Results• Better Still?

1> re_test:test_all("ua.10000").Processed 10000 resulting in 100 matches and9900 nomatchesRE List : 7776324 : 777.632400 micros avgok

• about 777 micro seconds per call (7.77 seconds for 10000)

Wednesday, June 8, 2011

Page 23: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 4 : re2 NIF

• From the re2 website (http://code.google.com/p/re2/)

"Backtracking engines are typically full of features and convenient syntactic sugar but can be forced into taking exponential amounts of time on even small inputs. RE2 uses automata theory to guarantee that regular expression searches run in time linear in the size of the input."

• NIF available (https://github.com/tuncer/re2.git)

Wednesday, June 8, 2011

Page 24: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 4 : Code 1

Wednesday, June 8, 2011

Page 25: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

Try 4 : Results• Awesome!

1> re_test:test_all("ua.10000").Processed 10000 resulting in 100 matches and9900 nomatchesRE2 Alternates : 265289 : 26.528900 micros avgok

• about 26 micro seconds per call (265 milliseconds for 10000)

Wednesday, June 8, 2011

Page 26: Anthony Molinaro, OpenX, Erlang LA Meetup Slides

But...

• larger lists required upping the maximum memory used from 8MB to 32MB for large lists (1800+ elements)

• less regex syntax, no backreferences, no zero width look aheads

Wednesday, June 8, 2011