Post on 22-Jan-2018
Lies, Damned Lies, and Substrings
HASEEB QURESHI
SOF TWARE ENGINEER @
Let me tell you a story about a time Ruby lied to me.
A coworker and I were arguing about an algorithm.
Him
Me
It started with a classic problem:
How to generate all of the substrings of a string?
Hello
H, e, l, l, o
He, el, ll, lo
Hel, ell, llo
Hell, ello
Hello Helloi = 0 j = 3
Hello
H, e, l, l, o
He, el, ll, lo
Hel, ell, llo
Hell, ello
Hello Helloi = 1 j = 4
Each substring is defined by a unique start and end index.
def substrings(str) (0...str.length).each_with_object([]) do |i, subs| (i...str.length).each do |j| subs << str[i..j] end endend Quadratically many pairs of indices,
therefore the inner loop runs O(n2) many times.
Me: This algorithm is O(n2).
def substrings(str) (0...str.length).each_with_object([]) do |i, subs| (i...str.length).each do |j| subs << str[i..j] end endend But what about what’s inside the loop?
def substrings(str) (0...str.length).each_with_object([]) do |i, subs| (i...str.length).each do |j| subs << str[i..j] end endend
How long does it actually take to build a substring?
(We’re going to assume fixed-width [ASCII/UTF-32] strings for simplicity.)
(Also, Ruby treats strings less than 24 characters differently, but we can ignore that for large n.)
H e l l o8fe0 8fe1 8fe2 8fe3 8fe4 8fe5
Memory
e l l52a0 52a1 52a2 52a3 52a4 52a5
str
str2 =str[1..3]
Obviously, copying each substring takes linear time.
That is, linear in the length of the average substring.
O(1)? Log(n)? O(n)?H, e, l, l, o
He, el, ll, lo
Hel, ell, llo
Hell, ello
Hello
… Which is how long?
require_relative 'substrings'
def average_substring_ratio(original_string_length) str = 'a' * original_string_length substring_lengths = substrings(str).map(&:length) average_substring_length = substring_lengths.reduce(:+) .fdiv(substring_lengths.count)
average_substring_length / original_string_lengthend
(1..150).step(5).each do |count| puts "#{count}: #{average_substring_ratio(count)}"end
1: 1.06: 0.444444444444444411: 0.393939393939393916: 0.37521: 0.365079365079365126: 0.35897435897435931: 0.354838709677419436: 0.3518518518518518641: 0.3495934959349593646: 0.3478260869565217351: 0.3464052287581756: 0.3452380952380952361: 0.344262295081967266: 0.343434343434343471: 0.3427230046948357
76: 0.3421052631578947581: 0.3415637860082304586: 0.3410852713178294591: 0.3406593406593406796: 0.34027777777777773101: 0.33993399339933994106: 0.33962264150943394111: 0.3393393393393393116: 0.339080459770115121: 0.33884297520661155126: 0.3386243386243386131: 0.3384223918575064136: 0.3382352941176471141: 0.3380614657210402146: 0.33789954337899547
(You can also prove
this mathematically.)
Limn→∞=⅓n
H, e, l, l, o
He, el, ll, lo
Hel, ell, llo
Hell, ello
Hello
So the average substring grows linearly with the original string.
def substrings(str) (0...str.length).each_with_object([]) do |i, subs| (i...str.length).each do |j| subs << str[i..j] end endend Thus, this copy is O(n)
def substrings(str) (0...str.length).each_with_object([]) do |i, subs| (i...str.length).each do |j| subs << str[i..j] end endend
So this whole thing takes O(n3) time.Colleague:
Not so fast. (or slow.)
Enter COW(copy-on-write)
Copy-on-write is a kind of structural sharing.
H e l l o8fe0 8fe1 8fe2 8fe3 8fe4 8fe5
Memory
str
str2 = str[1..3]
str_ptr: 8fe1length: 3
Here’s the proof.
require_relative 'display_string' # credit to Pat Shaughnessy
debug = Debug.new
str = ('a'..'z').to_a.joinstr2 = str.dup
debug.display_string(str) # DEBUG: RString = 0x7f98fb05b090 # DEBUG: ptr = 0x7f98fc0aa970 -> "abcdefghijklmnopqrstuvwxyz" # DEBUG: len = 26
debug.display_string(str2) # DEBUG: RString = 0x7f98fb05afa0 # DEBUG: ptr = 0x7f98fc0aa970 -> "abcdefghijklmnopqrstuvwxyz" # DEBUG: len = 26
Pointer to same string in memory!
require_relative 'display_string' # credit to Pat Shaughnessy
debug = Debug.new
str = ('a'..'z').to_a.joinstr2 = str[1..-1]
debug.display_string(str) # DEBUG: RString = 0x7f98fb05b090 # DEBUG: ptr = 0x7f98fc0aa970 -> "abcdefghijklmnopqrstuvwxyz" # DEBUG: len = 26
debug.display_string(str2) # DEBUG: RString = 0x7f98fb05afa0 # DEBUG: ptr = 0x7f98fc0aa971 -> "bcdefghijklmnopqrstuvwxyz" # DEBUG: len = 25
Still the same string, but now offset by 1.
What happens if either string gets mutated?
require_relative 'display_string' # credit to Pat Shaughnessy
debug = Debug.new
str = ('a'..'z').to_a.joinstr2 = str[1..-1]str[1] = '&'
debug.display_string(str) # DEBUG: RString = 0x7fa2a304fbf8 # DEBUG: ptr = 0x7fa2a2f1f170 -> "a&cdefghijklmnopqrstuvwxyz" # DEBUG: len = 26
debug.display_string(str2) # DEBUG: RString = 0x7fa2a304fae0 # DEBUG: ptr = 0x7fa2a2f50b11 -> "bcdefghijklmnopqrstuvwxyz" # DEBUG: len = 25
The write forced a copy to a new string in memory.
H e l l o8fe0 8fe1 8fe2 8fe3 8fe4 8fe5
Memory
str
str2 = str[1..3]
str_ptr: 8fe1length: 3
callbacks: [str2]
str[1] = '&'
H e l l o8fe0 8fe1 8fe2 8fe3 8fe4 8fe5
Memory
str
str2 = str[1..3]
callbacks: [str2]
e l l52a0 52a1 52a2 52a3 52a4 52a5
H & l l o8fe0 8fe1 8fe2 8fe3 8fe4 8fe5
Memory
str
str2 = str[1..3] e l l
52a0 52a1 52a2 52a3 52a4 52a5
So…
def substrings(str) (0...str.length).each_with_object([]) do |i, subs| (i...str.length).each do |j| subs << str[i..j] end endend
This is a shallow copy, which is actually O(1).
def substrings(str) (0...str.length).each_with_object([]) do |i, subs| (i...str.length).each do |j| subs << str[i..j] end endend
And this whole thing takes O(n2) time.
Case closed.
require_relative 'substrings'require 'benchmark'
str = 'abcdefgh' * 128str2 = str * 2
benchmarks = Benchmark.bmbm do |bm| bm.report(str.length) do substrings(str) end
bm.report(str2.length) do substrings(str2) endend
puts 'Growth: ' + benchmarks[1].real / benchmarks[0].real
Rehearsal ----------------------------------------1024 0.290000 0.070000 0.360000 ( 0.357953)2048 2.360000 0.500000 2.860000 ( 2.876344)------------------------------- total: 3.220000sec
user system total real1024 0.270000 0.070000 0.340000 ( 0.338351)2048 2.200000 0.400000 2.600000 ( 2.601713)
Growth: 7.689380300623611
When the input doubles, the time grows by a factor of 8.
This algorithm is not quadratic.
( 0.338351)( 2.601713)
wat
require 'benchmark'NUM_TIMES = 100_000
str = 'abcde' * 2 ** 10str2 = str * 2
Benchmark.bmbm do |bm| bm.report(str.length) do NUM_TIMES.times { str[1..-1] } end
bm.report(str2.length) do NUM_TIMES.times { str2[1..-1] } endend
Rehearsal -----------------------------------------...---------------------------------------------------
user system total real5120 0.020000 0.000000 0.020000 ( 0.021144)10240 0.020000 0.000000 0.020000 ( 0.020291)
That sure looks like copy-on-write optimization…
require 'benchmark'NUM_TIMES = 100_000
str = 'abcde' * 2 ** 10str2 = str * 2
Benchmark.bmbm do |bm| bm.report(str.length) do NUM_TIMES.times { str[1..-2] } end
bm.report(str2.length) do NUM_TIMES.times { str2[1..-2] } endend
Rehearsal -----------------------------------------...---------------------------------------------------
user system total real5120 0.110000 0.060000 0.170000 ( 0.171367)10240 0.200000 0.140000 0.340000 ( 0.347153)
Only substrings that include the last character are copy-on-write.
So turns out:
the vast majority of substrings don’t include the last character.H, e, l, l, o
He, el, ll, lo
Hel, ell, llo
Hell, ello
Hello
And, of course,
def substrings(str) (0...str.length).each_with_object([]) do |i, subs| (i...str.length).each do |j| subs << str[i..j] end endend So this on average is linear.
def substrings(str) (0...str.length).each_with_object([]) do |i, subs| (i...str.length).each do |j| subs << str[i..j] end endend
And this whole thing is O(n3).
It was all a lie.
WHY HAVE YOU BETRAYED ME
RUBY
Naturally…
Hmm.
¯\_( )_/¯
ᕕ( ᐛ )ᕗ
Let’s…
… recompile Ruby…?
maml004775hquresh:ruby haseeb_qureshi$ make installCC = clangLD = ldLDSHARED = clang -dynamic -bundleCFLAGS = -O3 -fno-fast-math -ggdb3 -Wall -Wextra -Wno-unused-parameter -Wno-parentheses -Wno-
long-long -Wno-missing-field-initializers -Wno-tautological-compare -Wno-parentheses-equality -Wno-constant-logical-operand -Wno-self-assign -Wunused-variable -Werror=implicit-int -Werror=pointer-arith -Werror=write-strings -Werror=declaration-after-statement -Werror=shorten-64-to-32 -Werror=implicit-function-declaration -Werror=division-by-zero -Werror=deprecated-declarations -Werror=extra-tokens -pipe
XCFLAGS = -D_FORTIFY_SOURCE=2 -fstack-protector -fno-strict-overflow -fvisibility=hidden -DRUBY_EXPORT -fPIE
CPPFLAGS = -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -D_DARWIN_UNLIMITED_SELECT -D_REENTRANT -I. -I.ext/include/x86_64-darwin15 -I./include -I. -I./enc/unicode/9.0.0
DLDFLAGS = -Wl,-undefined,dynamic_lookup -Wl,-multiply_defined,suppress -fstack-protector -Wl,-u,_objc_msgSend -Wl,-pie -framework CoreFoundation
SOLIBS =Apple LLVM version 7.3.0 (clang-703.0.31)Target: x86_64-apple-darwin15.4.0Thread model: posix
I now have a custom version of in my usr/local/bin
ml004775hquresh:bin haseeb_qureshi$ ls -l...-rwxr-xr-x 1 haseeb_qureshi admin 3.1M Oct 23 00:37 ruby...
ml004775hquresh:bin haseeb_qureshi$ ./ruby -vruby 2.4.0dev (2016-10-23 trunk 56478) [x86_64-darwin15]
require 'benchmark'NUM_TIMES = 100_000
str = 'abcde' * 2 ** 10str2 = str * 2
Benchmark.bmbm do |bm| bm.report(str.length) do NUM_TIMES.times { str[1..-2] } end
bm.report(str2.length) do NUM_TIMES.times { str2[1..-2] } endend
Let’s run this benchmark again…
Rehearsal -----------------------------------------...--------------------------------------------------- user system total real5120 0.020000 0.000000 0.020000 ( 0.020432)10240 0.020000 0.000000 0.020000 ( 0.020300)
ml004775hquresh:bin haseeb_qureshi$ ./ruby ~/Projects/substrings/benchmark3.rb
Boom.
Ruby is now doing copy-on-write optimization on all strings!
def substrings(str) (0...str.length).each_with_object([]) do |i, subs| (i...str.length).each do |j| subs << str[i..j] end endend And this bad boy, finally,
takes O(n2) time.
(Applause break)
But you have to wonder…
why was that the default behavior?
H e l l o8fe0 8fe1 8fe2 8fe3 8fe4 8fe5
str \0
In C, strings should end with a null-terminator or null byte.
This is how C knows it’s reached the end of a string.
Null terminator
H e l l o8fe0 8fe1 8fe2 8fe3 8fe4 8fe5
str \0
If you passed a substring which did not include a NUL into a library written in C, it might keep reading bytes until it found
the NUL.
Null terminator
str2 = str[1..3]
Essentially, it ensures any C extensions treat all Ruby
strings correctly.
So that’s it.
We’re finally done.
We have an O(n2) algorithm for substrings.
Except one thing…
Remember where we started?
We need to generate all the substrings.
Did we actually… generate them?
def substrings(str) (0...str.length).each_with_object([]) do |i, subs| (i...str.length).each do |j| subs << str[i..j] end endend
puts substrings("Hello")It takes linear time to print a substring, so printing all
the substrings will still take O(n3) time.
So in what sense is this O(n2)?
If you think about it, the whole idea of copy-on-write is laziness.
What we’ve created are lazy strings.
H, e, l, l, o
He, el, ll, lo
Hel, ell, llo
Hell, ello
Hello
Instead of making these:
str[0..-1]
We made these:
str[0..3], str[1..4]
str[0..2], str[1..3], str[2..4]
str[0..1], str[1..2], str[2..3], str[3..4]
str[0..0], str[1..1], str[2..2], str[3..3], str[4..4]
All we’ve really done is build each pair of indices.
The Ruby array that substrings(str) returns does not actually contain the
substrings.
It’s just a clever, lazy way to express them.
It’s lies all the way down.
Thanks for listening.
You can follow me at @hosseeb
Special thanks to Ned Ruggeri, David Runger, and Pat Shaughnessy.
You can find the code on Github: Haseeb-Qureshi