Einführung in die Programmierung Introduction to Programming Prof. Dr. Bertrand Meyer Chair of...
-
Upload
terence-sutton -
Category
Documents
-
view
212 -
download
0
Transcript of Einführung in die Programmierung Introduction to Programming Prof. Dr. Bertrand Meyer Chair of...
Einführung in die ProgrammierungIntroduction to Programming
Prof. Dr. Bertrand Meyer
Chair of Software Engineering
Complement to lecture 11 :Levenshtein distance algorithm
2
Levenshtein distance
Also called “Edit distance”
Purpose: to compute the smallest set of basic operations
Insertion Deletion Replacement
that will turn one string into another
Intro. to Programming, lecture 11 (complement): Levenshtein
3Intro. to Programming, lecture 11 (complement): Levenshtein
Levenshtein distance
MI C H A E L J A C KS O N
E N D S HOperation
S D S S S D D D D I
“Michael Jackson” to “Mendelssohn”
Distance 1 2 3 4 5 6 7 8 9 100
I H A
4
Levenshtein distance algorithmlevenshtein (source, target : STRING): INTEGER
-- Minimum number of operations to turn source into target
localdistance : ARRAY_2 [INTEGER]i, j, del, ins, subst : INTEGER
docreate distance.make (source.count, target.count)from i := 0 until i > source.count loop
distance [i, 0] := i ; i := i + 1end
from j := 0 until j > target.count loopdistance [0, j ] := j ; j := j + 1
end-- (Continued)
Indexed from zero
Intro. to Programming, lecture 11 (complement): Levenshtein
5
Levenshtein, continued
from i := 1 until i > source.count loop from j := 1 until j > target.count invariant
loop if source [i ] = target [ j ] then distance [i, j ] := distance [ i -1, j -1]
else
deletion := distance [i -1, j ]insertion := distance [i , j - 1]substitution := distance [i - 1, j - 1]
distance [i, j ] := minimum (deletion, insertion, substitution) + 1
endj := j + 1
end i := i + 1 end
Result := distance (source.count, target.count)end Intro. to Programming, lecture 11 (complement): Levenshtein
-- For all p : 0 .. i, q : 0 .. j –1, we can turn source [1 .. p ]-- into target [1 .. q ] in distance [p, q ] operations
s [m .. n ]: substring of s with items at positions k such that m k n (empty if m > n)
6
B E A T L E S
B
E
E
T
H
30 1 2 5 6 74
0
1
2
3
5
4
30 1 2 5 6 74
1
2
3
5
4
0I
2 3I I
4 5 6I I I
I
InsertKeep
K
K
DDelete Substitute
S
1
D
1
1
0K
1 2I I
3 4 5I I I
D2 1
S
?2I
S
3I 3 4I
D3
D2
D2 1
K2
I3
I4
S
D4
D3 3
S D2 2
S
3I 4S
K S K
D SI
SI
I I I I I I I
D
D
D
D
D
7
B E A T L E S
B
E
E
T
H
30 1 2 5 6 74
1
2
3
5
4
2 3 4 5 6
InsertKeep Delete Substitute
1
1 0 1 2 3 4 5
2 2 3 4
3 2 2 2 3 4
4 3 3 2 2 3 4
0 1
1 3
1
Keep B,1
Keep E,2
Subst EA,3
Keep T,4
Ins L,5 Ins E,6 SubstHS,7