Einführung in die Programmierung Introduction to Programming Prof. Dr. Bertrand Meyer Chair of...

7
Einführung in die Programmierung Introduction to Programming Prof. Dr. Bertrand Meyer Chair of Software Engineering Complement to lecture 11 : Levenshtein distance algorithm

Transcript of Einführung in die Programmierung Introduction to Programming Prof. Dr. Bertrand Meyer Chair of...

Page 1: Einführung in die Programmierung Introduction to Programming Prof. Dr. Bertrand Meyer Chair of Software Engineering Complement to lecture 11 : Levenshtein.

Einführung in die ProgrammierungIntroduction to Programming

Prof. Dr. Bertrand Meyer

Chair of Software Engineering

Complement to lecture 11 :Levenshtein distance algorithm

Page 2: Einführung in die Programmierung Introduction to Programming Prof. Dr. Bertrand Meyer Chair of Software Engineering Complement to lecture 11 : Levenshtein.

2

Levenshtein distance

Also called “Edit distance”

Purpose: to compute the smallest set of basic operations

Insertion Deletion Replacement

that will turn one string into another

Intro. to Programming, lecture 11 (complement): Levenshtein

Page 3: Einführung in die Programmierung Introduction to Programming Prof. Dr. Bertrand Meyer Chair of Software Engineering Complement to lecture 11 : Levenshtein.

3Intro. to Programming, lecture 11 (complement): Levenshtein

Levenshtein distance

MI C H A E L J A C KS O N

E N D S HOperation

S D S S S D D D D I

“Michael Jackson” to “Mendelssohn”

Distance 1 2 3 4 5 6 7 8 9 100

I H A

Page 4: Einführung in die Programmierung Introduction to Programming Prof. Dr. Bertrand Meyer Chair of Software Engineering Complement to lecture 11 : Levenshtein.

4

Levenshtein distance algorithmlevenshtein (source, target : STRING): INTEGER

-- Minimum number of operations to turn source into target

localdistance : ARRAY_2 [INTEGER]i, j, del, ins, subst : INTEGER

docreate distance.make (source.count, target.count)from i := 0 until i > source.count loop

distance [i, 0] := i ; i := i + 1end

from j := 0 until j > target.count loopdistance [0, j ] := j ; j := j + 1

end-- (Continued)

Indexed from zero

Intro. to Programming, lecture 11 (complement): Levenshtein

Page 5: Einführung in die Programmierung Introduction to Programming Prof. Dr. Bertrand Meyer Chair of Software Engineering Complement to lecture 11 : Levenshtein.

5

Levenshtein, continued

from i := 1 until i > source.count loop from j := 1 until j > target.count invariant

loop if source [i ] = target [ j ] then distance [i, j ] := distance [ i -1, j -1]

else

deletion := distance [i -1, j ]insertion := distance [i , j - 1]substitution := distance [i - 1, j - 1]

distance [i, j ] := minimum (deletion, insertion, substitution) + 1

endj := j + 1

end i := i + 1 end

Result := distance (source.count, target.count)end Intro. to Programming, lecture 11 (complement): Levenshtein

-- For all p : 0 .. i, q : 0 .. j –1, we can turn source [1 .. p ]-- into target [1 .. q ] in distance [p, q ] operations

s [m .. n ]: substring of s with items at positions k such that m k n (empty if m > n)

Page 6: Einführung in die Programmierung Introduction to Programming Prof. Dr. Bertrand Meyer Chair of Software Engineering Complement to lecture 11 : Levenshtein.

6

B E A T L E S

B

E

E

T

H

30 1 2 5 6 74

0

1

2

3

5

4

30 1 2 5 6 74

1

2

3

5

4

0I

2 3I I

4 5 6I I I

I

InsertKeep

K

K

DDelete Substitute

S

1

D

1

1

0K

1 2I I

3 4 5I I I

D2 1

S

?2I

S

3I 3 4I

D3

D2

D2 1

K2

I3

I4

S

D4

D3 3

S D2 2

S

3I 4S

K S K

D SI

SI

I I I I I I I

D

D

D

D

D

Page 7: Einführung in die Programmierung Introduction to Programming Prof. Dr. Bertrand Meyer Chair of Software Engineering Complement to lecture 11 : Levenshtein.

7

B E A T L E S

B

E

E

T

H

30 1 2 5 6 74

1

2

3

5

4

2 3 4 5 6

InsertKeep Delete Substitute

1

1 0 1 2 3 4 5

2 2 3 4

3 2 2 2 3 4

4 3 3 2 2 3 4

0 1

1 3

1

Keep B,1

Keep E,2

Subst EA,3

Keep T,4

Ins L,5 Ins E,6 SubstHS,7