A * Search A* (pronounced "A star") is a best first, graph search algorithm that finds the...
-
Upload
shana-davis -
Category
Documents
-
view
217 -
download
0
Transcript of A * Search A* (pronounced "A star") is a best first, graph search algorithm that finds the...
A* Search
A* (pronounced "A star") is a best first, graph search algorithm that finds the least-cost path from a given initial node to one goal node out of one or more possible goals.
Definitions
A* uses a distance-plus-estimate heuristic function denoted by f(x) to determine the order in which the search visits nodes in the tree induced by the search. The distance-plus-estimate heuristic is a sum of two functions:
• the path-cost function denoted g(x) from the start node to the current node and
•an admissible "heuristic estimate" of the distance to the goal denoted h(x).
• an admissible h(x) must not overestimate the distance to the goal. For an application like routing, h(x) might represent the straight-line distance to the goal, since that is physically the smallest possible distance between any two points (or nodes for that matter).
An A* algorithm for Edit Distance
Edit Distance DE (X,Y) measures how close string X is to string Y.
DE(X,Y) is the cost of the minimum cost transformation t : X t Y where t is a sequence of operations (insertion, equal substitution, unequal substitution, and deletion). The cost of t is the sum of the operation costs where each operation costs 1 except for equal substitution which costs 0.
A B B A C
B A A C A
The cost of this transformation is 3 which happens to be minimal.
Dynamic programming Solution (an O(mn) solution)
Decomposition : Last Operation Delete, Substitute, or InsertAtomic Problems : X prefix or Y prefix emptyTable :
Rows for 0 .. M for X prefix characters, Columns 0 .. N for Y prefix characters
Table Entry : DE (Xi , Yj)
Composition : = cost(Substitution) = 1 if xi != yj and 0 otherwise.DE (Xi ,Yj ) = min{ DE (Xi-1 ,Yj ) + 1,
DE (Xl-1 ,Yj-1 ) + ,DE (Xi ,Yj-1 ) + 1 }
Edit Distance as a Shortest Path Problem
Define a transformation graph GXY = (V,E) as follows:
• The set V of nodes (vertices) = {0 .. M} {0 .. N} where node npq represents the state of transforming a p length
prefix of X into a q length prefix of Y.
• The set E of edges represent the operations of
• deletion , connecting node np,q to np+1,q with length 1
• substitution , connecting node np,q to np+1,q+1 with length 0 or 1 depending on whether Xp+1 = Yq+1 or not
• insertion , connecting node np,q to np,q+1 with length 1
The start and goal nodes are n0,0 and nM,N
Introduction
Edit Distance – Based on Single Character Edit Operations
Insertion : a Inserts an “a” into target without effecting the source; cost = 1
Equal Substitution : a a Substitutes an “a” into target for an “a” in source; cost = 0
Unequal Substitution : a b Substitutes a “b” into target for an “a” in source; cost = 1
Deletion : a Deletes an “a” from source without effecting the target; cost = 1
Example of a Transformation Graph
The vertices of T correspond to prefix pairs of X and Y. The edges of T are directed and correspond to the single character edit operations which would transform one prefix pair into another.
Example of a Transformation Graph•X = abbab
•Y = bbaba
DE(X,Y) = cost of shortest pathstart vertex to goal vertex = 2
A frequency based Lower Bound function h
• Let Xi be the suffix of X beginning with the ith character and Yj be similarly defined.
• If X = abbab and Y = bbaba
• X2 = bbab and Y2 = baba
• Excess(X2,Y2,a) = 0
• Def(X2,Y2,a) =1
• Excess(X2,Y2,b) = 0
• Def(X2,Y2,b) =0
• Excess(X2,Y2) is sum of excesses over alphabet and Def(X2,Y2) is sum of deficiencies.
• h( X2,Y2 ) = max{Excess(X2,Y2),Def(X2,Y2)} is a lower bound to the length of the shortest path from vertex to goal.
Classification and Strings
Applications of Edit Distance
• DNA analysis
• Classification of heart beats.
• Handwriting recognition.
• Spelling correction.
• Error correction of variable length codes.
• Speech recognition.
Discrete Directional Alphabet
Mapping EKG’s to Strings
Classification as Path Problem
• LB(Start,Goal-1) = 0
• LB(Start,Goal-2) = 3
Lower Bounds to Edit DistanceLower Bound Based on Frequency
Let fa(X) and fa(Y) be the frequencies of a in X and Y.
Define Ex(a,X,Y) = fa(X) – fa(Y) if fa(X) > fa(Y) else 0Define Def(a,X,Y) = fa(Y) – fa(X) if fa(Y) > fa(X) else 0For any a, both Ex(a,X,Y) and Def(a,X,Y) D(X,Y) Ex(a,X,Y) + Ex(b,X,Y) D(X,Y).max { a Ex(a,X,Y), a Def(a,X,Y) } D(X,Y)LB(i,j,X,Y) computed for the ith suffix of X and the jth suffix of Y is a lower bound to the remaining distance after having computed the edit distance for the ith and jth prefixes of X and Y.
Lower Bounds to Edit DistanceLower Bound Based on Frequency
• Since X has a deficiency of 1 b with Y1 as a target, 1 is a lower bound to D(X,Y1).
• Since X has a deficiency of 2 a’s with Y2 as a target and an excess of 1 b, 2 is a lower bound to D(X,Y2).
• Since X has a deficiency of 3 b’s with Y3 as a target and an excess of 2 a’s, 3 is a lower bound to D(X,Y3).
• Consequently the initial vertices of the 3 transformation graphs are organized into a priority queue as shown to the left.
A* Search for Closest Target
f = h + gKeeping track of last operation since insertion cannot be followed by deletion and vise versa
A* Search for Closest Target
• Finds distance of 1 to Y1 in 3 steps.
• Y1 must be a closest goal since bnd + dist is minimized.