Searching. Motivation Find parts for a system Find an address for name Find a criminal...

24
Searching

Transcript of Searching. Motivation Find parts for a system Find an address for name Find a criminal...

Page 1: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

Searching

Page 2: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

2

Motivation

• Find parts for a system• Find an address for name• Find a criminal

– fingerprint/DNA match

• Locate all employees in a dept.• based on a collection of criteria• across multiple tables

• Find shortest path (network, roads, etc.)

Page 3: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

3

Linear Search

• Items to be searched are in a list x0, x1, … xn-1

– Need = = and < operators defined for the type

• Start with item 0– until (end of list or target found)– compare another item

• Best case, found on 1st comparison• Worst case, found on nth comparison

Page 4: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

4

Linear Search – vector/array

#include <vector>int LinearSearch (const vector int &v, const int &item){

for (i=0; i< v.size ;i++ ) if (item == v[i]) return 1;

return 0; // not found}

// # of compares for all possible cases of searching: 1+2+3+…+n = ½ * n (n+1)

// average = ½*n*(n+1)/n = (n+1)/2 ≈ n/2// average search time is O(n/2)= O(n)

Page 5: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

5

Linear Search – single-linked list

int LinearSearch (NodePointer first, const int &item){loc = first; for ( ; loc != NULL ; loc=loc->next ) { if (item == loc->data) return 1; }

return 0; // not found}

Worst case computing time is

still O(n)

Worst case computing time is

still O(n)

Page 6: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

6

Binary search

• Significantly faster than linear• Repeatedly "halving" the problem• We can divide a set of n items in half at

most log2 n times

• For performance COMPARISONS, we ignore the base for the log (2 in this case)

• Complexity of binary search is O(log n)

Page 7: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

7

Some Observations

• Binary usually outperforms linear search• Both require sequential storage• Data must be ordered (sorted)• Searching is done on a "key"

– a piece of data unique to each item– often smaller than the actual data– e.g.; your B-number vs. whole name

• A non-linear linked structure is better– there are several kinds of "tree" structures

Page 8: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

8

Big Oh - Formal Definition (again)

• f(n)=O(g(n))

• Thus, g(n) is an upper bound on f(n)• Note:

f(n) = O(g(n)) "f(n) has a complexity of g(n)"this is NOT the same as

O(g(n)) = f(n)• The '=' is not the usual "=" operator (it is not reflexive)

iff ᴲ {c, n0 | f(n) <= c g(n) for all n >= n0}

Page 9: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

9

Trees

• A data structure which consists of – a finite set of elements called nodes or vertices– a finite set of directed arcs which connect the

nodes

• If the tree is nonempty– one of the nodes (the root) has no incoming arc– all other nodes can be reached by following

unique sequences of consecutive arcs

Page 10: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

10

Trees

• Each node has n >= 0 children• Topmost node is the "root"• Binary tree: each node has 0, 1, 2 children• Engineering uses:

– Huffman encoding (data compression)– expression evaluation– sorting & searching– electric power distribution grid– go/no-go decision-making

Page 11: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

11

• Consider an ordered list of integers

1. Examine middle element

2. Examine left, right sublist (maintain pointers)

3. (Recursively) examine left, right sublists

Binary Search Tree

52 756345225 90

Page 12: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

12

• Redraw as a treelike shape – this is a binary tree

Binary Search Tree

52

63 9045

22

5

75

root

children of 75

parent of 63, 90

leaves

subtree

Page 13: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

13

Binary Search Tree (BST)

• A binary tree– Left-child <= Parent value <= Right child

• Several tree operations available– construction– test for empty– search for item– insert– delete– traverse (visit a node exactly once)

Page 14: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

14

Binary Tree terms

• Full tree (proper tree, 2-tree, strictly binary)– all nodes have exactly 0 or 2 children

• Complete tree– all levels (except maybe the last) are filled– all leaves are "pushed" left

• Balanced tree– L/R sub-trees of EVERY node differ by no

more than 1 level.

• Perfect tree – all leaves at same depth

Page 15: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

15

Implementations

• An array can be used – insertion, deletion, re-arranging VERY messy– searching, sorting inefficient– not useful for "sparse" trees (missing data)– very hard to traverse recursively

• Linked tree– nodes like those in Stacks, Queues & Lists

• pointer to left-child• pointer to right-child• data

Page 16: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

16

Recursive Descent

• A binary tree is either empty

or• Has a data-node (root) with 2 subtree ptrs

– left-tree– right-tree– the subtrees are disjoint

• Each sub-tree follows the same definition• Leads to simple recursive search programs

Page 17: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

17

Recursive Tree Traversal

void Traverse (node* ptr){if the binary tree is empty (ptr==NULL) then

return;

else // recursion here{ Process root data (ptr → data) Traverse (ptr → left);

Traverse (ptr → right);

}

Page 18: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

18

3 possible traversals

• Pre-order– data, left-sub-tree, right-sub-tree– first-touch

• In-order– left-sub-tree, data, right-sub-tree– 2nd touch

• Post-order– left-sub-tree, right-sub-tree, data– last-touch

Page 19: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

19

Traversal Order

• Given expression A – B * C + D

• Operator precedence is: ^ * / + -• This is normal infix order• Each operand is

– The child of a parent node

• Parent node, – for the corresponding operator

Page 20: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

20

The expression tree

+

D-

B

A *

C

Page 21: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

21

Remaining traversals

• Prefix + - A * B C D

• Postfix (Reverse Polish Notation – RPN)A B C * - D +

Page 22: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

Stack Applications• base-ten to base-two conversion

– remainders need to be printed in reverse order in which they are calculated

• run-time stack of function activation records– push when a function is called– pop when a function exits

• arithmetic expression evaluation– easier to evaluate when stored in postfix (RPN)

• infix to postfix conversion algorithm uses a stack • evaluating postfix is easy using a stack for operands

Page 23: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

Infix to Postfixinfix expression: (3 + 4) * 5 - 2

postfix expression: 3 4 + 5 * 2 -

A. scan input from L to RB. if operand, output itC. else // must be an operator or "(" or ")"

1. if "(" then push & loop2. if operator & prec(top)< prec(input)

a. pop & output until > = prec(top)<prec(input)

b. push the input3. if ")"

a. pop & output until "("b. remove & discard the "("c. discard the incoming ")"

4. if end of input, pop & output until empty

step in out stacktop is

on right

C2 ( (

B 3 3 (

C2b + +(

2 4 4 +(

3a,b,c ) +

2 * *

B 5 5

C2b - * -

B 2 2 -

null -

Page 24: Searching. Motivation Find parts for a system Find an address for name Find a criminal –fingerprint/DNA match Locate all employees in a dept. based on.

24

Evaluating Postfix

• use a stack to evaluate a postfix expression• read values into stack until operator reached• pop 2 values and apply operator

– be sure to maintain order of operands– a-b is not the same as b-a

• push result value onto stack• repeat until no input and stack is empty

postfix expression: 3 4 + 5 * 2 -push 3 and 4 see the +, pop 3, 4 add, push 7push 5see the * pop 7, 5 multiply, push 35push the 2, then see the – so pop 35, 2 and subtract