Searching. Motivation Find parts for a system Find an address for name Find a criminal...

Post on 12-Jan-2016

216 views 0 download

Transcript of Searching. Motivation Find parts for a system Find an address for name Find a criminal...

Searching

2

Motivation

• Find parts for a system• Find an address for name• Find a criminal

– fingerprint/DNA match

• Locate all employees in a dept.• based on a collection of criteria• across multiple tables

• Find shortest path (network, roads, etc.)

3

Linear Search

• Items to be searched are in a list x0, x1, … xn-1

– Need = = and < operators defined for the type

• Start with item 0– until (end of list or target found)– compare another item

• Best case, found on 1st comparison• Worst case, found on nth comparison

4

Linear Search – vector/array

#include <vector>int LinearSearch (const vector int &v, const int &item){

for (i=0; i< v.size ;i++ ) if (item == v[i]) return 1;

return 0; // not found}

// # of compares for all possible cases of searching: 1+2+3+…+n = ½ * n (n+1)

// average = ½*n*(n+1)/n = (n+1)/2 ≈ n/2// average search time is O(n/2)= O(n)

5

Linear Search – single-linked list

int LinearSearch (NodePointer first, const int &item){loc = first; for ( ; loc != NULL ; loc=loc->next ) { if (item == loc->data) return 1; }

return 0; // not found}

Worst case computing time is

still O(n)

Worst case computing time is

still O(n)

6

Binary search

• Significantly faster than linear• Repeatedly "halving" the problem• We can divide a set of n items in half at

most log2 n times

• For performance COMPARISONS, we ignore the base for the log (2 in this case)

• Complexity of binary search is O(log n)

7

Some Observations

• Binary usually outperforms linear search• Both require sequential storage• Data must be ordered (sorted)• Searching is done on a "key"

– a piece of data unique to each item– often smaller than the actual data– e.g.; your B-number vs. whole name

• A non-linear linked structure is better– there are several kinds of "tree" structures

8

Big Oh - Formal Definition (again)

• f(n)=O(g(n))

• Thus, g(n) is an upper bound on f(n)• Note:

f(n) = O(g(n)) "f(n) has a complexity of g(n)"this is NOT the same as

O(g(n)) = f(n)• The '=' is not the usual "=" operator (it is not reflexive)

iff ᴲ {c, n0 | f(n) <= c g(n) for all n >= n0}

9

Trees

• A data structure which consists of – a finite set of elements called nodes or vertices– a finite set of directed arcs which connect the

nodes

• If the tree is nonempty– one of the nodes (the root) has no incoming arc– all other nodes can be reached by following

unique sequences of consecutive arcs

10

Trees

• Each node has n >= 0 children• Topmost node is the "root"• Binary tree: each node has 0, 1, 2 children• Engineering uses:

– Huffman encoding (data compression)– expression evaluation– sorting & searching– electric power distribution grid– go/no-go decision-making

11

• Consider an ordered list of integers

1. Examine middle element

2. Examine left, right sublist (maintain pointers)

3. (Recursively) examine left, right sublists

Binary Search Tree

52 756345225 90

12

• Redraw as a treelike shape – this is a binary tree

Binary Search Tree

52

63 9045

22

5

75

root

children of 75

parent of 63, 90

leaves

subtree

13

Binary Search Tree (BST)

• A binary tree– Left-child <= Parent value <= Right child

• Several tree operations available– construction– test for empty– search for item– insert– delete– traverse (visit a node exactly once)

14

Binary Tree terms

• Full tree (proper tree, 2-tree, strictly binary)– all nodes have exactly 0 or 2 children

• Complete tree– all levels (except maybe the last) are filled– all leaves are "pushed" left

• Balanced tree– L/R sub-trees of EVERY node differ by no

more than 1 level.

• Perfect tree – all leaves at same depth

15

Implementations

• An array can be used – insertion, deletion, re-arranging VERY messy– searching, sorting inefficient– not useful for "sparse" trees (missing data)– very hard to traverse recursively

• Linked tree– nodes like those in Stacks, Queues & Lists

• pointer to left-child• pointer to right-child• data

16

Recursive Descent

• A binary tree is either empty

or• Has a data-node (root) with 2 subtree ptrs

– left-tree– right-tree– the subtrees are disjoint

• Each sub-tree follows the same definition• Leads to simple recursive search programs

17

Recursive Tree Traversal

void Traverse (node* ptr){if the binary tree is empty (ptr==NULL) then

return;

else // recursion here{ Process root data (ptr → data) Traverse (ptr → left);

Traverse (ptr → right);

}

18

3 possible traversals

• Pre-order– data, left-sub-tree, right-sub-tree– first-touch

• In-order– left-sub-tree, data, right-sub-tree– 2nd touch

• Post-order– left-sub-tree, right-sub-tree, data– last-touch

19

Traversal Order

• Given expression A – B * C + D

• Operator precedence is: ^ * / + -• This is normal infix order• Each operand is

– The child of a parent node

• Parent node, – for the corresponding operator

20

The expression tree

+

D-

B

A *

C

21

Remaining traversals

• Prefix + - A * B C D

• Postfix (Reverse Polish Notation – RPN)A B C * - D +

Stack Applications• base-ten to base-two conversion

– remainders need to be printed in reverse order in which they are calculated

• run-time stack of function activation records– push when a function is called– pop when a function exits

• arithmetic expression evaluation– easier to evaluate when stored in postfix (RPN)

• infix to postfix conversion algorithm uses a stack • evaluating postfix is easy using a stack for operands

Infix to Postfixinfix expression: (3 + 4) * 5 - 2

postfix expression: 3 4 + 5 * 2 -

A. scan input from L to RB. if operand, output itC. else // must be an operator or "(" or ")"

1. if "(" then push & loop2. if operator & prec(top)< prec(input)

a. pop & output until > = prec(top)<prec(input)

b. push the input3. if ")"

a. pop & output until "("b. remove & discard the "("c. discard the incoming ")"

4. if end of input, pop & output until empty

step in out stacktop is

on right

C2 ( (

B 3 3 (

C2b + +(

2 4 4 +(

3a,b,c ) +

2 * *

B 5 5

C2b - * -

B 2 2 -

null -

24

Evaluating Postfix

• use a stack to evaluate a postfix expression• read values into stack until operator reached• pop 2 values and apply operator

– be sure to maintain order of operands– a-b is not the same as b-a

• push result value onto stack• repeat until no input and stack is empty

postfix expression: 3 4 + 5 * 2 -push 3 and 4 see the +, pop 3, 4 add, push 7push 5see the * pop 7, 5 multiply, push 35push the 2, then see the – so pop 35, 2 and subtract