Searching algorithms

28
Searching Algorithm Trupti Agrawal 1

Transcript of Searching algorithms

Searching Algorithm

Trupti Agrawal 1

Searching Algorithms• Necessary components to search a list of data

– Array containing the list– Length of the list– Item for which you are searching

• After search completed– If item found, report “success,” return location in

array – If item not found, report “not found” or “failure”

Trupti Agrawal 2

• Suppose that you want to determine whether 27 is in the list • First compare 27 with list[0]; that is, compare 27 with 35• Because list[0] ≠ 27, you then compare 27 with list[1] • Because list[1] ≠ 27, you compare 27 with the next element in

the list• Because list[2] = 27, the search stops• This search is successful!

Searching Algorithms (Cont’d)

Figure 1: Array list with seven (07) elementsTrupti Agrawal 3

Searching Algorithms (Cont’d)Let’s now search for 10 The search starts at the first element in the list;

that is, at list[0]Proceeding as before, we see that this time the

search item, which is 10, is compared with every item in the list

Eventually, no more data is left in the list to compare with the search item; this is an unsuccessful search

Trupti Agrawal 4

Sequential Search Algorithm

public static int linSearch(int[] list, int listLength, int key) { int loc; boolean found = false;

for(int loc = 0; loc < listLength; loc++) { if(list[loc] == key) { found = true; break; } } if(found) return loc; else return -1; }

The previous could be further reduced to:

Trupti Agrawal 5

Sequential Search Algorithm (Cont’d)

public static int linSearch(int[] list, int listLength, int key) { int loc;

for(int loc = 0; loc < listLength; loc++) { if(list[loc] == key) return loc; } return -1; }

Trupti Agrawal 6

• Using a while (or a for) loop, the definition of the method seqSearch can also be written without the break statement as:

Sequential Search Algorithm (Cont’d)

public static int linSearch(int[] list, int listLength, int key) { int loc = 0; boolean found = false;

while(loc < listLength && !found) { if(list[loc] == key) found = true; else loc++ } if(found) return loc; else return -1;}

Trupti Agrawal 7

• Suppose that the first element in the array list contains the variable key, then we have performed one comparison to find the key.

• Suppose that the second element in the array list contains the variable key, then we have performed two comparisons to find the key.

• Carry on the same analysis till the key is contained in the last element of the array list. In this case, we have performed N comparisons (N is the size of the array list) to find the key.

• Finally if the key is NOT in the array list, then we would have performed N comparisons and the key is NOT found and we would return -1.

Performance of the Sequential Search

Trupti Agrawal 8

• Therefore, the best case is: 1• And, the worst case is: N• The average case is:

Performance of the Sequential Search (Cont’d)

1 + 2 + 3 + …..+ N + N

N+1

Average Number of Comparisons

Best case

Worst case and key found at the end of the array list!

Worst case and key is NOT found!

=

Number of possible casesTrupti Agrawal 9

Binary Search Algorithm

Can only be performed on a sorted list !!!

Uses divide and conquer technique to search list

Trupti Agrawal 10

Binary Search Algorithm (Cont’d)

Search item is compared with middle element of list

If search item < middle element of list, search is restricted to first half of the list

If search item > middle element of list, search second half of the list

If search item = middle element, search is complete

Trupti Agrawal 11

• Determine whether 75 is in the list

Binary Search Algorithm (Cont’d)

Figure 2: Array list with twelve (12) elements

Figure 3: Search list, list[0] … list[11]Trupti Agrawal 12

Binary Search Algorithm (Cont’d)

Figure 4: Search list, list[6] … list[11]

Trupti Agrawal 13

Binary Search Algorithm (Cont’d)public static int binarySearch(int[] list, int listLength, int key) { int first = 0, last = listLength - 1; int mid; boolean found = false;

while (first <= last && !found) { mid = (first + last) / 2; if (list[mid] == key) found = true; else if(list[mid] > key) last = mid - 1; else first = mid + 1; } if (found) return mid; else return –1;} //end binarySearch

Trupti Agrawal 14

Binary Search Algorithm (Cont’d)

Figure 5: Sorted list for binary search

key = 89

key = 34

Trupti Agrawal 15

Binary Search Algorithm (Cont’d)

key = 22

Figure 6: Sorted list for binary search

Trupti Agrawal 16

Indexed Search• Indexes: Data structures to organize records to

optimize certain kinds of retrieval operations.o Speed up searches for a subset of records, based on

values in certain (“search key”) fieldso Updates are much faster than in sorted files.

Trupti Agrawal 17

Alternatives for Data Entry k* in IndexData Entry : Records stored in index file

Given search key value k, provide for efficient retrieval of all data entries k* with value k.

In a data entry k* , alternatives include that we can store: alternative 1: Full data record with key value k, or alternative 2: <k, rid of data record with search key value k>, or alternative 3: <k, list of rids of data records with search key k>

Choice of above 3 alternative data entries is orthogonal to indexing technique used to locate data entries. Example indexing techniques: B+ trees, hash-based structures, etc.

Trupti Agrawal 18

Alternatives for Data EntriesAlternative 1: Full data record with key value k

Index structure is file organization for data records (instead of a Heap file or sorted file).

At most one index on a given collection of data records can use Alternative 1. Otherwise, data records are duplicated, leading to redundant storage and potential inconsistency.

If data records are very large, this implies size of auxiliary information in index is also large.

Trupti Agrawal 19

Alternatives for Data EntriesAlternatives 2 (<k, rid>) and 3 (<k, list-of-rids>):

Data entries typically much smaller than data records.

Comparison:Both better than Alternative 1 with large data records,

especially if search keys are small.

Alternative 3 more compact than Alternative 2, but leads to variable sized data entries even if search keys are of fixed length.

Trupti Agrawal 20

Index Classification

Clustered vs. unclustered index : If order of data records is the same as, or `close to’, order of data entries, then called clustered index.

Trupti Agrawal 21

Index Clustered vs Unclustered

Observation 1: Alternative 1 implies clustered. True ?

Observation 2: In practice, clustered also implies Alternative 1

(since sorted files are rare).Observation 3:

A file can be clustered on at most one search key.Observation 4:

Cost of retrieving data records through index varies greatly based on whether index is clustered or not !!

Trupti Agrawal 22

Index Clustered vs Unclustered

Observation 1: Alternative 1 implies clustered. True ?

Observation 2: In practice, clustered also implies Alternative 1

(since sorted files are rare).Observation 3:

A file can be clustered on at most one search key.Observation 4:

Cost of retrieving data records through index varies greatly based on whether index is clustered or not !!

Trupti Agrawal 23

Clustered vs. Unclustered Index

Index entries

Data entries

direct search for

(Index File)

(Data file)

Data Records

data entries

Data entries

Data Records

CLUSTERED UNCLUSTERED

Suppose Alternative (2) is used for data entries.

Trupti Agrawal 24

Clustered vs. Unclustered IndexUse Alternative (2) for data entriesData records are stored in Heap file.

To build clustered index, first sort the Heap file

Overflow pages may be needed for inserts. Thus, order of data recs is close to (not

identical to) sort order.Index entries

Data entries

direct search for

(Index File)

(Data file)

Data Records

data entries

Data entries

Data Records

CLUSTERED UNCLUSTERED

Trupti Agrawal 25

Summary of Index SearchMany alternative file organizations exist, each

appropriate in some situation. If selection queries are frequent, sorting the file or

building an index is important. Hash-based indexes only good for equality search. Sorted files and tree-based indexes best for range

search; also good for equality search. Files rarely kept sorted in practice; B+ tree index is

better.

Index is a collection of data entries plus a way to quickly find entries with given key values.

Trupti Agrawal 26

Summary of Index Search Data entries can be :

actual data records, <key, rid> pairs, or <key, rid-list> pairs.

Can have several indexes on a given file of data records, each with a different search key.

Indexes can be classified as clustered vs. unclustered,

Differences have important consequences for utility/performance of query processing

Trupti Agrawal 27

THANK YOU….. !!!

Trupti Agrawal 28