Post on 30-Dec-2015
description
Construction of Index: (Page 197)
• Objective: Given a document, find the number of occurrences of each word in the document.
• Example: Computer Science students know computers and computer languages.
• Keywords: computer, computers, science, students, know, and, languages.
Linear time algorithm:
• Let T be the text, |T| the length of T. We can find the occurrences of each word in T in O(|T|) time.
Constructing an automaton:
onk
s c i e n c
tupmoc
l
na
egaugna
edut n
sr
e
s
w
d
s
t
e
Remarks:
• There is a final state for each word.• There is a counter on each final state storing the
number of occurrences that the final state is reached.
• While reading, the algorithm creates new states for the new word.
• For words having met before, we just go through the corresponding states.
• When the final state is read, add 1 to the counter.
Assignment one (due in week 6 on Friday, 7:30 pm)
• Write a program to convert a text into a vector such that each element of the vector is the number of occurrences of the corresponding keyword.