Natural language database

68
PROJECT REPORT ON Natural Language Database Interface By Ryan Sequeira Abhijeet Badale Roshan Tirkey Swapnil Waghmode DEPARTMENT OF INFORMATION TECHNOLOGY IMPERIAL COLLEGE OF ENGINEERING and RESEARCH WAGHOLI, PUNE

description

its a report of NLDBI (natural language database interface) containing design, implementation and testing.

Transcript of Natural language database

Page 1: Natural language database

PROJECT REPORT

ON

Natural Language Database Interface

By

Ryan Sequeira

Abhijeet Badale

Roshan Tirkey

Swapnil Waghmode

DEPARTMENT OF INFORMATION TECHNOLOGY

IMPERIAL COLLEGE OF ENGINEERING and

RESEARCH

WAGHOLI, PUNE

Page 2: Natural language database

PROJECT REPORT

ON

Natural Language Database Interface

By

Ryan Sequeira

Abhijeet Badale

Roshan Tirkey

Swapnil Waghmode

DEPARTMENT OF INFORMATION TECHNOLOGY

IMPERIAL COLLEGE OF ENGINEERING and

RESEARCH

WAGHOLI, PUNE

Page 3: Natural language database

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Literature Survey 4

2.1 Advantages of NLDBI system . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Disadvantages of NLDBI System . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Some Already Developed NLDBI Systems . . . . . . . . . . . . . . . . . 6

2.4 Commercially Available Systems . . . . . . . . . . . . . . . . . . . . . . . 8

2.5 Various Approaches Used for Development of NLDBI Systems . . . . . . 9

2.5.1 Symbolic Approach (Rule Based Approach) . . . . . . . . . . . . 9

2.5.2 Empirical Approach (Corpus Based Approach) . . . . . . . . . . . 9

2.5.3 Connectionist Approach (Using Neural Network) . . . . . . . . . 10

3 Proposed Work 11

3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.3 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.5 Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Page 4: Natural language database

4 Project Design 14

4.1 SRS - Software Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.2 Hardware Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.3 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.4 UML Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.4.1 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.4.2 Sequence Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.4.3 Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.5 High Level Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.5.1 Module Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.5.2 Database Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.6 Screen-shots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Test Design & Report 23

5.1 Test Design for NLDBI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.2 Test Report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.2.1 Input/Output testing . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.2.2 Suggestion list testing . . . . . . . . . . . . . . . . . . . . . . . . 35

6 Implementation and Coding 39

6.1 List of questions answered by the system . . . . . . . . . . . . . . . . . . 39

6.1.1 Who questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.1.2 What questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.1.3 Which questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

6.1.4 When question . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6.2 Parse Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6.3 Context Free Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.3.1 LEX specification: . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.3.2 YACC Specification: . . . . . . . . . . . . . . . . . . . . . . . . . 57

7 Schedule of work 60

8 Conclusion and Future Scope 62

8.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

8.2 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Bibliography 62

Page 5: Natural language database

List of Figures

1.1 Existing NLDBI system with old user interface development techniques . 2

4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.3 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.4 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.5 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.6 NLDBI screen-shot interfaced with movie database and designed to answer

questions related to movies . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.1 System behaviour for wrong question inputs . . . . . . . . . . . . . . . . 25

5.2 System behaviour for irrelevant inputs . . . . . . . . . . . . . . . . . . . 27

5.3 System behaviour for correct questions . . . . . . . . . . . . . . . . . . . 29

5.4 System behaviour for no input or null input . . . . . . . . . . . . . . . . 31

5.5 System behaviour if data not present in database . . . . . . . . . . . . . 33

5.6 System’s suggestion list behaviour for no keyword inputs . . . . . . . . . 35

5.7 System’s suggestion list behaviour for relevant keyword inputs . . . . . . 36

5.8 System’s suggestion list behaviour for irrelevant inputs . . . . . . . . . . 37

6.1 Who questions class one . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

6.2 Who questions class two . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

6.3 Who questions class three . . . . . . . . . . . . . . . . . . . . . . . . . . 50

6.4 What questions class one . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.5 What questions class two . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

6.6 What questions class three . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.7 What questions class four . . . . . . . . . . . . . . . . . . . . . . . . . . 54

7.1 Gantt chart for preliminary research . . . . . . . . . . . . . . . . . . . . 60

7.2 Gantt chart for initial system design . . . . . . . . . . . . . . . . . . . . 60

7.3 Gantt chart for final development phase . . . . . . . . . . . . . . . . . . 61

Page 6: Natural language database

Chapter 1

Introduction

Natural language Interface to Database (NLDBI) can acts as an alternative interface for

finding structured information from database particularly on a small hand-held devices

because writing questions in natural language is much easier for a casual user than

the complicated and time consuming navigation required in the traditional database

interfaces. NLDBI shifts a users burden of learning use of interface to describe his or

her need for information to the system. NLDBI thus demands less input-output and

processing facilities which make it more useful for mobile devices .

Natural language interface to database is not a new area, a lot of research is

going on since a long time. Most of the NLDBI systems developed so far used shallow

analysis. Shallow analysis uses domain knowledge rather than linguistic knowledge to

interpret the meaning of input question, which results in a low success rate. This is a

main difficulty in implementing NLDBI for any practical use. To increase the success rate

deep analysis of input question can be used. Deep analysis uses linguistic knowledge for

detail understanding. Deep analysis has an advantage of portability that is not possible

in shallow analysis due to too much dependency on database.

To explore deep analysis for NLDBI a movie inquiry is selected as a domain. For

movie inquiry domain a corpus of question is collected. This corpus is then analysed to

find the pattern of questions used in the domain. Based on these patterns a context free

grammar is designed to represent syntax of input questions. Representing syntax is an

important step as in deep analysis method the subsequent steps are dependent on it.

1

Page 7: Natural language database

Introduction

1.1 Motivation

Downsides of existing query languages (e.g., SeRQL, SPARQL):

1. Complex syntax,

2. Not easy to learn,

3. Writing queries is error-prone task,

4. Requires understanding of Semantic Web technologies.

Downsides of existing systems ): This system requires user to select the columns

that he would like to use to make his query first. Next the user has to write an appropriate

query with no guidance from the system. Then the user is supposed to press Do query

to generate the output.

This system appears to be complex and requires the user to perform a lot of steps

with no guidance what so ever. Even though there are so many technologies available for

designing rich and user friendly User Interfaces the existing system still seem to be difficult

to use thus leaving a scope to apply the knowledge of Human Computer Interaction to

develop a user friendly system that performs better than the existing systems.

Figure 1.1: Existing NLDBI system with old user interface development techniques

2

Page 8: Natural language database

Introduction

1.2 Background

Traditionally , most natural language interfaces have a natural- language front end

paradigm, whereby natural language queries are translated into formal database queries

(i.e.: in some formal query language ). These queries are then processed by a database

management system The requested data could be returned as is, or could be passed back

to the natural-language component, which generates a natural-language version of the

data. This effectively divides the problem of natural-language database access into tow

sub problems: the natural-language component, and the database access component.

The methods for translating natural language to formal query languages have

varied over the years. The Lunar Sciences Natural Language Information System (or

LUNAR) is one of the oldest natural-language interfaces to database: it performed a

simple syntactic analysis of English query, followed by semantic analysis of the resulting

parse tree.

3

Page 9: Natural language database

Chapter 2

Literature Survey

2.1 Advantages of NLDBI system

Advantages of Natural Language Interface to Database are as follows:

� No Artificial Language

One advantage of NLIDBs is supposed to be that the user is not required to learn

an artificial communication language. Formal query languages like SQL are difficult

to learn and master, at least by non-computer-specialists.

� No Need for Training

Graphical interfaces and form-based interfaces are easier to use by occasional users;

still, invoking forms, linking frames, selecting restrictions from menus, etc. consti-

tute artificial communication languages that have to be learned and mastered by

the end-user. In contrast, an ideal NLDBI would allow queries to be formulated in

the user’s native language.

� Better for Some Questions

It has been argued that there are some kind of questions (e.g. questions involving

negation, or quantification) that can be easily expressed in natural language, but

that seem difficult (or at least tedious) to express using graphical or form-based

interfaces. For example, ”Which department has no programmers?” (Negation),

or ”Which company supplies every department?” (Universal quantification), can

be easily expressed in natural language, but they would be difficult to express in

most graphical or form-based interfaces. Questions like the above can, of course,

be expressed in database query languages like SQL, but complex database query

language expressions may have to be written.

4

Page 10: Natural language database

Literature Survey

� Easy to Use for Multiple Database Tables

Queries that involve multiple database tables like ”list the address of the farmers

who got bonus greater than 10000 rupees for the crop of wheat”, are difficult to form

in graphical user interface as compared to natural language interface.

2.2 Disadvantages of NLDBI System

Disadvantages of natural language interface to database system are as follows:

� Linguistic Coverage Not Obvious

A frequent complaint against NLDBIs is that the system’s linguistic capabilities

are not obvious to the user. Current NLDBIs can only cope with limited subsets

of natural language. Users find it difficult to understand (and remember) what

kinds of questions the NLDBI can or cannot cope with. Formal query languages,

form-based interfaces, and graphical interfaces typically do not suffer from these

problems. In the case of formal query languages, the syntax of the query language

is usually well-documented, and any syntactically correct query is guaranteed to be

given an answer. In the case of form-based and graphical interfaces, the user can

usually understand what sorts of questions can be input, by browsing the options

offered on the screen; and any query that can be input is guaranteed to be given

an answer.

For example, database doesn’t contain information about profit of farmers and user

inputs the query ”which farmer gets maximum profit”. This query is out of the

linguistic coverage of the system.

� Linguistic vs. Conceptual Failures

When the NLDBI cannot understand a question, it is often not clear to the user

whether the rejected question is outside the system’s linguistic coverage, or whether

it is outside the system’s conceptual coverage. Thus, users often try to rephrase

questions referring to concepts the system does not know (e.g. rephrasing questions

about salaries towards a system that knows nothing about salaries), because they

think that the problem is caused by the system’s limited linguistic coverage. In

other cases, users do not try to rephrase questions the system could conceptually

5

Page 11: Natural language database

Literature Survey

handle, because they do not realize that the particular phrasing of the question is

outside the linguistic coverage, and that an alternative phrasing of the same ques-

tion could be answered. Some NLIDBs attempt to solve this problem by providing

diagnostic messages, showing the reason a question cannot be handled.

For example, user asks a query ”list the names of farmers who are 35 years old” and

the database has information about the age of the farmer but ”age” word is not

there in query. So this query is not out of linguistic coverage but conceptually it is

not right, because system does not understand what 35 is representing i.e. whether

it is for age or for address.

� Users assume intelligence

NLDBI users are often misled by the system’s ability to process natural language,

and they assume that the system is intelligent, that it has common sense, or that it

can deduce facts, while in fact most NLDBIs have no reasoning abilities. This prob-

lem does not arise in formal query languages, form-based interfaces, and graphical

interfaces, where the capabilities of the system are more obvious to the user. For

example, when user asks a query ”list the names of farmers who are 35 years old”,

he/she is not specifying the word ”age”, assuming that system will understand it

automatically. But system is not so intelligent.

� Inappropriate Medium

It has been argued that natural language is not an appropriate medium for com-

municating with a computer system. Natural language is claimed to be too verbose

or too ambiguous for human-computer interaction. NLDBI users have to type long

questions, while in form-based interfaces only fields have to be filled in, and in

graphical interfaces most of the work can be done by mouse-clicking. In natural

language interface user has to type full sentence with all the connectors (articles,

prepositions, etc) but in graphical or form based interfaces it is not required.

2.3 Some Already Developed NLDBI Systems

� LUNAR System

This system comes in early seventies (1973). The system LUNAR is a system that

answers questions about samples of rocks brought back from the moon. The 11

meaning of system’s name is that is in relation to the moon. The system was infor-

mally introduced in 1971. To accomplish its function the LUNAR system uses two

6

Page 12: Natural language database

Literature Survey

databases; one for the chemical analysis and the other for literature references. The

LUNAR system uses an Augmented Transition Network (ATN) parser and Woods

Procedural Semantics. According to words , the LUNAR system performance was

quite impressive; it managed to handle 78

� LADDER

This system was developed in 1978. It was designed as a natural language interface

to a database of information about US Navy ships. According to Hendrix , the

LADDER system uses semantic grammar to parse questions to query a distributed

database. Although semantic grammars helped to implement systems with im-

pressive characteristics, the resulting systems proved difficult to port to different

application domains. Indeed, a different grammar had to be developed whenever

LADDER was configured for a new application . The system uses semantic gram-

mars technique that interleaves syntactic and semantic processing. The question

answering is done via parsing the input and mapping the parse tree to a database

query. The system LADDER was implemented in LISP. At the time of creation

of the LADDER system, it was able to process a database that is equivalent to a

relational database with 14 tables and 100 attributes.

� CHAT-80

This is one of the best-known NLDBIs of the early eighties. CHAT-80 was imple-

mented entirely in Prolog. It transformed English questions into Prolog expressions,

which were evaluated against the Prolog database. The code of CHAT-80 was cir-

culated widely and formed the basis of several other experimental NLDBIs. The

database of CHAT-80 consists of facts (i.e. oceans, major seas, major rivers and

major cities) about 150 of the countries world and a small set of English language

vocabulary that are enough for querying the database.

7

Page 13: Natural language database

Literature Survey

2.4 Commercially Available Systems

Some of the commercially available NLDBs are as follows:

� IBMS LANGUAGE ACCESS: This system stopped being commercially avail-

able in October 1992.

� SQL Tutor: It was developed in 1998. SQL can be very difficult for beginner users

to understand. The SQL-Tutor program tutors students by assisting the students

through a number of database questions from four different databases. A student

model is kept for each student based on query constraints (each constraint repre-

sents a part of the query that is necessary to answer the question). Each time a

particular query constraint is used, SQL-Tutor records whether it was used success-

fully or unsuccessfully. In this way a model of a student’s strengths and weaknesses

is generated and SQL-Tutor can select questions which re-enforce problem areas or

introduce new query concepts.

� INTELLECT: It was from Trinzic (formed by the merger of AICorp and Aion).

This system is based on experience from ROBOT.

� BBNS PARLANCE: It was built on experience from the development of the Rus

and Irus systems.

� ENGLISH WIZARD: It was developed by Linguistic Technology Corporation.

The company was founded by the author of AICorps original Intellect.

8

Page 14: Natural language database

Literature Survey

2.5 Various Approaches Used for Development of

NLDBI Systems

Natural language is the topic of interest from computational viewpoint due to the implicit

ambiguity that language possesses. Several researchers applied different techniques to

deal with language.

2.5.1 Symbolic Approach (Rule Based Approach)

Natural Language Processing appears to be a strongly symbolic activity. Words are

symbols that stand for objects and concepts in real worlds, and they are put together

into sentences that obey well specified grammar rules. Hence, for several decades Natural

Language Processing research has been dominated by the symbolic approach. Knowledge

about language is explicitly encoded in rules or other forms of representation. Language

is analysed at various levels to obtain information. On this obtained information certain

rules are applied to achieve linguistic functionality. As Human Language capabilities

include rule-base reasoning, it is supported well by symbolic processing. In symbolic

processing rules are formed for every level of linguistic analysis. It tries to capture the

meaning of the language based on these rules.

2.5.2 Empirical Approach (Corpus Based Approach)

Empirical approaches are based on statistical analysis as well as other data driven anal-

ysis, of raw data which is in the form of text corpora. A corpus is collections of machine

readable text. The approach has been around since NLP began in the early 1950s. Only

in the last 10 years or so empirical NLP has emerged as a major alternative to ratio-

nalist rule-based Natural Language Processing. Corpora are primarily used as a source

of information about language and a number of techniques have emerged to enable the

analysis of corpus data. Syntactic analysis can be achieved on the basis of statistical

probabilities estimated from a training corpus. Lexical ambiguities can be resolved by

considering the likelihood of one or another interpretation on the basis of context. Recent

research in computational linguistics indicates that empirical or corpus based methods

are currently the most promising approach for developing robust, efficient Natural Lan-

guage Processing (NLP) systems. These methods automate the acquisition of much of

the complex knowledge required for NLP by training on suitably annotated natural lan-

guage corpora, e.g. tree-banks of parsed sentences. Most of the empirical NLP methods

employ statistical techniques such as n-Gram models, Hidden Markov Models (HMMs),

9

Page 15: Natural language database

Literature Survey

and Probabilistic Context Free Grammars 16(PCFGs). Given the successes of empir-

ical NLP methods, researchers have recently begun to apply learning methods to the

construction of information extraction systems. Several different symbolic and statisti-

cal methods have been employed, but most of them are used to generate one part of a

larger information extraction system. Experimented n-Gram based language modeling

and claimed to develop language independent approach to IR and Natural Language

Processing.

2.5.3 Connectionist Approach (Using Neural Network)

Since human language capabilities are based on neural network in the brain, Artificial

Neural Networks (also called as connectionist network) provides an essential starting

point for modelling language processing. In the recent years, the field of connection-

ist processing has seen a remarkable development. The sub-symbolic neural network

approach holds a lot of promise for modelling the cognitive foundations of Natural Lan-

guage Interface using Shallow Parsing language processing. Instead of symbols, the ap-

proach is based on distributed representations that correspond to statistical regularities

in language. There has also been significant research applying neural-network methods

to language processing. However, there has been relatively little recent language research

using sub-symbolic learning, although some recent systems have successfully employed

decision trees transformation rules, and other symbolic methods. SHRUTI system is a

neurally inspired system for event modelling and temporal processing at a connectionist

level.

10

Page 16: Natural language database

Chapter 3

Proposed Work

3.1 Problem Definition

Natural Language Interface to database system provides an interface to the user which

helps him/her to query the database in his/her natural language. As the query languages

like SQL are very difficult to use for the common people and they find it very hard to

learn. So, to make database applications easy to use for these people who don’t know

query languages, Natural Language Interface for various databases has been developed.

3.2 Features

1. The GUI is very similar to a chatting application, making the GUI familiar to the

user and hence easy to use.

2. The system provides Key-Word based suggestion to help with finding the appro-

priate question. This system filters the list of suggested questions based on the

keywords entered.

3. Provides question analysis information i.e. the Goal of your question and the

Constraints applied.

4. Provides the Sql query generated during translating, which may help the user to

learn or validate the queries.

5. The system also provides auto spell check to correct the misspelled words.

11

Page 17: Natural language database

Proposed Work

3.3 Goals

The goal of NLP as stated above is ”to accomplish human-like language processing”.

The choice of the word ’processing’ is very deliberate, and should not be replaced with

’understanding’. For although the field of NLP was originally referred to as Natural

Language Understanding (NLU) in the early days of AI, it is well agreed today that

while the goal of NLP is true NLU, that goal has not yet been accomplished. A full NLU

System would be able to:

1. Paraphrase an input text

2. Translate the text into another language

3. Answer questions about the contents of the text

4. Draw inferences from the text

While NLP has made serious inroads into accomplishing goals 1 to 3, the fact

that NLP systems cannot, of themselves, draw inferences from text, NLU still remains

the goal of NLP. There are more practical goals for NLP, many related to the particular

application for which it is being utilized. For example, an NLP-based IR system has

the goal of providing more precise, complete information in response to a user’s real

information need. The goal of the NLP system here is to represent the true meaning

and intent of the user’s query, which can be expressed as naturally in everyday language

as if they were speaking to a reference librarian. Also, the contents of the documents

that are being searched will be represented at all their levels of meaning so that a true

match between need and response can be found, no matter how either are expressed in

their surface form. The goal of natural language processing (NLP) is to design and build

computer systems that are able to analyse natural languages like German or English, and

that generate their outputs in a natural language, too. Typical applications of NLP are

information retrieval, language understanding, and text classification. The goal of NLP

is to accomplish human-like language processing. The choice of the word ’processing’ is

very deliberate, and should not be replaced with ’understanding’. Although the field of

NLP was originally referred to as Natural Language Understanding (NLU) in the early

days of AI, it is well agreed today that while the goal of NLP is true NLU, that goal has

not yet been accomplished. A full NLU System would be able to paraphrase an input

text, translate the text into another language, answer questions about the contents of

the text, and draw inferences from the text.

12

Page 18: Natural language database

Proposed Work

3.4 Objectives

The objectives of the system are:

1. Provide a set of questions that provides answers to most of the users queries.

2. Divide the questions into two parts

(a) Goal: It is the data that the user wishes to find.

(b) Condition: These are the constraints used to narrow down the results to find

the appropriate data.

3. Compiling the question to translate the goals and conditions into a syntactically

correct and database relevant query.

4. Retrieving the answers and presenting it in a well formatted way.

3.5 Constraint

The constraints of the system are:

1. It uses Corpus based approach, hence the system is not capable of answering any

question outside the corpus.

2. The natural-language interface is designed for specific database. Hence any change

in the structure of the database will need lot of time to translate that change into

the interface.

3. The system answers one question at a time i.e. the question must contain only one

goal.

13

Page 19: Natural language database

Chapter 4

Project Design

4.1 SRS - Software Tools

The SRS tools used are as follows :

� NetBeans IDE

� JavaCC plug-in compatible with NetBeans (for compiler implementation in java)

� MySql Server 5.5

� Stanford Parser

4.2 Hardware Tools

The Hardware tools required are as follows :

� Microsoft Windows XP Professional SP3/Vista SP1/Windows 7 Professional:

� Processor: 2.6 GHz Intel Pentium IV or equivalent

� Memory: 2 GB

� Disk space: 1 GB of free disk space

14

Page 20: Natural language database

Project Design

4.3 System Architecture

Figure 4.1: System Architecture

15

Page 21: Natural language database

Project Design

4.4 UML Diagrams

4.4.1 Use Case Diagram

Figure 4.2: System Architecture

16

Page 22: Natural language database

Project Design

4.4.2 Sequence Diagram

Figure 4.3: System Architecture

17

Page 23: Natural language database

Project Design

4.4.3 Activity Diagram

Figure 4.4: System Architecture

18

Page 24: Natural language database

Project Design

4.5 High Level Design

4.5.1 Module Tree

Figure 4.5: System Architecture

19

Page 25: Natural language database

Project Design

4.5.2 Database Design

Movie Table

This table contains a list of movie names and other details related to movies like the

genre of the movie, its rating, the budget , language and its release date. As its the main

table it contains references to actor table , director table and producer table by using

actor id , director id, producer id as foreign keys. This is done to make the database

normalized to 3NF.

Movie

Parameter name Parameter Type

movie name varchar(50)

director id int(11)

producer id int(11)

actor id int(11)

genre varchar(50)

rating int(11)

budget varchar(50)

language varchar(50)

release date varchar(50)

Table 4.1: Movie Table

Actor Table

This table contains a list of all the actors that have acted in the movies stored in the

Movie Table.The primary key in this table is the actor id.

Actor

Parameter name Parameter Type

actor id int(11)

actor name varchar(50)

gender varchar(10)

Table 4.2: Actor Table

20

Page 26: Natural language database

Project Design

Director Table

This table contains a list of all the directors that have directed the movies stored in the

Movie Table. The primary key in this table is the director id.

Director

Parameter name Parameter Type

director id int(11)

director name varchar(50)

Table 4.3: Director Table

Producer Table

This table contains a list of all the producers that have produced in the movies stored in

the Movie Table.The primary key in this table is the producer id.

Producer

Parameter name Parameter Type

producer id int(11)

producer name varchar(50)

Table 4.4: Producer Table

21

Page 27: Natural language database

Project Design

4.6 Screen-shots

Figure 4.6: NLDBI screen-shot interfaced with movie database and designed to answer

questions related to movies

22

Page 28: Natural language database

Chapter 5

Test Design & Report

5.1 Test Design for NLDBI

The tests have been divided into 2 parts:

1. Input/Output testing

(a) Test for wrong questions

(b) Test for irrelevant input

(c) Test for correct questions

(d) Test for no input

(e) Test for data not present

2. Suggestion list testing

(a) Test for no keywords

(b) Test for relevant keywords

(c) Test for irrelevant keywords

23

Page 29: Natural language database

Test Design & Report

Test Description

Input/Output testing

This test is designed to test how the system responds to the data that is given as input.

The main output, sql query, goal and conditions generated by the system are considered

for testing.

Suggestion list testing

This test is designed to test whether the suggestion utility works as intended as this

utility plays an important role is making the system easy to use and is a key feature of

the system.

24

Page 30: Natural language database

Test Design & Report

5.2 Test Report

The test report contains screen shots and the description of each test with the observed

output. The report is a proof that the system has passes all the intended tested under

the specified constraints.

5.2.1 Input/Output testing

Test for wrong questions

Figure 5.1: System behaviour for wrong question inputs

25

Page 31: Natural language database

Test Design & Report

Description Expected output Actual output Passed

Designed to test system

for wrong questions, or

questions out of the

domain

The system prompts the

user that the query asked

cannot be answered

System prompts ”I didn’t

understand your ques-

tion”Yes

The system should not

generate any goal

The system does not gen-

erate any goal

The system should not

generate any condition

The system does not gen-

erate any condition

The system should not

generate any query

The system does not gen-

erate any query

Table 5.1: Test report for wrong question

26

Page 32: Natural language database

Test Design & Report

1.2. Test for irrelevant input

Figure 5.2: System behaviour for irrelevant inputs

27

Page 33: Natural language database

Test Design & Report

Description Expected output Actual output Passed

Designed to test system

for irrelevant input

The system prompts the

user that the query asked

cannot be answered

System prompts ”I didnt

understand your ques-

tion”Yes

The system should not

generate any goal

The system does not gen-

erate any goal

The system should not

generate any condition

The system does not gen-

erate any condition

The system should not

generate any query

The system does not gen-

erate any query

Table 5.2: Test report for irrelevant input

28

Page 34: Natural language database

Test Design & Report

3. Test for correct questions

Figure 5.3: System behaviour for correct questions

29

Page 35: Natural language database

Test Design & Report

Description Expected output Actual output Passed

Designed to test system

for correct questions

that are within its

answering domain

The system answers the

question with correct

data

System answers the ques-

tion with right dataYes

The system should gener-

ate correct goal

The system generates

right goal

The system should gener-

ate correct conditions

The system generates

right conditions

The system should gener-

ate correct query

The system generate

right query

Table 5.3: Test report for correct question

30

Page 36: Natural language database

Test Design & Report

4. Test for no input

Figure 5.4: System behaviour for no input or null input

31

Page 37: Natural language database

Test Design & Report

Description Expected output Actual output Passed

Designed to test system

for no or Null input

The system should not

react to null input

System doesn’t respond

YesThe system should not

generate any goal

The system does not gen-

erate any goal

The system should not

generate any condition

The system does not gen-

erate any condition

The system should not

generate any query

The system does not gen-

erate any query

Table 5.4: Test report for no input

32

Page 38: Natural language database

Test Design & Report

5. Test for data not present

Figure 5.5: System behaviour if data not present in database

33

Page 39: Natural language database

Test Design & Report

Description Expected output Actual output Passed

Designed to test system

when it does not have

data to answer the

question

The system prompts the

user that the data is

not present to answer the

question

System prompts ”Data

Not Found”Yes

The system should gener-

ate correct goal

The system generates

right goal

The system should not

generate any condition

The system generates

right conditions

The system should gener-

ate correct query

The system generate

right query

Table 5.5: Test report for data not present

34

Page 40: Natural language database

Test Design & Report

5.2.2 Suggestion list testing

Test for no keywords

Figure 5.6: System’s suggestion list behaviour for no keyword inputs

35

Page 41: Natural language database

Test Design & Report

Description Expected output Actual output Passed

Designed to test sugges-

tion utility for no key-

words

The system should not

display the suggestion

list

System reacts as ex-

pected by not displaying

the suggestion list

Yes

Table 5.6: Test report for no keywords

1.2. Test for relevant keywords

Figure 5.7: System’s suggestion list behaviour for relevant keyword inputs

36

Page 42: Natural language database

Test Design & Report

Description Expected output Actual output Passed

Designed to test sugges-

tion utility for correct

keywords

The system should dis-

play the suggestion list

with correct suggestions

System displays the sug-

gestions with helpful sug-

gestions

Yes

Table 5.7: Test report for relevant keywords

3. Test for irrelevant keywords

Figure 5.8: System’s suggestion list behaviour for irrelevant inputs

37

Page 43: Natural language database

Test Design & Report

Description Expected output Actual output Passed

Designed to test sugges-

tion utility for wrong

keywords

The system should not

display the suggestion

list

System reacts as ex-

pected by not displaying

the suggestion list

Yes

Table 5.8: Test report for irrelevant keywords

38

Page 44: Natural language database

Chapter 6

Implementation and Coding

6.1 List of questions answered by the system

These are the list of questions answered by the NLDBI system

6.1.1 Who questions

1. Who is the director of ”[movie name]” ?

2. Who is the producer of ”[movie name]” ?

3. Who is the actor of ”[movie name]” ?

4. Who is the director of the movie ”[movie name]” ?

5. Who is the producer of the movie ”[movie name]” ?

6. Who is the actor of the movie ”[movie name]” ?

7. Who directed ”[movie name]” ?

8. Who produced ”[movie name]” ?

6.1.2 What questions

1. What is the genre of ”[movie name]” ?

2. What is the rating of [movie name]” ?

3. What is the cast of ”[movie name]” ?

4. What is the release date of ”[movie name]” ?

39

Page 45: Natural language database

Implementation and Coding

5. What is the language of ”[movie name]” ?

6. What is the duration of ”[movie name]” ?

7. What is the budget of ”[movie name]” ?

8. What are the movies directed by ”[director name]” ?

9. What are the movies produced by ”[producer name]” ?

10. What are the movies acted by ”[actor name]” ?

11. What are the movies directed by ”[director name]” and produced by ”[producer

name]” ?

12. What are the movies produced by ”[producer name]” and directed by ”[director

name]” ?

13. What are the movies directed by ”[director name]” and acted by ”[actor name]” ?

14. What are the movies acted by ”[actor name]” and directed by ”[director name]” ?

15. What are the movies acted by ”[actor name]” and produced by ”[producer name]”

?

16. What are the movies produced by ”[producer name]” and acted by ”[actor name]”

?

17. What are the movies directed by ”[director name]” and produced by ”[producer

name]” and acted by ”[actor name]” ?

18. What are the movies directed by ”[director name]” and acted by ”[actor name]”

and produced by ”[producer name]” ?

19. What are the movies produced by ”[producer name]” and directed by ”[director

name]” and acted by ”[actor name]” ?

20. What are the movies produced by ”[producer name]” and acted by ”[actor name]”

and directed by ”[director name]” ?

21. What are the movies acted by ”[actor name]” and produced by ”[producer name]”

and directed by ”[director name]” ?

40

Page 46: Natural language database

Implementation and Coding

22. What are the movies acted by ”[actor name]” and directed by ”[director name]”

and produced by ”[producer name]” ?

23. What are the movies directed by ”[director name]” in ”[year]” ?

24. What are the movies produced by ”[producer name]” in ”[year]” ?

25. What are the movies acted by Brad in ”[year]” ?

26. What are the movies directed by ”[director name]” and produced by ”[producer

name]” in ”[year]” ?

27. What are the movies produced by ”[producer name]” and directed by ”[director

28. What are the movies directed by ”[director name]” and acted by ”[actor name]” in

”[year]” ?

29. What are the movies acted by ”[actor name]” and directed by ”[director name]” in

”[year]” ?

30. What are the movies acted by ”[actor name]” and produced by ”[producer name]”

in ”[year]” ?

31. What are the movies produced by ”[producer name]” and acted by ”[actor name]”

in ”[year]”?

32. What are the movies directed by ”[director name]” and produced by ”[producer

name]” and acted by ”[actor name]” in ”[year]” ?

33. What are the movies directed by ”[director name]” and acted by ”[actor name]”

and produced by ”[producer name]” in ”[year]” ?

34. What are the movies produced by ”[producer name]” and directed by ”[director

name]” and acted by ”[actor name]” in ”[year]” ?

35. What are the movies produced by ”[producer name]” and acted by ”[actor name]”

and directed by ”[director name]” in ”[year]” ?

36. What are the movies acted by ”[actor name]” and produced by ”[producer name]”

and directed by ”[director name]” in ”[year]” ?

37. What are the movies acted by ”[actor name]” and directed by ”[director name]”

and produced by ”[producer name]” in ”[year]” ?

41

Page 47: Natural language database

Implementation and Coding

38. What are the ”[genre]” movies released in ”[year]” ?

39. What are the ”[genre]” movies directed by ”[director name]” ?

40. What are the ”[genre]” movies produced by ”[producer name]” ?

41. What are the ”[genre]” movies acted by ”[actor name]” ?

42. What are the ”[genre]” movies directed by ”[director name]” and produced by

”[producer name]” ?

43. What are the ”[genre]” movies produced by ”[producer name]” and directed by

”[director name]” ?

44. What are the ”[genre]” movies directed by ”[director name]” and acted by ”[actor

name]” ?

45. What are the ”[genre]” movies acted by ”[actor name]” and directed by ”[director

name]” ?

46. What are the ”[genre]” movies acted by ”[actor name]” and produced by ”[producer

name]” ?

47. What are the ”[genre]” movies produced by ”[producer name]” and acted by ”[actor

name]” ?

48. What are the ”[genre]” movies directed by ”[director name]” and produced by

”[producer name]” and acted by ”[actor name]” ?

49. What are the ”[genre]” movies directed by ”[director name]” and acted by ”[actor

name]” and produced by ”[producer name]” ?

50. What are the ”[genre]” movies produced by ”[producer name]” and directed by

”[director name]” and acted by ”[actor name]” ?

51. What are the ”[genre]” movies produced by ”[producer name]” and acted by ”[actor

name]” and directed by ”[director name]” ?

52. What are the ”[genre]” movies acted by ”[actor name]” and produced by ”[producer

name]” and directed by ”[director name]” ?

53. What are the ”[genre]” movies acted by ”[actor name]” and directed by ”[director

name]” and produced by ”[producer name]” ?

42

Page 48: Natural language database

Implementation and Coding

54. What are the ”[genre]” movies directed by ”[director name]” in ”[year]” ?

55. What are the ”[genre]” movies produced by ”[producer name]” in ”[year]” ?

56. What are the ”[genre]” movies acted by Brad in ”[year]” ?

57. What are the ”[genre]” movies directed by ”[director name]” and produced by

”[producer name]” in ”[year]” ?

58. What are the ”[genre]” movies produced by ”[producer name]” and directed by

”[director name]” in ”[year]” ?

59. What are the ”[genre]” movies directed by ”[director name]” and acted by ”[actor

name]” in ”[year]” ?

60. What are the ”[genre]” movies acted by ”[actor name]” and directed by ”[director

name]” in ”[year]” ?

61. What are the ”[genre]” movies acted by ”[actor name]” and produced by ”[producer

name]” in ”[year]” ?

62. What are the ”[genre]” movies produced by ”[producer name]” and acted by ”[actor

name]” in ”[year]”?

63. What are the ”[genre]” movies directed by ”[director name]” and produced by

”[producer name]” and acted by ”[actor name]” in ”[year]” ?

64. What are the ”[genre]” movies directed by ”[director name]” and acted by ”[actor

name]” and produced by ”[producer name]” in ”[year]” ?

65. What are the ”[genre]” movies produced by ”[producer name]” and directed by

”[director name]” and acted by ”[actor name]” in ”[year]” ?

66. What are the ”[genre]” movies produced by ”[producer name]” and acted by ”[actor

name]” and directed by ”[director name]” in ”[year]” ?

67. What are the ”[genre]” movies acted by ”[actor name]” and produced by ”[producer

name]” and directed by ”[director name]” in ”[year]” ?

68. What are the ”[genre]” movies acted by ”[actor name]” and directed by ”[director

name]” and produced by ”[producer name]” in ”[year]” ?

43

Page 49: Natural language database

Implementation and Coding

6.1.3 Which questions

1. Which ”[genre]” movies released in ”[year]” ?

2. Which movies are directed by ”[director name]” ?

3. Which movies are produced by ”[producer name]” ?

4. Which movies are acted by ”[actor name]” ?

5. Which movies are directed by ”[director name]” and produced by ”[producer name]”

?

6. Which movies are produced by ”[producer name]” and directed by ”[director name]”

?

7. Which movies are directed by ”[director name]” and acted by ”[actor name]” ?

8. Which movies are acted by ”[actor name]” and directed by ”[director name]” ?

9. Which movies are acted by ”[actor name]” and produced by ”[producer name]” ?

10. Which movies are produced by ”[producer name]” and acted by ”[actor name]” ?

11. Which movies are directed by ”[director name]” and produced by ”[producer name]”

and acted by ”[actor name]” ?

12. Which movies are directed by ”[director name]” and acted by ”[actor name]” and

produced by ”[producer name]” ?

13. Which movies are produced by ”[producer name]” and directed by ”[director name]”

and acted by ”[actor name]” ?

14. Which movies are produced by ”[producer name]” and acted by ”[actor name]” and

directed by ”[director name]” ?

15. Which movies are acted by ”[actor name]” and produced by ”[producer name]” and

directed by ”[director name]” ?

16. Which movies are acted by ”[actor name]” and directed by ”[director name]” and

produced by ”[producer name]” ?

17. Which movies are directed by ”[director name]” in ”[year]” ?

18. Which movies are produced by ”[producer name]” in ”[year]” ?

44

Page 50: Natural language database

Implementation and Coding

19. Which movies are acted by Brad in ”[year]” ?

20. Which movies are directed by ”[director name]” and produced by ”[producer name]”

in ”[year]” ?

21. Which movies are produced by ”[producer name]” and directed by ”[director name]”

in ”[year]” ?

22. Which movies are directed by ”[director name]” and acted by ”[actor name]” in

”[year]” ?

23. Which movies are acted by ”[actor name]” and directed by ”[director name]” in

”[year]” ?

24. Which movies are acted by ”[actor name]” and produced by ”[producer name]” in

”[year]” ?

25. Which movies are produced by ”[producer name]” and acted by ”[actor name]” in

”[year]”?

26. Which movies are directed by ”[director name]” and produced by ”[producer name]”

and acted by ”[actor name]” in ”[year]” ?

27. Which movies are directed by ”[director name]” and acted by ”[actor name]” and

produced by ”[producer name]” in ”[year]” ?

28. Which movies are produced by ”[producer name]” and directed by ”[director name]”

and acted by ”[actor name]” in ”[year]” ?

29. Which movies are produced by ”[producer name]” and acted by ”[actor name]” and

directed by ”[director name]” in ”[year]” ?

30. Which movies are acted by ”[actor name]” and produced by ”[producer name]” and

directed by ”[director name]” in ”[year]” ?

31. Which movies are acted by ”[actor name]” and directed by ”[director name]” and

produced by ”[producer name]” in ”[year]” ?

32. Which ”[genre]” movies are released in ”[year]” ?

33. Which ”[genre]” movies are directed by ”[director name]” ?

34. Which ”[genre]” movies are produced by ”[producer name]” ?

45

Page 51: Natural language database

Implementation and Coding

35. Which ”[genre]” movies are acted by ”[actor name]” ?

36. Which ”[genre]” movies are directed by ”[director name]” and produced by ”[pro-

ducer name]” ?

37. Which ”[genre]” movies are produced by ”[producer name]” and directed by ”[di-

rector name]” ?

38. Which ”[genre]” movies are directed by ”[director name]” and acted by ”[actor

name]” ?

39. Which ”[genre]” movies are acted by ”[actor name]” and directed by ”[director

name]” ?

40. Which ”[genre]” movies are acted by ”[actor name]” and produced by ”[producer

name]” ?

41. Which ”[genre]” movies are produced by ”[producer name]” and acted by ”[actor

name]” ?

42. Which ”[genre]” movies are directed by ”[director name]” and produced by ”[pro-

ducer name]” and acted by ”[actor name]” ?

43. Which ”[genre]” movies are directed by ”[director name]” and acted by ”[actor

name]” and produced by ”[producer name]” ?

44. Which ”[genre]” movies are produced by ”[producer name]” and directed by ”[di-

rector name]” and acted by ”[actor name]” ?

45. Which ”[genre]” movies are produced by ”[producer name]” and acted by ”[actor

name]” and directed by ”[director name]” ?

46. Which ”[genre]” movies are acted by ”[actor name]” and produced by ”[producer

name]” and directed by ”[director name]” ?

47. Which ”[genre]” movies are acted by ”[actor name]” and directed by ”[director

name]” and produced by ”[producer name]” ?

48. Which ”[genre]” movies are directed by ”[director name]” in ”[year]” ?

49. Which ”[genre]” movies are produced by ”[producer name]” in ”[year]” ?

50. Which ”[genre]” movies are acted by Brad in ”[year]” ?

46

Page 52: Natural language database

Implementation and Coding

51. Which ”[genre]” movies are directed by ”[director name]” and produced by ”[pro-

ducer name]” in ”[year]” ?

52. Which ”[genre]” movies are produced by ”[producer name]” and directed by ”[di-

rector name]” in ”[year]” ?

53. Which ”[genre]” movies are directed by ”[director name]” and acted by ”[actor

name]” in ”[year]” ?

54. Which ”[genre]” movies are acted by ”[actor name]” and directed by ”[director

name]” in ”[year]” ?

55. Which ”[genre]” movies are acted by ”[actor name]” and produced by ”[producer

name]” in ”[year]” ?

56. Which ”[genre]” movies are produced by ”[producer name]” and acted by ”[actor

name]” in ”[year]”?

57. Which ”[genre]” movies are by ”[director name]” and produced by ”[producer

name]” and acted by ”[actor name]” in ”[year]” ?

58. Which ”[genre]” movies are directed by ”[director name]” and acted by ”[actor

name]” and produced by ”[producer name]” in ”[year]” ?

59. Which ”[genre]” movies are produced by ”[producer name]” and directed by ”[di-

rector name]” and acted by ”[actor name]” in ”[year]” ?

60. Which ”[genre]” movies are produced by ”[producer name]” and acted by ”[actor

name]” and directed by ”[director name]” in ”[year]” ?

61. Which ”[genre]” movies are acted by ”[actor name]” and produced by ”[producer

name]” and directed by ”[director name]” in ”[year]” ?

62. Which ”[genre]” movies are acted by ”[actor name]” and directed by ”[director

name]” and produced by ”[producer name]” in ”[year]” ?

6.1.4 When question

1. When did ”[movie name]” release ?

47

Page 53: Natural language database

Implementation and Coding

6.2 Parse Trees

These parse trees are generated by capturing the commonalities in the list of questions.

Figure 6.1: Who questions class one

48

Page 54: Natural language database

Implementation and Coding

Figure 6.2: Who questions class two

49

Page 55: Natural language database

Implementation and Coding

Figure 6.3: Who questions class three

50

Page 56: Natural language database

Implementation and Coding

Figure 6.4: What questions class one

51

Page 57: Natural language database

Implementation and Coding

Figure 6.5: What questions class two

52

Page 58: Natural language database

Implementation and Coding

Figure 6.6: What questions class three

53

Page 59: Natural language database

Implementation and Coding

Figure 6.7: What questions class four

54

Page 60: Natural language database

Implementation and Coding

6.3 Context Free Grammar

The context free grammar is the heart of the compiler and does the work of translating

the natural language question into sql query using lexical analysis, syntax analysis and

semantic analysis.

6.3.1 LEX specification:

The lex specification defines all the tokens used in the compiler. Each of the word entered

by the user first goes through lexical analysis. This is the step where the compiler

distinguishes the words based on the regular expressions specified below and then passes

these to the other phases of the compiler.

%{#include”y . tab . h”

#include<stdio . h>

#include<string . h>

%}

%%

[\ t ] ;

[ ? ] {yylval . s=strdup ( yytext ) ; r e turn QS ;}

directed |acted |produced {yylval . s=strdup ( yytext ) ; r e turn VBD ;}

released {yylval . s=strdup ( yytext ) ; r e turn VBN ;}

who |what {yylval . s=strdup ( yytext ) ; r e turn WP ;}

is {yylval . s=strdup ( yytext ) ; r e turn VBZ ;}

the {yylval . s=strdup ( yytext ) ; r e turn DT ;}

horror |comedy |action |drama |suspence |thriller |sci−fi {yylval . s=strdup ( yytext ) ; r e turn JJ ;}

and {yylval . s=strdup ( yytext ) ; r e turn CC ;}

55

Page 61: Natural language database

Implementation and Coding

which {yylval . s=strdup ( yytext ) ; r e turn WDT ;}

when {yylval . s=strdup ( yytext ) ; r e turn WRB ;}

did {yylval . s=strdup ( yytext ) ; r e turn VBDX ;}

director |producer |actor |actress |movie |release−date |genre |rating |cast |language |duration |budget |movies |year {yylval . s=strdup ( yytext ) ; r e turn NN ;}

release {yylval . s=strdup ( yytext ) ; r e turn VB ;}

are {yylval . s=strdup ( yytext ) ; r e turn VBP ;}

of |by |in {yylval . s=strdup ( yytext ) ; r e turn IN ;}

[0−9]+ { i f ( atoi ( yytext )>1950 && atoi ( yytext )<2012)

{yylval . s=strdup ( yytext ) ; r e turn CD ;}}

[ a−zA−Z ]+ {yylval . s=strdup ( yytext ) ; r e turn NNP ;}

%%

in t yywrap ( )

{r e turn 1 ;

}

56

Page 62: Natural language database

Implementation and Coding

6.3.2 YACC Specification:

%{#include<stdio . h>

#include<string . h>

i n t mflag=0,i=0;

char goal [ 5 ] [ 1 0 0 ] ; // gc − goa l c ond i t i on tab l e

char condition [ 5 ] [ 1 0 0 ] ;

char test [ 5 ] [ 1 0 0 ] ;

i n t gp=0,cp=0;

%}

%union

{i n t y ;

char *s ;

}

%token <s> QS

%token <s> VBD

%token <s> WP

%token <s> NN

%token <s> NNP

%token <s> VBZ

%token <s> DT

%token <s> IN

%token <s> VBP

%token <s> CD

%token <s> CC

%token <s> VBN

%token <s> JJ

%token <s> WDT

%token <s> VBDX

%token <s> VB

%token <s> WRB

%%

START : SBAR QS {printf ( ”=============goa l==============\n” ) ;f o r ( i=0;i<gp ; i++)

printf ( ”%s \n” , goal [ i ] ) ;

printf ( ”=============conn==============\n” ) ;f o r ( i=0;i<cp ; i++)

printf ( ”%s = \ t %s \n” , condition [ i ] , test [ i ] ) ;

} ;SBAR : WHNP SBARQ | WHADVP SQ ;

S : NPW VPJJ | NPW VBP VP ;

NPW : JJtemp NNtemp | NNtemp ;

SBARQ : S | SQ ;

WHADVP : WRB {addgoal ( ” r e l e a s e−date ” ) ; addcon ( ”moviename” ) ;} ;

WHNP : WP | WDT ;

SQ : VBZ NPZ PP | VBDtemp NPZ | VBP NPX | VBDX NNPtemp VB ;

57

Page 63: Natural language database

Implementation and Coding

NPX : NPZ VP | NPJJ VPJJ | NPJJ VP ;

VPX : VBDtemp PP PPX ;

VP : VP CC VPX | VPX ;

PPX : | PP ;

NPZ : DT NNtemp | DT NNtemp NNPtemp | NNPtemp | CD {addcon ( ” r e l e a s e−date ” ) , addtest ( $1←↩) ;} ;

NPJJ : DT JJtemp NNtemp ;

VPJJ : VBN PP ;

JJtemp : JJ {addcon ( ” genre ” ) ; addtest ( $1 ) ;} ;

PP : IN NPZ ;

VBDtemp : VBD {i f ( mflag==0)

{i f ( strcmp ( $1 , ” d i r e c t ed ” )==0)

addgoal ( ” d i r e c t o r ” ) ;

i f ( strcmp ( $1 , ”produced” )==0)

addgoal ( ” producer ” ) ;

i f ( strcmp ( $1 , ” acted ” )==0)

addgoal ( ” ac to r ” ) ;

addcon ( ”moviename” ) ;

}e l s e

{i f ( strcmp ( $1 , ” d i r e c t ed ” )==0)

addcon ( ” d i r e c t o r ” ) ;

i f ( strcmp ( $1 , ”produced” )==0)

addcon ( ” producer ” ) ;

i f ( strcmp ( $1 , ” acted ” )==0)

addcon ( ” ac to r ” ) ;

}

} ;NNtemp : NN {i f ( strcmp ( $1 , ”movie” )==0) ;

i f ( strcmp ( $1 , ” year ” )==0) { addgoal ( ” r e l e s e−date ” ) ; addcon ( ”moviename” ) ;}i f ( strcmp ( $1 , ”movies ” )==0) { mflag=1;addgoal ( ”moviename” ) ;}e l s e { addgoal ( $1 ) ; addcon ( ”moviename” ) ;}

} ;

NNPtemp : NNP { addtest ( $1 ) ; } ;

%%

main ( )

{yyparse ( ) ;

r e turn 0 ;

}

yyerror ( char * s1 )

{printf ( ”%s” , s1 ) ;

}

58

Page 64: Natural language database

Implementation and Coding

void addcon ( char x [ 1 0 0 ] )

{strcpy ( condition [ cp++],x ) ;

}void addgoal ( char x [ 1 0 0 ] )

{strcpy ( goal [ gp++],x ) ;

}void addtest ( char x [ 1 0 0 ] )

{strcpy ( test [ cp−1] ,x ) ;

}

59

Page 65: Natural language database

Chapter 7

Schedule of work

Figure 7.1: Gantt chart for preliminary research

Figure 7.2: Gantt chart for initial system design

60

Page 66: Natural language database

Schedule of work

Figure 7.3: Gantt chart for final development phase

61

Page 67: Natural language database

Chapter 8

Conclusion and Future Scope

8.1 Conclusion

The Natural Language Interface to Database system, for movie enquiry using Movie

database, accepts query in natural-language that is translated into SQL query, by making

use of a compiler tailored for this purpose . Then this SQL query is executed on database

to provide output to the user. Major concepts in this system are:

� Tokenizing the input query and check itf all the tokens are relevant . After analysing

query it divides it into goals and conditions and uses this information to form

appropriate SQL query.

� Execute SQL query on the database to retrieve required information. And provide

output to the user in a well formatted manner.

8.2 Future Scope

In future the system can be extended to perform the following tasks:

� The system can make use of the internet using web search to find relevant data.

� The system could make use of natural language processing techniques to under-

stand. any natural language question.

62

Page 68: Natural language database

Bibliography

[1] Avinash J. Agrawal and Dr. O. G. Kakde , ” Corpus Based Context Free Grammar

Design for Natural Language Interface to Database ”. , ISSN (Online): 1694-0814.

[2] Aho, Ullman, ”Principle of Compiler Design” , A book Published by Narosa Publi-

cations, 2002.

[3] Androutsopoulos I., Ritchie G.D., and Thanisch P., ”Natural Language Interfaces

to Databases ? An Introduction”, Journal of Natural Language Engineering Part 1,

1995, 29?81.

[4] ”The Standford Parser(statistical parser)”, http://nlp.stanford.edu/software/lex-

parser.

[5] Ruwanpura,S.(2000), SQ-HAL:Natural Language to SQL Translator ,

http://www.csse.monash.edu.au/hons/projects/2000/Supun.Ruwanpura,Monash

University.

[6] ”Artificial Intelligence” , by Elaine Rich and Kevin Knight,(2006), McGraw Hill com-

panies Inc., Chapter 15, page 377-426.

[7] Sleator,D.,Temperley,D. (1991), ”Parsing english with a link Grammar” , Carnegie

Mellon University Computer Science Technical Report,October 1991.

[8] A. El-Mouadib, Zakaria Suliman Zubi, Ahmed A. Almagrous, I. El-Feghi, ”Interactive

Natural Language Interface” , WSEAS transactions on computers, issue 4, volume 8,

(2009), pp 5-9.

[9] Tanveer Siddiqui, U.S. Tiwary, ”Natural Language Processing and Information Re-

trieval ” , Oxford university press, (2008).

[10] Suh, K.S., Perkins, W.C., ”The effects of a system echo in a restricted natural

language database interface for novice users”, in IEEE System sciences, 4, pp. 594-

599, 1994.

63