Varieties of Memory Thomas G. Bowers, Ph.D. Penn State Harrisburg 2000.
Artificial Intelligence Chat - Penn State Harrisburg Math/Computer
Transcript of Artificial Intelligence Chat - Penn State Harrisburg Math/Computer
The Pennsylvania State University
The Graduate School Capital College
Artificial Intelligence Chat
A Master’s Paper in
Computer Science
By
Resham N. Mahadeo
©2004 Resham N. Mahadeo
Submitted in Partial Fulfillment
Of requirements of for the degree of
Master of Science
October 2004
II
Table of Contents I ABSTRACT ........................................................................................................IV
II ACKNOWLEDGEMENTS ................................................................................. V
1 INTRODUCTION................................................................................................. 1
2 LOCATING THE BEST CHAT PROGRAM SHELL ....................................... 3
3 INSTALLING AND CONFIGURING THE MYSQL DATABASE................... 4
4 DESIGN DETAILS............................................................................................... 6
4.1 SELECTING THE RESPONSE ALGORITHM .................................................................... 6 4.2 ACQUIRING CONVERSATION................................................................................... 10 4.2.1 JMSN Main Functionality.............................................................................. 13 4.2.2 Main Interface ............................................................................................... 13 4.3 CHAT INTERFACE................................................................................................... 14 4.4 MAIN INTERFACE................................................................................................... 15 4.4.1 ChatDialog : .................................................................................................. 16 4.4.2 BuddyTree :................................................................................................... 16 4.4.3 BuddyList:..................................................................................................... 17 4.4.4 AbstractProcessor : ........................................................................................ 17 4.4.5 MSNMenuBar: .............................................................................................. 17 4.4.6 SelectCommon: ............................................................................................. 17 4.4.7 SelectResponse:............................................................................................. 17 4.4.8 AddFriendsDialog: ........................................................................................ 17 4.4.9 ChatArea: ...................................................................................................... 18 4.4.10 MsnListener:.................................................................................................. 18 4.4.11 MsnFriend: .................................................................................................... 18 4.4.12 Main: ............................................................................................................. 18 4.4.13 MSNMessenger : ........................................................................................... 18 4.4.14 SwitchboardSession: ...................................................................................... 18
5 EFFECTIVENESS OF THE RESULTS............................................................ 19
6 A COMPARISON OF OTHER AI CHAT CLIENTS....................................... 19
7 ISSUES AND PROBLEMS ................................................................................ 22
8 FUTURE ENHANCEMENTS............................................................................ 25
9 CONCLUSION ................................................................................................... 25
10 REFERENCES.................................................................................................... 26
III
Table of Figures
FIGURE 1: TABLE OF DATABASE TABLES ....................................................................... 4 FIGURE 2: SELECT RESPONSE ALGORITHM.................................................................... 7 FIGURE 3: ACQUIRING CONVERSATION........................................................................ 10 FIGURE 4: FLOW OF TRIANGULATION (SIMPLIFIED VERSION) ...................................... 11 FIGURE 5: MAIN INTERFACE OPTIONS ......................................................................... 13 FIGURE 6: CHAT DIALOG ............................................................................................. 14 FIGURE 7: MAIN INTERFACE ........................................................................................ 15 FIGURE 8: TABLE OF UNMATCHED PAIRS.................................................................... 23
IV
I Abstract
Our contribution includes the modification of an ordinary chat client to
create an automated chat client. Several functionalities were added to achieve
this goal. Adding arbitrary users to our buddy lists is one of these functionalities.
Means were developed by which data can be collected from two users chatting,
without the user’s awareness. The ability by which a single response can be sent
selectively to a user has been added. The ability to be able to respond to a
selected user automatically without any input has been added. Another important
functionality added is a reliable algorithm that can pick the appropriate response.
A mechanism to create a response from a customized list of responses in
conjunction with regular words received has also been added.
V
II Acknowledgements
I would like to express thanks to Dr. Pavel Naumov for extending his hours
to accommodate me during this project. It is dvice, guidance, encouragement,
and motivation that took me to the finish line. I would also like to say thanks to
Dr. Thang Bui for all his input, help and guidance during my course of study.
Finally, I would like to thank Dr. Linda Null, Dr. Qin Ding and Dr. Sukmoon Chang
for their input and guidance.
1
1 Introduction
There are programs such as Eliza, Chatbot, and Alicebot that have been
created for a similar purpose as the one in this project. These programs strive to
pass the standard test set by Alan Turing in 1950. This test states that if a person
cannot tell the difference between a computer and a human, both of which are in
separate rooms away from the person, then the computer has said to pass the
Turing test.
Instant messaging is sending short text messages between people
electronically in real time. A key feature of IM software programs is the buddy list,
which tells whether or not a friend or colleague is online and available to chat
[11]. Another popular feature is the ability to create different groups. There are
several statuses that users can set if they are away, busy, available, etc. Actions
available to the users in these buddy lists usually include, add, remove, block,
sort, search and unblock. However they can vary with different clients. Similar
actions can be taken on groups as well. It is also common to send or receive files
using this application. Some of the newer features that are available are the
ability to send or receive voice and video. Taking all of this into consideration, it is
easily seen why IM software programs are becoming more popular as time
progresses, particularly with the younger generation. However the scope of
instant messaging has room for advancement. Soon the scope and use of IM will
parallel that of e-mail.
2
Instant messaging has become more popular than it was several years
ago. It is very portable due to the fact that it can be used on PDAs, some cell
phones, and other thin clients. Some of the more popular ones on the Internet
are MSN Messenger, Yahoo Instant Messenger, AOL Instant Messenger and
ICQ. These clients provide a free and convenient way of communication between
computers and thin client users.
The goal of this project is to modify an existing chat client to create a chat
client that can respond independently. The responses are taken from a database
of stored responses. These stored responses have been collected from
conversations of other users.
There is lots of IM client software that has been developed by
programmers around the world to work with existing servers or to work with
their own servers. Some clients have been developed for use on intranets
whereas some have been created for use on the Internet.
The general idea is to send a message from one client to another client
via a server using an agreed-upon protocol. The most common protocols are
sockets and TCP/IP or UDP. After the server receives the message, it is then
sent to the addressed client.
The server is generally a listener who waits at a given port or at an
initiated socket. After a communication channel is established, initial messages
are sent back and forth. The first message is automatic, and its purpose is to
authenticate the user to the server. Subsequent automatic messages consist of
3
buddy lists, buddy statuses, and other information that the server keeps for the
client. These are done while the user is logging onto the chat client.
2 Locating the Best Chat Program Shell
There are many websites where open source can be found. Two of the
more popular ones at were found are http://sourceforge.net/ and
http://freshmeat.net/. Several clients were considered and time was dedicated to
them. The source was downloaded and compilations were attempted. It was a
very challenging task to find clients that contained complete source, could be
compiled, did not have many errors, and contained errors that could be resolved.
One such client Ebjava was found at http://sourceforge.net/projects/ebjava/.
Compiling Ebjava was successful. However after working with Ebjava for
sometime we found errors that could not be resolved. Several months were
dedicated to making Ebjava work but all attempts were unsuccessful.
Finally, a complete working client, JMSN Messenger, was found. This
client was written by Jang-Ho Hwang from Korea [5]. This client was found on
Source Forge’s website. The author was contacted through e-mail but no
response was received. Since his copyright statement allowed for modification or
redistribution, modifications were initiated.
4
3 Installing and configuring the MYSQL database
The two possible mediums that can be used to store conversation are files
or a database. Storing the conversation in files is not appropriate because of the
time it would take to search the files and the file access time. It was determined
that storing the responses, user names, people online and other necessary
information on a database was the ideal solution. We have considered MYSQL
as an ideal database for small-scale projects. The MYSQL database server 3.23
was downloaded from http://www.mysql.com/. The installation was done on an
HP Pavilion XH156 laptop. Documentation on installing and configuring MYSQL
was obtained from http://dev.mysql.com/doc/mysql/en/index.html. The table
space and database were created using the mysqlgui and mysql tools. Tables
were then created for Common_words, Conversation1, Conversation2,
Question, Code and Users_online. The structure of the tables is shown below.
Table Columns
Common_words words
Conversation1 Statement1
Conversation2 Statement1
Question Question, Answer
Code first
Users_online Online
FIGURE 1: TABLE OF DATABASE TABLES
The Common_words table is used to store frequently used words that
generally do not alter the primary meaning of the statement. Conversation1
5
table is used to store responses from the initial person chatting. Conversation2
table is used to store responses from the second person chatting. Question
table is used to store complete pairs of responses from both parties. Code table
is used to store internal codes for the application. Users_online table is used to
store users that have been added to the buddy list previously.
6
4 Design Details 4.1 Selecting the response Algorithm
Figure 2 demonstrates the actual steps taken in the algorithm.
On the first step the received response is checked against the database. The
response received is compared to the question column of the question table for
an exact match. If a unique result is found then the answer column is chosen as
a response. If there is more than one in the result set received from the
database, each question column result is separated into common and regular
words. The common words are determined by cross-referencing the
Common_words table. The Common_words table was constructed manually
and it was fine tuned to improve the performance of the system. Only the regular
words of each statement are considered. The common words are ignored. The
response taken is also separated into common and regular words. The common
words are also removed. The regular words in the statement are compared with
the regular words of each match from the database. The entry that has the most
matches will have its corresponding answer entry returned. If there are no
matches by cross-referencing the question column, the next step is to make a
comparison on the answer column, repeating the previously described steps.
The second step is to replace the spaces found between the words in the received statement by wildcards, and check the database for any similar statements. A comparison of the response received to the question column of the question table is done first. If a result set of one is received then its answer column is sent as a response. If there are more than one in the result set received from the database, each statement is broken down into common and
7
regular words. As previously discussed the regular words from the input
Single Response
Found Multiple Find best + [%] Database Response Response Lookup
(question / Answer)
Not Found Database Lookup (common_words) Remove Common Words
Add
Wildcards Database Lookup (question / answer) Single Response Found Multiple Find best Response Response
Not Found Remove a Regular word Database Lookup Single (question / answer) Response Multiple Find best Not Found Response Response Found
No Response Find Significant Create
Can be Found Regular word Automated Response
FIGURE 2: SELECT RESPONSE ALGORITHM
Input Text
Return Response
Return Response
Return Response
Return Response
Return Response
Return Response
Return Response
8
statement are compared with those of each entry in the result set. The
question column in the result set that has the highest number of matches has its
corresponding answer column sent to the other person. If there are no matches
by matching the question column, the next step is to make a comparison on the
answer column, repeating the previously described steps.
The next step is to remove the common words from the received
response. We remove one of the regular words from the right side of the
remaining statement. Wild cards are then added. The resulting statement is then
compared to the question column on the question table. If matches are not
found, a comparison is then made to the answer column on the question table.
In either case, if several matches are found, the best one is determined by
matching the regular words of the result and the response. If none of these
produce any results, another regular word is removed and the previously
described steps are repeated. If this does not produce any result, another regular
word is removed and the process is repeated. In the case where no data is found
after searching through the database, a constructed answer is returned. Several
possible answers have been stored as possible partial answers. The input
statement is stripped of all of its regular words (words not on the common words
table), and these words are added to an array. The most significant of the regular
words is picked from the array; this is basically the one with the largest length.
This word is added to one of the randomly selected possible answers. The
constructed response is then returned. An example of a possible constructed
response is "Tell me more of this + 'significant regular word'." Common words are
9
an important consideration because they can be interchanged without changing
the basic meaning of the phrase. Consider the 2 phrases, “Can I go to the park”
and “Can we go to the park”, if the “I” and “we” are eliminated the statements
both have the same basic meaning. Since “to” and “the” really don’t add meaning
to the statement, they would be considered common words also. Considering the
previous statements, the response to both would likely apply, thus they are
considered. In view of this, the best course of action is to remove the common
words from the responses before making more refined comparisons.
The common words table has a direct reflection on the response that is
produced. If too many words were added to the common words list, this would
result in the regular words list being smaller. This may cause some larger result
sets from the database, thus resulting in a larger result set of unrefined
responses to process. If we consider an extreme situation where a poor selection
of common words can cause all the words to be considered as common in a
statement, this would produce a situation where no matches can be found
because there are no words to compare. Thus the common words list was
modified several times to produce the best results.
10
4.2 Acquiring Conversation Initial Statement Sends Response Initial Statement Receive Response
Thread 2 waits, Insert response Selects and delete
User 3 Conversation2 Conversation1
Thread 1 waits Selects and delete
Insert Pairs of response
Question
Insert response
Receive Response
Sends response FIGURE 3: ACQUIRING CONVERSATION
Figure 3 shows the flow of capturing data from a conversation triangle. A
more simplified version can be seen in the Figure 4. Figure 3 shows the actual
flow of data between the database and the client. The boxes shown with ‘client 2
& User3’ and ‘client 1 & User3’ represent dialog windows on the AI JMSN
application. ‘User 1’ and ‘User 2’ are two external users that have been added to
the buddy list.
AI JMSN Application
Client 2 & User3
User 2
User 1 Client 1 & User3
11
User1 User2
Conversation1 Conversation2
User3 FIGURE 4: FLOW OF TRIANGULATION (SIMPLIFIED VERSION)
Conversation1 is used to store responses from user1 as shown in Figure
3. User3, which is the AI JMSN application, relays this response to User2.
Similarly User3 relays the response from Conversaiont2 to User1. The Question
table is used to store pairs of responses from User1 and User2. The question of
how to get the database populated was addressed in the following manner. First
find two people online initiate a chat with person one. The response attained from
person one is then sent to person two. It would appear to both people that they
are chatting with person three. However person three is just serving as a relay for
messages between person one and person two. All the complete pairs of
responses were then stored in the Question table (this can be easily seen in
Fig1). This task involved adding several modules to be referenced by the
chatdialog module. The buddytree module is used to display the users that are
online. When setting up conversations with 2 people, only person 3 will post
12
complete pairs of responses to the question table so as to reduce redundancy.
Both modules post their single responses to their respective tables. The other
person views these responses. User 1 will create a thread that will check for
responses from person two on conversation 2. User 2 will create a thread that
will check for responses from person one on conversation 1(figure 3).
Another module was created to create possible usernames. This module
adds the possible names to the buddy list. A base name is entered into the input
window. Text is then added to it that then creates different variations of the base
name. The base name was attained from a census of the most popular names
chosen. The names used were first names, last names, and a combination of first
and last. The result was then used as a base name. If the name was used
before, it will not be used again. This is verified by querying the database
(users_online table). The resulting names are then added to the buddy list. The
new names are also stored in the database so it is not possible to use the same
name multiple times. After the name is added to the buddy list, it is then sent to
the MSN server to let the user know that he has been added to the buddy list. If
the user acknowledges this, he accepts it and makes it possible to be seen when
online, etc.
Several modules were created to select, delete, insert, and update data on
the tables. The connection is made through Java’s JDBC.
13
4.2.1 JMSN Main Functionality 4.2.2 Main Interface
FIGURE 5: MAIN INTERFACE OPTIONS
The Main Interface contains the usual options that are common with other
chat clients. The general options were modified to include the option of adding
several users (Add several buddies).
General options
User Logged on
Groups
14
4.3 Chat Interface
FIGURE 6: CHAT DIALOG
Response button is used to return one response based on the received response
Get Conv button is used to set up conversations between two other users
auto-res button is used to automatically respond to all responses
15
4.4 Main Interface FIGURE 7: MAIN INTERFACE
Figure 7 shows the links between the objects in this project. The Main
object starts the application. After the main Object is initiated, an instance of
MainFrame
MSNMessenger
MsnAdapter
BuddyTree
Main
LocalCopy
UserStatus
MSNMenuBar
EventViewer
AddConfirmDialog
NotificationProcessor
DispatchProcessor
JScrollPane
MsnFriend
Hashtable
BuddyGroup
LoginSplash
BuddyList
ActionGroup
LocalCopy
MsnFriend
MsnFriend
MSNMessenger
MainFrame
BuddyTree
MSNMessenger
MainFrame
MsnListener
16
MainFrame is created. The MainFrame creates instances of listeners and other
objects that are shown in the above diagram. These objects create the main
interface that can be seen in Figure 7. The MSNMessenger object that is created
initiates listeners such as the MsnListener. The MsnListener communicates with
the MSN server to establish a steady communication channel. The diagram
shows multiple instances of the same objects. However, these refer to the
existing object where a new object may be created, but it is cast to the existing
one. One-way to explain this is they are just images of the initial object.
Below are some of the main modules used in this project:
4.4.1 ChatDialog :
This is responsible for parsing conversation received from the
MSNlistener. This module formats the text sent out to the person involved in the
chat. It has the interface to produce an automated response depending on the
response received. It also is used to relay conversations between two MSN
users. This module is also used to send files. Lastly it is used in the process to
determine the most appropriate response to a statement.
4.4.2 BuddyTree :
This module keeps track of the buddy list it communicated with the
AbstractProcessor module. It receives or transmits updates to the buddy list.
This object keeps the actual MSN user objects for the task.
17
4.4.3 BuddyList:
This module stores all the buddies, groups, and the information
received from MSN. This structure also sorts the users in alphabetical order. This
sorting is done in the individual groups.
4.4.4 AbstractProcessor :
This listens for information from the MSN server. Information that is
listened for are user statuses (online, away, offline, etc), user information, etc.
4.4.5 MSNMenuBar:
This module represents the menu options located on the main
application. This module actually transfers control to the actual application when
an option is chosen.
4.4.6 SelectCommon:
This class makes calls to the database to find common words.
4.4.7 SelectResponse:
This class makes calls to the database to find the response that is
similar to the buddy’s response.
4.4.8 AddFriendsDialog:
This module accepts input from an input dialog window and uses it as a
base. This base is then incremented with text. The resulting text is then added to
the buddy list. The Buddy tree object then sends the list to the MSN server which
checks for the newly used id’s status.
18
4.4.9 ChatArea: This class is responsible for displaying the chat area and keeping track
of the responses between the parties chatting.
4.4.10 MsnListener: This class is responsible for relaying all of the actions that are
communicated with the MSN server.
4.4.11 MsnFriend:
This object contains all the information pertaining to the individual
users and buddies. This is the object that is used to on the buddy tree to
represent all of the individual buddies.
4.4.12 Main:
This class is the main class that starts the application. This class
initiates the listener’s class. It also initiates the classes that logs in the user to the MSN server.
4.4.13 MSNMessenger :
This class is responsible for all the events that are part of the MSN set
of events. The common events are unread mail, add buddy failed, who added
me, who removed me, file sent, file received, instant message received, etc.
4.4.14 SwitchboardSession:
19
This module keeps track of all of the current activities. These activities
are relayed to the MSN server. Some of the activities are current conversations,
invitations, receipt of files, sending of files, processing of messages, who is
typing, who joined the conversation, etc.
5 Effectiveness of the results
The effectiveness of the resulting application is measured on the
conversation that is collected from the initial users. Using simple questions and
responses, the effectiveness of the application is evident. However, after chatting
with the AI Chat client and in depth conversation is reached, we have found
some of the responses were unrelated. We chatted with some of the other AI
chat clients online and found that this was a common problem. If the AI client
does not understand a response or question, a vague or unrelated response is
returned.
6 A Comparison of other AI Chat clients
Some of the other AI Chat clients are A.L.I.C.E created by A. L. I. C. E.
Artificial Intelligence Foundation, the Eliza program, Ella, the winner of the 2002
Loebner Prize Contest, and The Electronic Brain AI Bot. Later in this section,
comparisons will be made with some of these AI chat clients.
A.L.I.C.E. is an artificial intelligence natural language chat robot
based on an experiment specified by Alan M. Turing in 1950. The A.L.I.C.E.
20
software utilizes AIML, an XML language we designed for creating stimulus-
response chat robots. Some view A.L.I.C.E. and AIML as a simple extension of
the old ELIZA psychiatrist program. The comparison is fair regarding the
stimulus-response architecture. However, the A.L.I.C.E. bot has at present more
than 40,000 categories of knowledge, whereas the original ELIZA had only about
200. Another innovation was provided by the web, which enabled natural
language sample data collection possible on an unprecedented scale [6].
A.L.I.C.E was first implemented in 1995 using SETL, a language based on
set theory and mathematical logic [6]. After chatting with A.L.I.C.E, we found that
this project shared common problems with ours. Common statements and
questions are not a problem for A.L.I.C.E or this project. However when the
question or statement over steps the boundary of being common, the response is
vague and sometimes not related. On the technical side, A.L.I.C.E has a web
user interface; whereas, this project is a complete chat client with AI functionality.
The A.L.I.C.E application was written in Java, same as this project, and the
questions or statements are stored on flat files. A.L.I.C.E has the flexibility of
adding a database. The database that the website recommends is MYSQL.
A.L.I.C.E also uses XML for the logging of data to the files. In regards to the
structure of this project and A.L.I.C.E, it is a chat client that can be accessed
through a browser or a local client. It uses similar client server technology as this
project. A.L.I.C.E is not connected in any way to a chat service such as ICQ,
MSN, or Yahoo. In contrast this project is connecting to MSN. Both A.L.I.C.E and
21
this project go through a state of acquiring data (questions or statements) to be
used for actual conversations.
Joseph Weizenbaum in Communications of the ACM described the
original ELIZA in January 1966. ELIZA was one of the first programs that
attempted to communicate in natural language [3]. After examining snippets of
conversation between ELIZA and a person, it seemed like a conversation
between a psychiatrist and a patient. A good portion of the responses are a
manipulation in conjunction with other words of the actual question or statement.
However this could have been due to the lack of saved responses on the parts of
conversation that were examined. The application finds keywords and patterns in
the statement and creates a response with the matches that are found. The
keywords are given weights and based on these weights a response is selected.
ELIZA is very similar to this project in the way that the response is selected. The
versions of ELIZA that we researched were scripts that ran in a web browser.
Similar problems were found with the responses when an actual chat session
was started. However, what makes ELIZA a more rigorous chat client is the way
an answer is produced when there is no related material stored. As mentioned
before, a manipulation of the input statement or question is manipulated to
produce the output response in this case.
“Ella is the winner of the 2002 Loebner Prize Contest for "Most Human
Computer". She is a charming on-line chatterbot with an interface using multiple
images and text display boxes. Ella can play full-featured Blackjack, tell
22
I Ching fortunes, and performs various useful functions, all with natural language
interaction. A lexical database with more than 120,000 entries is used to assist
her knowledge and usefulness.”[9] After chatting with a version of Ella, it seemed
that the application is used for learning purposes. There are versions available
for different purposes such as math, games, books etc. These applications are
loaded with learning information and then distributed. These applications provide
an interactive way for people to learn materials on different topics. The source is
not available for download so an analysis of the architecture could not be done.
This application combines voice, images, and video in its responses. It uses
voice recognition software to handle voice conversations. Ella uses Databases to
store the entries for the different books.
7 Issues and problems
Finding the appropriate chat client to work with was very difficult. The
sources that were considered have been found to be incomplete or riddled with
problems. It took some time to understand the code and become familiar with the
clients. Some of the clients were found to be functionally inflexible, thus were not
considered. The client that was chosen was found to be inflexible towards this
project in some ways. The only way possible to find out if someone is logged on
is by adding the person to the buddy list. The reason for this is that objects are
passed between the client and the MSN server for the entire buddy list so it can
be refreshed. This makes it impossible to send only one user to find out his or
her status. Due to this reason, random users are added to the buddy list, which
23
only has a capacity of approximately 100. The user must add the person to the
buddy list, which may or may not happen. After adding 100 users to the buddy
list and waiting a few days to be added, it was disappointing to discover that only
a few users were found to be online. This restricted the amount of people
available for conversation. For this reason a very large amount of conversation
data was not gathered. This may have been a restriction that was instigated by
MSN. Another possibility was to send a random message to a user without
finding out if this person was logged on. After exploring this, it was found that this
would only work if the user (MsnFriend object) object came from MSN. This
resulted in the buddy list being the only way to communicate with random users.
The data acquired from the conversations was very raw and could have
been cleaned up to be more effective. Some users still use full words in their
online conversations so good data can be acquired from these users. If one user
sends out more than one question or statement, or responds slow to a statement,
this would cause a mismatch when the data was saved. To expand on this
consider Figure 8 below:
User1 User2
1 Hi How are you doing?
2 What did you do today?
3 Not too much
4 I am great
FIGURE 8: TABLE OF UNMATCHED PAIRS
24
Looking at Figure 8, it is not possible to accurately save the response that
applies to the question unless they are in order. Sometimes one user may
respond to something that was not related to the previous statement this could
cause strange or unrelated pairs in the database.
Difficulty was met when trying to initiate conversation between two
strangers. Most of the unsuspecting users spent time trying to find whom they
are chatting with than actually chatting. After finding out that the other person
was just as clueless about who they were chatting with, the conversation usually
ended. Due to this reason in-depth conversation was very difficult to attain.
Collecting data from random users online can have its setbacks. Users online
can use abbreviations, instead of full words, which may cause some
complications with the parsing algorithm. Since only some people online uses
abbreviations, unless there were entries in the database with these
abbreviations, a good match would not be found. Some of the common
abbreviations were added to the common_words table so as to exclude them
from the search.
Gaining familiarity with MYSQL from an administrator and developer
perspective was challenging but a very rewarding experience. In this project we
have gained the knowledge of setting up a database, assigning privileges,
creating indexes etc.
25
8 Future enhancements
Refining the acquired conversation data will give the responses more
effectiveness. Finding a way to remove the responses that are not properly
paired would improve the responses. By creating a method to find only users that
are online would enable a greater accumulation of conversations. Refining the
database of common conversations would increase the effectiveness of the
responses.
9 Conclusion
The effectiveness of this application is dependent upon the data
(conversations) that is captured and stored as reference. Storing structured and
intelligent conversation that was created would increase the complexity and
intelligence level of the responses.
Several issues may have influenced the data captured. One of these
issues is that sometimes responses are not always directed to the last response.
Some responses may be arbitrary or a response to a previous statement.
However, when these responses were stored, they were paired with the previous
or current response incorrectly.
The method that was used to capture data was discussed in this paper.
This method may not have been the most effective way to store data. The reason
is most people chatting online who do not know each other will spend more time
figuring out who the other person is instead of actually having a meaningful
26
conversation. When they realize that they are talking to a stranger they will end
the conversation. This method was successful in capturing basic conversation
responses but not in-depth conversation responses.
In this project, an attempt was made to capture natural conversation
responses. However this data could be altered to make it more meaningful.
Some responses captured may be crude and unpleasant. These could be
removed, leaving only the acceptable responses. These are customization steps
that would be important if a more focused purpose is determined.
10 References 1. A. M. Turing (1950) Computing Machinery and Intelligence. Mind 49: 433- 460. 2. Deitel and Deitel, Third edition, Java™ How to Program, Prentice Hall 1999,
Upper Saddle River NJ 07458. 3. http://chayden.net/eliza/Eliza.shtml 4. http://freshmeat.net/. 5. http://sourceforge.net/ 6. http://www.alicebot.org/ 7. http://www.botspot.com/search/s-chat.htm 8. http://www.codeproject.com/useritems/AI_Chatbot.asp 9. http://www.ellaz.com/AI/ 10. http://www.realtor.org/WebIntell.nsf/0/9d9b6f0d9a364b7886256aa900510d4d?O
penDocument 11. Joseph Weizenbaum: ELIZA - a computer program for the study of natural language communication between man and machine. Commun. ACM 9(1): 36-45 (1966) 12. Vikram Vaswani, Pamela Smith , MySQL: The Complete Reference, , McGraw- Hill Companies 2002, 2100Powell Street, 10th floor Emeryville, CA 94608.