Post on 12-Sep-2014
description
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
WEB MINING PROJECT OF 531A
SUMMERY REPORT
Team member: Yi-Jen Ho
Yi-Ling LinHoward R. Zhao
Ying Ding
Instructor: Prof. Hsinchun Chen
12/16/2005
INDEX
1. Introduction................................................................................32. Research Questions....................................................................4
2.1 Friendly User Interface..........................................................................................4
2.2 Web-service APIs...................................................................................................5
2.3 Database and Data Mining Algorithm...................................................................5
3. Literature Review......................................................................64. Research Design.........................................................................7
4.1 Functionality..........................................................................................................7
4.2 Architecture............................................................................................................7
5. Findings.......................................................................................85.1 Database Schema / Data Mining Algorithm..........................................................8
1
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
5.1.1 Database Design.........................................................................................9
5.1.2 Schema of Data Tables................................................................................9
5.1.3 Data Mining Algorithm.............................................................................11
5.2 APIs......................................................................................................................11
5.3 User Interface.......................................................................................................12
5.4 Sample Scenarios.................................................................................................13
6. System Novelty.........................................................................216.1 Data Mining Algorithm........................................................................................21
6.2 APIs......................................................................................................................21
6.3 User Interface.......................................................................................................21
6.4 Compare with others............................................................................................22
7. Conclusion / Future Directions...............................................22Reference.......................................................................................25
1. Introduction
Standing on the Giant’s shoulder, we can always look further and more widely. Our team,
encouraged by the promising access huge, open data warehouses of Amazon, eBay, and
Google, , came to a creative e-business idea: provide a platform for users to gather
specifically useful information for their special products to sell or buy.
In order to make our dream come true, we did our all efforts to establish a powerful
website to fulfill the potential requirements of our target users and named it “Wishsky”.
We expect our product, Wishsky, could provide complete production information, such as
retail prices on Amazon, auction prices and seller details on eBay, and hot news of certain
items. In the future, Wishsky will become an all purpose community not only to serve
general Internet users but also to provide bring manufacturers and other businesses
important information.
According to above brainchild, we consider following five perspectives to construct
the business model of Wishsky: target customers, services provided, potential business
2
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
partners, information resources and financials. Our target customers are users who want
to compare prices before buying, who prefer buying products online, and who prefer
applications which emphasis community interaction. Moreover, we provide services
including product information gathered from eBay and Amazon, Google searches related
to customers’ wish items and the consuming trend information.
As to potential business partners, we expect to collaborate with e-market places such
as Yahoo! Auction, and online retailers as well as Walmart.com. In addition, we will get
information resources from e-Bay, Amazon, Google, and our business partners. The
financial opportunities in the platform would be positioned on different business values.
Basically, in the short term, we may attract some advertisements and our statistics
information to sustain the operation overhead of Wishsky.
Besides, we are going to collect information from the “Wish-Lists” on Amazon and
from our own users. After making some statistical analysis, we will provide precise
potential market information for sellers, like the catalog of the most popular products, the
acceptable price range, the geographical distribution of buyers, etc. Furthermore, we will
serve the buyers by mining meaningful information for them to get the products with the
cheapest price and best services.
Based on mentioned business model, Wishsky is implemented by accessing API
(Application Programming Interface) of Amazon, eBay, and Google, by presenting a user
friendly environment, by constructing complete supporting database, and by adopting
efficient data mining algorithm. It is a critical success factor for Wishsky to combine
above components closely and effectively so we have to overcome many research
questions which are detailed as following.
3
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
2. Research Questions
After overall business model of Wishsky is initiated, some important research challenges
should be overcome. These challenges could be categorized in three topics, friendly user
interface, web-service APIs, and data mining algorithm.
2.1 Friendly User Interface
Because Wishsky is design for supplicated Internet users as well as aged and young users,
it is very important for Wishsky to build up a friendly environment to serve multi-aged
users. Therefore, there are three major issues of user interface.
First, the operation processes of Wishsky should be very easy to use, and there
should be some clear instruction to guide users with diverse information literacy to
perform powerful functionality of Wishsky. Second, interesting and lovely interface is
critical for Wishsky to attractive more traffic. Last, well-design interface should combine
and present all functionality in Wishsky.
Therefore, in Wishsky, lots of easily understandable pictures and figures will be used
to teach first time users how to operate functions so users do not spend much time to do
“try-and-error”. Meanwhile, lively interface will promote the reputation of Wishsky
significantly.
2.2 Web-service APIs
According to initial plans, there are three APIs which are implemented in Wishsky, and
these APIs come from Amazon, Google, and eBay. Basically, APIs of each web-service
4
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
provider need to be tested because there could be some conflictions between APIs and
system architecture.
Moreover, the combination of different APIs provided by different web-service
providers should be paid more attention. For instance, how to integrate different data
formats of heterogeneous APIs is critical technical issue to solve.
In order to achieve the planned functions, some Amazon APIs, such as customers’
wish-list retrieval and item search, are implemented in Wishsky. Furthermore, available
auction search of eBay API is used to provide users valuable auction details. As to
Google’s APIs, new search is important to be adopted to search hot news related to
products.
2.3 Database and Data Mining Algorithm
In order to provide effective recommendation mechanism of related product for users, it
is important to construct correct database schema and to apply proper and efficient data
mining algorithm. Fortunately, there are tons of well-designed DBMS and data mining
related applications/program packages implemented easily by doing some revisions. As a
result, significant are how to choose proper DMBS, how to design effective database
schema, how to decide which type of data mining algorithms form 4 major types,
predictive modeling, database segmentation, link analysis, and deviation detection, and
how to choose proper algorithm package to implement into Wishsky.
In addition, time-complexity of algorithm is another major issue for data mining
implementation. Imagine that it is time-consuming to execute recommendation algorithm
if the volume of data is extremely high so adopting real-time process or back-end process
significant affects overall performance of Wishsky.
5
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
3. Literature Review
How to expand the market is always one of the most concerned issues in business. The
initiative of procuring customers’ wishes has been proved to be one of the most efficient
ways to understand what the market needs, and this has been widely adopted in business.
Based on the customers’ wish-lists, business distributes to them their product information
via various channels, like email, personalized dynamic web content.
This strategy, on one hand, provides business a useful channel for advertising; on the
other hand, customers get informed about those products they are interested in. Tons of
businesses have integrated this method in their marketing strategies, like www. Half.com.
However, many businesses are not satisfied with this simple application of information
from customers’ wish-lists, they move further to dig out more value in it. Various
recommendation algorithms, such as association rules and clustering algorithm, are
designed to figure out the items customers may possibly be interested in according to
their shopping history or the items which already exist in their wish-lists so business may
develop a wider market. The logistics behind recommendation rules are different from
one another, which may come from market analysis, or studies in social science or
psychology, yet they are all created to expand the sellers’ market. The typical
representative and also one of the pioneers to deploy this strategy is Amazon.
No matter how innovative techniques the aforementioned businesses apply to make
full use of customers’ wish-lists, they share one thing in common - they are trying to sell
what they want to sell. However, our project set out to provide a quite different kind of
service to customers who create wish-lists in our website, where they can really enjoy the
pleasure of being served like God.
6
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
4. Research Design
4.1 Functionality
The main function of the Wishsky application is to allow users to add and manage
Amazon items in a feature-rich wish-list based setting. There are two ways users can add
items to their wish-list. The first way is to search for a wishlist on Amazon using the
email address. The second way is to search for items by item type (i.e. Book, DVD, etc)
and keywords. In both ways, a list of items will be displayed with thumbnail image (if
available), title, and listing price. There’s also a link to the item on the Amazon webpage,
so users can see more details about the item. To add an item to the Wishsky wish-list,
select the checkbox next to it and click the “Add Items to this Wishlist” button. A user
can also delete items from his or her wish-list by clicking the “Delete” button next to the
item.
The next major function of the application is the integration with EBay and Google.
Once a user is done managing the wish-list, the next webpage shows the items on your
wish-list. Above it is a list of news items from Google related to a wish-list item.
Clicking in a wish-list item displays a popup with auction items searched from EBay.
The bottom part of the page displays recommended items. Recommendations are figured
out three ways: by Association rule, popular items from a user’s friends list, and most
popular items from all users.
4.2 Architecture
The above diagram displays the basic architecture of the Wishsky application. Wishsky
is a Java application, and uses a variety of java libraries to interact with its various
7
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
components. Users access the application through a web browser (i.e. Firefox, Internet
Explorer), and the web browser uses HTTP requests to interface with our application,
which is hosted in a Tomcat Web container. Using JDBC, Wishsky interfaces with a SQL
Server database. The database is used for saving customer information, and wish-list
items. It is also used for data mining. Finally, Wishsky also connects to EBay, Google
and Amazon via Web Services. Using SOAP calls, the application is able to dynamically
request and receive information from each site.
5. Findings
After doing our all efforts to construct basic architecture of Wishsky and link all
important components together, we are pleasant to detail what we found during the
process of implementing Wishsky.
8
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
5.1 Database Schema / Data Mining
Algorithm
In order to achieve effective and efficient operations within functionality of
Wishsky, there is a need to establish a supporting database to record users’ data or to store
product related information which retrieved from Amazon, Google, or eBay. Moreover,
based on the consideration of efficiency, the back-end recommendation of Wishsky
implemented by association rules also needs database to record rules which are generated
by Apriori algorithm. Hence, there are two groups of data tables which are detailed as
following.
5.1.1 Database Design
The first group of data tables is used to store users related data, such as their basic
profiles, their wish-list information, items in users’ wish-lists, and theirs friend lists. The
second group of data tables is used to record association rules which are generated by
back-end data mining algorithm everyday. As following figure shows, the relationships
between data tables present how Wishsky record users’ related information and data
mining results.
9
Customer
Friend
Wishlist Item
Association
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
5.1.2 Schema of Data Tables
Table Name: Customer
PK C_id int 4
C_account char 20
C_pwd char 20
C_amazon_email nvarchar 50
C_amazon_listid char 10 Null
C_ebay_id nvarchar 50 Null
C_ebay_pwd nvarchar 20 Null
C_first_name nvarchar 20
C_last_name nvarchar 20
C_gender char 1
C_marry char 1
C_birth_year int 4
C_birth_month int 4
C_birth_day int 4
C_profession nvarchar 15
C_income char 6
C_state char 2
C_city nvarchar 15
Table Name: Wishlist
PK W_id int 4
FK C_id int 4
W_name char 20
Table Name: Item
PK I_id bigint 8
FK W_id int 4
I_amazon_id nvarchar 50
I_name nvarchar 50 Null
10
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
I_quantity int 4 Null
I_priority int 4
I_lowestprice decimal 9
I_set_lowprice decimal 9 Null
I_set_highprice decimal 9 Null
Table Name: Friend
PK F_no bigint 8
C_Account_one char 20
FK C_Account_two_id int 4
C_Account_two char 20
FK C_Account_two_email nvarchar 50
C_Account_two_W_id int 4
F_Nickname nvarchar 100 Null
Table Name: Association
PK A_no bigint 8 0
FK I_one char 20 0
FK I_two char 20 0
A_confidential decimal 9 Null
5.1.3 Data Mining Algorithm
Wishsky is very friendly because it provides recommended production information,
including product picture, title, and retailer price on Amazon, to users so they can notice
what they need but forget. In order to recommend proper products for users, there are
three recommendation mechanisms in Wishsky, and they are “Associated Items”,
“Friend’s Wish-Items”, and “Hot Product”.
First, “Associated Items” is implemented by overwriting Apriori algorithm in
WEKA software package. Like what is mentioned before, we decide to do data mining
process with back-end method because of efficiency consideration, and the back-end
method was shown in following picture. Apriori is commonly used to figure out
association between items within different transaction records so we adapted it to do first
type of recommendation.
11
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
Second, we implement “Friend’s Wish-Items” by calculating hot products between
your friends’ wish-lists because we assume that birds of a feather flock together. As a
result, Wishsky recommends what your friends like to you. Last one is hot product
recommendation. By intuition, the most popular items are recommended to users.
5.2 APIs
When these three major Internet companies (eBay, Google, and Amazon) announced that
they would provide access to their wealth of data free of charge, the Internet community
applauded them. The way they each decided to allow access to their data is in some ways
very similar, but also different. Fortunately, they all provided a SOAP interface using
Java, and sometimes, they provide the java libraries themselves. This made it easier to
integrate each web service with the Wishsky application. Another similarity is that we
were given unique ID strings to be attached to each SOAP request. Through this method,
each company could monitor and limit the number of requests made to their system by
specific applications. During our testing and development, the request limits were never
an issue, but if our application were to be used by hundreds, or even thousands of
customers, this would be a major issue.
Because each Web Service provides different levels of functionality for users, there
was noticeable different level of effort required to setup the interface. By far, EBay took
the most amount of time, because not only does it allow searching of items being
auctioned, but it also allows bidding and posting items to sell. What this meant is that
EBay requires not one, but three IDs, a real eBay account, a registered runame (a unique
name of our Wishsky application), and a long unique string (token) associated with the
EBay account, just to do just the simplest search. The scope of our application did not
involve bidding or selling items on EBay, only searching. This meant we only needed
one EBay account and token string was needed for the whole application. Otherwise,
12
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
each Wishsky user would be required to register an EBay account and go through an
authorization process. In comparison, Amazon and Google especially, were relatively
easy to setup, was not without its difficulties.
The good folks at Google provided a jar file with the necessary java classes to
quickly access Google searches. However, we found out that the supplied jar file was not
compatible with Tomcat. I was forced to discard the jar, and create my own SOAP
classes using Apache Axis WSDL2Java function. Amazon is more similar to EBay,
because we use it to search for Items, either through an Amazon Wishlist, or by keyword.
However, it does not allow posting items to sell, so there’s no extensive authorization
process.
5.3 User Interface
The interface design rationale of the platform is giving our customer hope and making
their dreams come true. Because we believe everybody is happy when he or she sees the
beautiful rainbow in the sky, we design the interface with sky and rainbow. We also hope
our customers are happy when they are in our website. So we design two little happy boy
and girl to welcome our customers and let our customers have a family feeling in our
website. In short, we will design the interface based on these expectations:
Give user the impression of being in a dream
Make the interface appealing to multiple age groups
Family friendly
5.4 Sample Scenarios
When a new user visit to Wishsky, and he can easily find what products are hot ones. If
click the picture of the product, it will directly link to Amazon related page so that he can
get more detail about that products. Meanwhile, Wishsky provides important news and
reviews information for hot products. After user clicks on the title of the production, right
frame will show him the lists of related links which are generated from Google search.
13
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
For instance, he would like read the first news. Pop up windows directly like to the
webpage.
By the way, in this page, Wishsky
also provides the function of searching
wish-list in Amazon by interring
correct email address.
After entering basic data and click submit, user, as a new register, will see a user
14
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
friendly page because there is a clear guide to teach first time users how to create his own
wish-lists on Wishsky. First, user can directly import desired items from his Amazon
Wishlist according to his email address for Amazon. After successfully import it, he can
choose items and add them to my wish-list.
15
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
16
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
On the other hands, users can user type keyword of product title and select product
categories to search products form Amazon. For instance, he would like search “Sony T7
Camera” related products. Easily, he can find what he want and add it into his wish-list.
17
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
18
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
If users finish Wishlist initial setting, they just click done and go to next page. Then, they
can enter a personalized page. Wishsky tries to build up an environment which is just for
you so that you can see you name, cute icon based on your gender, and your wish items
on this page. Again, Wishsky provides critical product information based on your wished
items. Moreover, when users click on item title, a list of available auctions of certain
items will pop up. If users want see more detail, just click the link to eBay.
19
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
In Wishsky, users can maintain their basic data, including password, personal
profiles, and friend lists. Users just need to input their friends’ ID and added into their
friend list. After finishing it, they can easily know what their friends want for Christmas
gifts.
20
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
In addition, our Wishsky provides three different mechanisms to recommend users
something they might be interesting in. The first one is based on the result created by
association rule algorithm. Initially, Wishsky will show us recommended items which
come from association rule based on our customers’ Wishlist.
The second type is based on your friends’ Wishlist so Wishsky recommends what your
friends like to you. The last one is based on hot item within our websites so that we can
catch the trend of the fashion.
21
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
6. System Novelty
Wishsky is a website integrates product information form Amazon, Google, and eBay
and, at the same time, it provides friendly interface for multi-aged users to operate useful
functions. There must be something special within Wishsky so, in this chapter, we
discuss the novelty of Wishsky.
6.1 Data Mining Algorithm
Wishsky implements recommendation function based on the result generated from
Apriori algorithm, this kind of recommendation is most common one applied on nearly
all of electronic retailers, such as Amazon.
Simultaneously, Wishsky design another interesting recommendation mechanism
which depend on the most popular items which appear in users’ friends’ wish-lists
repeatedly. It is the unique innovation which successfully combines wish-list and friend
list because Wishsky is the only one community based on wish-list. As a result, Wishsky
has high potential to become a whole new community which provides lots of business
opportunities.
6.2 APIs
Our application is the only application known to interface with EBay, Amazon and
Google. This allows users to potentially receive a great deal of relevant information
quickly. Users can open multiple browsers at once, and search each application
separately, but it would tedious to search for many items, and there would be no way to
easily view all the information on one page.
22
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
6.3 User Interface
On the basis of the design rationale of family friendly interface, we not only have friendly
interface design for different customers but also provide clear directions for different
functions. Our interface is designed with soft color and naive logos. With different
functions, providing circumspect directions to lead customers using each function more
smoothly.
6.4 Compare with others
Our project is different not only in the technological way, but also our business idea is
quite interesting and fresh. This demonstrates in two perspectives.
The first is we create a new kind of business community. A business community is a
place where customers with similar business interest gather together. In the community
we create in our project, customers know each other’s wishes. This will provide a lot of
convenience to our customers. For instance, you can avoid the embarrassing time of
sending an inappropriate gift to your friend if you know what he or she really wishes.
And you will probably get very excited to receive from your friend a right gift you have
long been expecting for, as he or she gets a chance to know your wishes. As far as we
know, this kind of business community has never been seen before.
7. Conclusion / Future Directions
We are so glad to see what we have achieved after a long semester effort- a business
website with an innovative business idea, comprehensive and considerate service,
friendly and warm interface, most importantly, the optimistic huge market value. This is
23
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
from a cooperative and hard working team. At the beginning period of our project, we
proposed several different plans, and each sounded very reasonable, but we respected
each other’s idea, and discussed without biased in order to choose out the best one.
During the designing period, we all tried to do what he or she could do, and no one
talked about if the work distribution was fair or not. Furthermore, we helped each other
out, and never retained anything within him or her. We learned from each other, and
encouraged each other whenever we felt depressed. Due to the lack of practical
experience in developing complete systems, we did feel a lot of pain during the process
for this project, as we always came up more requirements than planned while having to
make the schedule. But the best thing was we had never given up, and we came to know
that things can be more perfect but never be completely perfect. The result was we were
very happy to see the good responses from the professor and our classmates during the
demo. We sincerely hope that all the users of “Wishsky” realize their wishes as we did.
In order to continue our great dream of Wishsky, we, of course, will put more efforts
in it so we have some future plans to create more value-added functions, design more
interesting interfaces, and even integrating more APIs of web-services. The major future
plans are summarized into three following directions.
Data Mining Algorithm
At this moment, Wishsky provides users product recommendation based on product
association relationships of all users’ wish-lists, product association relationships of
users’ friends’ wish-lists, and hot products in Wishsky. Although current solution of
recommendation is sufficient to fulfill the need of users, it still could be improved.
According to our future plan, we would like to adopt ART (Adaptive Resonance
Theory Network) to group our mass users based on their basic attributes so that is the
reason why we require new registers full out many personal attributes. Because ART is
an unsupervised machine learning algorithm, it is useful to categorize users.
Hence, users can be divided into different interesting groups first, and then Apriori
association rule algorithm is used to find out product association rules among wish-lists
of users who are in the same interesting groups. By doing so, Wishsky could do more
effective and accurate product recommendations.
APIs
Web Services is still a new technology, a novelty that applications (Web or Desktop) are
just starting to use. Most likely, this technology will evolve and change rapidly. To take
24
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
advantage of new features and bug fixes, Wishsky will also need to update to newer
versions on a regular basis. Also, because of time limitations, for now the Wishsky
application can only search for items on eBay and Amazon. Future development may
allow users to actually buy items or post items for auction in eBay. A key idea of
Wishsky is to allow users to see buyer demand before auctioning items, and to find out
which items are popular, and likely to receive more bids. It only seems only natural to
also allow users to quickly auction an item where they never have to leave our interface,
and may then be tempted to use a competing application’s software.
User Interface
We could provide more functions for our customers to personalize their own interface in
the future. Our customer can upload their personal photo to be the welcome logo, and
adjust the interface color by their preferences. In addition, they can design and change
each block of the interface with their interests and the interface will show all of the
interesting functions based on different customers.
Meanwhile, the platform will dynamically show different information and functions
for different customers based on their individual interests and settings. In briefly, we hope
we can provide a personalized space for our customers to enjoy all of information what
they want.
We hope that our product, Wishsky, not only make our clients’ dreams come true but
also ours. For the success of our project, we appreciate a lot for the instruction from Dr.
Chen, and the help from the teaching assistant Xin Li and other teams.
25
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project
Reference
Website
1. http://www.half.com
2. http://www.amazon.com
3. http://www.google.com
4. http://www.ebay.com
5. http://www.cs.waikato.ac.nz/ml/weka/
6. http://www.amazon.com/gp/aws/landing.html
7. http://www.google.com/apis/
8. http://developer.ebay.com
Paper
1. Yung-Hsiang Chiu, “The Study of Applying Neural Network and Data Mining
Techniques to Course Recommendation Base on E-learning Environment”, 6 June
2003
2. Guan-Hua Sun, “A Data Mining Methodology for Library New Book
Recommendation”, July 2000
26