Feature Grading (1)
-
Upload
ajay-tarade -
Category
Documents
-
view
215 -
download
0
Transcript of Feature Grading (1)
7/27/2019 Feature Grading (1)
http://slidepdf.com/reader/full/feature-grading-1 1/6
The Design and Implementation of Feature-Grading
Recommendation System for E-Commerce
Luo Yi International School
Beijing University of Posts andTelecommunications
Beijing, China, 100876 [email protected]
Fan MiaoSchool of Software Engineering
Beijing University of Posts andTelecommunications
Beijing, China, 100876 [email protected]
Zhou XiaoxiaSchool of Insurance and Economics
University of International Business and Economics
Beijing, China, [email protected]
Abstract – In this paper we present a novel approach named
Feature-Grading which is a comprehensive algorithm used to
make recommendation of commodities in e-commerce business.
It is a technique based on the integration of feature mining,
sentimental analysis, and the records of customer historical
behaviors. The overall process of Feature-Grading can be
separated into 5 key steps: 1.Extracting overall feature set of a
group category of commodities; 2.Extracting modifier set and
negative words set; 3.Acquiring specific feature set and featureassessment set; 4.Acquiring specific feature weight set;
5.Acquiring item weight set. After these 5 steps, we are able to
grade and rank all the items with an acquired grading equation.
Then the needed as well as top ranking items can be
recommended. Moreover, we utilize the real information of
mobiles and their reviews from the famous e-commerce website
Amazon.cn as our experimental data and discuss some important
results which reveal that the Feature-Grading really works well.
At last, we also briefly introduce the prototype recommendation
system we developed on the basis of Feature-Grading.
Keywords – Feature-Grading; Feature mining; Sentimental
Analysis; Historical behaviors; Recommendation
I. I NTRODUCTION
In the e-commerce, there are two major approaches forcustomers to meet items face-to-face. One is called“Customer-active” which is achieved by customers themselvesthrough some search engines. The other way is accomplishedby merchants with a kind of recommendation system torecommend commodities. We call it “Items-active”.
For “Customer-active”, what a customer enters in searchengines reveals what he/she wants. Existing search engines forcommodities utilizes the similar techniques as those fornormal web pages which is based on key-words matching,meaning that items saved in the database should be tagged
with enough key words. Most of such key words, however, aremanually appended by merchants. This mechanism is verylow-efficient. It is easy to neglect some vital features as well.If there is a system which can automatically mine out the keyfeatures, (i.e. the key words), of a group category of items,then it is possible to complete the marking process with lessmanual operation so as to improve comprehensive efficiency.This should be our first mission, since the mining of featuresnot only benefits the existing “Customer-active” searching
approaches, but also acts as the fundamental of our proposedrecommendation algorithm.
As for “Item-active”, we have more words to say becauseit executes the function of a recommendation system better.Since the birth of e-commerce, there has arose manyrecommendation algorithms. A latest and popular method iscalled Collaborative Filtering [1, 2, 3]. It has two typical
types, one is user-based and the other is item-based. The mainidea of user-based is that many users may have similarpurchasing behaviors so that they are put into a same group.Once a member has bought a certain item, this item will berecommended to other members in the same group. However,the item-based approach connects similar commodities ratherthan users together. If an item is purchased then a similar onemay be recommended. The integration of such two approachesachieves relatively good performance, resulting in the widelyuse of Collaborative Filtering algorithm in contemporary largee-commerce websites [4, 5].
However, such algorithm fails to consider diverseassessments and reviews after each item. Therefore sometimes
many low rating items are recommended, merely because theyare similar to what user has purchased. Hence, a better systemshould understand how to rank recommended commoditiesand provide both related and highly appreciated items.Apparently, it involves evaluation, which can only be done bycustomers in common sense. Therefore, our task is to analyzeon the customer reviews then extract their sentimentalorientation to accomplish the final grading and rankingprocess.
Besides, we also believe the current general model ofrecommendation will gradually become more personalized.That’s why we further proposed an improved algorithm whichcan make personal recommendation towards a specific
customer based on his/her historical behaviors.Thus, a more complete, reliable, and personalized
recommendation algorithm has been proposed in this paper onthe basis of practical business demands and existing systems’drawbacks. We call it Feature-Grading algorithm. Meanwhile,we also developed its corresponding prototype system. (SeeFig.1) Our Chinese experimental dates of multi-brand mobilesand their reviews come from Amazon.cn.
Proceeding of the IEEE
International Conference on Information and Automation
Shenzhen, China June 2011
978-1-61284-4577-0270-9/11/$26.00 ©2011 IEEE 236
7/27/2019 Feature Grading (1)
http://slidepdf.com/reader/full/feature-grading-1 2/6
Fig.1 The prototype system of Feature-Gradin
II. THE DESIGN OF ALGORITH
The overall process of Feature-Gradi
designed as Fig.2. At beginning, only comm
reviews are stored in database.
Fig.2 The overall process of Feature-Grading
By following the arrow we can attain t
equation for recommendation.
Now we’ll move on to the details.
A. Extracting overall feature set of
commoditiesThis critical is the base of the whole syste
First, we use the ICTCLAS (Institut
Technology, Chinese Lexical Analysis Syst
each review into independent words with pa
With further optimizing, we can obtain satistagging results. An example is as below.
Fig.3 An example of a splitted and tagged
algorithm
ng algorithm is
odities and their
algorithm
he final grading
a category of
m.
of Computing
em) [6] to split
t-of-speech tags.
ied splitting and
review
After getting the desired revie
typical features from these sepa
Chinese expression habits, feature
are tagged with “/n”.
We firstly come up with the
Mining, typically the Apriori A
features. This means is achieved b
appearing frequently in the reviews
once a customer made comments oare involved. Through the handling
Apriori Algorithm, we can even
feature set.
Another plain method is
Normally, features are always
frequently than other words in a se
these top ranked words to form the
but less reliable.
Actually we integrated these
properly manual optimization. W
satisfied feature set of a categor , where m is the nu
B. Extracting modifier set and ne For the following sentimental
recognize modifiers in reviews an
the splitted reviews, modifiers are
with “/a” and “/v”. Here we only c
namely those can only be positi
utilized a simple and effective
modifier synonym group in WordN
and its orientation.
In WordNet, synonyms are dis
we can simply treat words in the
sentimental orientation. (See Fig.4)
Fig.4 Polarized modifier synon
We initially pick up some qual
hand and mark them with label.
certain modifier, we firstly check
a seed. If yes, its orientation will d
need to traverse the synonyms o
WordNet. Once a synonym turns t
of the original modifier can be jud
Additionally, we will add a new m
, we now need to extract
ate words. According to
s are always nouns which
idea of using Association
lgorithm [7], to identify
y mining associated words
The reason to use it is that
an item, associated wordsof the whole reviews with
ually acquire appropriate
Word Frequency Count:
nouns and occur more
tence. We can simply filter
final feature set. It’s quick
two methods and applied
indicated the eventually
of commodities as ber of total features.
ative words setanalysis, we have to firstly
judge their orientation. In
always the words tagged
nsider polarized modifiers,
ve or negative. We have
way which involves the
et [8] to identify a modifier
tributed into one group, so
ame group have the same
m group in WordNet
ified modifiers as seeds by
Then when dealing with a
hether the modifier is just
rectly be judged; if no, we
f the desired modifier in
be a seed, the orientation
ed through such synonym.
difier into database so that
237
7/27/2019 Feature Grading (1)
http://slidepdf.com/reader/full/feature-grading-1 3/6
the stored seeds group can be enhanced
judgment accuracy can be meliorated. We
satisfied modifier set as .
Then we need to extract negative word
contribute to the final sentimental identifi
words are always tagged with “/d”, we ca
manually. The negative word set are indicated
C. Acquiring specific feature set and featuof an individual item
After obtaining the feature set of a w
commodities, we are able to acquire the speci
each individual item through their review
feature set is indicated as number of known features an item owns. In
database cannot cover all the features
uncommented features are unknown to us an
right to decide them subjectively. Therefore, i
can only be considered as they don’t have u
Obviously, is a subset of and dif
different . Then we will continue take senti
reviews to get assessment set.
Particularly, for the item, its spe
is , where refers
number. Now it’s time to summarize the nu
assessments, which is indicated as , an
negative ones, which is indicated as , for t
in order to attain the desired assessment set one mapping with the specific feature set .
We can put and into a table, Fig. 5
Fig.5 The specific feature set and feature assessment set
The question is how to get and .
two approaches: one is called Widow Mech
the other is called Syntax Matching (SM).
a. Window Mechanism (WM)
WM is a pragmatic and applied metho
Chinese lexical analysis and part-of-speech
proposed by Fan Miao [9]. Its principle sche
WM makes use of the nearest-asso
meaning that the nearest modifier is suppo
greatest influence on the headword, nam
according to most idiomatic expression ways
started from the headword to search for the
and identify its orientation in both forwar
direction within a settled range which is t
Window. But it is not adequate since there m
and following
define the final
since they also
cation. Negative
n pick them up
as .
e assessment set
hole category of
fic feature set of
s. This specific
, where k is the
fact, reviews in
f an item, so
we do not have
tems in database
known features.
erent item has
ental analysis on
ific feature set
to the feature
mber of positive
the number of
he feature
that is one-to-
ere
.
is an example.
of an example item
Here we provide
nism (WM) and
on the basis of
agging which is
e is as Fig.6.
iated principle,
sed to have the
ely the feature,
of Chinese. It is
nearest modifier
and backward
he so-called Big
ay exist negative
words or double negation that can
decide to introduce the WM again,
modifier as the new headword a
named Small Window. Similarly,
headword to search for negative
however, within the intersection o
discovered negative words is eve
maintains the same; while the orig
the number is odd. Thus theimproved.
Fig.6 The principle scheme of
b.
Syntax Matching (SM)
SM is a completely different
is the training of syntax path. The
the shortest path from feature to m
To illustrate it, Fig.7 shows a synt
“Deposit” to the modifier “New”.
Fig.7 An example of a syntax path of
The syntax path ought to be: U
@CP#DOWN@CP#DOWN@IP
UP/DOWN means going up and
node, and # stands for the partition
We have utilized abundant na
training data to acquire 3319 distimodels which are stored in dat
compare the newly obtained sy
database. Once they match eac
relationship between the headwor
review can be recognized, so can
Besides, SM has an extra advan
machine learning, meaning that a n
in the database so that models will
will be increased.
change the orientation. We
where we regard searched
nd define a new range ,
it is started from the new
words in both directions,
f and . If the number of
n, the original orientation
inal orientation reverses if
identification accuracy is
indow Mechanism
ay from WM. Its core idea
syntax path here refers to
odifier in the parsing tress.
x path from the headword
common Chinese sentence
P@NP#UP@NP#DOWN
DOWN@VP, where
down, @TAG represents
etween different actions.
ural language materials as
ct types of syntax path as base. Therefore, we can
ntax path with those in
other successfully, the
and modifier in handled
he sentimental orientation.
age since it is based on
ew path can then be added
be upgraded and accuracy
238
7/27/2019 Feature Grading (1)
http://slidepdf.com/reader/full/feature-grading-1 4/6
As for the issue of negative words and double negation,
SM makes use of the similar approach as WM, so here we will
not repeat it.
Finally we integrated both WM and SM in our system
thus the actual analysis accuracy has been greatly improved.
The following figure is the result of analysis for a certain
review with these two approaches in our prototype system.
Fig.8 A sentimental analysis result with both WM and SM
D. Acquiring specific feature weight set Now we have already got specific feature assessment
set of the item. It is time to consider how to recommend
good items, which requires us to grade each item and rank
them.
A direct idea for grading is to grade each feature
individually for an item and find the sum as its final mark. For
the feature of the item, more positive reviews means
the feature are more appreciated, so we simply consider
as its grades. In a word, the total mark of the
item should be,
, (1)
where is the number of specific features of the item.However, to a particular customer, the importance of
different feature cannot be the same. Take a mobile for
example, a certain customer may pay more attention to its
price rather than whether it has access to the Internet. So we
have to distribute different weights to different features.
Specifically, for the item we can get a feature weight
set .
Fig.9 The customer_browser table To work out , we firstly introduce two tables. The first
one is called customer_browser table, which records a
customer’s historical purchase details. As long as the customer
buys a certain item, its features and their corresponding
number of mentioned times (both positive and negative) will
be recorded in the table. Fig.9 is a real example. The second
one is customer_preference table that reveals what a customer
has intently cared about. See Fig.10.
Fig.10 The customer_preference table Obviously, the integration of these two tables to some
degree reflects a customer’s interests as well as demands so
they will directly influence the feature weight set.At the beginning we only consider the situation where
these two tables are not empty. Set the frequency (number of
appearance times) of feature in customer_browser table to
be (i=1, 2, … , m), then the total frequency for all features
should be , where m is the number of all
features. Similarly, the frequency of feature in
customer_preference table is indicated as , and the total
frequency is . Now we focus on the item,
its item is . Its frequency in customer_browser table
is , so there is a ratio
. (2)
While in customer_preference table the frequency is and there is also a ratio
. (3)
Then we set the weight of feature as
. (4)
For those features that never appear in the two tables,
their weight will be 0 known from equation (4). It is
reasonable because we want to make those items whose
features that customer mainly cares about stand out when
doing recommendation.
However, if one of these two tables is empty, its influence
on the final should be set to nil, which means should
either be or . Furthermore, complete
emptiness reveals that the customer has never bought anything
or cared about any feature. It shows all the features have the
same importance for the customer and we will let the feature
weight set satisfy for the item.
Till now, we can get the specific feature weight set and
the total mark of the item is revised as,
, (5)
239
7/27/2019 Feature Grading (1)
http://slidepdf.com/reader/full/feature-grading-1 5/6
E. Acquiring overall feature weight set
Although equation (5) has considered the situation where
a certain customer may pay more attention to some particular
features, it is still not complete. There is a problem: in initial
database, the number of mentioned time of different features is
not the same. Direct using of equation (5) may weaken the
influence of those features seldom mentioned. Here is an
example:
Suppose item A and B have two features: quality and price. For A, its quality has 100 positive and 0 negative
reviews; its price has 10 positive and 0 negative reviews. For
B, its quality has 210 positive and 0 negative reviews; its price
has 0 positive and 100 negative reviews. According to
equation (5) and assume for both A
and B, then
,
,
,
,
,
,So .
Although A is praised in both features while B is
criticized on its price, their marks are the same. Apparently the
influence of price is weakened. Since “features are born to the
same”, we need to take measures to balance their influences.
For the feature of overall feature set , assume its
number of mentioned times (both positive and negative) in
reviews in database is . Then we let so
that the overall feature weight set can be got, which is
. Obviously it is one-by-one mapping
with overall feature set .
Every feature will be influenced by , so equation (5) is
changed to be
, (6)
With equation (6), the problem of previous example can
be solved,
,
.
Thus the grade of B is lower than A, which satisfies
common sense.
E. Acquiring item weight setAdditionally, we have to notice that different items have
different number of known features as mentioned above. Thisdifference can as well be too big to fairly grade and rank
items. To overcome this problem and balance the inequity,
different items should be assigned with weights and such
weights form a new set called item weight set, which is
, where t is number of items.
Assuming the number of features is for the
item, we have .Hence, equation (6) can be further improved as
, (7)
Now we eventually acquire the final form of grading
equation and we are able to use it for real grading, ranking and
recommendation.
III. EXPERIMENTAL RESULTS OF SOME MAJOR PROCESSES
After the design of the comprehensive recommendation
algorithm, we took advantages of the information of multi-
band mobiles from Amazon.cn (http://www.amazon.cn). Their
titles and reviews are stored in the database. On the basis of
these real data in life we tested some major processes in our
proposed algorithm.
A. Extraction of set , and Table I shows the experimental results of set extraction.
We then added these sets into our database for further use.TABLE I
R ESULTS OF SET EXTRACTION
Item Number
Review 15099
Overall features 91
Opinion Modifier 850
Negative 35
B. Judgment of sentimental orientation for reviewsTable II shows the result of orientation judgment for
reviews. Through the integration of WM and SM we can see
the final accuracy is around 86%.TABLE II
R ESULTS OF ORIENTATION JUDGMENT FOR REVIEWS
Item Number
Review 15099
Feature hit review 10267
Correct judgement 8829
C. Analysis of time complexity
Our approach has a time complexity of under the
worst situation, where t refers to the number of items and m
refers to the number of overall features. However, the existing
item-based collaborative filtering whose complexity can be
under the worst situation, where p is the number of
customers who bought the item. The comparison reveals our
method has much less time complexity than existing ones.
IV. THE IMPLEMENTATION OF PROTOTYPE SYSTEM
Based our proposed Feature-Grading algorithm, we has
developed the prototype system. (See Fig.1) It connects with
underlying database and keeps on recording a particularcustomer’s historical behaviors. Customer can use it to make
real-time recommendation and synthesized recommendation.
For real-time recommendation, a customer is asked to
input his/her preferred features. The system then will grade
and rank all the items based on the customer’s current wants
through the method of Feature-Grading.
Fig.10 shows the results of real-time recommendation
according to the customer Luo Yi.
240
7/27/2019 Feature Grading (1)
http://slidepdf.com/reader/full/feature-grading-1 6/6
Fig.10 The real-time recommendation to customer Luo Yi
For synthesized recommendation, since a customer’s
historical behaviors are saved, the system then directly usethose records to grade items. Fig.11 is an example of
synthesized recommendation to customer Fan Miao.
Fig.11 The synthesized recommendation to customer Fan Miao Besides, the module of sentimental analysis of review is
added into function list so that customer can make his/her own
review on some item. (See Fig.8) The new reviews will ofcourse influence future recommendation.
V. CONCLUSIONS AND FUTURE WORK
This paper has creatively proposed a recommendation
algorithm called Feature-Grading and further introduced its
corresponding prototype system. We mainly focused on the
design of 5 process of this algorithm and also discussed some
key results after experiments. These results revealed that the
Feature-Grading method works very well. It overcome some
drawbacks of existing recommendation systems and extended
their ability as well.
Our future efforts will be spent on the improvements of
sentimental analysis of reviews. We plan to expand the handle
range from simple sentence to compound sentence, including
transitional sentence, comparative sentence, and imperative
sentence and so on.
R EFERENCES
[1] P. Resnick, N. Iacovous, M. Suchak, P. Bergstrom, and J. Riedl,“GroupLens: An Open Architecture for Collaborative Filtering of Netnews,” In Proceedings of CSCW’94, Chapel Hill, NC.
[2] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl., “Analysis ofRecommendation Algorithms for E-Commerce,” In Proceedings ofACM E-Commerce, 2000.
[3] W. Hill, L. Stead, M. Rosenstein, and G. Furnas, “Recommending andEvaluating Choices in a Virtual Community of Use,” In Proceedings ofCHI’95.
[4] U. Shardanand, and P. Maes, “Social Information Filtering: Algorithmsfor Automating ‘Word of Mouth’,” In Proceedings of CHI ’95. Denver,CO.
[5] J. Konstan, B. Miller, D. Maltz, J. Herlocker, L. Gordon, and J. Riedl,“GroupLens: Applying Collaborative Filtering to Usenet News,”
Communications of the ACM, 40(3), pp. 77-87.[6] Hua-Ping Zhang, Hong-Kui Yu, De-Yi Xiong, Qun Liu, “HHMM-
based Chinese lexical analyzer ICTCLAS,” Proceedings of the secondSIGHAN workshop on Chinese language processing, 2003, pp.184-187.
[7] Rakesh Agrawal, Ramakrishnan Srikant, “Fast Algorithms for MiningAssociation Rules,” Proceedings of the 20th International Conferenceon Very Large Data Bases, VLDB, pages 487-499, Santiago, Chile,September 1994.
[8] George A. Miller, Richard Beckwith, Christiane Fellbaum, DerekGross, and Katherine Miller, “Introduction to WordNet: An On-lineLexical Database,” International Journal of Lexicography, 1990, pp.235-244.
[9] Miao Fan, Guoshi Wu, Jing Li, “Feature-Item Recommender Systemfor E-Commerce,” 2011 International Conference on Computer Controland Automation.
241