Feature Grading (1)

7/27/2019 Feature Grading (1)

http://slidepdf.com/reader/full/feature-grading-1 1/6

The Design and Implementation of Feature-Grading

Recommendation System for E-Commerce

Luo Yi International School

Beijing University of Posts andTelecommunications

Beijing, China, 100876 [email protected]

Fan MiaoSchool of Software Engineering

Beijing University of Posts andTelecommunications

Beijing, China, 100876 [email protected]

Zhou XiaoxiaSchool of Insurance and Economics

University of International Business and Economics

Beijing, China, [email protected]

Abstract – In this paper we present a novel approach named

Feature-Grading which is a comprehensive algorithm used to

make recommendation of commodities in e-commerce business.

It is a technique based on the integration of feature mining,

sentimental analysis, and the records of customer historical

behaviors. The overall process of Feature-Grading can be

separated into 5 key steps: 1.Extracting overall feature set of a

group category of commodities; 2.Extracting modifier set and

negative words set; 3.Acquiring specific feature set and featureassessment set; 4.Acquiring specific feature weight set;

5.Acquiring item weight set. After these 5 steps, we are able to

grade and rank all the items with an acquired grading equation.

Then the needed as well as top ranking items can be

recommended. Moreover, we utilize the real information of

mobiles and their reviews from the famous e-commerce website

Amazon.cn as our experimental data and discuss some important

results which reveal that the Feature-Grading really works well.

At last, we also briefly introduce the prototype recommendation

system we developed on the basis of Feature-Grading.

Keywords – Feature-Grading; Feature mining; Sentimental

Analysis; Historical behaviors; Recommendation

I. I NTRODUCTION

In the e-commerce, there are two major approaches forcustomers to meet items face-to-face. One is called“Customer-active” which is achieved by customers themselvesthrough some search engines. The other way is accomplishedby merchants with a kind of recommendation system torecommend commodities. We call it “Items-active”.

For “Customer-active”, what a customer enters in searchengines reveals what he/she wants. Existing search engines forcommodities utilizes the similar techniques as those fornormal web pages which is based on key-words matching,meaning that items saved in the database should be tagged

with enough key words. Most of such key words, however, aremanually appended by merchants. This mechanism is verylow-efficient. It is easy to neglect some vital features as well.If there is a system which can automatically mine out the keyfeatures, (i.e. the key words), of a group category of items,then it is possible to complete the marking process with lessmanual operation so as to improve comprehensive efficiency.This should be our first mission, since the mining of featuresnot only benefits the existing “Customer-active” searching

approaches, but also acts as the fundamental of our proposedrecommendation algorithm.

As for “Item-active”, we have more words to say becauseit executes the function of a recommendation system better.Since the birth of e-commerce, there has arose manyrecommendation algorithms. A latest and popular method iscalled Collaborative Filtering [1, 2, 3]. It has two typical

types, one is user-based and the other is item-based. The mainidea of user-based is that many users may have similarpurchasing behaviors so that they are put into a same group.Once a member has bought a certain item, this item will berecommended to other members in the same group. However,the item-based approach connects similar commodities ratherthan users together. If an item is purchased then a similar onemay be recommended. The integration of such two approachesachieves relatively good performance, resulting in the widelyuse of Collaborative Filtering algorithm in contemporary largee-commerce websites [4, 5].

However, such algorithm fails to consider diverseassessments and reviews after each item. Therefore sometimes

many low rating items are recommended, merely because theyare similar to what user has purchased. Hence, a better systemshould understand how to rank recommended commoditiesand provide both related and highly appreciated items.Apparently, it involves evaluation, which can only be done bycustomers in common sense. Therefore, our task is to analyzeon the customer reviews then extract their sentimentalorientation to accomplish the final grading and rankingprocess.

Besides, we also believe the current general model ofrecommendation will gradually become more personalized.That’s why we further proposed an improved algorithm whichcan make personal recommendation towards a specific

customer based on his/her historical behaviors.Thus, a more complete, reliable, and personalized

recommendation algorithm has been proposed in this paper onthe basis of practical business demands and existing systems’drawbacks. We call it Feature-Grading algorithm. Meanwhile,we also developed its corresponding prototype system. (SeeFig.1) Our Chinese experimental dates of multi-brand mobilesand their reviews come from Amazon.cn.

Proceeding of the IEEE

International Conference on Information and Automation

Shenzhen, China June 2011

978-1-61284-4577-0270-9/11/$26.00 ©2011 IEEE 236



Fig.1 The prototype system of Feature-Gradin

II. THE DESIGN OF ALGORITH

The overall process of Feature-Gradi

designed as Fig.2. At beginning, only comm

reviews are stored in database.

Fig.2 The overall process of Feature-Grading

By following the arrow we can attain t

equation for recommendation.

Now we’ll move on to the details.

A. Extracting overall feature set of

commoditiesThis critical is the base of the whole syste

First, we use the ICTCLAS (Institut

Technology, Chinese Lexical Analysis Syst

each review into independent words with pa

With further optimizing, we can obtain satistagging results. An example is as below.

Fig.3 An example of a splitted and tagged

algorithm

ng algorithm is

odities and their

algorithm

he final grading

a category of

m.

of Computing

em) [6] to split

t-of-speech tags.

ied splitting and

review

After getting the desired revie

typical features from these sepa

Chinese expression habits, feature

are tagged with “/n”.

We firstly come up with the

Mining, typically the Apriori A

features. This means is achieved b

appearing frequently in the reviews

once a customer made comments oare involved. Through the handling

Apriori Algorithm, we can even

feature set.

Another plain method is

Normally, features are always

frequently than other words in a se

these top ranked words to form the

but less reliable.

Actually we integrated these

properly manual optimization. W

satisfied feature set of a categor , where m is the nu

B. Extracting modifier set and ne For the following sentimental

recognize modifiers in reviews an

the splitted reviews, modifiers are

with “/a” and “/v”. Here we only c

namely those can only be positi

utilized a simple and effective

modifier synonym group in WordN

and its orientation.

In WordNet, synonyms are dis

we can simply treat words in the

sentimental orientation. (See Fig.4)

Fig.4 Polarized modifier synon

We initially pick up some qual

hand and mark them with label.

certain modifier, we firstly check

a seed. If yes, its orientation will d

need to traverse the synonyms o

WordNet. Once a synonym turns t

of the original modifier can be jud

Additionally, we will add a new m

, we now need to extract

ate words. According to

s are always nouns which

idea of using Association

lgorithm [7], to identify

y mining associated words

The reason to use it is that

an item, associated wordsof the whole reviews with

ually acquire appropriate

Word Frequency Count:

nouns and occur more

tence. We can simply filter

final feature set. It’s quick

two methods and applied

indicated the eventually

of commodities as ber of total features.

ative words setanalysis, we have to firstly

judge their orientation. In

always the words tagged

nsider polarized modifiers,

ve or negative. We have

way which involves the

et [8] to identify a modifier

tributed into one group, so

ame group have the same

m group in WordNet

ified modifiers as seeds by

Then when dealing with a

hether the modifier is just

rectly be judged; if no, we

f the desired modifier in

be a seed, the orientation

ed through such synonym.

difier into database so that

237



the stored seeds group can be enhanced

judgment accuracy can be meliorated. We

satisfied modifier set as .

Then we need to extract negative word

contribute to the final sentimental identifi

words are always tagged with “/d”, we ca

manually. The negative word set are indicated

C. Acquiring specific feature set and featuof an individual item

After obtaining the feature set of a w

commodities, we are able to acquire the speci

each individual item through their review

feature set is indicated as number of known features an item owns. In

database cannot cover all the features

uncommented features are unknown to us an

right to decide them subjectively. Therefore, i

can only be considered as they don’t have u

Obviously, is a subset of and dif

different . Then we will continue take senti

reviews to get assessment set.

Particularly, for the item, its spe

is , where refers

number. Now it’s time to summarize the nu

assessments, which is indicated as , an

negative ones, which is indicated as , for t

in order to attain the desired assessment set one mapping with the specific feature set .

We can put and into a table, Fig. 5

Fig.5 The specific feature set and feature assessment set

The question is how to get and .

two approaches: one is called Widow Mech

the other is called Syntax Matching (SM).

a. Window Mechanism (WM)

WM is a pragmatic and applied metho

Chinese lexical analysis and part-of-speech

proposed by Fan Miao [9]. Its principle sche

WM makes use of the nearest-asso

meaning that the nearest modifier is suppo

greatest influence on the headword, nam

according to most idiomatic expression ways

started from the headword to search for the

and identify its orientation in both forwar

direction within a settled range which is t

Window. But it is not adequate since there m

and following

define the final

since they also

cation. Negative

n pick them up

as .

e assessment set

hole category of

fic feature set of

s. This specific

, where k is the

fact, reviews in

f an item, so

we do not have

tems in database

known features.

erent item has

ental analysis on

ific feature set

to the feature

mber of positive

the number of

he feature

that is one-to-

ere

.

is an example.

of an example item

Here we provide

nism (WM) and

on the basis of

agging which is

e is as Fig.6.

iated principle,

sed to have the

ely the feature,

of Chinese. It is

nearest modifier

and backward

he so-called Big

ay exist negative

words or double negation that can

decide to introduce the WM again,

modifier as the new headword a

named Small Window. Similarly,

headword to search for negative

however, within the intersection o

discovered negative words is eve

maintains the same; while the orig

the number is odd. Thus theimproved.

Fig.6 The principle scheme of

b.

Syntax Matching (SM)

SM is a completely different

is the training of syntax path. The

the shortest path from feature to m

To illustrate it, Fig.7 shows a synt

“Deposit” to the modifier “New”.

Fig.7 An example of a syntax path of

The syntax path ought to be: U

@CP#DOWN@CP#DOWN@IP

UP/DOWN means going up and

node, and # stands for the partition

We have utilized abundant na

training data to acquire 3319 distimodels which are stored in dat

compare the newly obtained sy

database. Once they match eac

relationship between the headwor

review can be recognized, so can

Besides, SM has an extra advan

machine learning, meaning that a n

in the database so that models will

will be increased.

change the orientation. We

where we regard searched

nd define a new range ,

it is started from the new

words in both directions,

f and . If the number of

n, the original orientation

inal orientation reverses if

identification accuracy is

indow Mechanism

ay from WM. Its core idea

syntax path here refers to

odifier in the parsing tress.

x path from the headword

common Chinese sentence

P@NP#UP@NP#DOWN

DOWN@VP, where

down, @TAG represents

etween different actions.

ural language materials as

ct types of syntax path as base. Therefore, we can

ntax path with those in

other successfully, the

and modifier in handled

he sentimental orientation.

age since it is based on

ew path can then be added

be upgraded and accuracy

238



As for the issue of negative words and double negation,

SM makes use of the similar approach as WM, so here we will

not repeat it.

Finally we integrated both WM and SM in our system

thus the actual analysis accuracy has been greatly improved.

The following figure is the result of analysis for a certain

review with these two approaches in our prototype system.

Fig.8 A sentimental analysis result with both WM and SM

D. Acquiring specific feature weight set Now we have already got specific feature assessment

set of the item. It is time to consider how to recommend

good items, which requires us to grade each item and rank

them.

A direct idea for grading is to grade each feature

individually for an item and find the sum as its final mark. For

the feature of the item, more positive reviews means

the feature are more appreciated, so we simply consider

as its grades. In a word, the total mark of the

item should be,

, (1)

where is the number of specific features of the item.However, to a particular customer, the importance of

different feature cannot be the same. Take a mobile for

example, a certain customer may pay more attention to its

price rather than whether it has access to the Internet. So we

have to distribute different weights to different features.

Specifically, for the item we can get a feature weight

set .

Fig.9 The customer_browser table To work out , we firstly introduce two tables. The first

one is called customer_browser table, which records a

customer’s historical purchase details. As long as the customer

buys a certain item, its features and their corresponding

number of mentioned times (both positive and negative) will

be recorded in the table. Fig.9 is a real example. The second

one is customer_preference table that reveals what a customer

has intently cared about. See Fig.10.

Fig.10 The customer_preference table Obviously, the integration of these two tables to some

degree reflects a customer’s interests as well as demands so

they will directly influence the feature weight set.At the beginning we only consider the situation where

these two tables are not empty. Set the frequency (number of

appearance times) of feature in customer_browser table to

be (i=1, 2, … , m), then the total frequency for all features

should be , where m is the number of all

features. Similarly, the frequency of feature in

customer_preference table is indicated as , and the total

frequency is . Now we focus on the item,

its item is . Its frequency in customer_browser table

is , so there is a ratio

. (2)

While in customer_preference table the frequency is and there is also a ratio

. (3)

Then we set the weight of feature as

. (4)

For those features that never appear in the two tables,

their weight will be 0 known from equation (4). It is

reasonable because we want to make those items whose

features that customer mainly cares about stand out when

doing recommendation.

However, if one of these two tables is empty, its influence

on the final should be set to nil, which means should

either be or . Furthermore, complete

emptiness reveals that the customer has never bought anything

or cared about any feature. It shows all the features have the

same importance for the customer and we will let the feature

weight set satisfy for the item.

Till now, we can get the specific feature weight set and

the total mark of the item is revised as,

, (5)

239



E. Acquiring overall feature weight set

Although equation (5) has considered the situation where

a certain customer may pay more attention to some particular

features, it is still not complete. There is a problem: in initial

database, the number of mentioned time of different features is

not the same. Direct using of equation (5) may weaken the

influence of those features seldom mentioned. Here is an

example:

Suppose item A and B have two features: quality and price. For A, its quality has 100 positive and 0 negative

reviews; its price has 10 positive and 0 negative reviews. For

B, its quality has 210 positive and 0 negative reviews; its price

has 0 positive and 100 negative reviews. According to

equation (5) and assume for both A

and B, then

,

,

,

,

,

,So .

Although A is praised in both features while B is

criticized on its price, their marks are the same. Apparently the

influence of price is weakened. Since “features are born to the

same”, we need to take measures to balance their influences.

For the feature of overall feature set , assume its

number of mentioned times (both positive and negative) in

reviews in database is . Then we let so

that the overall feature weight set can be got, which is

. Obviously it is one-by-one mapping

with overall feature set .

Every feature will be influenced by , so equation (5) is

changed to be

, (6)

With equation (6), the problem of previous example can

be solved,

,

.

Thus the grade of B is lower than A, which satisfies

common sense.

E. Acquiring item weight setAdditionally, we have to notice that different items have

different number of known features as mentioned above. Thisdifference can as well be too big to fairly grade and rank

items. To overcome this problem and balance the inequity,

different items should be assigned with weights and such

weights form a new set called item weight set, which is

, where t is number of items.

Assuming the number of features is for the

item, we have .Hence, equation (6) can be further improved as

, (7)

Now we eventually acquire the final form of grading

equation and we are able to use it for real grading, ranking and

recommendation.

III. EXPERIMENTAL RESULTS OF SOME MAJOR PROCESSES

After the design of the comprehensive recommendation

algorithm, we took advantages of the information of multi-

band mobiles from Amazon.cn (http://www.amazon.cn). Their

titles and reviews are stored in the database. On the basis of

these real data in life we tested some major processes in our

proposed algorithm.

A. Extraction of set , and Table I shows the experimental results of set extraction.

We then added these sets into our database for further use.TABLE I

R ESULTS OF SET EXTRACTION

Item Number

Review 15099

Overall features 91

Opinion Modifier 850

Negative 35

B. Judgment of sentimental orientation for reviewsTable II shows the result of orientation judgment for

reviews. Through the integration of WM and SM we can see

the final accuracy is around 86%.TABLE II

R ESULTS OF ORIENTATION JUDGMENT FOR REVIEWS

Item Number

Review 15099

Feature hit review 10267

Correct judgement 8829

C. Analysis of time complexity

Our approach has a time complexity of under the

worst situation, where t refers to the number of items and m

refers to the number of overall features. However, the existing

item-based collaborative filtering whose complexity can be

under the worst situation, where p is the number of

customers who bought the item. The comparison reveals our

method has much less time complexity than existing ones.

IV. THE IMPLEMENTATION OF PROTOTYPE SYSTEM

Based our proposed Feature-Grading algorithm, we has

developed the prototype system. (See Fig.1) It connects with

underlying database and keeps on recording a particularcustomer’s historical behaviors. Customer can use it to make

real-time recommendation and synthesized recommendation.

For real-time recommendation, a customer is asked to

input his/her preferred features. The system then will grade

and rank all the items based on the customer’s current wants

through the method of Feature-Grading.

Fig.10 shows the results of real-time recommendation

according to the customer Luo Yi.

240



Fig.10 The real-time recommendation to customer Luo Yi

For synthesized recommendation, since a customer’s

historical behaviors are saved, the system then directly usethose records to grade items. Fig.11 is an example of

synthesized recommendation to customer Fan Miao.

Fig.11 The synthesized recommendation to customer Fan Miao Besides, the module of sentimental analysis of review is

added into function list so that customer can make his/her own

review on some item. (See Fig.8) The new reviews will ofcourse influence future recommendation.

V. CONCLUSIONS AND FUTURE WORK

This paper has creatively proposed a recommendation

algorithm called Feature-Grading and further introduced its

corresponding prototype system. We mainly focused on the

design of 5 process of this algorithm and also discussed some

key results after experiments. These results revealed that the

Feature-Grading method works very well. It overcome some

drawbacks of existing recommendation systems and extended

their ability as well.

Our future efforts will be spent on the improvements of

sentimental analysis of reviews. We plan to expand the handle

range from simple sentence to compound sentence, including

transitional sentence, comparative sentence, and imperative

sentence and so on.

R EFERENCES

[1] P. Resnick, N. Iacovous, M. Suchak, P. Bergstrom, and J. Riedl,“GroupLens: An Open Architecture for Collaborative Filtering of Netnews,” In Proceedings of CSCW’94, Chapel Hill, NC.

[2] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl., “Analysis ofRecommendation Algorithms for E-Commerce,” In Proceedings ofACM E-Commerce, 2000.

[3] W. Hill, L. Stead, M. Rosenstein, and G. Furnas, “Recommending andEvaluating Choices in a Virtual Community of Use,” In Proceedings ofCHI’95.

[4] U. Shardanand, and P. Maes, “Social Information Filtering: Algorithmsfor Automating ‘Word of Mouth’,” In Proceedings of CHI ’95. Denver,CO.

[5] J. Konstan, B. Miller, D. Maltz, J. Herlocker, L. Gordon, and J. Riedl,“GroupLens: Applying Collaborative Filtering to Usenet News,”

Communications of the ACM, 40(3), pp. 77-87.[6] Hua-Ping Zhang, Hong-Kui Yu, De-Yi Xiong, Qun Liu, “HHMM-

based Chinese lexical analyzer ICTCLAS,” Proceedings of the secondSIGHAN workshop on Chinese language processing, 2003, pp.184-187.

[7] Rakesh Agrawal, Ramakrishnan Srikant, “Fast Algorithms for MiningAssociation Rules,” Proceedings of the 20th International Conferenceon Very Large Data Bases, VLDB, pages 487-499, Santiago, Chile,September 1994.

[8] George A. Miller, Richard Beckwith, Christiane Fellbaum, DerekGross, and Katherine Miller, “Introduction to WordNet: An On-lineLexical Database,” International Journal of Lexicography, 1990, pp.235-244.

[9] Miao Fan, Guoshi Wu, Jing Li, “Feature-Item Recommender Systemfor E-Commerce,” 2011 International Conference on Computer Controland Automation.

241

Feature Grading (1)

Documents

Transcript of Feature Grading (1)