Executive Integrity, Audit Opinion, and Fraud in Chinese Listed
Opinion Fraud Detection in Online Reviews by Network Effects
-
Upload
so-yeon-kim -
Category
Data & Analytics
-
view
137 -
download
3
Transcript of Opinion Fraud Detection in Online Reviews by Network Effects
Opinion Fraud Detection in On-line Reviews using Network Ef-
fects
Leman Akoglu, Rishi Chandy, Christos FaloutsosICWSM’13
Previous workManual labeling is hard
Unsupervised fashion requiring no la-beled data
Side information is used (review text, timestamp, behavioral analysis..) No side information
Different characteristics from dataset Can be applied to all types of review
networks
ProblemGiven
◦user-product review network◦review sign (+:thumbs up/-:thumbs down)
Classify◦objects into type-specific classes: users: `honest’ / `fraudster’ products: `good’ / `bad’ reviews: `genuine’ / `fake’
Automatically label the users, products, and reviews in the network.
No side data! (e.g., timestamp, review text)
honest
Goodquality
badquality
fraud
Formulation
node labels as random variables
prior belief observed neighbor potentials
compatibility potentials
Signededges
Objective function utilizes pairwise Markov Random Fields (Kindermann&Snell, 1980)
Finding best assignments to unobserved variables in our objective functionis inference problem !
FormulationNew algorithm extends
LBP(Loopy Belief Propagation) for signed networks.
Signed Inference Algorithm (sIA)
i
I) Repeat for each node:
II) At convergence:
Alternatley communi-cate mes-sages
Belief of user i having label yi
FRAUDEAGLEStep 1. Scoring (signed Inference Algorithm)
◦Repeat message propagation ◦– update messages from users to products◦– update messages from products to users◦Until all messages stop changing
Step 2. Grouping◦Ranked list of users by score from Step1.◦Cluster induced subgraphs on top k users and
products
Dataset SWM: All app reviews of entertain-
ment category (games, news, sports, etc.) from an anonymous online app store database ◦ As of June 2012:
◦ * 1, 132, 373 reviews◦ * 966, 842 users◦ * 15,094 software products
(apps) Ratings: 1 (worst) to 5 (best)
A large number of reviews come from users who have very few reviews
Skewed towards positive ratings
Competitive algorithms Compared to 2 iterative classifiers (modified to handle signed edges):
I) Weighted-vote Relational Classifier (wv-RC) (Macskassy&Provost, 2003)
II) HITS (honesty-goodness in mutual recursion) (Kleinberg, 1999)
Top scorers
+ positive (4-5) ratingo negative (1-2) rating
Users
Products
31 user all with 5-star ratings to 5 products
After removingfake reviews
Top-scorers matter!
Fraud score > 0.5Honesty score < 0
High-rating product
Low-rating product
Computational complexityScalable to large data with computa-
tional complexity linear in network size
Running time grows linearly with in-creasing size