Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial...

28
Turning the Tide: Curbing Deceptive Yelp Behaviors June 2018 @ UCAS Dongwon Lee Penn State / IST SIAM SDM 2014

Transcript of Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial...

Page 1: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

TurningtheTide:CurbingDeceptiveYelpBehaviors

June2018@UCAS

Dongwon Lee

PennState/IST

SIAMSDM2014

Page 2: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

Reviews

ReviewCentricSocialNetworks

2

User

Socialnetworks

Page 3: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

MaliciousBehaviors:FraudulentReviews

q Socialnetworks:idealtargetsformaliciousbehaviorsq Upto25%ofYelpreviewsarefraudulent[1]q YELP:

q Extrahalf-starratingcausesarestauranttosellout19%moreoften[2]

q One-starincreaseleadstoa5–9%increaseinrevenue[3]

[1]Yelpadmitsaquarterofsubmittedreviewscouldbefake.BBC,www.bbc.co.uk/news/technology-24299742[2]MichaelAndersonandJeremyMagruder.Learningfromthecrowd:Regressiondiscontinuityestimatesoftheeffectsofanonlinereviewdatabase.EconomicJournal,122(563):957–989,2012.[3]MichaelLuca.Reviews,Reputation,andRevenue:TheCaseofYelp.com.Availableathbswk.hbs.edu/item/6833.html.

3

Page 4: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

MaliciousBehaviors:ReviewCampaigns

q ReviewCampaign:postmultiplefraudulentreviews

q Example:SearchEngineOptimization(SEO)companies[1]q UseIPspoofingtechniquesq Setupfakeonlineprofilesq TargetYelp,GoogleLocal,CitySearchq Investigatedbylawenforcement

q Deceptivevenue:usesreviewcampaignstoalterrating

[1] A.G.Schneiderman AnnouncesAgreementWith19CompaniesToStopWritingFakeOnlineReviewsAndPayMoreThan$350,000InFines.Availableat:http://www.ag.ny.gov/press-release/ag-schneiderman-announces-agreement-19-companies-stop-writing-fake-online-reviews-and 4

Page 5: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

FeasibilityStudy

q Createdfakevenuesq PostedreviewjobsonAmazonMechanicalTurk

q Receivedmorethan90(fake)reviews

…forafistfulofdollars

5

Page 6: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

ProblemStatement

q Detectmaliciousbehaviorsinreviewcentricsocialnetworksq Fraudulentreviewsq Deceptivevenuesq Impactfulreviewcampaigns

6

Page 7: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

AdversaryModel

q Needstoadjusttheratingoftargetvenueq Hasfinitebudgetq Controlsfinitesetof(IPaddress,YelpSybilaccount)pairs

q Hasaccesstoamarketofreviewwriters

q Socialnetworkproviderdoesnotcolludewithattackers

7

Page 8: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

MarcoSystemOverview

8

1. Friend & review count2. Venue “expertise”3. Venue activities4. …

FRI ModuleRSD Module

Venue timeline

ARD Module

Review ratings

Venue ClassifierDeceptive & legitimate venues Features

7,435 venues195,417 users270,121 reviews

Train

Train Label

Fraudulent & genuine reviews

Review Classifier

Users Venues

Friendrelations Spatial

vicinity

Reviews/Time

Page 9: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

SuccessfulReviewCampaign

q Increases(decreases)theratingofthetargetvenuebyatleasthalfastar

q Claim:Theminimumnumberofreviewstheadversaryneedstopostinordertofraudulentlyincreasetheratingofavenuebyhalfastarisn/7q n:thenumberofgenuinereviewsavenuehasatthe

completionofthecampaign

9

Page 10: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

10

ReviewSpikes

Theorem:Ifn>49,asuccessfulreviewcampaignwillexceed,duringtheattackinterval,themaximumnumberofreviewsofauniformreviewdistribution

Page 11: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

ReviewSpikeDetection(RSD)

q Identifyvenuesthatreceivehighernumberofpositive(negative)reviewsthannormal

q UsethemeasuresofdispersionofBox-and-Whiskerplots todetectoutliers

q Twofeaturesq Numberofspikesdetectedforavenueq Normalizedamplitudeofthehighestspike

11

Page 12: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

MarcoSystemOverview

12

1. Friend & review count2. Venue “expertise”3. Venue activities4. …

FRI ModuleRSD Module

Venue timeline

ARD Module

Review ratings

Venue ClassifierDeceptive & legitimate venues Features

7,435 venues195,417 users270,121 reviews

Train

Train Label

Fraudulent & genuine reviews

Review Classifier

Users Venues

Friendrelations Spatial

vicinity

Reviews/Time

Page 13: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

AggregateRatingDisparity(ARD)Module

q ARDModule: Measurethereviewdivergenceq N:totalnumberofreviewsofvenueV

ARD(V)=∑ |#$%&$')*+&,-.*%-)*+&,-/01|23

4

13

Page 14: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

MarcoSystemOverview

14

1. Friend & review count2. Venue “expertise”3. Venue activities4. …

FRI ModuleRSD Module

Venue timeline Review ratings

Venue ClassifierDeceptive & legitimate venues Features 7,435 venues

195,417 users270,121 reviews

Train

Train Label

Fraudulent & genuine reviews

Review Classifier

Users Venues

Friendrelations Spatial

vicinity

Reviews/Time

ARD Module

Page 15: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

FraudulentReviewImpact(FRI)Module

q Venueswithfewgenuinereviewsq Vulnerabletoreviewcampaignsq Longtermcampaignscanre-definethe“normal”

reviewpostingbehavior

q FRIModule: detectfraudulentreviewsthatsignificantlyimpacttheaggregateratingofvenues

15

Page 16: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

FRIModule(Cont’d)

Featurestoclassifyreview(fraudulentvs.genuine):q Reviewwriter

q Numberoffriendsq Numberofreviewswrittenq Expertiseofuseraroundvenueq Numberofcheck-insatvenueq Numberofphotosatvenueq Ageofuser’saccountwhenreviewwaspostedq Feedbackcountofreview

q FRIFeature:q Percentageofreviewsclassifiedasfraudulent

16

Page 17: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

ReviewData

q Goldstandardfraudulentreviewsq Spelp (spamYelp)sitesq Suspicioususeraccountsq Genericreviewtext

q Goldstandardgenuinereviewsq Writtenbyactive,popularusersq Noshort,genericreviews

q 200fraudulentand202genuinereviews

17

Page 18: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

ReviewClassification

18

Overallaccuracy:RF[94%],Bagging[93.5%],DT[93%]

Page 19: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

MarcoSystemOverview

19

1. Friend & review count2. Venue “expertise”3. Venue activities4. …

FRI ModuleRSD Module

Venue timeline Review ratings

Venue Classifier

Deceptive & legitimate venues Features

7,435 venues195,417 users270,121 reviews

Train

Train Label

Fraudulent & genuine reviews

Review Classifier

Users Venues

Friendrelations Spatial

vicinity

Reviews/Time

ARD Module

Page 20: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

VenueClassificationFeatures

Featurestoclassifyvenues:q Numberofreviewspikesforvenueq Amplitudeofthehighestspike

q Aggregateratingdisparity

q Fraudulentreviewimpactofvenueq Countofreviewsclassifiedfraudulent

q Ratingofthevenueq Numberofreviews(withcheck-ins&photos)q Ageofvenue

20

Page 21: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

VenueData

21

q Deceptivevenue:fraudulentreviewsimpactitsrating

q Groundtruth:Yelp’s“ConsumerAlert”feature

Page 22: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

VenueData(cont’d)

22

q Goldstandardlegitimatevenuesq Wellknownconsistentqualityq Atmost10%ofreviewsarefilteredbyYelp

q 90deceptiveand100legitimatevenues

Page 23: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

VenueClassification

RFandDTaretiedforbestaccuracy,95.8%.23

Page 24: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

Comparisonwithstate-of-the-art

CompareMarcowiththethreedeceptivevenuedetectionstrategiesofFengetal.[1],avg∆,distΦandpeak↑

Strategy Accuracy(%)Marco/RF 95.8avg∆ 66.3distΦ 72.1peak↑ 58.9

24

Page 25: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

Marco’sOverhead

Per-moduleoverhead Zoom-inofFRImoduleoverhead

25

Page 26: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

MarcointheWild:YelpData

q YCrawl: developedcrawlertofetchrawHTMLpagesofYelpvenueanduseraccounts

q Collected:q 7,435venuesfromSanFrancisco,NewYorkCityandMiamiq Carshops,Spas,Movingcompanies

q 270,121reviewsq 195,417reviewers

26

Page 27: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

ExperimentalResultsonLiveData

City CarShop Mover Spa

Miami,FL 1000(6) 348(8) 1000(21)

SanFrancisco,CA 612(59) 475(45) 1000(42)

NYC,NY 1000(8) 1000(27) 1000(28)

DetecteddeceptivevenuesbyMarcooutofcollectedvenuesinYelp

27

SanFrancisco:Marcoflagsalmost10%ofcarrepairandmovingcompaniesassuspicious

Page 28: Turning the Tide: Curbing Deceptive Yelp Behaviors...Malicious Behaviors: Fraudulent Reviews qSocial networks: ideal targets for malicious behaviors qUp to 25% of Yelp reviews are

Conclusions

28

q Lowerboundonthenumberofreviewsrequiredtolaunchsuccessful reviewcampaign

q Marco:automaticdetectionoffraudulentreviews,deceptivevenuesandimpactfulreviewcampaigns

q Noveldatasetofreviewsandvenues

q Marcoiseffectiveandfast