Evil Twins: Modeling Power Users in Attacks on...
Transcript of Evil Twins: Modeling Power Users in Attacks on...
Evil Twins: Modeling Power Users
in Attacks on Recommender Systems
7/9/2014 1
David C. Wilson and Carlos E. Seminario University Of North Carolina Charlotte
7/9/2014 2
7/9/2014 3
7/9/2014 4
Have you ever wondered whether these are
REAL ratings & REAL reviews
entered by REAL people??
Do you always TRUST
this information??
7/9/2014 5
7/9/2014 6
Research Problem
7/9/2014 7
Attacks on Recommender Systems
“Push”
“Nuke”
“Disrupt”
Example of a Push Attack
7/9/2014 8
User Profiles Avengers Titanic Avatar Alien Psycho TwilightBob 5 2 3 3 ?Ted 2 4 4 1Fred 3 1 3 1 2
Ginger 4 2 3 1 1Jodie 3 3 2 1 3 1
Jill 3 1 2Tom 4 3 3 3 2
Corey 5 1 5 1
Source: Mobasher et al, 2007
Example of a Push Attack
7/9/2014 9
User Profiles Avengers Titanic Avatar Alien Psycho TwilightBob 5 2 3 3 ?Ted 2 4 4 1Fred 3 1 3 1 2
Ginger 4 2 3 1 1Jodie 3 3 2 1 3 1
Jill 3 1 2Tom 4 3 3 3 2
Corey 5 1 5 1
User Profiles Avengers Titanic Avatar Alien Psycho TwilightBob 5 2 3 3 ?Ted 2 4 4 1Fred 3 1 3 1 2
Ginger 4 2 3 1 1Jodie 3 3 2 1 3 1
Jill 3 1 2Tom 4 3 3 3 2
Corey 5 1 5 1Alice 5 3 2 5Axel 5 1 4 2 5Alvin 5 2 2 2 5
Source: Mobasher et al, 2007
Example of a Push Attack
7/9/2014 10
Filler: Ratings to correlate with regular
users
Target Item, Attack Intent
User Profiles Avengers Titanic Avatar Alien Psycho TwilightBob 5 2 3 3 ?Ted 2 4 4 1Fred 3 1 3 1 2
Ginger 4 2 3 1 1Jodie 3 3 2 1 3 1
Jill 3 1 2Tom 4 3 3 3 2
Corey 5 1 5 1
User Profiles Avengers Titanic Avatar Alien Psycho TwilightBob 5 2 3 3 ?Ted 2 4 4 1Fred 3 1 3 1 2
Ginger 4 2 3 1 1Jodie 3 3 2 1 3 1
Jill 3 1 2Tom 4 3 3 3 2
Corey 5 1 5 1Alice 5 3 2 5Axel 5 1 4 2 5Alvin 5 2 2 2 5
Source: Mobasher et al, 2007
Example of a Push Attack
7/9/2014 11
Filler Size: Avg # Ratings
per Profile
Attack Size: # of Attack
Profiles
User Profiles Avengers Titanic Avatar Alien Psycho TwilightBob 5 2 3 3 ?Ted 2 4 4 1Fred 3 1 3 1 2
Ginger 4 2 3 1 1Jodie 3 3 2 1 3 1
Jill 3 1 2Tom 4 3 3 3 2
Corey 5 1 5 1
User Profiles Avengers Titanic Avatar Alien Psycho TwilightBob 5 2 3 3 ?Ted 2 4 4 1Fred 3 1 3 1 2
Ginger 4 2 3 1 1Jodie 3 3 2 1 3 1
Jill 3 1 2Tom 4 3 3 3 2
Corey 5 1 5 1Alice 5 3 2 5Axel 5 1 4 2 5Alvin 5 2 2 2 5
Source: Mobasher et al, 2007
User Profiles Avengers Titanic Avatar Alien Psycho TwilightBob 5 2 3 3 ?Ted 2 4 4 1Fred 3 1 3 1 2
Ginger 4 2 3 1 1Jodie 3 3 2 1 3 1
Jill 3 1 2Tom 4 3 3 3 2
Corey 5 1 5 1Alice 5 3 2 5Axel 5 1 4 2 5Alvin 5 2 2 2 5
Example of a Push Attack
7/9/2014 12
Filler Size: Avg # Ratings
per Profile
Filler: Ratings to correlate with regular
users
Target Item, Attack Intent
Attack Size: # of Attack
Profiles
User Profiles Avengers Titanic Avatar Alien Psycho TwilightBob 5 2 3 3 ?Ted 2 4 4 1Fred 3 1 3 1 2
Ginger 4 2 3 1 1Jodie 3 3 2 1 3 1
Jill 3 1 2Tom 4 3 3 3 2
Corey 5 1 5 1
Source: Mobasher et al, 2007
“Twilight” predicted rating (User-based) Before attack 2 After attack 5
User Profiles Avengers Titanic Avatar Alien Psycho TwilightBob 5 2 3 3 ?Ted 2 4 4 1Fred 3 1 3 1 2
Ginger 4 2 3 1 1Jodie 3 3 2 1 3 1
Jill 3 1 2Tom 4 3 3 3 2
Corey 5 1 5 1Alice 5 3 2 5Axel 5 1 4 2 5Alvin 5 2 2 2 5
Generating Attack User Profiles
7/9/2014 13
Filler Item Selection and
Ratings are key
to a successful
attack!!
User Profiles Avengers Titanic Avatar Alien Psycho TwilightBob 5 2 3 3 ?Ted 2 4 4 1Fred 3 1 3 1 2
Ginger 4 2 3 1 1Jodie 3 3 2 1 3 1
Jill 3 1 2Tom 4 3 3 3 2
Corey 5 1 5 1Alice 5 3 2 5Axel 5 1 4 2 5Alvin 5 2 2 2 5
Generating Attack User Profiles
7/9/2014 14
Filler Item Selection and
Ratings are key
to a successful
attack!!
Attack Models using Statistical “Average” Users
User Profiles Avengers Titanic Avatar Alien Psycho TwilightBob 5 2 3 3 ?Ted 2 4 4 1Fred 3 1 3 1 2
Ginger 4 2 3 1 1Jodie 3 3 2 1 3 1
Jill 3 1 2Tom 4 3 3 3 2
Corey 5 1 5 1Alice 5 3 2 5Axel 5 1 4 2 5Alvin 5 2 2 2 5
Random: Filler @ norm dist around avg of all item ratings Source: O’Mahony et al, 2002; Lam & Riedl 2004; Mobasher et al, 2007; Williams et al 2006; Hurley et al, 2009
Generating Attack User Profiles
7/9/2014 15
Filler Item Selection and
Ratings are key
to a successful
attack!!
Random: Filler @ norm dist around avg of all item ratings Average: Filler @ norm dist around avg of each item’s rating Source: O’Mahony et al, 2002; Lam & Riedl 2004; Mobasher et al, 2007; Williams et al 2006; Hurley et al, 2009
User Profiles Avengers Titanic Avatar Alien Psycho TwilightBob 5 2 3 3 ?Ted 2 4 4 1Fred 3 1 3 1 2
Ginger 4 2 3 1 1Jodie 3 3 2 1 3 1
Jill 3 1 2Tom 4 3 3 3 2
Corey 5 1 5 1Alice 5 3 2 5Axel 5 1 4 2 5Alvin 5 2 2 2 5
Generating Attack User Profiles
7/9/2014 16
Attack Models using Statistical “Average” Users
Filler Item Selection and
Ratings are key
to a successful
attack!!
User Profiles Avengers Titanic Avatar Alien Psycho TwilightBob 5 2 3 3 ?Ted 2 4 4 1Fred 3 1 3 1 2
Ginger 4 2 3 1 1Jodie 3 3 2 1 3 1
Jill 3 1 2Tom 4 3 3 3 2
Corey 5 1 5 1Alice 5 3 2 5Axel 5 1 4 2 5Alvin 5 2 2 2 5
Random: Filler @ norm dist around avg of all item ratings Average: Filler @ norm dist around avg of each item’s rating Bandwagon: Popular @ max rating, filler @ Random model Segment: Popular @ max rating, filler @ min rating Obfuscated: Noise injection, User shifting, Target shifting Average-over-popular: Average model with % Popular filler Source: O’Mahony et al, 2002; Lam & Riedl 2004; Mobasher et al, 2007; Williams et al 2006; Hurley et al, 2009
Generating Attack User Profiles
7/9/2014 17
Attack Models using Statistical “Average” Users
Filler Item Selection and
Ratings are key
to a successful
attack!!
Research Gap
18 7/9/2014
Attack Models and Attack Detection research have focused on “average” attackers and related user models
Source: Mobasher et al, 2007; Burke, et al, 2011; Chirita, et al, 2005; Mehta and Nejdl, 2009; Hurley et al, 2009; Williams et al 2007; Sandvig et al, 2007; Cheng and Hurley, 2010, Bhaumik et al, 2006
However, attackers continue to find new and more powerful
strategies to attack Recommender Systems
User-User Similarity Matrix Social graph
Social Network Analysis: Central Influential
Viral Marketing: Connected users exert influence
Source: Palau et al 2004; Wasserman and Faust, 1994; Domingos and Richardson, 2001; Anand and Griffiths, 2011
Influential “Power” Users vs “Average” Users
7/9/2014 19
Influence Impact a user has on recommendations
Selecting Power Users from a Dataset
7/9/2014 20
Finding optimal set of Power Users in a social network is complex, so heuristic approaches for Power User selection are used .. Number of Ratings in the user profile
Aggregated Similarity: sum of similarities between users
In-Degree Centrality: number of neighborhoods a user is in
Source: Kempe et al, 2003; Rashid et al, 2005; Goyal and Lakshmanan, 2012; Herlocker et al, 2004; Lathia et al, 2008; Wilson and Seminario, 2013
User Profiles Avengers Titanic Avatar Alien Psycho Twilight NumRatingsBob 5 2 3 3 4Ted 2 4 4 1 4Fred 3 1 3 1 2 5
Ginger 4 2 3 1 1 5Jodie 3 3 2 1 3 1 6
Jill 3 1 2 3Tom 4 3 3 3 2 5
Corey 5 1 5 1 4
7/9/2014 21
Number of Ratings Power Users
User-Item Matrix
Similarity Bob Ted Fred Ginger Jodie Jill Tom Corey AggSimBob 0.756 0.718 0.945 2.419Ted 0.500 0.522 1.000 2.022Fred 0.756 0.674 0.426 1.856
Ginger 1.000 0.866 1.000 2.866Jodie 0.767 0.866 1.000 2.633
Jill 1.000 0.866 0.866 2.732Tom 0.945 0.866 0.645 2.456
Corey 1.000 1.000 1.000 3.000
7/9/2014 22
Aggregated Similarity Power Users User-User Similarity Matrix
(Pearson Correlation, no weighting)
Similarity Bob Ted Fred Ginger Jodie Jill Tom CoreyBob 0.378 0.479 0.472Ted 0.250 0.348 0.333Fred 0.378 0.449 0.284
Ginger 0.639 0.577 0.500Jodie 0.639 0.538 0.667
Jill 0.333 0.433 0.433Tom 0.472 0.577 0.538
Corey 0.500 0.667 0.433InDegree 2 0 1 7 5 1 4 4
7/9/2014 23
In Degree Power Users User-User Similarity Matrix
(Pearson Correlation, weighted)
Social Graph based on User-User Similarities
Degree Centrality
7/9/2014 24
Bob
Ted Jodie
Tom
Corey
Fred
Jill
Ginger
Bob 2
Fred 1
Corey 4
Ginger 7 Tom
4
Jodie 5
Jill 1
Ted 0
Real Power Users Have Impact
7/9/2014 25
Investigated feasibility of Power User Attack (PUA) Select top Power Users from dataset Use Power User profiles as “filler” Select target items (“new” items) Attack parameters: size, intent CF Algorithms and Datasets
Source: Wilson and Seminario, 2013; Seminario and Wilson, 2014
Small number of Power Users (< 5% of dataset users) can have significant effects on recommendations
PUA effective against User-based and SVD-based RS Source: Wilson and Seminario, 2013; Seminario and Wilson, 2014
Power User Model
7/9/2014 26
Synthetic Power Users (SPU’s) based on Real Power Users (RPU’s)
1. Select top RPU’s from dataset InDegree Number of Ratings Aggregated Similarity
2. Generate SPU’s for Attack SPU profiles based on RPU’s The “evil twins”
3. Evaluate the Model Before Attack ..
4. Select Attack Parameters
5. Evaluate the Model After Attack
Generate Synthetic Power Users (SPU’s)
7/9/2014 27
Filler Size based on RPU’s profile size Item Selection based on RPU’s item popularity Item Ratings based on RPU’s average item ratings
Generate Synthetic Power Users (SPU’s)
7/9/2014 28
Filler Size based on RPU’s profile size Item Selection based on RPU’s item popularity Item Ratings based on RPU’s average item ratings
Objective was to emulate (not duplicate) Real Power Users
Evaluating the Power User Model
7/9/2014 29
Evaluation Metrics Before Attack: How well do the SPU’s match the RPU’s?
Precision and Recall Mean Absolute Error Statistical differences
Power User Model – Evaluation Before the Attack
7/9/2014 30
Selecting SPU’s from ML100K dataset Source for ML100K: grouplens.org
Power User Model – Evaluation Before the Attack
7/9/2014 31
Selecting SPU’s from ML100K dataset
We were able to find the majority of NumRatings and InDegree SPU’s in the top-50
Good SPU emulation of RPU’s
Source for ML100K: grouplens.org
Power User Model – Evaluation Before the Attack
7/9/2014 32
SPU vs RPU MAE Differences
Power User Model – Evaluation Before the Attack
7/9/2014 33
SPU vs RPU MAE Differences
InDegree and NumRatings SPU’s have better ablation results than AggSim
InDegree SPU’s indicate a strong level of influence
Power User Model – Evaluation Before the Attack
7/9/2014 34
SPU vs RPU Statistical Characteristics
No differences across all power user selection methods Average number of ratings per power user Average user rating (across all power users) Average item rating (across all power user items) The Power User Model generates SPU’s that match
key statistical measures of RPU’s Good SPU emulation of RPU’s
Power User Model
7/9/2014 35
Synthetic Power Users (SPU’s) based on Real Power Users (RPU’s) 1. Select top RPU’s from dataset
InDegree Number of Ratings Aggregated Similarity
2. Generate SPU’s for Attack SPU profiles based on RPU’s The “evil twins”
3. Evaluate the Model Before Attack
4. Select Attack Parameters Attack Size: 5% (= 50 power users) Attack intent: Push (promote) Target items: “New”, injected at run time
5. Evaluate the Model After Attack ..
Evaluating the Power User Model
7/9/2014 36
Evaluation Metrics Before Attack: How well do the SPU’s match the RPU’s?
Precision and Recall, Mean Absolute Error, Statistical differences
After Attack: How effective is the Power User Attack with SPU’s? Robustness: Hit Ratio, Rank, Prediction Shift
Source: O’Mahony et al, 2002, Lam and Riedl, 2004; Mobasher et al, 2007; Burke et al, 2011
Hit Ratio % of users with target item top-N list
Rank position of target item in top-N list
Prediction Shift change in predicted rating for target item
High Hit Ratio, low Rank, high Prediction Shift indicates more impact
Source: Lam and Riedl, 2004; Mobasher et al, 2007; Burke et al, 2011; Seminario and Wilson 2014
Power User Model – Evaluation After the Attack
7/9/2014 37
Power User Model – Evaluation After the Attack
7/9/2014 38
Attacks with InDegree and Number of Ratings SPU’s have high impacts on User-based and
SVD-based recommenders
Summary & Future Work
7/9/2014 39
Power User Model produces InDegree and NumRatings SPU’s that effectively emulate RPU’s
Power User Attack with SPU’s is effective against User-based and SVD-based CF systems
Small number of SPU’s (< 5%) can have significant effects on recommender predictions
Future ..
Explore other Power User selection methods Extend Power User Model (item selection) Evaluate other CF algorithms and domains Mitigate Power User attacks
Summary & Future Work
7/9/2014 40
Power User Model produces InDegree and NumRatings SPU’s that effectively emulate RPU’s
Power User Attack with SPU’s is effective against User-based and SVD-based CF systems
Small number of SPU’s (< 5%) can have significant effects on recommender predictions
Future ..
Explore other Power User selection methods Extend Power User Model (item selection) Evaluate other CF algorithms and domains Mitigate Power User attacks
Power User Attacks using InDegree and NumRatings methods can impact recommender systems
System operators should be aware of, and able to
defend against, Power User Attacks