Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network...
-
Upload
talon-keeley -
Category
Documents
-
view
213 -
download
0
Transcript of Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network...
![Page 1: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/1.jpg)
Xintao Wu Aug 25,2014
Research Overview
1
![Page 2: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/2.jpg)
OutlineIntroductionPrivacy Preserving Social Network Analysis
Input perturbation Output perturbation
Fraud Detection in Social Networks Spectral analysis of graph topology Detecting Random Link Attacks Detecting weak anomalies
Sample ProjectsConclusions and Future work
2
![Page 3: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/3.jpg)
Trustworthy ComputingTrustworthy = reliability, security,
privacy, usabilitySample research challenges
Understand and capture emergent behaviors/interactions among regular users, fraudsters, and victims
Design secure, survivable, persistent systems when under attack
Enable privacy protection in collecting/analyzing/sharing personal data
3
![Page 4: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/4.jpg)
Privacy Breach CasesNydia Velázquez (1994)
Medical record on her suicide attempt was disclosed
AOL Search Log (2006) Anonymized release of 650K users’
search histories lasted for less than 24 hours
NetFlix Contest (2009) $1M contest was cancelled due to privacy
lawsuit23andMe (2013)
Genetic testing was ordered to discontinue by FDA due to genetic privacy
4
![Page 5: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/5.jpg)
AcxiomPrivacy
In 2003, the EPIC alleged Acxiom provided consumer information to US Army "to determine how information from public and private records might be analyzed to help defend military bases from attack."
In 2013 Acxiom was among nine companies that the FTC investigated to see how they collect and use consumer data.
Security In 2003, more than 1.6 billion customer
records were stolen during the transmission of information to and from Acxiom's clients.5
![Page 6: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/6.jpg)
6
Most restricted Restricted Some restrictions Minimal restrictions
Effectively no restrictions No legislation or no information
Privacy Regulation -- Forrester
![Page 7: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/7.jpg)
Privacy Protection Laws USA
HIPAA for health careGrann-Leach-Bliley Act of 1999 for financial institutionsCOPPA for children online privacyState regulations, e.g., California State Bill 1386
CanadaPIPEDA 2000 - Personal Information Protection and Electronic
Documents Act European Union
Directive 94/46/EC - Provides guidelines for member state legislation and forbids sharing data with states that do not protect privacy
Contractual obligations Individuals should have notice about how their data is used
and have opt-out choices
7
![Page 8: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/8.jpg)
Privacy Preserving Data Mining
8
ssn name zip race … age Sex income … disease
28223 Asian … 20 M 85k … Cancer
28223 Asian … 30 F 70k … Flu
28262 Black … 20 M 120k … Heart
28261 White … 26 M 23k … Cancer
. . … . . . … .
28223 Asian … 20 M 110k … Flu
69% unique on zip and birth date87% with zip, birth date and gender
Generalization (k-anonymity, l-diversity, t-closeness) Randomization
![Page 9: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/9.jpg)
Social Network Data
9
Data owner
Data miner
release
name
sex age
disease
salary
Ada F 18 cancer
25k
Bob M 25 heart 110k
Cathy F 20 cancer
70k
Dell M 65 flu 65k
Ed M 60 cancer
300k
Fred M 24 flu 20k
George
M 22 cancer
45k
Harry M 40 flu 95k
Irene F 45 heart 70k
id Sex age
disease
salary
5 F Y cancer
25k
3 M Y heart 110k
6 F Y cancer
70k
1 M O flu 65k
7 M O cancer
300k
2 M Y flu 20k
9 M Y cancer
45k
4 M M flu 95k
8 F M heart 70k
![Page 10: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/10.jpg)
Threat of Re-identification
10
id Sex age
disease
salary
5 F Y cancer
25k
3 M Y heart 110k
6 F Y cancer
70k
1 M O flu 65k
7 M O cancer
300k
2 M Y flu 20k
9 M Y cancer
45k
4 M M flu 95k
8 F M heart 70k
Attacker
attack
Privacy breachesIdentity disclosureLink disclosureAttribute disclosure
![Page 11: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/11.jpg)
Privacy Preservation in Social Network Analysis• Input Perturbation
• K-anonymity
• Generalization
• Randomization
• Output Perturbation
• Background on differential privacy
• Differential privacy preserving social network mining
11
![Page 12: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/12.jpg)
Our Work Feature preservation randomization
Spectrum preserving randomization (SDM08)
Markov chain based feature preserving randomization (SDM09)
Reconstruction from randomized graph (SDM10)
Link privacy (from the attacker perspective) Exploiting node similarity feature
(PAKDD09 Best Student Paper Runner-up Award)
Exploiting graph space via Markov chain (SDM09)
12
![Page 13: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/13.jpg)
PSNet (NSF-0831204)
13
![Page 14: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/14.jpg)
Output Perturbation
14
Data owner
Data miner
name
sex age
disease
salary
Ada F 18 cancer
25k
Bob M 25 heart 110k
Cathy F 20 cancer
70k
Dell M 65 flu 65k
Ed M 60 cancer
300k
Fred M 24 flu 20k
George
M 22 cancer
45k
Harry M 40 flu 95k
Irene F 45 heart 70k
Query f
Query result + noise
Cannot be used to derive whether any individual is included in the database
![Page 15: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/15.jpg)
Differential Guarantee [Dwork, TCC06]
15
name
disease
Ada cancer
Bob heart
Cathy
cancer
Dell flu
Ed cancer
Fred flu
f count(#cancer) f(x) + noise
name
disease
Ada cancer
Bob heart
Cathy cancer
Dell flu
Ed cancer
Fred flu
K
K
f count(#cancer) f(x’) + noise
3 + noise
2 + noise
achieving Opt-Out
![Page 16: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/16.jpg)
Our WorkDP-preserving cluster coefficient (ASONAM12)
Divide and conquer Smooth sensitivity
DP-preserving spectral graph analysis (PAKDD13) LNPP: based on the Laplace Noise Perturbation SBMF: based on the Exponential Mechanism and
MBF density Linear-refinement of DP-preserving query
answering (PAKDD13 Best Application Paper)DP-preserving graph generation based on
degree correlation (TDP13)
16
![Page 17: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/17.jpg)
SMASH (NIH R01GM103309)
17
![Page 18: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/18.jpg)
OutlineIntroductionPrivacy Preserving Social Network Analysis
Input perturbation Output perturbation
Fraud Detection Spectral analysis of graph topology Detecting Random Link Attacks Detecting weak anomalies
Sample ProjectsConclusions and Future work
18
![Page 19: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/19.jpg)
Cyber Fraud Cyber crime
cost US economy $400 Billion annually OSN Fraud and Attack
Sybil attack, spam, viral marketing, fraudulent auction, brand jacking, denial of service, etc.
Fake followers on Twitter (used in viral marketing) worth $360 million annually on the black market.
19
![Page 20: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/20.jpg)
Fraud CharacterizationIndividual vs. collusiveRobot vs. money-motivated regular
userRandom vs. selective targetStatic vs. dynamic
Traditional topology-based detection methodsincur high computational cost difficult to detect collaborative attacks
or subtle anomalies
Topology-based Detection
20
![Page 21: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/21.jpg)
An abstraction of collaborative attacks including spam, viral marketing, etc.
The attacker creates some fake nodes and uses them to attack a large set of randomly selected regular nodes;
Fake nodes also mimic the real graph structure among themselves to evade detection.
Random Link Attack [Shirvastava ICDE08]
21
![Page 22: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/22.jpg)
Spectral Graph Analysis based Fraud Detection
Examine the spectral space of graph topology.
A network with n nodes and m edges that is undirected, un-weighted, and without considering link/node attribute information
Adjacency Matrix A (symmetric)
Adjacency Eigenspace
22
![Page 23: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/23.jpg)
Eigenspace
23
Principal Minor
![Page 24: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/24.jpg)
Projecting Node in Spectral Space [SDM09]
24
Spectral coordinate: ),,( 21 kuuuu xxx
kn
k
k
nn
k
x
x
x
x
x
x
x
x
xxxx
2
1
2
22
21
1
12
11
21 k-orthogonal line pattern
0. vu
1
vu
vu
when nodes u, v from
the same community
when nodes u, v from different communities
2
![Page 25: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/25.jpg)
Example
25
Spectral coordinate: ),,( 21 kuuuu xxx
Polbook Network
![Page 26: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/26.jpg)
A snapshot of websites in domain .UK (2007) (114K nodes and 1.8M links), add a mix of 8 RLAs with varied sizes and connection patterns.
SPCTRA: based on spectral spaceGREEDY: based on outer-triangles [Shrivastava, ICDE08]
Evaluation on Web spam challenge data [ICDE11]
26
Much faster 36s vs. 26h
![Page 27: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/27.jpg)
OutlineIntroductionPrivacy Preserving Social Network Analysis
Input perturbationOutput perturbation
Fraud DetectionSpectral analysis of graph topologyDetecting random link attacks Detecting weak anomalies
Sample ProjectsConclusions and Future work
27
![Page 28: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/28.jpg)
28
Privacy Preserving Data Mining (NSF CAREER)
28 28
![Page 29: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/29.jpg)
Genetic Privacy (NSF SCH pending)
29BIBM13 Best Paper Award
![Page 30: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/30.jpg)
oSafari (NSF SaTC)
30
![Page 31: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/31.jpg)
Manipulation in E-Commerce (NSF III pending)
31
Structured Topic Analysis
Spectral Bipartite Graph Analysis
D-S based Evidence Fusion
• Bot-committed• Money-motivated
ReviewsRatingsRanks
![Page 32: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/32.jpg)
Privacy Preserving Database Application testing (NSF 0310974)
ER
Data
DDL
CatalogProduction db
R NR S
Conflict resolution
Disclosure AssessmentRule Analyzer
R’ NR’ S’
Schema & Domain Filter
Schema’ Domain’
Data Generator Mock DB
User
33
![Page 33: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/33.jpg)
Data Generation for Testing DB Applications (NSF 0915059)
How to generate data to cover paths?
34
![Page 34: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/34.jpg)
OutlineIntroductionPrivacy Preserving Social Network Analysis
Input perturbation Output perturbation
Fraud Detection Spectral analysis of graph topology Detecting Random Link Attacks Detecting weak anomalies
Sample ProjectsConclusions and Future work
35
![Page 35: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/35.jpg)
Big Data Computing Drowning in data
Volume, Velocity, Variety, and Veracity 2.5 Exabyte every day Web data, healthcare, e-commerce, social
networkAdvancing technology
Cheap storage/processing power Growth in huge data centers Data is in the “cloud”- Amazon AWS,
Hadoop, Azure Computing is in the “cloud”
36
![Page 36: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/36.jpg)
Social Media Customer Analytics
37
Network topology (friendship,followship,intera
ction)
name
sex age
disease
salary
Ada F 18 cancer
25k
Bob M 25 heart 110k
…
id Sex age address
Income
5 F Y NC 25k
3 M Y SC 110k
Structured profile
Retweet sequence
Product and review
Entity resolutionPatterns
Temporal/spatialScalability
VisualizationSentiment
Privacy
Unstructured text (e.g., blog, tweet) Transaction
database
Velocity, Variety
10GB tweets per dayBelk and Lowe’sChancellor’s special fund
![Page 37: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/37.jpg)
38
![Page 38: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/38.jpg)
39
![Page 39: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/39.jpg)
Samsung AVC Denial Log Analysis
40
Volume and Velocity:1 million log files per day and each has thousands entriesS3, Hive and EMR
![Page 40: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/40.jpg)
Drivers of Data Computing
41
6A’sAnytimeAnywhereAccess toAnything byAnyoneAuthorized
4V’sVolumeVelocityVarietyVeracity
ReliabilitySecurityPrivacyUsability
![Page 41: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation.](https://reader037.fdocuments.us/reader037/viewer/2022110321/56649cb75503460f9497dbda/html5/thumbnails/41.jpg)
Thank You! Questions?
42
Collaborators: Aidong Lu, Xinghua Shi, Jun Li (Oregon), Dejing Dou (Oregon), Tao Xie (UIUC)
Doctoral graduates: Songtao Guo, Ling Guo, Kai Pan, Leting Wu, Xiaowei Ying
Doctoral Students: Yue Wang, Yuemeng Li, Zhilin Luo (visiting)