©2
00
9 C
arn
eg
ie M
ello
n U
niv
ers
ity :
1
Analyzing the Privacy of Smartphone Apps
Apr 22, 2013
Shah AminiJialiu Lin
Prateek Sachdeva
Jason HongJanne LindqvistNorman Sadeh
Joy Zhang
ComputerHumanInteraction:MobilityPrivacySecurity
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
2
How to Manage Smartphone Privacy?
• Lots of smart devices– 1B smartphones worldwide
• Lots of apps– ~700k apps and 40B+
downloads for each of Android and iOS
• Highly intimate• Lots of rich data• Lots of inferences
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
3
Smartphones are Intimate
Mobile phones and millennials (Pew 2012):• 75% use in bed before going to sleep • 83% sleep with their mobile phones• 90% check first thing in the morning• Half use them while eating • A third use them in the bathroom (!)• A fifth check them every ten minutes
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
4
Smartphone Data is Rich
Who we know(contact list,
social networking)
Who we call(call log)
Who we text(sms log, Kakao,
social networking)
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
5
Smartphone Data is Rich
Where we go(gps, foursquare)
Photos(some geotagged)
Sensors(accel, sound, light)
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
6
Inferences from DataExample: Modeling Social Relationships
• If you were in a jail in Mexico, which of the 500+ “friends” in your phone contact list would come and get you out?
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
7
Inferences from DataExample: Modeling Social Relationships• Can we build a richer augmented
social graph?– models tie strength, group, role
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
8
Inferences from DataExample: Modeling Social Relationships
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
9
Inferences from DataExample: Modeling Social Relationships
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
10
Inferences from DataExample: Modeling Social Relationships
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
11
• Friend or not – 92% accuracy– Using just GPS co-location data
• Life facet {family, social, work} – 90%• Tie strength {low, med, high} – 75%
– Using just contacts, call logs, SMS logs
Cranshaw et al, Bridging the Gap Between Physical Location and Online Social Networks, Ubicomp 2010.
Min et al, Mining Smartphone Data to Classify Life-Facets of Social Relationships, CSCW 2013.
Inferences from DataExample: Modeling Social Relationships
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
12
Sensor dataSleep data
(self-reported ground truth)
Inferences from DataExample: Sleep
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
13
Smartphone Data for DepressionSocial Relationships• Isolation• Lack of close
family or friends
Physical Activities• Mobility• Consistency• Places you go to
Sleep Patterns• Excessive sleep• Too little sleep• Change over time
Cognitive Behaviors• Multitasking• Lots of phone use
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
14
How to Manage Smartphone Privacy?
• Lots of smart devices– 1B smartphones worldwide
• Lots of apps– ~700k apps and 40B+
downloads for each of Android and iOS
• High intimacy• Lots of rich data• Lots of inferences
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
15
Shares your location,gender, unique phone ID,phone# with advertisers
Uploads your entire contact list to their server(including phone #s)
What are your apps really doing?
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
16
Many Smartphone Apps Have “Unusual” Permissions
App Permissions UsedTiny Flashlight + LED Internet Access, phone#
Backgrounds Contact List
Dictionary Location
Bible Quotes Location
• Advertising, malware, bootstrapping social networks, future permissions
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
17
Android
• What do these permissions mean?
• Why does app need this permission?
• When does it usethese permissions?
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
18
Two Threads of Work
• Works in progress, feedback appreciated
• CrowdScanning– Crowdsourcing approach to understand
coarse-grain privacy perceptions of apps
• Gort– Tool for analysts
to understandfine-grain app behaviors
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
19
CrowdScanning Core Ideas
• Idea 1: find the gap between what people expect an app to do and what it actually does
• Idea 2: use crowdsourcing to do this (crowdsource privacy)
Lin et al, Expectation and Purpose: Understanding User’s Mental Models of Mobile App Privacy thru Crowdsourcing. Ubicomp 2012.
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
20
Nissan Maxima Gear Shift
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
21
Privacy as Expectations
• Apply this same idea of mental models for privacy– Compare what people expect an app
to do vs what an app actually does– Emphasize the biggest gaps,
misconceptions that many people had
App Behavior(What an app actually does)
User Expectations(What people think
the app does)
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
22
Crowdsourcing Privacy
• Few people read privacy policies– We want to install the app– Reading policies not part of main task– Complexity of these policies (the pain!!!)– Clear cost (time) for unclear benefit
• Crowdsourcing can mitigate these problems
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
23
10% users were surprised this app wrote contents to their SD card.
25% users were surprised this app sent their approximate location to dictionary.com for searching nearby words.
85% users were surprised this app sent their phone’s unique ID to mobile ads providers.
0% users were surprised this app could control their audio settings.
See all
90% users were surprised this app sent their precise location to mobile ads providers.
95% users were surprised this app sent their approximate location to mobile ads providers.
95% users were surprised this app sent their phone’s unique ID to mobile ads providers.
See all
0% users were surprised this app can control camera flashlight.
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
24
Our Study on App Privacy
• Showed crowd workers screenshots anddescription of app (from Google Play)– 56 of top 100 Android Apps
• Showed permissions one at a time– Only those related to privacy
• Expectation Condition– Why they think the app uses permission– How comfortable they were with it
• Purpose Condition– We gave an explanation (based on our analysis)– How comfortable they were with it
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
25
Our Study on App Privacy
• Participants– Recruited from Mturk, US people only– Asked what version of Android OS they used– Between-subjects (one condition only)
• Method– Only 56 of top 100 apps requested use of
unique phone ID, contact list, or location• Led to a total of 134 app-resource pairs
– 20 participants per pair per condition• 2*20*134 = 5360 tasks
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
26
Results for Location Data (N=20 per app, Expectations Condition)
App Comfort Level (-2 – 2)
Maps 1.52
GasBuddy 1.47
Weather Channel 1.45
Foursquare 0.95
TuneIn Radio 0.60
Evernote 0.15
Angry Birds -0.70
Brightest Flashlight Free -1.15
Toss It -1.2
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
27
Most Unexpected Uses(N=20 per app, Expectations Condition)
• Found strong correlation between expectations & comfort level (r=0.91)
Apps using Contact List Comfort Level (-2 – 2)
Backgrounds HD Wallpaper -1.35
Pandora -0.70
GO Launcher EX -0.75
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
28
Showing Purpose Lowers Concerns
• All differences statistically significant• Big increases for dictionary, Shazam,
Air Control Lite, and others (> 1.0)
App Comfort w/ Purpose
Comfort w/o Purpose
Device ID 0.47 (=0.30) -0.10 (=0.41)
Contact List 0.66 (=0.22) 0.16 (=0.54)
Network Location 0.90 (=0.53) 0.65 (=0.55)
GPS Location 0.72 (=0.62) 0.35 (=0.73)
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
29
Scaling Up CrowdScanning
• It took ~2 wks to crowdsource 56 apps• 700k+ apps for iOS & Android markets
• Idea: Use static & dynamic analysis + clustering for privacy models of apps– Ex. “Games uses location” -1.3– Ex. “Uses location for map” +0.5
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
30
Scaling Up CrowdScanningCrawled Data Set
• Crawled 171k apps from Google Play– App name– Category (Arcade, Finance, etc)– Number of downloads– Average user rating (1-5)– Rating distribution– Price– Content Rating– 13M user reviews
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
31
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
32
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
33
Scaling Up CrowdScanningStatic Analysis of Apps
• Starting assumptions:– Most apps use third-party libraries– When sensitive data is used, b/c libraries
• Ex. Location sent to ad server via library• Ex. Location sent to Google for maps
• Understanding what libraries app uses and how they are used can offer us richer semantics and explanations
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
34
Scaling Up CrowdScanningLibraries are Major Point of Leverage
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
35
Scaling Up CrowdScanningStatic Analysis of Apps
• Features extracted:– Libraries used– Network conn (in library or in main code)– Permissions (in library or main code)
• 124k apps processed– Uses PyDev (Python for Eclipse) and
AndroGuard (reverse eng apps)– 5 Amazon EC2 instances, 30 secs / app
• Will crowdsource core set of 400 apps and build models to predict privacy
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
36
Scaling Up CrowdScanningTangent: Analyzing App Comments• Linear regression of most common
words to 5-star ratings– Out of 1M comments, 8% of dataset– Only 0.09% comments related to privacy
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
37
Two Threads of Work
• CrowdScanning– Crowdsourcing approach to understand
coarse-grain privacy perceptions of apps
• Gort– Tool for analysts
to understandfine-grain app behaviors
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
38
Gort App Analysis Tool
• Goal of Gort is to help analysts understand and vet behaviors of apps– Journalists– Privacy advocates– Three letter agencies
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
39
Example Comparison
• CrowdScanning: Yelp uses location• Gort: When (what screens) and why?
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
40
Gort v1Control
Flow GraphCurrentScreen
Serverscontacted
HTTPdetails
HTTPrequests
Marketdescription
Permissionsused
Personaldata sent
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
41
Gort v2 Envisioned Workflow
• Start with a pool of apps• Use heuristics to flag unusual behaviors
to direct analyst’s attention– Static and dynamic heuristics
• See overview of apps, view individual apps, check odd behaviors and context (screens)
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
42
Gort v2 Heuristics for Apps
• Interviewed 13 experts– Asked what characteristics and behaviors
they would check to vet an app– Got ~100 heuristics, still organizing them
Network• Sends password
w/o SSL• Connects to
fixed IP address
Permissions• Contact List• Location but not
for maps or ads• Uses mic
Phone / SMS• SMS to fixed /
premium num• Forwards SMS
to server
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
43
Traversing Screens in Apps
• Have to traverse app for some heuristics– Ex. when exactly does the app use location?– Also want to capture screenshots
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
44
Traversing Screens in Apps
• General case is fairly easy– Breadth-first-search from home screen– Uses TEMA to get widgets on screen– Use Android’s MonkeyRunner to simulate
input and get screenshots
• But lots of exception cases…
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
45
Some Hard Cases for Traversal
Dialogs w/side effectsText InputsLogins
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
46
Some Hard Cases for Traversal
Changes tosystem env
AppUpdates
Randomizeddialogs
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
47
Scaling Up CrowdScanningMaking the Results Public
• What will we do with all these results?• Basic idea: deploy a web site
– Let public see results of our scans– Show privacy scores (and explanations)– Tell app developers how to fix their apps
• Awareness, Knowledge, Motivation
• Still early stages here, should have first iteration of site out end of May
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
48
Public Feedback to Date
• Slate• Yahoo News• MSNBC• Pittsburgh Tribune Review
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
49
Thanks!
More info at cmuchimps.orgor email [email protected]
Special thanks to:• Army Research Office• National Science Foundation• Alfred P. Sloan Foundation
• Google• CMU Cylab
Join our community for researchers at:www.reddit.com/r/pervasivecomputing
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
50
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
51
The Opportunity
• We are creating a worldwide sensor network with these smartphones
• We can now capture and analyze human behavior at unprecedented fidelity and scale
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
52
Summary
• Smartphones offer big opportunity to understand human behavior at unprecedented fidelity and scale
• Augmented Social Graph• Urban Analytics• CrowdScanning
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
53
Reach of Apps Growing
Finances Automobiles Homes
©2
01
3 C
arn
eg
ie M
ello
n U
niv
ers
ity :
54
Reach of Apps Growing
Top Related