Click FraudsDealing with frauds at Appsflyer
Business
Advertiser
Business
Advertiser
Publisher
Business
Advertiser
Publisher
Business
Advertiser
Publisher
Click
Business
Advertiser
Publisher
Click
Install
Business
Advertiser
Publisher
Click
Install
Who’s the bad guy?
Advertiser
Publisher
Click
Install
Who’s the bad guy?
Advertiser
Publisher
Click
Install
What advertiser pays for?
Cost per impression
Cost per click
Cost per install
Cost per action
What advertiser pays for?
Cost per impression
Cost per click
Cost per install
Cost per action
What advertiser pays for?
Cost per impression
Cost per click
Cost per install
Fraud techniques of a different league
Less fraudulent installs than clicks/views as CPI is usually much higher
Cost per action
Fraud methods
Fraud methods
Programmatic (bots)
Fraud methods
Programmatic (bots)Humans
Fraud detection methods
Rule-based
Need expert knowledge of past fraud behaviour
Highly effective at detecting known fraud types
Ineffective at new types
Anomaly detection
Good for new kinds of deviations
Not good for known types of fraud
Supervised learning
Need examples of past fraud
Can be effective at detecting similar occurrences
Ineffective at new types of fraud
Rule-based
Unrecognized user agent string
Mozilla/4.0 (compatible; MSIE 4.5; Windows 98; )
Wrong IMEI
Too many applications installed from the same device
Frequent re-installs on a specific device
Save device installs from many different geographical locations
Inadequately short time between click and install
iOS app install receipt can’t be validated by iTunes
…
Anomaly detection
k-means clustering
Anomaly detection
k-means clustering
Anomaly detection
k-means clustering
Choosing features
Normally distributed values (or half-normally)
Normalizing data
Custom normalizer
StandardScaler (Spark >= 1.4)
Choose number of clusters
Iterate on different clusters number
Evaluate “clustering score”
Build k-means model
Find vectors with P(x) < 𝝴
k-means clustering
k-means clustering - parsing
k-means clustering - feature selection
k-means clustering - finding K
k-means clustering - find anomalies
Supervised learning
Logistic regression
Decision tree
Random forests
…
Training set {x1, x2, …., xN} -> E
Train the model
Validate, then train again..
Test
Apply!
Action items
Drop fraudulent requests
Pros:
Less traffic goes through the system
Cons:
False positives
Must capture all the frauds as they come in
Mark transactions, which are fraud (in our opinion)
Pros:
Let customer decide what to do
Allows offline fraud detection
Mixed approach
Thank you!
Top Related