Securing and Personalizing Commerce Using Identity Data Mining

Using Identity Data Mining

Securing & Personalizing Commerce

Jonathan LeBlancDeveloper Evangelist (PayPal)

Github: http://github.com/jcleblancTwitter: @jcleblanc

The Problem

Commerce Relies on Static Data Contributions

Premise

You can determine the personality profile of a person based on their usage habits

Personalization == Security

Technology was the Solution!

Then I Read This…

Us & Them

The Science of Identity

By David Berreby

The Different States of Knowledge

What a person knows

What a person knows they don’t know

What a person doesn’t know they don’t know

Technology was NOT the Solution

Identity and discovery are

NOT a technology solution

Our Subject Material

HTML content is poorly structured

There are some pretty bad web practices on the interwebz

You can’t trust that anything semantically valid will be present

How We’ll Capture This Data

Start with base linguistics

Extend with available extras

The Com

ponents

The Basic Pieces

Page Data

Scrapey Scrapey

Keywords Without all

the fluff

WeightingWord diets

Capture Raw Page Data

Semantic data on the webis sucktastic

Assume 5 year olds built the sites

Language is the key

Extract Keywords

We now have a big jumble of words. Let’s extract

Why is “and” a top word? Stop words = sad panda

Weight Keywords

All content is not created equal

Meta and headers and semantics oh my!

This is where we leech off the work of others

Simple

Questions to Keep in Mind

Should I use regex to parse web content?

How do users interact with page content?

What key identifiers can be monitored to detect interest?

Fetching the Data: The Request

$html = file_get_contents('URL');

$c = curl_init('URL');

The Simple Way

The Controlled Way

Fetching the Data: cURL$req = curl_init($url);

$options = array( CURLOPT_URL => $url, CURLOPT_HEADER => $header, CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_AUTOREFERER => true, CURLOPT_TIMEOUT => 15, CURLOPT_MAXREDIRS => 10 );

curl_setopt_array($req, $options);

//list of findable / replaceable string characters $find = array('/\r/', '/\n/', '/\s\s+/'); $replace = array(' ', ' ', ' '); //perform page content modification $mod_content = preg_replace('#<script(.*?)>(.*?)</ script>#is', '', $page_content); $mod_content = preg_replace('#<style(.*?)>(.*?)</ style>#is', '', $mod_content);

$mod_content = strip_tags($mod_content);$mod_content = strtolower($mod_content);$mod_content = preg_replace($find, $replace, $mod_content); $mod_content = trim($mod_content);$mod_content = explode(' ', $mod_content);

natcasesort($mod_content);

//set up list of stop words and the final found stopped list$common_words = array('a', ..., 'zero'); $searched_words = array();

//extract list of keywords with number of occurrences foreach($mod_content as $word) { $word = trim($word); if (preg_match('/[^a-zA-Z]/', $word) == 1){ $word = ''; } if(strlen($word) > 2 && !in_array($word, $common_words)){ $searched_words[$word]++; } }

arsort($searched_words, SORT_NUMERIC);

Scraping Site Meta Data

//load scraped page data as a valid DOM document $dom = new DOMDocument(); @$dom->loadHTML($page_content);

//scrape title $title = $dom->getElementsByTagName("title"); $title = $title->item(0)->nodeValue;

//loop through all found meta tags $metas = $dom->getElementsByTagName("meta"); for ($i = 0; $i < $metas->length; $i++){ $meta = $metas->item($i); if($meta->getAttribute("property")){ if ($meta->getAttribute("property") == "og:description"){ $dataReturn["description"] = $meta->getAttribute("content"); } } else { if($meta->getAttribute("name") == "description"){ $dataReturn["description"] = $meta->getAttribute("content"); } else if($meta->getAttribute("name") == "keywords”){ $dataReturn[”keywords"] = $meta->getAttribute("content"); } } }

Extendin

g the E

Weighting Important Data

Tags you should care about: meta (include OG), title, description, h1+, header

Bonus points for adding in content location modifiers

Weighting Important Tags

//our keyword weights$weights = array("keywords" => "3.0", "meta" => "2.0", "header1" => "1.5", "header2" => "1.2");

//add modifier hereif(strlen($word) > 2 && !in_array($word, $common_words)){ $searched_words[$word]++; }

Expanding to Phrases

2-3 adjacent words, making up a direct relevant callout

Seems easy right? Just like single words

Language gets wonky without stop words

Working with Unknown Users

The majority of users won’t be immediately targetable

Use HTML5 LocalStorage & Cookie backup

Adding in Time Interactions

Interaction with a site does not necessarily mean interest in it

Time needs to also include an interaction component

Gift buying seasons see interest variations

Grouping Using Commonality

InterestsUser A

InterestsUser B

www.slideshare.com/jcleblanc

Thank You! Questions?

Jonathan LeBlancDeveloper Evangelist (PayPal)

Github: http://github.com/jcleblancTwitter: @jcleblanc

Securing and Personalizing Commerce Using Identity Data Mining

Technology

Transcript of Securing and Personalizing Commerce Using Identity Data Mining

SECURING AWS ACCESS WITH MODERN IDENTITY SOLUTIONS … · SECURING AWS ACCESS WITH MODERN IDENTITY SOLUTIONS ... most applicable security controls for your business functions in the

Personalizing Through Programmatic

Mobile Electronic Identity: Securing Payment on Mobile Phones

CIS13: APIs, Identity, and Securing the Enterprise

Securing the Future with Physical Identity and Access ......Physical Identity and Access Management: Bridging the stakeholder gap Physical Identity and Access Management: ... Based

Keynote Lecture Securing our Identity: from Biometric Anti ...

B4 the identity of things-securing the internet of everything

NetIQ Identity Manager Security Guide€¦ · few security best practice guidelines for deploying and securing the Identity Manager system. Figure 1-1 Securing the Identity Manager

Personalizing LinkedIn Feed

Robust Software Tokens: Towards Securing a Digital Identity · 2001. 11. 3. · 0 Robust Software Tokens: Towards Securing a Digital Identity Taekyoung Kwon Taekyoung Kwon is with

Securing your Web Applications Subbaraju Uppalapati Manager, Software Engineering Identity & Security BU, Novell.

Securing 5G Through Cyber-Telecom Identity Federation

Identity Council - Secure Technology Alliance · securing identity information for proper use. Identity Council Activities ... • Logical Access Security: The Role of Smart Cards

Securing IoT Connected Device Applications - GOTO Blog · Securing IoT Connected Device Applications Ian Massingham Technology Evangelist, AWS IanMmmm. ... Amazon Cognito User Identity

Identity & Access Management - Securing Your Data in the 21st Century Enterprise

Optimizing and Personalizing Treatment for ADHD canada/MargaretWeiss.pdf · Optimizing and Personalizing Treatment for ADHD ... Insured vs. uninsured? 6. ... Optimizing and Personalizing

Identity education of Japanese-Brazilian peopleFesta junina ・Brazilian’s festival ・They used Japanese and Portuguese. ・The trouble they have securing their identity ・It’s

Securing Your Amazon Web Services Account Using …...Securing Your Amazon Web Services Account Using Identity and Access Management September 2014 Introduction Your application can

Securing Cloud Platforms with Project Lightwave · SECURING CLOUD PLATFORMS WITH PROJECT LIGHTWAVE Directory Services and Identity Management Lightwave is an extensible identity platform

Securing Customer Interaction Anytime, Anywhere - (Alexa Demo) - Identity Live 2017 - Austin