Securing and Personalizing Commerce Using Identity Data Mining

31
Using Identity Data Mining Securing & Personalizing Commerce Jonathan LeBlanc Developer Evangelist (PayPal) Github: http://github.com/jcleblanc Twitter: @jcleblanc

description

As we are witnessing our society becoming increasingly more reliant on mobile technology, so are we seeing the mobilization of money. In this new realm of commerce, online identity is becoming significantly more important. As a payment is processed, it becomes incredibly important to not only understand who a person is, but also to understand what their broader interests and preferences are so that personalized experiences, suggesting new content and merchandise, may be delivered on an individual level.

Transcript of Securing and Personalizing Commerce Using Identity Data Mining

Page 1: Securing and Personalizing Commerce Using Identity Data Mining

Using Identity Data Mining

Securing & Personalizing Commerce

Jonathan LeBlancDeveloper Evangelist (PayPal)

Github: http://github.com/jcleblancTwitter: @jcleblanc

Page 2: Securing and Personalizing Commerce Using Identity Data Mining

The Problem

Commerce Relies on Static Data Contributions

Page 3: Securing and Personalizing Commerce Using Identity Data Mining

Premise

You can determine the personality profile of a person based on their usage habits

Personalization == Security

Page 4: Securing and Personalizing Commerce Using Identity Data Mining

Technology was the Solution!

Page 5: Securing and Personalizing Commerce Using Identity Data Mining

Then I Read This…

Us & Them

The Science of Identity

By David Berreby

Page 6: Securing and Personalizing Commerce Using Identity Data Mining

The Different States of Knowledge

What a person knows

What a person knows they don’t know

What a person doesn’t know they don’t know

Page 7: Securing and Personalizing Commerce Using Identity Data Mining

Technology was NOT the Solution

Identity and discovery are

NOT a technology solution

Page 8: Securing and Personalizing Commerce Using Identity Data Mining

Our Subject Material

Page 9: Securing and Personalizing Commerce Using Identity Data Mining

Our Subject Material

HTML content is poorly structured

There are some pretty bad web practices on the interwebz

You can’t trust that anything semantically valid will be present

Page 10: Securing and Personalizing Commerce Using Identity Data Mining

How We’ll Capture This Data

Start with base linguistics

Extend with available extras

Page 11: Securing and Personalizing Commerce Using Identity Data Mining

The Com

ponents

Page 12: Securing and Personalizing Commerce Using Identity Data Mining

The Basic Pieces

Page Data

Scrapey Scrapey

Keywords Without all

the fluff

WeightingWord diets

FTW

Page 13: Securing and Personalizing Commerce Using Identity Data Mining

Capture Raw Page Data

Semantic data on the webis sucktastic

Assume 5 year olds built the sites

Language is the key

Page 14: Securing and Personalizing Commerce Using Identity Data Mining

Extract Keywords

We now have a big jumble of words. Let’s extract

Why is “and” a top word? Stop words = sad panda

Page 15: Securing and Personalizing Commerce Using Identity Data Mining

Weight Keywords

All content is not created equal

Meta and headers and semantics oh my!

This is where we leech off the work of others

Page 16: Securing and Personalizing Commerce Using Identity Data Mining

Simple

Ext

ract

ion E

ngine

Page 17: Securing and Personalizing Commerce Using Identity Data Mining

Questions to Keep in Mind

Should I use regex to parse web content?

How do users interact with page content?

What key identifiers can be monitored to detect interest?

Page 18: Securing and Personalizing Commerce Using Identity Data Mining

Fetching the Data: The Request

$html = file_get_contents('URL');

$c = curl_init('URL');

The Simple Way

The Controlled Way

Page 19: Securing and Personalizing Commerce Using Identity Data Mining

Fetching the Data: cURL$req = curl_init($url);

$options = array( CURLOPT_URL => $url, CURLOPT_HEADER => $header, CURLOPT_RETURNTRANSFER => true, CURLOPT_FOLLOWLOCATION => true, CURLOPT_AUTOREFERER => true, CURLOPT_TIMEOUT => 15, CURLOPT_MAXREDIRS => 10 );

curl_setopt_array($req, $options);

Page 20: Securing and Personalizing Commerce Using Identity Data Mining

//list of findable / replaceable string characters $find = array('/\r/', '/\n/', '/\s\s+/'); $replace = array(' ', ' ', ' '); //perform page content modification $mod_content = preg_replace('#<script(.*?)>(.*?)</ script>#is', '', $page_content); $mod_content = preg_replace('#<style(.*?)>(.*?)</     style>#is', '', $mod_content);

$mod_content = strip_tags($mod_content);$mod_content = strtolower($mod_content);$mod_content = preg_replace($find, $replace, $mod_content); $mod_content = trim($mod_content);$mod_content = explode(' ', $mod_content);

natcasesort($mod_content);

Page 21: Securing and Personalizing Commerce Using Identity Data Mining

//set up list of stop words and the final found stopped list$common_words = array('a', ..., 'zero'); $searched_words = array();

//extract list of keywords with number of occurrences foreach($mod_content as $word) { $word = trim($word); if (preg_match('/[^a-zA-Z]/', $word) == 1){ $word = ''; }     if(strlen($word) > 2 && !in_array($word, $common_words)){         $searched_words[$word]++;     } }

arsort($searched_words, SORT_NUMERIC);

Page 22: Securing and Personalizing Commerce Using Identity Data Mining

Scraping Site Meta Data

//load scraped page data as a valid DOM document $dom = new DOMDocument(); @$dom->loadHTML($page_content);

//scrape title $title = $dom->getElementsByTagName("title"); $title = $title->item(0)->nodeValue;

Page 23: Securing and Personalizing Commerce Using Identity Data Mining

//loop through all found meta tags $metas = $dom->getElementsByTagName("meta"); for ($i = 0; $i < $metas->length; $i++){ $meta = $metas->item($i);   if($meta->getAttribute("property")){ if ($meta->getAttribute("property") == "og:description"){       $dataReturn["description"] = $meta->getAttribute("content");     }   } else { if($meta->getAttribute("name") == "description"){      $dataReturn["description"] = $meta->getAttribute("content");     } else if($meta->getAttribute("name") == "keywords”){       $dataReturn[”keywords"] = $meta->getAttribute("content");     }   } }

Page 24: Securing and Personalizing Commerce Using Identity Data Mining

Extendin

g the E

ngine

Page 25: Securing and Personalizing Commerce Using Identity Data Mining

Weighting Important Data

Tags you should care about: meta (include OG), title, description, h1+, header

Bonus points for adding in content location modifiers

Page 26: Securing and Personalizing Commerce Using Identity Data Mining

Weighting Important Tags

//our keyword weights$weights = array("keywords" => "3.0",                             "meta" => "2.0",                             "header1" => "1.5",                             "header2" => "1.2");

//add modifier hereif(strlen($word) > 2 && !in_array($word, $common_words)){     $searched_words[$word]++; }

Page 27: Securing and Personalizing Commerce Using Identity Data Mining

Expanding to Phrases

2-3 adjacent words, making up a direct relevant callout

Seems easy right? Just like single words

Language gets wonky without stop words

Page 28: Securing and Personalizing Commerce Using Identity Data Mining

Working with Unknown Users

The majority of users won’t be immediately targetable

Use HTML5 LocalStorage & Cookie backup

Page 29: Securing and Personalizing Commerce Using Identity Data Mining

Adding in Time Interactions

Interaction with a site does not necessarily mean interest in it

Time needs to also include an interaction component

Gift buying seasons see interest variations

Page 30: Securing and Personalizing Commerce Using Identity Data Mining

Grouping Using Commonality

InterestsUser A

InterestsUser B

Inte

rests

Com

mon

Page 31: Securing and Personalizing Commerce Using Identity Data Mining

www.slideshare.com/jcleblanc

Thank You! Questions?

Jonathan LeBlancDeveloper Evangelist (PayPal)

Github: http://github.com/jcleblancTwitter: @jcleblanc