Likes and Locations - Adventure in Social Data Mining
-
Upload
gene-chuang -
Category
Technology
-
view
3.782 -
download
2
Transcript of Likes and Locations - Adventure in Social Data Mining
Likes and LocationsAdventure in Social Data Mining
Gene Chuang – Exec Dir of Social Eng, ATTi
Masahji Stewart – Founder, Synctree
Q CTO Dinner 4/6/11 – Lawry’s Beverly Hills, CA
Dedication
Background
Social Local Mobile Loco
Why Mine Social and Local Data?
• Signals to improve user experience
• Timely and “Placely”
• Engagement
• Provide value – save time, save money
• Opt In, Privacy
Yp.com Infrastructure
• Ruby on Rails for Web, Login and API
• Solr/Lucene for Search
• Hadoop for Data pipeline
• Hive for Ad Hoc queries on Hadoop
• Ruby ETL scripts
Oauth 2
• Oauth 2 is an open protocol that allows users to share their private resources (e.g. photos, videos, contact lists) stored on one site with another site without having to hand out username and password – instead they hand out tokens
• Think Valet Key
YP.com Login/Registration
Login Layer
A
Oauth 2 Dance
Semi-Social Search
Social Mining - ExtractExtract Script
Pull data out of a database (like Oracle), Hive, Files, hit Facebook,or any other source and output JSON data to STDOUT:
For example to get count of the total users signed up by day:$ RAILS_ENV=production sdm extract total-users-by-day 2011-02-14{"day":"2011-02-14","count":891,"total":1328636}{"day":"2011-02-15","count":1088,"total":1329724}{"day":"2011-02-16","count":1016,"total":1330740}{"day":"2011-02-17","count":1359,"total":1332099}{"day":"2011-02-18","count":1143,"total":1333242}{"day":"2011-02-19","count":660,"total":1333902}{"day":"2011-02-20","count":597,"total":1334499}{"day":"2011-02-21","count":874,"total":1335373}
Social Mining - Transform
Transform scripts take JSON data in via STDIN and print JSON data out to STDOUT
For example, to add ypids to existing facebook likes then filter out location and ypidmatching data:
$ cat data/facebook_likes_2011_01_12.json | sdm transform add-ypid | sdm transform filter-fields name phone location ypid_best_match ypids ypid_match_results id{"name":"Snuggle Bunnies","location":{"city":"Carlisle","zip":"45005","country":"United States","state":"OH"},"id":"106864249335072","ypid_match_results":[]}{"name":"Associate Construction","location":{"city":"Franklin","zip":"45005","country":"United States","street":"31 Eagle Court","state":"OH"},"id":"235027821862","ypid_best_match":"6197197","phone":"(937)-746-2932"}{"name":"PH Bistro","location":{"city":"Franklin","zip":"45005","country":"United States","street":"543 S Main Street","state":"OH"},"id":"261032274490","ypid_best_match":"1120570","phone":"(937)-743-0069"}{"name":"Bullwinkle's Top Hat Bistro - Miamisburg, OH","location":{"city":"Miamisburg","zip":"45342-2312","country":"United States","street":"19 North Main St","state":"OH"},"id":"260274607015","ypid_best_match":"12255503","phone":"(937)-859-7677"}
Social Mining - LoadLoad
Load scripts read data in from STDIN and load it into another system (an example of this would be a dashboard)
For example loading total facebook accounts by day into the web dashboard$ sdm extract total-fb-accounts-by-day 2011-01-10 | sdm load dashboard total_fb_accounts day total
Location Real-Time Fuzzy MatcherFP0 (exact match)
Append LISTING_NAME + ADDRESS + CITY + PHONETokenize, normalize, strip punctuation, and stemAppend tokens
FP3 (fuzzy match)
Append LISTING_NAME + ADDRESS + CITY + PHONETokenize, normalize, strip punctuation, and stemRemove tokens that are less than 2 chars longRemove upper-case short tokens (i.e., MD, CPA, DDS, etc)Remove non-phone, short, numerical tokens Remove stopwords based on top 170 most occurring listing_name tokensOrder tokens alphabeticallyAppend tokens
Example:Vijay K. Sammy CPA, LLC153 Orchard StElmwood Park NJ - 07407(201) 218-0710
FP Method Value FP0 vijaiksammicpallc153orchardstelmwoodpark2012180710 FP3 0710201218elmwoodorchardparksammistvijai
Social Data
• Valid Facebook Access Tokens: 14K
• Total Unique Likes: 300K
• % Likes with Locations and/or Phones: 19%
• % Likes mapped to YPID: 38%
• Total Check-Ins: 530
Social Mining Mother Lode
• Social Search
• Local Recommendation Engine
• Discovery Wall
• Top 10 List
• Social e-Commerce
• Online Presence Management – Social CRM
Questions?
• http://www.twitter.com/genechuang
• http://www.quora.com/Gene-Chuang
• http://www.linkedin.com/in/genechuang