Bootstrapping Recommendations OSCON 2015

65
Bootstrapping Recommendations with Neo4j OSCON

Transcript of Bootstrapping Recommendations OSCON 2015

Bootstrapping  Recommendations with  Neo4j

OSCON

About  Me

• Max  De  Marzi  -­‐  Neo4j  Field  Engineer    

• My  Blog:  http://maxdemarzi.com  • Find  me  on  Twitter:  @maxdemarzi  • Email  me:  [email protected]  • GitHub:  http://github.com/maxdemarzi

Big  Data  -­‐  What  is  it  good  for?

• Absolutely  Nothing!

• Benchmarks Is  this  performing  better  then  that?  Yes,  why?  Uh.  • Recommendations You  should  buy  this  right  now.  • Predictions You  will  probably  buy  this.

Top  10  Recommendations

• PopularityThe  naive  approach One  size  fits  most

Naive  Approach

I’m  getting  little  Timmy  some  “Cards  Against  Humanity”  

Content  Based  Recommendations

• Step  1:  Collect  Item  Characteristics  • Step  2:  Find  similar  Items  • Step  3:  Recommend  Similar  Items  

• Example:  Similar  Movie  Genres

There  is  more  to  life  than  Romantic  Zombie-­‐coms

Collaborative  Filtering  Recommendations

• Step  1:  Collect  User  Behavior  • Step  2:  Find  similar  Users  • Step  3:  Recommend  Behavior  taken  by  similar  users  

• Example:  People  with  similar  musical  tastes

You  are  so  original!

Using  Relationships  for  Recommendations

Content-­‐based  filtering  Recommend  items  based  on  what  users  have  liked  in  the  past  

Collaborative  filtering    Predict  what  users  like  based  on  the  similarity  of  their  behaviors,  activities  and  preferences  to  others  

Movie

Person

Person

RATED

SIMILARITY

rating:  7

value:  .92

Hybrid  Recommendations

• Combine  the  two  for  better  results  

• Like  Peanut  Butter  and  Jelly

Benefits  of  Real-­‐Time  Recommendations

Online  Retail  • Suggest  related  products  and  services  • Increase  revenue  and  engagement  

Media  and  Broadcasting  • Create  an  engaging  experience  • Produce  personalized  content  and  offers  

Logistics  • Recommend  optimal  routes  • Increase  network  efficiency

Challenges  for  Real-­‐Time  Recommendations

Make  effective  real-­‐time  recommendations  • Timing  is  everything  in  point-­‐of-­‐touch  applications  • Base  recommendations  on  current  data,  not  last  night’s  batch  load  

Process  large  amounts  of  data  and  relationships  for  context  • Relevance  is  king:  Make  the  right  connections  • Drive  traffic:  Get  users  to  do  more  with  your  application  

Accommodate  new  data  and  relationships  continuously  • Systems  get  richer  with  new  data  and  relationships  • Recommendations  become  more  relevant

Relational  vs.  Graph  Models

Relational  Model Graph  Model

RATED

RATED

RATED

MAX

Person MovieRatings

MAXTerminator

Toy  Story

Titanic

Cypher  Query  Language

MATCH  (:Person  {  name:“Dan”}  )  -­‐[:KNOWS]-­‐>  (:Person  {  name:“Ann”}  )  

KNOWS

Dan Ann

Label Property Label Property

Node Node

MATCH  (boss)-­‐[:MANAGES*0..3]-­‐>(sub),              (sub)-­‐[:MANAGES*1..3]-­‐>(report)  WHERE  boss.name  =  “John  Doe”  RETURN  sub.name  AS  Subordinate,      count(report)  AS  Total

Express  Complex  Queries  Easily  with  Cypher

Find  all  direct  reports  and  how  many  people  they  manage,  

up  to  3  levels  down

Cypher  QuerySQL  Query

Hello  World  Recommendation

Hello  World  Recommendation

Movie  Data  Model

Cypher  Query:  Movie  Recommendation

MATCH  (watched:Movie  {title:"Toy  Story”})  <-­‐[r1:RATED]-­‐  ()  -­‐[r2:RATED]-­‐>  (unseen:Movie)  WHERE  r1.rating  >  7  AND  r2.rating  >  7  AND  watched.genres  =  unseen.genres  AND  NOT(  (:Person  {username:”maxdemarzi"})  -­‐[:RATED|WATCHED]-­‐>  (unseen)  )  RETURN  unseen.title,  COUNT(*)  ORDER  BY  COUNT(*)  DESC  LIMIT  25

What  are  the  Top  25  Movies  • that  I  haven't  seen  • with  the  same  genres  as  Toy  Story    • given  high  ratings  • by  people  who  liked  Toy  Story

Let’s  try  k-­‐nearest  neighbors  (k-­‐NN)

Cosine  Similarity

Cypher  Query:  Ratings  of  Two  Users

MATCH    (p1:Person  {name:'Michael  Sherman’})  -­‐[r1:RATED]-­‐>  (m:Movie),                                (p2:Person  {name:'Michael  Hunger’})  -­‐[r2:RATED]-­‐>  (m:Movie)  RETURN  m.name  AS  Movie,                                r1.rating  AS  `M.  Sherman's  Rating`,                                  r2.rating  AS  `M.  Hunger's  Rating`

What  are  the  Movies  these  2  users  have  both  rated

Cypher  Query:  Ratings  of  Two  UsersCalculating  Cosine  Similarity

Cypher  Query:  Cosine  Similarity  

MATCH  (p1:Person)  -­‐[x:RATED]-­‐>  (m:Movie)  <-­‐[y:RATED]-­‐  (p2:Person)  WITH    SUM(x.rating  *  y.rating)  AS  xyDotProduct,              SQRT(REDUCE(xDot  =  0.0,  a  IN  COLLECT(x.rating)  |  xDot  +  a^2))  AS  xLength,              SQRT(REDUCE(yDot  =  0.0,  b  IN  COLLECT(y.rating)  |  yDot  +  b^2))  AS  yLength,              p1,  p2  MERGE  (p1)-­‐[s:SIMILARITY]-­‐(p2)  SET      s.similarity  =  xyDotProduct  /  (xLength  *  yLength)

Calculate  it  for  all  Person  nodes  with  at  least  one  Movie  between  them

Movie  Data  Model

Cypher  Query:  Your  nearest  neighbors

MATCH  (p1:Person  {name:'Grace  Andrews’})  -­‐[s:SIMILARITY]-­‐  (p2:Person)  WITH    p2,  s.score  AS  sim  ORDER  BY  sim  DESC  LIMIT  5  RETURN    p2.name  AS  Neighbor,  sim  AS  Similarity

Who  are  the  • top  5  Persons  and  their  similarity  score  • ordered  by  similarity  in  descending  order  • for  Grace  Andrews

Your  nearest  neighbors

Cypher  Query:  k-­‐NN  Recommendation

MATCH  (m:Movie)  <-­‐[r:RATED]-­‐  (b:Person)  -­‐[s:SIMILARITY]-­‐  (p:Person  {name:'Zoltan  Varju'})  WHERE  NOT(  (p)  -­‐[:RATED]-­‐>  (m)  )  WITH  m,  s.similarity  AS  similarity,  r.rating  AS  rating  ORDER  BY  m.name,  similarity  DESC  WITH  m.name  AS  movie,  COLLECT(rating)[0..3]  AS  ratings  WITH  movie,  REDUCE(s  =  0,  i  IN  ratings  |  s  +  i)*1.0  /  LENGTH(ratings)  AS  recommendation  ORDER  BY  recommendation  DESC  RETURN  movie,  recommendation LIMIT  25

What  are  the  Top  25  Movies  • that  Zoltan  Varju  has  not  seen  • using  the  average  rating  • by  my  top  3  neighbors  

Recommendations  over  Searching/Browsing

Recommend  Jobs  to  Job  SeekersWhat  connects  them?  • location  • skills  • education  • experience

Cypher  Query:  Job  RecommendationWhat  are  the  Top  10  Jobs  for  me  • that  are  in  the  same  location  I’m  in  • for  which  I  have  the  necessary  qualifications

Job  Recommendation  ResultsPerfect  Candidate  for  100%  matches    • missing  qualifications  can  be  added  quickly  • might  encourage  exaggerated  resumes    

Just  one  tiny  itsy  bitsy  problem

Job  Boards  get  paid  by  • Number  of  Applicants  to  a  Job  • Wholesale  Resume  sales  • Selling  your  data  

Recommend  LoveFind  your  soulmate  in  the  graph    • Are  they  energetic?  • Do  they  like  dogs?  • Have  a  good  sense  of  humor?  • Neat  and  tidy,  but  not  crazy  about  it?

What  are  the  Top  10  Potential  Mates  for  me  • that  are  in  the  same  location  • are  sexually  compatible  • have  traits  I  want    • want  traits  I  have

Cypher  Query:  Love  Recommendation

Love  Recommendation  Results

Linked  Data

Connect  to  the    Semantic  Web

Bootstrapping  your  Recommendation  Engine

• Data    • Data  • Data

The  Concept  of  Sushi

What  else  is  Delicious?

Getting  some  Data

graphipedia

https://github.com/mirkonasato/graphipedia

neo4j-­‐dbpedia-­‐importer

https://github.com/kbastani/neo4j-­‐dbpedia-­‐importer

Named  Entity  RecognitionAutomatically  find  • names  of  people  • place  and  locations  • products  • and  organizations

Hacker  News  for  Example

• What  are  the  kids  in  silicon  valley  talking  about?

Let’s  find  out

• They  have  an  API!  • Get  some  data:StoriesUsersAuthors Commenters

Data  Model

Hacker  News  Recommendations

• Which  stories  should  I  read?  • Which  users  should  I  follow?  • What  else  should  I  be  interested  in?  • Who  seems  to  know  a  lot  about  X?  • Etc.

GraphAware  Recommendation  Framework

• Ability  to  trade  off  recommendation  quality  for  speed  • Ability  to  pre-­‐compute  recommendations  • Built-­‐in  algorithms  and  functions  • Ability  to  measure  recommendation  quality  • Ability  to  easily  run  in  A/B  test  environments

Real-­‐Time  Recommendations  with  Neo4j

SocialRecommendations

Products   and  Services Content Routing

Walmart        BUSINESS  CASE

World’s  largest  companyby  revenue  

World’s  largest  retailer  and  private  employer  

SF-­‐based  global  e-­‐commerce  division  

manages  several  websites  

Found  in  1969Bentonville,  Arkansas  

• Needed  online  customer  recommendations  to  keep  pace  with  competition  

• Data  connections  provided  predictive  context,  but  were  not  in  a  usable  format  

• Solution  had  to  serve  many  millions  of  customers  and  products  while  maintaining  superior  scalability  and  performance

Walmart        SOLUTION

• Brings  customers,  preferences,  purchases,  products  and  locations  into  a  graph  model  

• Uses  connections  to  make  product  recommendations  

• Solution  deployed  across  WalMart  divisions  and  websites

Global  Courier        BUSINESS  CASE

World’s  largest  courier  

480,000  employees€55  billion  in  revenue    

Needed  new   B2C  and  B2B  parcel  routing  

system  for  its  logistics  practice  

Legacy  system  neither  supported  the  full  network  

nor  the  shift  to  online  demands

Needed  to  replace  aging  B2B  and  B2C  parcel  routing  system  whose  requirements  include:  • 24x7  availability  • Peak  loads  of  5M  parcels  per  day,  3K  per  second  • Support  for  complex  and  diverse  software  stack  • Predictable  performance  with  linear  scalability  • Daily  changes  to  logistics  networks  • Route  from  any  point  to  any  point  • Single  point  of  truth  for  entire  network

Global  Courier        SOLUTION

Neo4j  provides  the  ideal  domain  fit  since  a  logistics  network  is  a  graph  • High  availability  and  performance  via  Neo4j  clustering  

• Greatly  simplified  Cypher  queries  for  routing  versus  relational  SQL  queries  

• Flexible  data  model  that  reflects  the  real  logistics  world  far  better  than  relational  

• Easy-­‐to-­‐grasp  whiteboard-­‐friendly  model

eBay        BUSINESS  CASE

C2C  and  B2C retail  network  

Full  e-­‐commerce  functionality  for  individuals  

and  businesses  

Integrated  with  logistics  vendors  for  product  

deliveries

• Needed  an  offering  to  compete  with  Amazon  Prime  

• Enable  customer-­‐selected  delivery  inside  90  minutes  

• Calculate  best  route  option  in  real-­‐time  • Scale  to  enable  a  variety  of  services  • Offer  more  predictable  delivery  times

eBay  Now          SOLUTION

• Acquired  UK-­‐based  Shutl.  a  leader  in  same-­‐day  delivery  

• Used  Neo4j  to  create  eBay  Now  • 1000  times  faster  than  the  prior   MySQL-­‐based  solution  

• Faster  time-­‐to-­‐market  • Improved  code  quality  with  10  to  100  times  less  query  code

Classmates        BUSINESS  CASE

Online  yearbook  connecting  friends  from  school,  work  and  military  

in  US  and  Canada  

Founded  as   Memory  Lane  in  Seattle  

Develop  new  social  networking  capabilities  to  monetize  yearbook-­‐related  offerings  • Show  all  the  people  I  know  in  a  yearbook  • Show  yearbooks  my  friends  appear  in  most  often  • Show  sections  of  a  yearbook  that  my  friends  appear  most  in  

• Show  me  other  schools  my  friends  attended

Classmates        SOLUTION

Neo4j  provides  a  robust  and  scalable  graph  database  solution  • 3-­‐instance  cluster  with  cache  sharding  and  disaster-­‐recovery  

• 18ms  response  time  for  top  4  queries  • 100M  nodes  and  600M  relationships  in  initial  graph—including  people,  images,  schools,  yearbooks  and  pages  

• Projected  to  grow  to  1B  nodes  and  6B  relationships

National  Geographic        BUSINESS  CASE

Non-­‐profit  scientific  and  educational  institution  

founded  in  1888  

Covers  geography,  archaeology,  natural  science,  environment  and  historical  

conservation  

Journals,  online  media,   radio,  TV,  documentaries,   live  events  and  consumer  

content  and  goods

• Improve  poor  performance  of  PostgreSQL  app  • Increase  user  engagement  by  linking  to  100+  years  of  multimedia  content    

• Improve  targeting  by  understand  subscribers’  interests  better  

• Recommend  content  and  services  to  users  based  on  their  interests

National  Geographic        SOLUTION

• Enabled  complex  real-­‐time  analytics  across  eight  million  users  and  a  century  of  content  

• Delivered  robust  performance  by  eliminating  triple-­‐nested  SQL  joins    

• Cross-­‐refers  users  among  content,  live  events,  travel,  goods  and  causes  

• Neo4j  solution  much  less  cumbersome  and  easier  to  maintain  than  previous  SQL  system

Curaspan        BUSINESS  CASE

Leader  in  patient  management  for  discharges  

and  referrals  Manages  patient  referrals  4600+  health  care  facilities  Connects  providers,  payers  via  web-­‐based  patient  management  platform  Founded  in  1999  in  

Newton,  Massachusetts

• Improve  poor  performance  of  Oracle  solution  

• Support  more  complexity  including  granular,  role-­‐based  access  control  

• Satisfy  complex  Graph  Search  queries  by  discharge  nurses  and  intake  coordinators  Find  a  skilled  nursing  facility  within  n  miles  of  a  given  location,  belonging  to  health  care  group  XYZ,  offering  speech  therapy  and  cardiac  care,  and  optionally  Italian  language  services

Curaspan        SOLUTION• Met  fast,  real-­‐time  performance  demands  

• Supported  queries  span  multiple  hierarchies  including  provider  and  employee-­‐permissions  graphs  

• Improved  data  model  to  handle  adding  more  dimensions  to  the  data  such  as  insurance  networks,  service  areas  and  care  organizations  

• Greatly  simplified  queries,  simplifying  multi-­‐page  SQL  statements  into  one  Neo4j  function

FiftyThree      BUSINESS  CASE

Maker  of  Paper,   one  of  the  top  apps  

in  Apple’s  App  Store,  with  millions  of  users  

Based  in  New  York  City

• Add  social  capabilities  to  digital-­‐paper  app  • Support  social  collaboration  across  millions  of  users  in  new  Mix  app  

• Enable  seamless  interaction  between  social  and  content-­‐asset  networks  

• Ensure  new  apps  are  robust,  scalable  and  fast

FiftyThree        SOLUTION

• Neo4j  data  model  ideal  for  social  network,  content  management  and  access  control  • Users  create,  publish  and  share  designs  simply  • Easy  to  develop  and  evolve  Neo4j-­‐based  app  • Integrates  well  with  FiftyThree  EC2  architecture  

See  the  Neo4j  solution  in  action  Betting  the  Company  (Literally)  on  a  Graph  Databasehttp://aseemk.com/talks/neo4j-­‐lessons-­‐learned#/

App  Store  Editor’s  Choice2012  iPad  App  of  Year Apple  Best  Apps  of  2014

Questions

• How  does  Neo4j  fit  into  my  existing  infrastructure? As  a  Service.  

• Will  Neo4j  scale? Yes.