Recommender Systems with Ruby (adding machine learning, statistics, etc)

60
Ruby in the world of recommendations (also machine learning, statistics and visualizations..) Marcel Caraciolo @marcelcaraciolo Developer, Cientist, contributor to the Crab recsys project, works with Python for 6 years, interested at mobile, education, machine learning and dataaaaa! Recife, Brazil - http://aimotion.blogspot.com Saturday, September 14, 2013

description

Talk lectured at Frevo On Rails Ruby Meeting at Recife/Pernambuco 14/09/2013

Transcript of Recommender Systems with Ruby (adding machine learning, statistics, etc)

Page 1: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Ruby in the world of recommendations

(also machine learning, statistics and visualizations..)

Marcel Caraciolo@marcelcaracioloDeveloper, Cientist, contributor to the Crab recsys project,works with Python for 6 years, interested at mobile,education, machine learning and dataaaaa!Recife, Brazil - http://aimotion.blogspot.com

Saturday, September 14, 2013

Page 2: Recommender Systems with Ruby (adding machine learning, statistics, etc)

FAÇA BACKUP!    NUNCA:  find  .  -­‐type  f  -­‐not  -­‐name  '*pyc'  |  xargs  rm

Saturday, September 14, 2013

Page 3: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Scientific Environment

Presentation & VisualizationExperimentation

(Re-Design)

Data AcquisitionData Analysis

Saturday, September 14, 2013

Page 4: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Where is Ruby?

Presentation & VisualizationExperimentation

(Re-Design)

Data AcquisitionData Analysis

Saturday, September 14, 2013

Page 5: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Where is Ruby?

Presentation & VisualizationExperimentation

(Re-Design)

Data AcquisitionData Analysis

Saturday, September 14, 2013

Page 6: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Where is Ruby?

Presentation & VisualizationExperimentation

(Re-Design)

Data AcquisitionData Analysis

Saturday, September 14, 2013

Page 7: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Where is Ruby?

Presentation & VisualizationExperimentation

(Re-Design)

Data AcquisitionData Analysis

Saturday, September 14, 2013

Page 8: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Where is Ruby?

Python launched at 1991; Ruby launched at 1995

Python was highly addopted and promoted by most of the research and

development team of Google

Saturday, September 14, 2013

Page 9: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Where is Ruby?Python lançado em 1991; Ruby lançado em 1995

Python foi altamente popularizado com a adoção oficial de boa parte do time de pesquisa do Google

Python has been an important key of Google since its beginning, and still continues as our infra-structure grows, we are always looking for more people with skills in this language.

Peter Norvig, Google, Inc.Saturday, September 14, 2013

Page 10: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Where is Ruby?

Python was famous even at some old scientific articles

Saturday, September 14, 2013

Page 11: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Where is Ruby?Ruby’s popularity exploded at 2004.

Focus on web

Django - 2005; Numpy - 2005; BioPython - 2001; SAGE - 2005;

Matplotlib- 2000;

Python

Saturday, September 14, 2013

Page 12: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Where is Ruby?Programming comes second to researchers, not

first like us. - “Ruby developer answer”

Python    [(x, x*x) for x in [1,2,3,4] if x != 3]

vs Ruby`[1,2,3,4].map { |x| [x, x*x] if x != 3 }`

vs Result    [(1,1), (2,4), (4,16)]

Saturday, September 14, 2013

Page 13: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Where is Ruby?

Ruby

Python

Saturday, September 14, 2013

Page 14: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Hey, Ruby has options!

Saturday, September 14, 2013

Page 15: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Hey, Ruby has options!

Saturday, September 14, 2013

Page 16: Recommender Systems with Ruby (adding machine learning, statistics, etc)

:(

Saturday, September 14, 2013

Page 17: Recommender Systems with Ruby (adding machine learning, statistics, etc)

:D

Saturday, September 14, 2013

Page 18: Recommender Systems with Ruby (adding machine learning, statistics, etc)

gem install nmatrix

git clone https://github.com/SciRuby/nmatrix.git

cd nmatrix/

bundle install

rake compile

rake repackage

gem install pkg/nmatrix-*.gem

Saturday, September 14, 2013

Page 19: Recommender Systems with Ruby (adding machine learning, statistics, etc)

>> NMatrix.new([2, 3], [0, 1, 2, 3, 4, 5], :int64).pp [0, 1, 2] [3, 4, 5]=> nil

>> m = N[ [2, 3, 4], [7, 8, 9] ]=> #<NMatrix:0x007f8e121b6cf8shape:[2,3] dtype:int32 stype:dense> >> m.pp [2, 3, 4] [7, 8, 9]

Depends on ATLAS/CBLAST and written mostly in C and C++

https://github.com/SciRuby/nmatrix/wiki/Getting-started

Saturday, September 14, 2013

Page 20: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Hey, Ruby has options!

Saturday, September 14, 2013

Page 21: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Data Visualization

•R•Gnuplot•Google Charts API•JFreeChart•Scruffy•Timetric•Tioga•RChart

Saturday, September 14, 2013

Page 22: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Data Visualizationrequire 'rsruby'

cmd = %Q

(

pdf(file = "r_directly.pdf"))

boxplot(c(1,2,3,4),c(5,6,7,8))

dev.off()

)

def gnuplot(commands)

IO.popen("gnuplot", "w") { |io| io.puts commands }

end

commands = %Q(

set terminal svg

set output "curves.svg"

plot [-10:10] sin(x), atan(x), cos(atan(x))

)

gnuplot(commands)

http://effectif.com/ruby/manor/data-visualisation-with-rubyhttps://github.com/glejeune/Ruby-Graphviz/

Saturday, September 14, 2013

Page 23: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Other tools•BioRuby

#!/usr/bin/env ruby require 'bio' # create a DNA sequence object from a Stringdna = Bio::Sequence::NA.new("atcggtcggctta") # create a RNA sequence object from a Stringrna = Bio::Sequence::NA.new("auugccuacauaggc") # create a Protein sequence from a Stringaa = Bio::Sequence::AA.new("AGFAVENDSA") # you can check if the sequence contains illegal characters# that is not an accepted IUB character for that symbol# (should prepare a Bio::Sequence::AA#illegal_symbols method also)puts dna.illegal_bases # translate and concatenate a DNA sequence to Protein sequencenewseq = aa + dna.translateputs newseq # => "AGFAVENDSAIGRL"

http://bioruby.org/Saturday, September 14, 2013

Page 24: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Other tools•RubyDoop (uses JRuby)

module  WordCount

   class  Reducer

       def  reduce(key,  values,  context)

           sum  =  0

           values.each  {  |value|  sum  +=  value.get  }

           context.write(key,  Hadoop::Io::IntWritable.new(sum))

       end

   end

end

https://github.com/iconara/rubydoop

module  WordCount

   class  Mapper

       def  map(key,  value,  context)

           value.to_s.split.each  do  |word|

               word.gsub!(/\W/,  '')

               word.downcase!

               unless  word.empty?

                   context.write(Hadoop::Io::Text.new(word),  Hadoop::Io::IntWritable.new(1))

               end

           end

       end

   end

end

Saturday, September 14, 2013

Page 25: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Coming back to the world of recommenders

The world is an over-crowded place

Saturday, September 14, 2013

Page 26: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Coming back to the world of recommenders

!"#$%&'()$*+$,-$&.#'/0'&%)#)$1(,0#

Saturday, September 14, 2013

Page 27: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Recommendation Systems

Systems designed to recommend to me something I may like

Saturday, September 14, 2013

Page 28: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Recommendation Systems!"#$%&"'$"'(')*#*+,)

-+*#)+. -#/') 0#)1#

2' 23&4"+')1 5,6 7),*%'"&863

!

Graph Representation

Saturday, September 14, 2013

Page 29: Recommender Systems with Ruby (adding machine learning, statistics, etc)

And how does it work ?

Saturday, September 14, 2013

Page 30: Recommender Systems with Ruby (adding machine learning, statistics, etc)

What the recommenders realy do ?

1. Predict how much you may like a certain product o service

2. It suggests a list of N items ordered by the level of your interests.

3. It suggests a N list o f users to a product/service

4. It explains to you why those items were recommended.

5. It adjusts the prediction and recommendations based on your feedback and from anothers.

Saturday, September 14, 2013

Page 31: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Content Based Filtering

Gone with the Wind

Die Hard

Similar

Armagedon ToyStore

Marcel

likesrecommends

Items

Users

Saturday, September 14, 2013

Page 32: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Problems with Content Recommenders

1. Restrict Data Analysis

3. Portfolio Effect

- Items and users mal-formed. Even worst in audio and images

- An person that does not have experience with Sushi does not get the recommendation of the best sushi in town.

- Just because I saw 1 movie of Xuxa when I was child, it must have to recommend all movies of her (só para baixinhos!)

2. Specialized Data

Saturday, September 14, 2013

Page 33: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Collaborative Filtering

Gone with the wind

Thor

Similar

Armagedon ToyStore

Marcel

like recommend

Items

Rafael Amanda Users

Saturday, September 14, 2013

Page 34: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Problems with Collaborative Filtering

1. Scalability

2. Sparse Data

3. Cold Start

4. Popularity

- Amazon with 5M users, 50K items, 1.4B ratings

- New users and items with no records

- I only rated one book at Amazon!

- The person who reads ‘Harry Potter’ also reads ‘Kama Sutra’

5. Hacking

- Everyone reads Harry Potter!

Saturday, September 14, 2013

Page 35: Recommender Systems with Ruby (adding machine learning, statistics, etc)

How does it show ?Highlights More about this artist...

Listen to the similar songs

Someone similar to you also liked this...

Since you listened this, you may like this one...

Those items come together...

The most popular of your group...

New Releases

Saturday, September 14, 2013

Page 36: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Recommendable

Quickly add a recommender engine for Likes and Dislikes to your Ruby app

http://davidcel.is/recommendable/

Saturday, September 14, 2013

Page 37: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Recommendable

Saturday, September 14, 2013

Page 38: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Recommendable

   gem  'recommendable'

Add to your GemFile:

Saturday, September 14, 2013

Page 39: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Recommendablerequire 'redis'

Recommendable.configure do |config| # Recommendable's connection to Redis config.redis = Redis.new(:host => 'localhost', :port => 6379, :db => 0)

# A prefix for all keys Recommendable uses config.redis_namespace = :recommendable

# Whether or not to automatically enqueue users to have their recommendations # refreshed after they like/dislike an item config.auto_enqueue = true

# The name of the queue that background jobs will be placed in config.queue_name = :recommendable

# The number of nearest neighbors (k-NN) to check when updating # recommendations for a user. Set to `nil` if you want to check all # other users as opposed to a subset of the nearest ones. config.nearest_neighbors = nilend

Create a configuration initializer:

Saturday, September 14, 2013

Page 40: Recommender Systems with Ruby (adding machine learning, statistics, etc)

RecommendableIn your ONE model that will be receiving the

recommendations:

class User recommends :movies, :books, :minerals, :other_things

# ...end

Saturday, September 14, 2013

Page 41: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Recommendable

>> current_user.liked_movies.limit(10)>> current_user.bookmarked_books.where(:author => "Cormac McCarthy")>> current_user.disliked_movies.joins(:cast_members).where('cast_members.name = Kim Kardashian')

You can chain your queries

Saturday, September 14, 2013

Page 42: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Recommendable

>> current_user.hidden_minerals.order('density DESC')>> current_user.recommended_movies.where('year < 2010')>> book.liked_by.order('age DESC').limit(20)>> movie.disliked_by.where('age > 18')

You can chain your queries

Saturday, September 14, 2013

Page 43: Recommender Systems with Ruby (adding machine learning, statistics, etc)

RecommendableYou can also like your recommendable objects

>> user.like(movie)=> true>> user.likes?(movie)=> true>> user.rated?(movie)=> true # also true if user.dislikes?(movie)>> user.liked_movies=> [#<Movie id: 23, name: "2001: A Space Odyssey">]>> user.liked_movie_ids=> ["23"]>> user.like(book)=> true>> user.likes=> [#<Movie id: 23, name: "2001: A Space Odyssey">, #<Book id: 42, title: "100 Years of Solitude">]>> user.likes_count=> 2>> user.liked_movies_count=> 1>> user.likes_in_common_with(friend)=> [#<Movie id: 23, name: "2001: A Space Odyssey">, #<Book id: 42, title: "100 Years of Solitude">]>> user.liked_movies_in_common_with(friend)=> [#<Movie id: 23, name: "2001: A Space Odyssey">]>> movie.liked_by_count=> 2>> movie.liked_by=> [#<User username: 'davidbowman'>, #<User username: 'frankpoole'>]

Saturday, September 14, 2013

Page 44: Recommender Systems with Ruby (adding machine learning, statistics, etc)

RecommendableObviously, You can also DISLIKE your recommendable

objects>> user.dislike(movie)>> user.dislikes?(movie)>> user.disliked_movies>> user.disliked_movie_ids>> user.dislikes>> user.dislikes_count>> user.disliked_movies_count>> user.dislikes_in_common_with(friend)>> user.disliked_movies_in_common_with(friend)>> movie.disliked_by_count>> movie.disliked_by

Saturday, September 14, 2013

Page 45: Recommender Systems with Ruby (adding machine learning, statistics, etc)

RecommendableRecommendations

>> friend.like(Movie.where(:name => "2001: A Space Odyssey").first)>> friend.like(Book.where(:title => "A Clockwork Orange").first)>> friend.like(Book.where(:title => "Brave New World").first)>> friend.like(Book.where(:title => "One Flew Over the Cuckoo's Next").first)>> user.like(Book.where(:title => "A Clockwork Orange").first)=> [#<User username: "frankpoole">, #<User username: "davidbowman">, ...]>> user.recommended_books # Defaults to 10 recommendations=> [#<Book title: "Brave New World">, #<Book title: "One Flew Over the Cuckoo's Nest">]>> user.similar_raters # Defaults to 10 similar users=> [#<>> user.recommended_movies(10, 30) # 10 Recommendations, offset by 30 (i.e. page 4)=> [#<Movie name: "A Clockwork Orange">, #<Movie name: "Chinatown">, ...]>> user.similar_raters(25, 50) # 25 similar users, offset by 50 (i.e. page 3)=> [#<User username: "frankpoole">, #<User username: "davidbowman">, ...]

Saturday, September 14, 2013

Page 46: Recommender Systems with Ruby (adding machine learning, statistics, etc)

RecommendableJaccard Similarity

Marcel likes A, B, C and dislikes DAmanda likes A, B and dislikes CGuilherme likes C, D and dislikes AFlavio likes B, C, E and dislikes D

J(Marcel, Amanda) =([A,B].size + [].size - [C].size - [].size) / [A,B,C,D].size

J(Marcel, Amanda) =2 + 0 - 1 - 0 / 4 = 1/4 = 0.25

Saturday, September 14, 2013

Page 47: Recommender Systems with Ruby (adding machine learning, statistics, etc)

RecommendableJaccard Similarity

Marcel likes A, B, C and dislikes DAmanda likes A, B and dislikes CGuilherme likes C, D and dislikes AFlavio likes B, C, E and dislikes D

J(Marcel, Guilherme) =([C].size + [].size - [A].size - [D].size) / [A,B,C,D].size

J(Marcel, Guilherme) =1 + 0 - 1 - 1 / 4 = 1/4 = - 0.25

Saturday, September 14, 2013

Page 48: Recommender Systems with Ruby (adding machine learning, statistics, etc)

RecommendableJaccard Similarity

Marcel likes A, B, C and dislikes DAmanda likes A, B and dislikes CGuilherme likes C, D and dislikes AFlavio likes B, C, E and dislikes D

J(Marcel, Flavio) =([B,C].size + [D].size - [].size - [].size) / [A,B,C,D, E].size

J(Marcel, Flavio) =2 + 0 - 0 - 0 = 2/5 = 0.4

Saturday, September 14, 2013

Page 49: Recommender Systems with Ruby (adding machine learning, statistics, etc)

RecommendableJaccard Similarity

MostSimilar(Marcel) = [ (Flavio, 0.4) , (Amanda, 0.25) , (Guilherme, -0.25)]

Marcel likes A, B, C and dislikes DAmanda likes A, B and dislikes CGuilherme likes C, D and dislikes AFlavio likes B, C, E and dislikes D

Saturday, September 14, 2013

Page 50: Recommender Systems with Ruby (adding machine learning, statistics, etc)

RecommendableRecommendations

>> Movie.top=> #<Movie name: "2001: A Space Odyssey">>> Movie.top(3)=> [#<Movie name: "2001: A Space Odyssey">, #<Movie name: "A Clockwork Orange">, #<Movie name: "The Shining">]

The best of your recommendable models

Wilson score confidence - Reddit Algorithm

Saturday, September 14, 2013

Page 51: Recommender Systems with Ruby (adding machine learning, statistics, etc)

RecommendableCallbacks

class User < ActiveRecord::Base has_one :feed

recommends :movies after_like :update_feed

def update_feed(obj) feed.update "liked #{obj.name}" endend

apotonick/hooks to implement callbacks for liking, disliking, etc

Saturday, September 14, 2013

Page 52: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Recommendable

Recommendable::Helpers::Calculations.update_similarities_for(user.id)Recommendable::Helpers::Calculations.update_recommendations_for(user.id)

Manual recommendations

Saturday, September 14, 2013

Page 53: Recommender Systems with Ruby (adding machine learning, statistics, etc)

redis makes the magic!

Manual recommendations

Saturday, September 14, 2013

Page 54: Recommender Systems with Ruby (adding machine learning, statistics, etc)

redis makes the magic!

Manual recommendations

Saturday, September 14, 2013

Page 55: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Recommendable

module  Recommendable    module  Workers        class  Resque            include  ::Resque::Plugins::UniqueJob  if  defined?(::Resque::Plugins::UniqueJob)            @queue  =  :recommendable

           def  self.perform(user_id)                Recommendable::Helpers::Calculations.update_similarities_for(user_id)                Recommendable::Helpers::Calculations.update_recommendations_for(user_id)            end        end    endend

Recommendations over Queueing SystemPut the workers to do the job! (SideKiq, Resque, DelayedJob)

Saturday, September 14, 2013

Page 56: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Recommended Books

SatnamAlag, Collective Intelligence in Action, Manning Publications, 2009

Toby Segaran, Programming Collective Intelligence, O'Reilly, 2007

Saturday, September 14, 2013

Page 57: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Recommended Books

Exploring everyday thingswith R and Ruby, Sau Chang,O’Reilly, 2012

Saturday, September 14, 2013

Page 58: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Recommended Course

https://www.coursera.org/course/recsys

Saturday, September 14, 2013

Page 59: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Ruby developers, It does exist

Web

Saturday, September 14, 2013

Page 60: Recommender Systems with Ruby (adding machine learning, statistics, etc)

Ruby in the world of recommendations

(also machine learning, statistics and visualizations..)

Marcel Caraciolo@marcelcaracioloDeveloper, Cientist, contributor to the Crab recsys project,works with Python for 6 years, interested at mobile,education, machine learning and dataaaaa!Recife, Brazil - http://aimotion.blogspot.com

Saturday, September 14, 2013