What is jubatus (short)

Post on 22-Jun-2015

181 views 0 download

Tags:

Transcript of What is jubatus (short)

What is Jubatus?How it works for you?

NTT SICHiroki Kumazaki

Jubatus is…• A Distributed Online Machine-Learning framework– An OSS developped in Japan

• GPL2.0

• Distributed– Fault-Tolerance– Scale out

• Online– Fixed time computation

• Machine-Learning– More than “word count”!

Architecture• ML model is combined with feature-extractor

MachineLearningModel

FeatureExtractor

Jubatus Server

Jubatus RPC

Architecture

• Multilanguage client library– gem, pip, cpan, maven Ready!– It essentially uses a messagepack-rpc.

• So you can use OCaml, Haskell, JavaScript, Go with your own risk.

Client

Jubatus RPC

Architecture• Many ML algorithms– Classifier– Recommender– Anomaly Detection– Clustering– Regression– Graph Mining

Useful!

Classifier• Task: Classification of Datum

import sys

def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)

if __name__ == “__main__”: print(fib(int(sys.argv[1])))

def fib(a) if a == 1 or a == 0 1 else return fib(a-1) + fib(a-2) endendif __FILE__ == $0 puts fib(ARGV[0].to_i)end

Sample Task: Classify what programming language used

It’s It’s

Classifier• Set configuration in the Jubatus server

ClassifierFreatureExtractor

"converter": { "string_types": { "bigram": { "method": "ngram", "char_num": "2" } }, "string_rules": [ { "key": "*", "type": "bigram", "sample_weight": "tf", "global_weight": "idf“ } ]}

Feature Extractor

Classifier• Configuration JSON– It does “feature vector design”– very important step for machine learning

"converter": { "string_types": { "bigram": { "method": "ngram", "char_num": "2" } }, "string_rules": [ { "key": "*", "type": "bigram", "sample_weight": "tf", "global_weight": "idf“ } ]}

setteings for extract feature from string

define function named “bigram”

original embedded function “ngram”

pass “2” to “ngram” to create “bigram”

for all dataapply “bigram”

feature weights based on tf/idfsee wikipedia/tf-idf

Classifier• Feature Extractor becomes “bigram extractor”

Classifierbigramextractor

Feature Extractor• What bigram extractor does?

bigramextractor

import sys

def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)

if __name__ == “__main__”: print(fib(int(sys.argv[1])))

key value

im 1

mp 1

po 1

... ...

): 1

... ...

de 1

ef 1

... ...

Feature Vector

Classifier• Training model with feature vectors

key valueim 1mp 1po 1... ...): 1... ...de 1ef 1... ...

Classifier

key valuepu 1ut 1... ...{| ...|m 1m| 1{| 1en 1nd 1

key value@a 1$_ 1... ...my ...su 1ub 1us 1se 1... ...

Classifier• Set configuration in the Jubatus server

Classifier

"method" : "AROW","parameter" : { "regularization_weight" : 1.0}

Feature Extractor

bigramextractor Classifier Algorithms

• Perceptron• Passive Aggressive• Confidence Weight• Adaptive Regularization of Weights• Normal Her d

Classifier• Use model to classification task– Jubatus will find clue for classification

AROW

key valuesi 1il 1... ...{| 1... ...

It’s

Classifier• Use model to classification task– Jubatus will find clue for classification

AROW

key valuere 1): 1

... ...s[ 1... ...

It’s

Via RPC• invoke feature extraction and classification from

client via RPC

AROWbigramextractor

lang = client.classify([sourcecode])

import sys

def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)

if __name__ == “__main__”: print(fib(int(sys.argv[1])))

key value

im 1

mp 1

po 1

... ...

): 1

... ...

de 1

ef 1

... ...

It may be

What classifier can do?• You can – estimate the topic of tweets– trash spam mail automatically– monitor server failure from syslog– estimate sentiment of user from blog post– detect malicious attack– find what feature is the best clue to classification

How to use?• see examples in

http://github.com/jubatus/jubatus-example – gender– shogun– malware classification– language detection