What is jubatus (short)
-
Upload
kumazaki-hiroki -
Category
Data & Analytics
-
view
181 -
download
0
Transcript of What is jubatus (short)
![Page 1: What is jubatus (short)](https://reader036.fdocuments.us/reader036/viewer/2022062406/5587a0abd8b42a2a368b45bd/html5/thumbnails/1.jpg)
What is Jubatus?How it works for you?
NTT SICHiroki Kumazaki
![Page 2: What is jubatus (short)](https://reader036.fdocuments.us/reader036/viewer/2022062406/5587a0abd8b42a2a368b45bd/html5/thumbnails/2.jpg)
Jubatus is…• A Distributed Online Machine-Learning framework– An OSS developped in Japan
• GPL2.0
• Distributed– Fault-Tolerance– Scale out
• Online– Fixed time computation
• Machine-Learning– More than “word count”!
![Page 3: What is jubatus (short)](https://reader036.fdocuments.us/reader036/viewer/2022062406/5587a0abd8b42a2a368b45bd/html5/thumbnails/3.jpg)
Architecture• ML model is combined with feature-extractor
MachineLearningModel
FeatureExtractor
Jubatus Server
Jubatus RPC
![Page 4: What is jubatus (short)](https://reader036.fdocuments.us/reader036/viewer/2022062406/5587a0abd8b42a2a368b45bd/html5/thumbnails/4.jpg)
Architecture
• Multilanguage client library– gem, pip, cpan, maven Ready!– It essentially uses a messagepack-rpc.
• So you can use OCaml, Haskell, JavaScript, Go with your own risk.
Client
Jubatus RPC
![Page 5: What is jubatus (short)](https://reader036.fdocuments.us/reader036/viewer/2022062406/5587a0abd8b42a2a368b45bd/html5/thumbnails/5.jpg)
Architecture• Many ML algorithms– Classifier– Recommender– Anomaly Detection– Clustering– Regression– Graph Mining
Useful!
![Page 6: What is jubatus (short)](https://reader036.fdocuments.us/reader036/viewer/2022062406/5587a0abd8b42a2a368b45bd/html5/thumbnails/6.jpg)
Classifier• Task: Classification of Datum
import sys
def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)
if __name__ == “__main__”: print(fib(int(sys.argv[1])))
def fib(a) if a == 1 or a == 0 1 else return fib(a-1) + fib(a-2) endendif __FILE__ == $0 puts fib(ARGV[0].to_i)end
Sample Task: Classify what programming language used
It’s It’s
![Page 7: What is jubatus (short)](https://reader036.fdocuments.us/reader036/viewer/2022062406/5587a0abd8b42a2a368b45bd/html5/thumbnails/7.jpg)
Classifier• Set configuration in the Jubatus server
ClassifierFreatureExtractor
"converter": { "string_types": { "bigram": { "method": "ngram", "char_num": "2" } }, "string_rules": [ { "key": "*", "type": "bigram", "sample_weight": "tf", "global_weight": "idf“ } ]}
Feature Extractor
![Page 8: What is jubatus (short)](https://reader036.fdocuments.us/reader036/viewer/2022062406/5587a0abd8b42a2a368b45bd/html5/thumbnails/8.jpg)
Classifier• Configuration JSON– It does “feature vector design”– very important step for machine learning
"converter": { "string_types": { "bigram": { "method": "ngram", "char_num": "2" } }, "string_rules": [ { "key": "*", "type": "bigram", "sample_weight": "tf", "global_weight": "idf“ } ]}
setteings for extract feature from string
define function named “bigram”
original embedded function “ngram”
pass “2” to “ngram” to create “bigram”
for all dataapply “bigram”
feature weights based on tf/idfsee wikipedia/tf-idf
![Page 9: What is jubatus (short)](https://reader036.fdocuments.us/reader036/viewer/2022062406/5587a0abd8b42a2a368b45bd/html5/thumbnails/9.jpg)
Classifier• Feature Extractor becomes “bigram extractor”
Classifierbigramextractor
![Page 10: What is jubatus (short)](https://reader036.fdocuments.us/reader036/viewer/2022062406/5587a0abd8b42a2a368b45bd/html5/thumbnails/10.jpg)
Feature Extractor• What bigram extractor does?
bigramextractor
import sys
def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)
if __name__ == “__main__”: print(fib(int(sys.argv[1])))
key value
im 1
mp 1
po 1
... ...
): 1
... ...
de 1
ef 1
... ...
Feature Vector
![Page 11: What is jubatus (short)](https://reader036.fdocuments.us/reader036/viewer/2022062406/5587a0abd8b42a2a368b45bd/html5/thumbnails/11.jpg)
Classifier• Training model with feature vectors
key valueim 1mp 1po 1... ...): 1... ...de 1ef 1... ...
Classifier
key valuepu 1ut 1... ...{| ...|m 1m| 1{| 1en 1nd 1
key value@a 1$_ 1... ...my ...su 1ub 1us 1se 1... ...
![Page 12: What is jubatus (short)](https://reader036.fdocuments.us/reader036/viewer/2022062406/5587a0abd8b42a2a368b45bd/html5/thumbnails/12.jpg)
Classifier• Set configuration in the Jubatus server
Classifier
"method" : "AROW","parameter" : { "regularization_weight" : 1.0}
Feature Extractor
bigramextractor Classifier Algorithms
• Perceptron• Passive Aggressive• Confidence Weight• Adaptive Regularization of Weights• Normal Her d
![Page 13: What is jubatus (short)](https://reader036.fdocuments.us/reader036/viewer/2022062406/5587a0abd8b42a2a368b45bd/html5/thumbnails/13.jpg)
Classifier• Use model to classification task– Jubatus will find clue for classification
AROW
key valuesi 1il 1... ...{| 1... ...
It’s
![Page 14: What is jubatus (short)](https://reader036.fdocuments.us/reader036/viewer/2022062406/5587a0abd8b42a2a368b45bd/html5/thumbnails/14.jpg)
Classifier• Use model to classification task– Jubatus will find clue for classification
AROW
key valuere 1): 1
... ...s[ 1... ...
It’s
![Page 15: What is jubatus (short)](https://reader036.fdocuments.us/reader036/viewer/2022062406/5587a0abd8b42a2a368b45bd/html5/thumbnails/15.jpg)
Via RPC• invoke feature extraction and classification from
client via RPC
AROWbigramextractor
lang = client.classify([sourcecode])
import sys
def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)
if __name__ == “__main__”: print(fib(int(sys.argv[1])))
key value
im 1
mp 1
po 1
... ...
): 1
... ...
de 1
ef 1
... ...
It may be
![Page 16: What is jubatus (short)](https://reader036.fdocuments.us/reader036/viewer/2022062406/5587a0abd8b42a2a368b45bd/html5/thumbnails/16.jpg)
What classifier can do?• You can – estimate the topic of tweets– trash spam mail automatically– monitor server failure from syslog– estimate sentiment of user from blog post– detect malicious attack– find what feature is the best clue to classification
![Page 17: What is jubatus (short)](https://reader036.fdocuments.us/reader036/viewer/2022062406/5587a0abd8b42a2a368b45bd/html5/thumbnails/17.jpg)
How to use?• see examples in
http://github.com/jubatus/jubatus-example – gender– shogun– malware classification– language detection