Machine Learning

25
Machine Learning Márk Horváth Morgan Stanley FID Institutional Securities

description

Machine Learning. Márk Horváth Morgan Stanley FID Institutional Securities. Content. AI Paradigm Data Mining Weka Application Areas Introduce many fields and the whole paradigm No time for details. AI Paradigm. - PowerPoint PPT Presentation

Transcript of Machine Learning

Page 1: Machine Learning

Machine Learning

Márk HorváthMorgan Stanley

FID Institutional Securities

Page 2: Machine Learning

Content

• AI Paradigm• Data Mining• Weka• Application Areas

• Introduce many fields and the whole paradigm– No time for details

Page 3: Machine Learning

AI Paradigm

• “The area of computer science which deals with problems, that we where not able to cope with before.”– Computer science is a branch of mathematics, btw.

• “Algorithms solving problems mainly through interaction with the problem. The programmer does not have to understand the solution to the problem itself, but only the details of the learning algorithm.”

Page 4: Machine Learning

AI Paradigm• Why AI?

– new, fast expanding science, applicable at most of other sciences

• it also deals with explaining evidence– interdisciplinar

• math• computer science• applied math• philosophy of science• biology (many naturally inspired algorithms, thinking machine)

• Why Machine Learning / Data Mining?– it can be applied on any data (financial, medical, demographical,

…)

Page 5: Machine Learning

AI Paradigm• 1965 John McCarthy => 42 years• Hilbert, theorem proving machine• Occam (XIV.)• Many distinct fields• Many algorithms at each field

• => 1 hour is nothing….

• Empirical and theoretical science• Intuition needed to use and hybridize• Few proves• Area too big to grasp everything in detail, but concepts are

important– => BIG PICTURE, no formulas!

Page 6: Machine Learning

AI Taxonomy

AI

Logic /Expert Sys

Machine Learning /Data Mining /

Function Approximation

Optimization Control Clustering

0R, 1R(max likelihood)

Naiive BayesDecision Tree /Covering

Linear Regression / Gradient Methods

Kernel Based / Nearest Neighbor

Model / PCA, ICA

AGI

Page 7: Machine Learning

Data Mining vs. Statistics

• Statistics– ~ hypothesis testing

• DM– search through hypothesizes

• Empirical side– Many methods work which are proven to not

converge– Some methods do not work while they should (due to

computation power problems, slow convergence)

Page 8: Machine Learning

Relation, Attribute, Class

@relation 'cpu‘@attribute MYCT real @attribute MMIN real@attribute MMAX real@attribute CACH real@attribute CHMIN real@attribute CHMAX real@attribute class real % performance@data125,256,6000,256,16,128,19929,8000,32000,32,8,32,25329,8000,16000,32,8,16,13226,8000,32000,64,8,32,29023,16000,32000,64,16,32,381…

(Ω, A, P)X = MYCT x MMIN x MMAX x CACH x CHMIN x CHMAX (Attribute, Feature)Y = class (Class, Target)

Ω = X x Y

ρ( Y | X ) = ?

Page 9: Machine Learning

General View of Data Mining

• Language• Build model / search over the Language

Page 10: Machine Learning

Simple Cases

• 0R• 1R (nominal class)

• Max likelihood

• Linear Regression

Page 11: Machine Learning

Data Mining Taxonomy

• Regression vs. Classification (exchangeable)

• Deterministic vs. Stochastic (~exchangeable: Chebyshev)

• Batch driven vs. Updateable (~exchangeable, but with cost)

• Symbolic vs. Subsymbolic

Page 12: Machine Learning

Methodology

• Clean data• Try many methods• Optimize good methods• Hybridize good methods, make meta

algorithms

Page 13: Machine Learning

Evaluation Measures

• Mean Absolute Error / Root Mean Squared Error

• Correlation Coefficient• Information gain• Custom (e.g. weighted)• Significance analysis (Bernoulli process)

Page 14: Machine Learning

Overfitting, Learning Noise• Philosophical question

– When do we accept or deny a model?– No chance to prove, only to reject

• Train / (Validation) / Test

• Cross-validation, leave one out

• Minimum Description Length principle– Occam– Kolmogorov complexity

Page 15: Machine Learning

Nearest Neighbor / Kernel

• Instance based• Statistical (k neighbors)• Distance: Euclidian, Manhattan / Evolved• Missing Attribute: maximal distance

• KD-tree (log(n)), ball tree, metric tree

Page 16: Machine Learning

Decision Trees / Covering

• Divide and Conquer• Split by the best feature

• User Classifier / REP Tree

Page 17: Machine Learning

Naiive Bayes

• Independent Attributes

• P(X | Y) = P(Y | X) * P(X) / P(Y) = = Π P(Y | Xi) * P(X) / P(Y)

• Discrete Class

Page 18: Machine Learning

Artificial Neural Networks

• Structure (Weka)– Theoretical limitations (Minsky, AI winter)

• Recurrent networks for time series

Page 19: Machine Learning

Feedforward Learning Rules• Learning rules

– Perceptron / Winnow (very simple rules for special cases)– Various gradient descent methods

• Slower than perceptron• Faster than doing derivation of the whole expression• Local search

– Evolution• Global search• Bit slower, but easy to hybridize with local search• Can evolve:

– Weights– Structure– Transfer functions– Recurrent networks

Page 20: Machine Learning

Perceptron / Winnow

• Perceptron– Add the misclassified instance to the weight– Converges if the space is separable

• Winnow– Binary– Increase or decrease non zero attribute

weights

Page 21: Machine Learning

Feature extraction

• Discretization• PCA/ICA• Various state space transitions• Evolving features• Clustering

Page 22: Machine Learning

Meta / Hybrid Methods• LEGO ;)• Vote (many ways)• Use meta algorithm to predict based on base methods• Embed

– Apply regression in the leaves of decision trees– Embed decision tree, or training samples in ANN

• Unify– Choose a general purpose language– Use conventional training methods to build models– Hybridize training methods, evolve

• Easy to write articles, countless new ideas

Page 23: Machine Learning

Practical Uses

• New paradigm• Countless applications• At all natural sciences

– finance, psychology, sociology, biology, medicine, chemistry, …

– actually discovering and explaining evidence is science itself

• Business– predictive enterprise

Page 24: Machine Learning

Applications in AI

• Optimal Control (model building)• Using in other AI methods

– Speech recognition– OCR– Speech synthesis– Vision, recognition– AGI (logic, DM, evolution, clustering,

reinforcement learning, …)

Page 25: Machine Learning

TDK, Article

• Any topic you’ve found interesting…