Lecture 1: What is Machine Learning?

Machine Learning for Language Technology 2015http://stp.lingfil.uu.se/~santinim/ml/2015/ml4lt_2015.htm

What is Machine Learning?

Marina [email protected]

Department of Linguistics and PhilologyUppsala University, Uppsala, Sweden

Autumn 2015

http://stp.lingfil.uu.se/~santinim/ml/2015/ml4lt_2015.htm

mailto:[email protected]

Acknowledgements

• Thanks to Hal Daume’ III, Andrew Y. Ng, Arthur Samuel, Tom Mitchell, Ethem Alpaydin, Michael Jordan, Michael Collins, Joakim Nivre, Pedro Domingo, wikipedia, the web.

Lecture 1: What is Machine Learning? 2

What is Machine Learning?http://www.umiacs.umd.edu/~hal/ciml/

• Machine learning is the study of computer systems that learn from data and experience.

• It is applied in an incredibly wide variety of application areas, from medicine to advertising, from military to pedestrian.

• Any area in which you need to make sense of data is a potential customer of machine learning.

[Hal Daume’ III]


http://www.umiacs.umd.edu/~hal/ciml/

Example: Rule-based systems

• Parts-Of-Speech Tagger (Brill’s tagger)

• tag1--> tag2 IF Condition

the Condition tests the preceding and/or following word tokens, or their tags (the notation for such rules differs between implementations). For example, in Brill's notation:

• IN NN WDPREVTAG DT while

change the tag of a word from IN (preposition) to NN (common noun), if the preceding word's tag is DT (determiner) and the word itself is "while". This covers cases like "all the while" or "in a while", where "while" should be tagged as a noun rather than its more common use as a preposition (many rules are more general).


Example: Machine Learning-Based Systems (From: The Apache OpenNLP library, a machine learning based toolkit for the processing of natural language text: https://opennlp.apache.org/ )

• We do not have rules , but a training corpus/dataset: The POS Tagger can be trained on annotated training material. The training material is a collection of tokenized sentences where each token has the assigned part-of-speech tag. The training material may look like this:

About_IN10_CD Euro_NNP,_, I_PRP reckon_VBPThat_DTsounds_VBZgood_JJ

• With this annotated material we train a (mathematical/statistical) model.• Then we evaluate the results: how well does this model perform? The accuracy can

be measured on a test dataset or via cross validation.


https://opennlp.apache.org/

Generally speaking: Deduction vs Induction

• Deductive reasoning works from the more general to the more specific. Sometimes this is informally called a "top-down" approach. We might begin with thinking up a theory about our topic of interest. We then narrow that down into more specific hypotheses that we can test. This ultimately leads us to be able to test the hypotheses with specific data -- a confirmation (or not) of our original theories.

• Inductive reasoning works the other way, moving from specific observations to broader generalizations and theories. Informally, we sometimes call this a "bottom up" approach. In inductive reasoning, we begin with specific observations (the data) and measures, begin to detect patterns and regularities, formulate some tentative hypotheses that we can explore, and finally end up developing a general model.


Machine Learning is based on ...

• Induction

• Generalization from data


In summary (by Hal Daume’ III)https://piazza.com/umd/fall2015/cmsc422/home

• Machine learning is all about finding patterns in data. The whole idea is to replace the "human writing code" with a "human supplying data" and then let the system figure out what it is that the person wants to do by looking at the examples.

• The most central concept in machine learning is generalization: how to generalize beyond the examples that have been provided at "training time" to new examples that you see at "test time.“

• A very large fraction of what we'll talk about has to do with figuring out what generalization means.


https://piazza.com/umd/fall2015/cmsc422/home

Why learning?

• We do not know the exact method: speech recognition, spam filters, robotics, etc.

• The exact method is too expensive: statistical physics, etc.

• Task evolves over time...

• There is no need to use machine learning for computing a payroll: for this task we just need an algorithm!


Why is ML so fashionable?

• Broad applicability:

– Finance, robotic, medicin, NLP, IT, MT etc.

• Close connection between theory and practice

• Open field, lots of room for new work.M


Interdisciplinary Field

Machine learning is: • a subfield of computer science that evolved from the study

of pattern recognition and computational learning theory in artificial intelligence.

• Machine learning explores the study and construction of algorithms that can LEARN from and make predictions on data.

• Such algorithms operate by building a model from example inputs in order to make data-driven predictions or decisions, rather than following strictly static program instructions.

[wikipidia]


Machine Learning & Statistics

• Machine learning and statistics are closely related fields.

• According to Michael Jordan, the ideas of machine learning, from methodological principles to theoretical tools, have had a long pre-history in statistics. He also suggested the term data science as a placeholder to call the overall field.


Machine Learning and Data Mining

• Machine Learning relates with the study, design and development of models and algorithms that give computers the capability to learn from data.

• Data Mining can be defined as the process that starting from apparently unstructured data tries to extract knowledge and/or unknown interesting patterns. During this process machine Learning algorithms are used.


Many Types of Learning

• Supervised learning

– Supervised classification is the only focus of this course

• Unsupervised learning

• Semi- and weakly supervised

• Reinforcement learning

• etc.


In other words

• Supervised learning: learning with a teacher (with class labels)

• Unsupervised learning: learning without a teacher ( without class labels)


Learning problems

• Regression

• Binary classification

• Multiclass classification

• Ranking

• etc.


Informal and Formal Definitions

• There are plenty of definitions...

• Informal: The field of study that gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959)

• Formal: A computer program is said to learn from experience E, with respect to some task T, and some performance measure P, if its performance on T as measured by P improves with experience E (Tom Mitchell, 1998).


Example: Spam Filter (by Andrew Y. Ng)

1. Classifying emails as spam or not spam

2. Watching you label emails as spam or not spam.

3. The number (or fraction) of emails correctly classified as spam/not spam.

4. None of the above—this is not a machine learning problem.


Spam Filter (by Andrew Y. Ng)

• Answer:


Classification: Questions

• How would you write a program to distinguish a picture of me from a picture of someone else?

• How would you write a program to determine whether a sentence is grammatical or not?

• How would you write a program to distinguish cancerous cells from normal cells?


Classification: Answers

• How would you write a program to distinguish a picture of me from a picture of someone else? ⇒ Provide examples pictures of me and pictures of other people and let a classifier learn to distinguish the two.

• How would you write a program to determine whether a sentence is grammatical or not?⇒ Provide examples of grammatical and ungrammatical sentences and let a classifier learn to distinguish the two.

• How would you write a program to distinguish cancerous cells from normal cells? ⇒ Provide examples of cancerous and normal cells and let a classifier learn to distinguish the two.


Elements of Machine Learning

• Generalization: how well a model performs on new data.

• Data: – Training data: specific examples to learn from.– Test data: new specific examples to assess performance.

• Models (theoretical assumptions)– decision trees, naive bayes, perceptron, etc.

• Algorithms: – Learning altorighms that infer the model parameters from the data.– Inference algorithms that infer prediction from a model.


Generalization

• Predicting the future based on the past

• Our system needs to generalize beyond the training data to some future data that it might not have seen yet.


Generalization: Overfitting & Underfitting

• Overfitting occurs when the model fits the training data too well and does not generalize so it performs badly on the test data. Overfitting is often a result of an excessively complicated model

• Underfitting occurs when the model does not fit the data well enough. Underfitting is often a result of an excessively simple model.

• Both overfitting and underfitting lead to poor predictions on new data sets.

• A learning model that overfits or underfits does not generalize well.


Example: Letter vs non-letter classification

Training set

Test set


Data:

The iris dataset

Three components: 1. Class label (aka “label”, denoted y) 2. Features (aka “attributes”) 3. Feature values (aka “attribute values”, denoted x) ⇒ Features can be binary, nominal or continuous

• A labeled dataset is a collection of (x,y) pairsLecture 1: What is Machine Learning? 26

Task• Predict the class for this ”test” example:Sepal length – Sepal width – Petal length – Petal width - Type

5.2 3.7 1.7 0.3 ???


Require us to

generalize from

the training data

Noise

• Unexplained or random variation in the data

• Anomaly


ModelsModels are theoretical assumptions

– decision trees, naive bayes, perceptron, etc.

Given:– Domain X: descriptions– Domain Y: predictions– H: hypothesis space; the set of all possible hypotheses– h: target hypothesis

– Idea: to extrapolate observed y's over all X.– Hope: to predict well on future y's given x's.– Require: there must be regularities to be found.


Inductive bias

• In ML, the inductive bias is the set of theoretical assumptions that must be added to the observed data to transform the algorithm's outputs into logical deductions.


Examples of inductive biases

• Decision trees (ID3): Shorter trees are preferred over larger trees. Trees that place high information gain attributes close to the root are preferred over those that do not.

• Naive Bayes classifier: maximize conditional independence.

• Logistic regression: there exists a simple boundary that splits one class from the other, and the further you get from that boundary the more confident you can be.

• Perceptron: data must be linearly separable.


Inductive Bias: definition

• “The inductive bias of a learning algorithm is the set of assumptions that the learner uses to predict outputs given inputs that it has not encountered”.

– Tom Mitchell, 1980


Learning & Inference Algorithms

• Traditionally, the goal of learning has been to find a model for which prediction (i.e., inference) accuracy is as high as possible.

• More recently: find models for which prediction (i.e., inference) is as efficient as possible.

• In practical terms: recent interest in more unconventional approaches to learning that combine generalization accuracy with other desiderata such as faster inference.

Lecture 3: Basic Concepts of ML 33

The End


Lecture 1: What is Machine Learning?

Education

Transcript of Lecture 1: What is Machine Learning?