Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf ·...

130
Machine Learning as Extreme TDD: An Introduction Srikumar Karaikudi Subramanian http://sriku.org [email protected] Director Technology @ Pramati Technologies https://labs.imaginea.com 6 Mar 2019

Transcript of Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf ·...

Page 1: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Machine Learning as Extreme TDD: An Introduction

Srikumar Karaikudi Subramanianhttp://sriku.org

[email protected]

Director Technology @ Pramati Technologies https://labs.imaginea.com

6 Mar 2019

Page 2: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Overview

Page 3: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Overview• Motivation and prelude

Page 4: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Overview• Motivation and prelude

• A soft problem: Classifying gender by name

Page 5: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Overview• Motivation and prelude

• A soft problem: Classifying gender by name

• Testing our gender classifier

Page 6: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Overview• Motivation and prelude

• A soft problem: Classifying gender by name

• Testing our gender classifier

• Writing our gender classifier

Page 7: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Overview• Motivation and prelude

• A soft problem: Classifying gender by name

• Testing our gender classifier

• Writing our gender classifier

• Learners and features

Page 8: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Overview• Motivation and prelude

• A soft problem: Classifying gender by name

• Testing our gender classifier

• Writing our gender classifier

• Learners and features

• The learning cycle

Page 9: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Overview• Motivation and prelude

• A soft problem: Classifying gender by name

• Testing our gender classifier

• Writing our gender classifier

• Learners and features

• The learning cycle

• The road ahead

Page 10: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Overview• Motivation and prelude

• A soft problem: Classifying gender by name

• Testing our gender classifier

• Writing our gender classifier

• Learners and features

• The learning cycle

• The road ahead

• Concluding notes

Page 11: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Overview• Motivation and prelude

• A soft problem: Classifying gender by name

• Testing our gender classifier

• Writing our gender classifier

• Learners and features

• The learning cycle

• The road ahead

• Concluding notes

Page 12: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Motivation

Page 13: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Motivation

• Machine learning is eating software

Page 14: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Motivation

• Machine learning is eating software

• Non-ML folks must be able to participate

Page 15: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Motivation

• Machine learning is eating software

• Non-ML folks must be able to participate

• Critical thinking is valuable and available

Page 16: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Motivation

• Machine learning is eating software

• Non-ML folks must be able to participate

• Critical thinking is valuable and available

• Discovering ML application areas is valuable

Page 17: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Prelude

Page 18: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Prelude

• This talk is not for ML experts

Page 19: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Prelude

• This talk is not for ML experts

• Hoping to address devs and QA engineers

Page 20: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

A personal anecdote… from a long time ago… in GWBASIC days

Page 21: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

A personal anecdote

Enter your name: Kumar

… from a long time ago… in GWBASIC days

Page 22: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

A personal anecdote

Enter your name: Kumar

Hello Mr. Kumar

… from a long time ago… in GWBASIC days

Page 23: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

A personal anecdote

Enter your name: Kumar

Hello Mr. Kumar

Enter your name: Shobana

… from a long time ago… in GWBASIC days

Page 24: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

A personal anecdote

Enter your name: Kumar

Hello Mr. Kumar

Enter your name: Shobana

Hello Ms. Shobana

… from a long time ago… in GWBASIC days

Page 25: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

A personal anecdote

Enter your name: Kumar

Hello Mr. Kumar

Enter your name: Shobana

Hello Ms. Shobana

Enter your name: ▮

… from a long time ago… in GWBASIC days

Page 26: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Guessing one’s gender

Page 27: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Guessing one’s gender

type GenderGuesser = String ! Gender

Page 28: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Guessing one’s gender

data Gender = Man | Woman

type GenderGuesser = String ! Gender

Page 29: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Guessing one’s gender

How would you test the function?

data Gender = Man | Woman

type GenderGuesser = String ! Gender

Page 30: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Guessing one’s gender

(Restricting to Indian names for simplicity)

How would you test the function?

data Gender = Man | Woman

type GenderGuesser = String ! Gender

Page 31: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Some questions to ask

Page 32: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Some questions to ask• Are “man” and “woman" the only two categories

relevant to our situation?

Page 33: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Some questions to ask• Are “man” and “woman" the only two categories

relevant to our situation?

• How do I get a decent list of names with known gender?

Page 34: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Some questions to ask• Are “man” and “woman" the only two categories

relevant to our situation?

• How do I get a decent list of names with known gender?

• What anomalies exist in the set that I need to know about?

Page 35: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Some questions to ask• Are “man” and “woman" the only two categories

relevant to our situation?

• How do I get a decent list of names with known gender?

• What anomalies exist in the set that I need to know about?

• What do we know about the quality of the data?

Page 36: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Some questions to ask• Are “man” and “woman" the only two categories

relevant to our situation?

• How do I get a decent list of names with known gender?

• What anomalies exist in the set that I need to know about?

• What do we know about the quality of the data?

• i.e How do we trust the tests we base it on?

Page 37: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Some questions to ask• Are “man” and “woman" the only two categories

relevant to our situation?

• How do I get a decent list of names with known gender?

• What anomalies exist in the set that I need to know about?

• What do we know about the quality of the data?

• i.e How do we trust the tests we base it on?

• How should we deal with unisex names?

Page 38: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on tests

Page 39: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on testsgender “Ram” == Man

Page 40: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on testsgender “Ram” == Mangender “Sita” == Woman

Page 41: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on testsgender “Ram” == Mangender “Sita” == Womangender “Ashok” == Man

Page 42: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on testsgender “Ram” == Mangender “Sita” == Womangender “Ashok” == Mangender “Yamuna” == Woman

Page 43: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on testsgender “Ram” == Mangender “Sita” == Womangender “Ashok” == Mangender “Yamuna” == Womangender “Valavan” == Man

Page 44: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on testsgender “Ram” == Mangender “Sita” == Womangender “Ashok” == Mangender “Yamuna” == Womangender “Valavan” == Mangender “Aarthi” == Woman

Page 45: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on testsgender “Ram” == Mangender “Sita” == Womangender “Ashok” == Mangender “Yamuna” == Womangender “Valavan” == Mangender “Aarthi” == Womangender “Valli” == Woman

Page 46: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on testsgender “Ram” == Mangender “Sita” == Womangender “Ashok” == Mangender “Yamuna” == Womangender “Valavan” == Mangender “Aarthi” == Womangender “Valli” == Womangender “Amjad” == Man

Page 47: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on testsgender “Ram” == Mangender “Sita” == Womangender “Ashok” == Mangender “Yamuna” == Womangender “Valavan” == Mangender “Aarthi” == Womangender “Valli” == Womangender “Amjad” == Mangender “Azma” == Woman

Page 48: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on testsgender “Ram” == Mangender “Sita” == Womangender “Ashok” == Mangender “Yamuna” == Womangender “Valavan” == Mangender “Aarthi” == Womangender “Valli” == Womangender “Amjad” == Mangender “Azma” == Woman

gender “Chandra” == ??

Page 49: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on testsgender “Ram” == Mangender “Sita” == Womangender “Ashok” == Mangender “Yamuna” == Womangender “Valavan” == Mangender “Aarthi” == Womangender “Valli” == Womangender “Amjad” == Mangender “Azma” == Woman

gender “Chandra” == ??gender “Kiran” == ??

Page 50: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on testsgender “Ram” == Mangender “Sita” == Womangender “Ashok” == Mangender “Yamuna” == Womangender “Valavan” == Mangender “Aarthi” == Womangender “Valli” == Womangender “Amjad” == Mangender “Azma” == Woman

gender “Chandra” == ??gender “Kiran” == ??gender “Rama” == ??

Page 51: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on testsgender “Ram” == Mangender “Sita” == Womangender “Ashok” == Mangender “Yamuna” == Womangender “Valavan” == Mangender “Aarthi” == Womangender “Valli” == Womangender “Amjad” == Mangender “Azma” == Woman

gender “Chandra” == ??gender “Kiran” == ??gender “Rama” == ?? — Is it “Ramaa” or “Raamaa”?

Page 52: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on testsgender “Ram” == Mangender “Sita” == Womangender “Ashok” == Mangender “Yamuna” == Womangender “Valavan” == Mangender “Aarthi” == Womangender “Valli” == Womangender “Amjad” == Mangender “Azma” == Woman

gender “Chandra” == ??gender “Kiran” == ??gender “Rama” == ?? — Is it “Ramaa” or “Raamaa”?

Page 53: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on testsgender “Ram” == Mangender “Sita” == Womangender “Ashok” == Mangender “Yamuna” == Womangender “Valavan” == Mangender “Aarthi” == Womangender “Valli” == Womangender “Amjad” == Mangender “Azma” == Woman

gender “Chandra” == ??gender “Kiran” == ??gender “Rama” == ?? — Is it “Ramaa” or “Raamaa”?

gender “pumpkin” == ??

Page 54: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on testsgender “Ram” == Mangender “Sita” == Womangender “Ashok” == Mangender “Yamuna” == Womangender “Valavan” == Mangender “Aarthi” == Womangender “Valli” == Womangender “Amjad” == Mangender “Azma” == Woman

gender “Chandra” == ??gender “Kiran” == ??gender “Rama” == ?? — Is it “Ramaa” or “Raamaa”?

gender “pumpkin” == ??gender “doormat” == ??

Page 55: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Initial take on testsgender “Ram” == Mangender “Sita” == Womangender “Ashok” == Mangender “Yamuna” == Womangender “Valavan” == Mangender “Aarthi” == Womangender “Valli” == Womangender “Amjad” == Mangender “Azma” == Woman

gender “Chandra” == ??gender “Kiran” == ??gender “Rama” == ?? — Is it “Ramaa” or “Raamaa”?

gender “pumpkin” == ??gender “doormat” == ??gender “பாt$ரm” == ??

Page 56: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

We want a table!

Name Gender

Valavan Man

Azma Woman

Kiran Either

pumpkin ??

பாt$ரm ??

Page 57: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Our first function …

Page 58: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Our first function …

Page 59: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Our first function …

Table lookup! 😎

Page 60: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Our first function …

Table lookup! 😎

BAD!

Page 61: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Our first function …

Table lookup! 😎

BAD!

• Tests will never break

Page 62: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Our first function …

Table lookup! 😎

BAD!

• Tests will never break

• We have to ship the table

Page 63: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Our first function …

Table lookup! 😎

BAD!

• Tests will never break

• We have to ship the table

• Doesn’t work for names not in the table

Page 64: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Our first function …

Table lookup! 😎

BAD!

• Tests will never break

• We have to ship the table

• Doesn’t work for names not in the table

• We can do better

Page 65: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Overfitting

Page 66: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Overfitting

• Means we’re learning by rote. Can only answer textbook questions.

Page 67: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Overfitting

• Means we’re learning by rote. Can only answer textbook questions.

• Understanding implies compression

Page 68: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Overfitting

• Means we’re learning by rote. Can only answer textbook questions.

• Understanding implies compression

• Necessarily lossy - usually heavily lossy

Page 69: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Overfitting

• Means we’re learning by rote. Can only answer textbook questions.

• Understanding implies compression

• Necessarily lossy - usually heavily lossy

• What if we say - “Can’t use more than MAX_MEMORY”?

Page 70: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Towards generalizability

Page 71: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Towards generalizability

Tester Coder

Page 72: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Towards generalizability

Tester Coder

Name Gender

Valavan Man

Azma Woman

Kiran Either

pumpkin Either

பாt$ரm Either

80%

20%

{ // …

}

SecretTest set

Page 73: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Towards generalizability

Tester Coder

Dev set

Training set

Name Gender

Valavan Man

Azma Woman

Kiran Either

pumpkin Either

பாt$ரm Either

80%

20%

{ // …

}

SecretTest set

Page 74: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Pause …

Page 75: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Pause …

• All of this has been just about the data and the objective

Page 76: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Pause …

• All of this has been just about the data and the objective

• Most important and usually expensive part of the process

Page 77: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Modeling the “learner”

Page 78: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Modeling the “learner”

type Learner = Data ! (Learner, Guesser)

Page 79: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Modeling the “learner”

type Guesser = String ! Gendertype Learner = Data ! (Learner, Guesser)

Page 80: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Modeling the “learner”

type Data = [(String, Gender)]

type Guesser = String ! Gendertype Learner = Data ! (Learner, Guesser)

Page 81: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Modeling the “learner”

data Gender = Man | Woman

type Data = [(String, Gender)]

type Guesser = String ! Gendertype Learner = Data ! (Learner, Guesser)

Page 82: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Modeling the “learner”

data Gender = Man | Woman

type Data = [(String, Gender)]

type Guesser = String ! Gendertype Learner = Data ! (Learner, Guesser)

This is a process

Page 83: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Features

Page 84: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Features

• Are about increasing generalizability

Page 85: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Features

• Are about increasing generalizability

• … by reducing the space of data

Page 86: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Features

• Are about increasing generalizability

• … by reducing the space of data

• The smaller the data space, the easier it is to collect enough data, and the easier it is to test.

Page 87: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Features

• Are about increasing generalizability

• … by reducing the space of data

• The smaller the data space, the easier it is to collect enough data, and the easier it is to test.

• Capture existing domain understanding

Page 88: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Features

• Are about increasing generalizability

• … by reducing the space of data

• The smaller the data space, the easier it is to collect enough data, and the easier it is to test.

• Capture existing domain understanding

Ex: “Kumar” → “ar”

Page 89: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Introducing features

Page 90: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Introducing features

type Feature = String

Page 91: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Introducing features

type Feature = Stringtype FeatureData = [(Feature, Gender)]

Page 92: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Introducing features

type Feature = Stringtype FeatureData = [(Feature, Gender)]

type FeatureExtractor = String ! Feature

Page 93: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Introducing features

type Feature = Stringtype FeatureData = [(Feature, Gender)]

type FeatureExtractor = String ! Featuretype GuesserF = Feature ! Gender

Page 94: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Introducing features

type Feature = Stringtype FeatureData = [(Feature, Gender)]

type FeatureExtractor = String ! Featuretype GuesserF = Feature ! Gender

type LearnerF = FeatureData ! (LearnerF, GuesserF)

Page 95: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Using features

Page 96: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Using features

learner :: FeatureExtractor ! LearnerF ! Learner

Page 97: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Using features

learner :: FeatureExtractor ! LearnerF ! Learnerlearner feature learnerF data =

Page 98: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Using features

learner :: FeatureExtractor ! LearnerF ! Learnerlearner feature learnerF data = let trans (name, g) = (feature name, g)

Page 99: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Using features

learner :: FeatureExtractor ! LearnerF ! Learnerlearner feature learnerF data = let trans (name, g) = (feature name, g) (lf2, fpred) = learnerF (map trans data)

Page 100: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Using features

learner :: FeatureExtractor ! LearnerF ! Learnerlearner feature learnerF data = let trans (name, g) = (feature name, g) (lf2, fpred) = learnerF (map trans data) pred name = fpred (feature name)

Page 101: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Using features

learner :: FeatureExtractor ! LearnerF ! Learnerlearner feature learnerF data = let trans (name, g) = (feature name, g) (lf2, fpred) = learnerF (map trans data) pred name = fpred (feature name) in

Page 102: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Using features

learner :: FeatureExtractor ! LearnerF ! Learnerlearner feature learnerF data = let trans (name, g) = (feature name, g) (lf2, fpred) = learnerF (map trans data) pred name = fpred (feature name) in (learner feature lf2, pred)

Page 103: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Code walkthrough

Page 104: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

The learning cycle

Page 105: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

The learning cycle

Try something

Page 106: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

The learning cycle

Try something

Measure outcomeagainst expectation

Page 107: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

The learning cycle

Try something

Measure outcomeagainst expectation

Adjustparameters

Page 108: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

The learning cycle

Try something

Measure outcomeagainst expectation

Adjustparameters

Page 109: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

The learning cycle

Try something

Measure outcomeagainst expectation

Adjustparameters

(until happy)

Page 110: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

The learning cycle

type LearnerP = (Params, ErrorFn, UpdateFn)

Page 111: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

The learning cycle

type LearnerP = (Params, ErrorFn, UpdateFn)

type ErrorFn = Data ! Params ! Error

Calculateshow we’re

doing

Page 112: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

The learning cycle

type LearnerP = (Params, ErrorFn, UpdateFn)

type ErrorFn = Data ! Params ! Error

Calculateshow we’re

doing

type UpdateFn = Error ! Params ! Params

Calculateswhat to try

next

Page 113: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

The road ahead

Predict

Learn

Page 114: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

The road ahead

Predict

Learn

“Differentiable programming”

Page 115: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

A taste of DP …

Page 116: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Some takeaways

Page 117: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Some takeaways

• Its all about the data. Bad data = Bad model

Page 118: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Some takeaways

• Its all about the data. Bad data = Bad model

• Your data, training plan, tests and metrics must come first. Just like TDD, but extreme. Ethics must also be factored in.

Page 119: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Some takeaways

• Its all about the data. Bad data = Bad model

• Your data, training plan, tests and metrics must come first. Just like TDD, but extreme. Ethics must also be factored in.

• Features (esply “embeddings”) are the API between ML models

Page 120: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Some takeaways

• Its all about the data. Bad data = Bad model

• Your data, training plan, tests and metrics must come first. Just like TDD, but extreme. Ethics must also be factored in.

• Features (esply “embeddings”) are the API between ML models

• Function factorization can help understand learning processes

Page 121: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Some takeaways

• Its all about the data. Bad data = Bad model

• Your data, training plan, tests and metrics must come first. Just like TDD, but extreme. Ethics must also be factored in.

• Features (esply “embeddings”) are the API between ML models

• Function factorization can help understand learning processes

• Automatic differentiation is helping ML adoption, so pay attention

Page 122: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

How my cousin did it

Page 123: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

How my cousin did it

gender name =

Page 124: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

How my cousin did it

gender name = if endswith “a” name

Page 125: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

How my cousin did it

gender name = if endswith “a” name then Woman

Page 126: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

How my cousin did it

gender name = if endswith “a” name then Woman else if endswith “i” name

Page 127: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

How my cousin did it

gender name = if endswith “a” name then Woman else if endswith “i” name then Woman

Page 128: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

How my cousin did it

gender name = if endswith “a” name then Woman else if endswith “i” name then Woman else Man

Page 129: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

Thanks for listening!

Page 130: Machine Learning as Extreme TDD: An Introductionsriku.org/talks/ml-as-tdd-20190306-v2.pdf · Overview • Motivation and prelude • A soft problem: Classifying gender by name •

References

Data sets

https://github.com/ellisbrown/name2gender

https://github.com/vsant/indian-name-classifier