L16. Useful Things to Know about Machine Learning

A Few Useful Things To Know About Machine Learning

Charles ParkerAllston Trading

Machine Learning Out of the Lab

• Eventually, you’ll want to apply machine learning in context

• Should you?

• What does success look like?

• What else should you be worried about?

• I am eminently qualified to opine on all of these topics (not really)

2

Data Science is Science

• Do you think of yourself as a scientist?

• How can we become better scientists?

• Might as well take some advice from the best

3

Edsger W. Dijkstra (1930 - 2002)

• Turing Award Winner

• Inventor of the canonical shortest path and minimum spanning tree algorithms

• Seminal work on (among others):

• Concurrent and Distributed Computing

• Formal Methods

• Operating Systems

• Was, perhaps more than any one other person, responsible for turning computer science into an academic discipline (“Let the poorer mathematician study pure mathematics”)

4

Advice To A Young Scientist (EWD 1055A)

• Dijkstra was a prolific writer, and all of his writings are archived and numbered:

https://www.cs.utexas.edu/users/EWD

• EWD 1055A is informally known as “Advice to a Young Scientist”https://www.cs.utexas.edu/users/EWD/ewd10xx/EWD1055A.PDF

• Let’s take a few pages out of his book

5

https://www.cs.utexas.edu/users/EWD

https://www.cs.utexas.edu/users/EWD/ewd10xx/EWD1055A.PDF

7

• Killing Ambitious Projects

• Ignoring the Lure of Complexity

• Finding Your Own Humility

• Avoiding Useless Projects

• Creating a Good Story

• Continuing To Learn

Advice From Dijkstra

Killing Ambitious Projects

8

Before embarking on an ambitious project, try to kill it.

- Edsger Dijkstra

Time For Machine Learning!

• Imagine yourself a greenhorn data scientist

• Fresh-faced

• Full of optimism

• Asked to apply machine learning to some general problem. What do you do?

• Big bite

• Small bite

9

Four Use Cases For Machine Learning

• No human experts (protein folding)

• Humans can do f(x), but can’t explain how (character recognition)

• f(x) is changing all the time (market data)

• f(x) must be specialized many times (anything user specific)

• Where is your problem in one of those cases?

• Where are the subproblems?

• What’s hard and what’s easy? (Watermarks vs. Handwriting)

10

The Joy of Hacking

• Do you really need to solve this with machine learning?

• It comes with overhead . . .

• . . . and you might do better without it!

• Even if you’re using machine learning, just hack

• You know a lot of fancy tricks

• You rarely have to explain positive results

• Something is better than nothing

11

12








Ignoring the Lure of Complexity

13

Don't get enamored with the complexities you have learned to live with (be they of your own making or imported). The lurking suspicion that something could be simplified is the world's richest source of rewarding challenges.

- Edsger Dijkstra

Simplicity As Value Added

• What can you replace?

• Spaghetti Code

• Processes?

• Removal of Drudgery (Like spreadsheets)

14

Simplicity of Construction

• Push around the complexity of the model

• Remember learning is a form of compression

• Don’t “learn to live” with your features!

• The raw data is often back there somewhere

• Deep learning is an extreme example

15

16








Finding Your Own Humility

17

Never tackle a problem of which you can be pretty sure that (now or in the near future) it will be tackled by others who are, in relation to that problem, at least as competent and well-equipped as you are.

- Edsger Dijkstra

The Right Straw-man

• An important aspect of academia is finding the “right straw-man”

• What is your straw-man?

• You might have to hack a solution

• It might be an actual person

• Embrace the AB test! Don’t be satisfied with cross-validation!

18

Showing Improvement

• If you’ve got the courage and the means to do an AB test, you’re looking for improvement

• It’s often not that simple

• How much is there?

• How stable is it?

• How long, pessimistically, will it take to show up?

19

Daily Sharpe Ratio

• Measure daily equity, plot is p(negative equity at day n)

• Just to give a sense, 0.5 is considered spectacularly reliable.

20

21








Avoiding Useless Projects

22

Avoid involvement in projects so vague that their failure could remain invisible: such involvement tends to corrupt one's scientific integrity.

- Edsger Dijkstra

Low-Hanging Fruit

• Many of us in machine learning make a career of being a non-expert (Tom Dietterich)

• Start by finding the best combination of easy and big win you can find

• Might not be super glamorous

• Might take some time

• But it’s safer (see: modeling the whole market)

• And people see value right away

• Don’t be afraid to push back against the opposite

23

Metrics Design

• First, carefully design metrics

• Talk to experts (even if it’s scary)

• Iterate (even if it’s difficult)

• Then don’t trust them

• You won’t have gotten it right

• Monitor closely and find ways to improve

24

The Opposite of Science

• Remember when I said data science was science? I lied.

• The “multiple comparisons problem” is much, much worse

• This just means you have to muster even more scientific objectivity

• What is your edge (you are doing an AB test, right)?

• How careful to you have to be to maintain it?

• This should influence how willing you are to “just push it out”

25

26








Creating a Good Story

27

Write as if your work is going to be studied by a thousand people.

- Edsger Dijkstra

Everybody Loves a Good Story

• A good chunk of data science is about telling stories

• Can you explain why something should be deployed?

• Can you summarize your model’s behavior?

• Sometimes stories are more valuable than models

• At LinkedIn, this is a lot of what they are (were) doing

28

Story #1: Print Shop Operators Dilema

29

Story #2: Latent Dumpster Allocation

30

31








Continuing to Learn

32

Raise your standards as high as you can live with, avoid wasting your time on routine problems, and always try to work as closely as possible at the boundary of your abilities. Do this because it is the only way of discovering how that boundary should be moved forward.

- Edsger Dijkstra

Keep Learning

• Don’t get attached to your own ideas!

• Especially in this field

• Learn to embrace how wrong you certainly are

• Understand your limitations

• The danger of rolling-your-own

• The complexity it generates

33

Summing Up

• A lot of this is about finding where the value is for ML in some organization

• Locating the right problem

• Executing

• Showing the proof

• It’s still early days

• We’re all ambassadors

• If one of us wins, we all do

34

L16. Useful Things to Know about Machine Learning

Data & Analytics

Transcript of L16. Useful Things to Know about Machine Learning