Data Science demystified

Post on 28-Jan-2018

1.018 views 2 download

Transcript of Data Science demystified

1IPL CONFIDENTIAL

Data Science demystified

Murthy Kolluru, Ph.D.

2IPL CONFIDENTIAL

3IPL CONFIDENTIAL

4IPL CONFIDENTIAL

How good is my customer?

• Within the first few weeks of engagement, figure out how muchrevenue can be expected in the first two years.

• 100,000 customers over 5 years and

a lot of data

• POS data, playing, demographics

• Over 50 attributes

5IPL CONFIDENTIAL

Attribute 1

Attribute 2

6IPL CONFIDENTIAL

Probability of being high value = -0.25* age + 0.34* income + 0.78 * number of kids

Age Income Kids

Output

7IPL CONFIDENTIAL

Attribute 1

Attribute 2

8IPL CONFIDENTIAL

If parents are old and number of kids is less than 2 and income is less than $10K,

the value is low

Output

9IPL CONFIDENTIAL

Attribute 1

Attribute 2

10IPL CONFIDENTIAL

Attribute 1

Attribute 2

11IPL CONFIDENTIAL

12IPL CONFIDENTIAL

Simplest form of non-linearity

13IPL CONFIDENTIAL

By carefully combining simple non-linearities, you can get

highly non linear curves.

14IPL CONFIDENTIAL

15IPL CONFIDENTIAL

16IPL CONFIDENTIAL

Finally mind is demystified!

Rival The New Yorker, December 6, 1958 P. 44

ABSTRACT: Talk story about the perceptron, a new

electronic brain which hasn't been built, but which has

been successfully simulated on the I.B.M. 704. Talk

with Dr. Frank Rosenblatt, of the Cornell Aeronautical

Laboratory, who is one of the two men who developed

the prodigy; the other man is Dr. Marshall C. Yovits, of

the Office of Naval Research, in Washington. Dr.

Rosenblatt defined the perceptron as the first non-

biological object which will achieve an organization o

its external environment in a meaningful way. It

interacts with its environment, forming concepts that

have not been made ready for it by a human agent. If

a triangle is held up, the perceptron's eye picks up the

image & conveys it along a random succession of lines

to the response units, where the image is registered. It

can tell the difference betw. a cat and a dog, although

it wouldn't be able to tell whether the dog was to the

left or right of the cat. Right now it is of no practical

use, Dr. Rosenblatt conceded, but he said that one

day it might be useful to send one into outer space to

take in impressions for us

17IPL CONFIDENTIAL

18IPL CONFIDENTIAL

• Blackbox models only solve part of the problem

• How do we get Explicability?

19IPL CONFIDENTIAL

Attribute 4

Attribute 1

Attribute 2

Attribute 5

20IPL CONFIDENTIAL

What we did

• Created more features

• Did they have a favorite game?

• How are the kids ages distributed?

• When did the first sale happen?

• …

21IPL CONFIDENTIAL

Patterns

Favorite – Played a

game more than 50% of

the time

Uniform –Played multiple

games

22IPL CONFIDENTIAL

23IPL CONFIDENTIAL

24IPL CONFIDENTIAL

24

Customers who are uniform in first 30

days are on average sticky and give

more revenues in two years.

25IPL CONFIDENTIAL

First sale

Dec and Jan

win!

26IPL CONFIDENTIAL

26

Upsell?

Dec & Jan lose

big!

27IPL CONFIDENTIAL

• A great model on simple and incomplete data almost

always loses to a simple and incomplete model on great

data

• Pick unsolved problems in your business where you have

some past data

• Create as many additional factors as you can from the data

• View it from multiple angles in your Excel

• You will most likely have some Aha moments in store!!!

Action Points

28IPL CONFIDENTIAL

There will be a shortage of

100,000 data scientists and

1,000,000 data smart

managers by 2020

Mckinsey

29IPL CONFIDENTIAL

IPL’s Big Data Analytics Track

Architecting data science solutions &

products

Hands-on model building

Data visualizations

and story telling

Complexities in data sourcing,

privacy, security

30IPL CONFIDENTIAL

THANK YOU

11/29/2014