Data monetization

41
S ANAND, CHIEF DATA SCIENTIST, GRAMENER MONETISING DATA REMOVING YOUR MENTAL HURDLES

Transcript of Data monetization

Page 1: Data monetization

S ANAND, CHIEF DATA SCIENTIST, GRAMENER

MONETISING DATA

REMOVING YOUR MENTAL HURDLES

Page 2: Data monetization

DATA

ANALYSIS VISUALS

INSIG

HTS REPORTS

EXPLORATION

ISEVERYWHERE

Page 3: Data monetization

DATA

ANALYSIS VISUALS

INSIG

HTS REPORTS

EXPLORATION

ISEVERYWHERE

COMMON COMPLAINT #1

WE DON’T HAVE DATA

Page 4: Data monetization

We have internal information. Getting

information from outside is our challenge. There’s

no way of doing that.

– Senior EditorLeading Media Company

Page 5: Data monetization

India’s religions

Page 6: Data monetization

United Kingdom’s religions

Page 7: Data monetization
Page 8: Data monetization

UNCOVER YOUR DARK DATA

Source: http://www.patrickcheesman.com/dark-data-problems-and-solutions/

• INACCESSIBLE data (e.g. technology is outdated)• FORGOTTEN data (e.g. collected, but not actively used)• UNCOLLECTED data (e.g. information exists, not digitized)• SINGLE PURPOSE data (e.g. used for a specific purpose)

Page 9: Data monetization

We’ve used network diagrams to detect terrorism, corporate fraud, product affinities and behavioural customer segmentation

Page 10: Data monetization

AUGMENT YOUR

DATASOURCES

DATA ISEVERYWHERE

COMMON COMPLAINT #1

WE DON’T HAVE DATACOMMON COMPLAINT #2

THE DATA ISN’T STRUCTURED

CRM DATASALES DATAPRICING DATACALL RECORDSWEB LOG DATAVENDOR INVOICESSOCIAL MEDIA DATACLICKTHROUGH DATACOMPETITOR RESEARCHCUSTOMER TRANSACTIONS…

CENSUS DATAE-COMMERCE PRICESCOMMODITY PRICESSTOCK MARKET DATAFINANCIAL REPORTINGSOCIAL MEDIA DATAMOBILE PENETRATIONAADHAR DATACOURT CASE BRIEFSSHAPE FILES…

Page 11: Data monetization

How does Mahabharata, one of the largest epics with 1.8 million words lend itself to text analytics?

Can this ‘unstructured data’ be processed to extract analytical insights?

What does sentiment analysis of this tome convey?

Is there a better way to explore relations between characters?

How can closeness of characters be analysed & visualized?

Visualising the Mahabharata

Page 12: Data monetization

“ Can we help CFOs understand what

questions are being asked by investors and

analysts during earnings releases? How this is

different from competition?

– Product HeadGlobal Financial

Services Firm

Page 13: Data monetization

WHAT DO FINANCIAL ANALYSTS ASK IBM VS MSFT?

Page 14: Data monetization

DATA ISEVERYWHERE

EXTRACT THE

META DATA

AUGMENT YOUR

DATASOURCES

COMMON COMPLAINT #2

THE DATA ISN’T STRUCTURED

COMMON COMPLAINT #3

THE DATA ISN’T RICH / CLEAN

COMMONWHO, WHAT, WHEN, WHERETEXTTEXT KEYWORDSSENTIMENTIMAGEVISUAL RECOGNITIONAUDIO / CALLSTRANSCRIPTSMOOD ANALYSIS

Page 15: Data monetization

“ Can we get the results of every single election in

history, and create a portal to visualize these

results?

– Rajdeep SardesaiCNN-IBN

Page 16: Data monetization

The PDF files have a reasonably clear structure

Page 17: Data monetization

… that translates into text that can be parsed

Page 18: Data monetization

Not every spelling error is easily identifiable by the first letter

Page 19: Data monetization

… with several names spelt wrong

These are, in fact twodifferent constituenciesBut these are exactly the same

... and so are theseI’ve no idea if these are 2, or 3, constituencies!

Page 20: Data monetization

… with the ability for the system to correct errors automatically

Page 21: Data monetization
Page 22: Data monetization

DATA ISEVERYWHERE

TRANSFORM THE DATA &

ENRICH ITEXTRACT THE

META DATA

AUGMENT YOUR

DATASOURCES

COMMON COMPLAINT #3

THE DATA ISN’T RICH / CLEAN

Page 23: Data monetization

DATA

ANALYSIS VISUALS

INSIG

HTS REPORTS

EXPLORATION

ISEVERYWHERE

COMMON COMPLAINT #1

WE DON’T HAVE THE TOOLS

Page 24: Data monetization

This is a dataset (1975 – 1990) that has been around for several years, and has been studied extensively. Yet, a visualization can reveal patterns that are neither obvious nor well known.

For example,• Are birthdays uniformly distributed?• Do doctors or parents exercise the C-section option to move

dates?• Is there any day of the month that has unusually high or low

births?• Are there any months with relatively high or low births?More births Fewer births … on average, for each day of the year (from 1975 to 1990)

LET’S LOOK AT 15 YEARS OF US BIRTH DATA

Page 25: Data monetization

THE PATTERN IN INDIA IS QUITE DIFFERENTThis is a birth date dataset that’s obtained from school admission data for over 10 million children. When we compare this with births in the US, we see none of the same patterns.

For example,• Is there an aversion to the 13th or is there a local cultural

nuance?• Are holidays avoided for births?• Which months have a higher propensity for births, and why?• Are there any patterns not found in the US data?

More births Fewer births … on average, for each day of the year (from 2007 to 2013)

Page 26: Data monetization

THIS ADVERSELY IMPACTS CHILDREN’S MARKSIt’s a well established fact that older children tend to do better at school in most activities. Since many children have had their birth dates brought forward, these younger children suffer.

The average marks of children “born” on the 1st, 5th, 10th, 15th etc. of the month tend to score lower marks. • Are holidays avoided for births?• Which months have a higher propensity for births, and why?• Are there any patterns not found in the US data?

Higher marks Lower marks… on average, for children born on a given day of the year (from 2007 to 2013)

Page 27: Data monetization

DEPLOY

MODERNTOOLS

ANALYSIS ISEVERYWHERE

COMMON COMPLAINT #1

WE DON’T HAVE THE TOOLS

COMMON COMPLAINT #2

WE DON’T GET INSIGHTS

RSASEXCELPYTHONDATABASESML SERVICES

Page 28: Data monetization

RESTAURANT FOUND AN UNUSUAL DIP IN SALESA restaurant chain had data for every single transaction made over a few years. Plotting this as a time series showed them nothing unusual.

However, the same data on a calendar map reveals a very different story.

Specifically, at the bottom left point-of-sale terminal, sales dips on every Wednesday. At the bottom right point-of-sale terminal, sales rises on every Wednesday (almost as if to compensate for the loss.)

It turns out that the manager closes the bottom-left counter every Wednesday afternoon due to shortage of staff, assuming that it results in no loss of sales. There is, however, a net loss every Wednesday.

Page 29: Data monetization

DEPLOY

MODERNTOOLS

ANALYSIS ISEVERYWHERE

TEST DATASETSANONYMISATIONEVALUATION CRITERIAIMPROVEMENT METRICDATA INFRASTRUCTUREMODEL INFRASTRUCTUREVISUALS INFRASTRUCTURE

SET UP AN ML PLATFORM

INFRASTRUCTURE FOR

RAPIDITYCOMMON COMPLAINT #2

MODELS ARE COMPLICATED

COMMON COMPLAINT #3

IMPLEMENTATIONS ARE SLOW

Page 30: Data monetization

Nation-wide statistics onbehaviour and performance of students

Over 1,000 questions each administered toseveral lakhs of students across the country

Page 31: Data monetization

Having books improves reading abilityHaving more books at home improves the performance of children when it comes to reading. (But children typically only have only 1-10 books at home)

… but the impact in social is lessWhile having more books improves the reading % score by 8%, it only increases the social % by 4%

Page 32: Data monetization

Tuitions help very little

… but children of illiterate parents do worse

Page 33: Data monetization

Watching TV occasionally is goodChildren who watch TV every day don’t do as well as children who watch TV only once a week.

But children who never watch TV fare the worst.

Watching TV every day helps improve children’s reading ability a little bit more…

… but mathematical abilities fall dramatically at that point

Page 34: Data monetization

Having educated parents helps mostThis table shows the % improvement in score due to each factor

THIS TECHNIQUE CAN BEAPPLIED TO ANY DATASET

Page 35: Data monetization

AUTOMATING ANALYSIS IN POULTRY FARMING

We group by every input

factor

… and calculate the impact on every metric.

By moving from average to the best group, what’s the improvement?

The actual performance by each group is shown

0-3m 3-6m 6m-1yr 1-2 yrs > 2 yrs11 12.3 12.7 15.3 16.1

Our product can create visualisations from data automatically, without any supervision.

Above is an example. Irrespective of the dataset, this visual shows which input parameters have a significant impact on the output. Another such example is the cluster scatterplot.

Only significant results shown

Page 36: Data monetization

68% correlation between AUD &

EUR

Plot of 6 month daily AUD - EUR

values

Block of correlated currencies

… clustered hierarchically

Page 37: Data monetization

Restaurant: Product Sales Correlation

Page 38: Data monetization

Restaurant: Product sales correlation

Page 39: Data monetization

DEPLOY

MODERNTOOLS

ANALYSIS ISEVERYWHERE

CLUSTER PLOTSCORRELATIONSCROSS TABULATIONGROUP MEANSKEYWORD EXTRACTIONNETWORK ANALYSISSANKEY DRILLDOWNSSENTIMENT ANALYSIS…

INFRASTRUCTURE FOR

RAPIDITYCOMMON COMPLAINT #3

IMPLEMENTATIONS ARE SLOW

BUILD AND USE

TEMPLATES

Page 40: Data monetization

DATA

ANALYSIS VISUALS

INSIG

HTS REPORTS

EXPLORATION

ISEVERYWHERE

Page 41: Data monetization

S ANAND, CHIEF DATA SCIENTIST, GRAMENER

THE CAPABILITIES AREIN YOUR REACH TODAY

EXPLORE THE ART OF DATA