Ayasdi with IHME data

Data has shape and shape has

meaningTM

2

• Overview of IRIS from Ayasdi• A tool for looking at large datasets and trying to find meaning

• Walking through an example of an Ayasdi analysis

Outline

3

• We are gathering more data all the time

What IRIS is for…

4

…and while data are often collected to address specific questions, the data may also hold additional insights

5

CD

+Stim, Ab

Baseline

“There isn’t a single story happening in your complex data” – Anthony Bak, Ayasdi

• IRIS combines topological math with a highly flexible and intuitive interface to analyze large datasets

• Creates different shapes that can be explored

• Ayasdi can be used on different kinds of high complexity datasets• Transcriptome profiling• Clinical data• Flow cytometry data• Financial data• Text• Etc.

That’s where we think IRIS from Ayasdi will help

6

• Concept is: data has shape based on how elements in the datasets are mathematically related to each other• For example, how are samples alike?

• IRIS takes the data, performs a mathematical transformation, and uses the output to group samples together and draw a picture

• This is done iteratively with different mathematical transformations to give multiple different views of the data’s shapes

• The shapes highlight possibly interesting parts of the dataset• In our case, disease or patient subsets

How does IRIS work?

7

8From Ayasdi

The problem of having a liberal arts education…

9

Platonic ideal of chair

What an IRIS analysis looks like

10

3 different shapes made from the

same data

Explaining the parts

11

Dots represent groups of

samples that are similar to each

other

Connecting lines represent at

least one shared member

between groups

Features like this arm on the

shape can be examined in further detail

Coloring (red=high to blue=low) can be based on initial math or annotations (ie, gender, disease), gene expression, etc.

• Groups and shapes area analyzed and interpreted• We try to understand what underlies the shapes and forms that arise

• Link back to biology, patients, effect

• Learn new insights

• Create hypotheses, test on the fly,

• Iterate

• Next several slides will be an example of an IRIS analysis and insights

How does an IRIS analysis proceed?

12

• Institute for Health Metrics and Evaluation (IHME)• Performed survey of smoking prevalence worldwide, from 1980-2012• 187 countries• Dataset contains smoking frequency broken down by age, gender, year

• 518 columns, 187 rows

• Some reasons to look at this data:• Practice—and IRIS workflow is pretty much the same for any dataset• Using non-gene expression data• Smoking is a risk factor for RA, diabetes, etc.

Example analysis: Smoking prevalence

13

These were derived from the IHME data

14

Thinking like an analyst: what do different parts of shapes mean?

There’s a lot to potentially explore

Start with this basic shape:

15

What are these two groups?

Upper arm

Lower arm

Certain mathematical transformations often create this antibody shape in large datasets

First step: define groups and do numerical and categorical comparison to rest of shape

16

Lower arm categorical table

Column Name ValuePercent in Group 1

Percent in Both Group 1 and

Group 2Count in Group

1

Count in Both Group 1 and

Group 2 p-valueISOsubregion 35 0.27 0.06 6 11 4.23E-04Developing Yes 1.00 0.73 22 137 6.48E-04ISOsubregion 14 0.27 0.09 6 17 0.006991494

Annualized Rate of Change (%) Male and Female 1980 to 2012 -0.5 0.18 0.04 4 8 0.007475094

Annualized Rate of Change (%) Male and Female 1980 to 2012 -0.7 0.18 0.05 4 10 0.019024382ISOregion 2 0.45 0.27 10 50 0.035708684

Bangladesh

Burkina Faso

Burundi

Cambodia

Djibouti

Federated States of Micronesia

Ghana

Guinea-Bissau

Indonesia

Jamaica

Laos

Malawi

Maldives

Myanmar

Namibia

Paraguay

Philippines

Rwanda

Somalia

Sri Lanka

Thailand

Zimbabwe

Southeastern Asia

Eastern Africa

Highlighting lower arm countries on a map

17

Some geographical

clustering

Now looking at numerical annotations

18

Column Name KS Statistic KS p-value T-test p-value Group 1 Mean - Group 2 Mean KS Sign

Smoking Prevalence (%) Age 80+ 1997 0.62 4.83578E-07 3.79979E-05 6.960909091 +










Smoking Prevalence (%) Age 75 2004 0.57 5.51162E-06 1.50199E-05 7.676363636 +


















Ranking by one of their built in

statistics, see quickly that data columns largely reflect smoking

prevalence among the elderly

Pick a few years for the 80+ smoking prevalence to graph boxplots

19

Okay, so confirming insights: we’re looking at a subset of countries that have a high rate of smoking in the elderly. Note that Upper Arm

group has a substantially lower rate

Other countries have high rates in the elderly; and within the lower arm group, some

have relatively low rates

So we’ve found a subpopulation

But that’s not the whole story

20

CountryLower arm

groupSmoking Prevalence (%) Age 80+ 2000 Country

Lower arm group

Smoking Prevalence (%) Age 80+ 2000

Pakistan no 34 Laos yes 29.4Tonga no 25.2 Myanmar yes 26.4

Kiribati no 24.4 Namibia yes 23.3Nepal no 23.8 Bangladesh yes 21.8

Lebanon no 22.2 Cambodia yes 20Timor-Leste no 18.8 Indonesia yes 18.1

Denmark no 17.1 Federated States of Micronesia yes 17.6Tunisia no 16.4 Philippines yes 15.8Jordan no 16.2 Paraguay yes 14.5

Lesotho no 15.9 Malawi yes 14.4South Korea no 15.9 Djibouti yes 14.3

Malaysia no 15.8 Zimbabwe yes 13.7

Dominican Republic no 15 Thailand yes 13Vanuatu no 14.5 Maldives yes 12.5Palestine no 14.2 Sri Lanka yes 11.2Vietnam no 13.9 Burkina Faso yes 11Cyprus no 13.7 Burundi yes 9.7Samoa no 13.6 Rwanda yes 8.7Albania no 13.4 Somalia yes 8.5

Mongolia no 13.1 Ghana yes 7.9South Africa no 13.1 Jamaica yes 7.6

China no 13 Guinea-Bissau yes 7.5

• Many directions to go here

• In IRIS• persistence of group• Co-occurrence with other annotations beyond “developing”

• Outside of IRIS• Once you know a subgroup exists, statistical analyses• Visualization techniques such as heatmaps

What are the characteristics that define that subpopulation?

21

Persistence (or not) of subgroup integrity across shapes and analyses

22

From this we can go back to the mathematical

transformations used to make each set of shapes and find clues to what is driving this group to stay together in some shapes

but not others

Overlay of different kinds of information

23

Comparison of developing country status suggests two groups we could compare to look for additional insights

Annualized rate of change between 1980-1996 is another annotation we could look into more

Developing = no

Developing = yesPopulation

Ann rate of change 1980-96

Comparing the two developing world enriched groups

24

• Found differences between older age smoking prevalence—lower arm group has higher rate• We already knew that

• Also found differences in 10yr old smoking prevalence—lower arm group has lower rate• We didn’t know that…

10 year old smoking prevalence

25

1980

20102000

1990 Smoking in kids consistently low in the lower arm group.

Suggests for public health intervention for these

countries--need to confirm pattern and, if it confirms, look at transition from non-

smoking to smoking and when that happens

Looking more closely at Annualized rate of change

26

Ann rate of change 1980-96 Ann rate of change 2006-2012

Ann rate of change 1980-2012 Ann rate of change 1996-2006 Suggestion that lower arm group had relatively less

decrease in overall smoking rates in the 80s and 90s,

but rate of decrease began to pickup in the 2000s,

relative to other countries

From a Public Health standpoint, now go back

and ask what kinds of smoking cessation

interventions were put in place in the 2000s

Ayasdi with IHME data

Documents

Transcript of Ayasdi with IHME data