Exploring Metabolomic data with recursive partitioning Metabolomic Workshop NISS July 14-15, 2005.

9
Exploring Metabolomic data with recursive partitioning Metabolomic Workshop NISS July 14-15, 2005

Transcript of Exploring Metabolomic data with recursive partitioning Metabolomic Workshop NISS July 14-15, 2005.

Exploring Metabolomic data with recursive

partitioning

Metabolomic WorkshopNISS

July 14-15, 2005

University of North Carolina Wilmington

Why study metabolites?• Metabolomics – the global study of all small

molecules produced in the human body

• Biochemical consequences of environment, drugs, and mutations can be observed directly through metabolites

• Understand how drugs work, interactions and possible side effects

• ~2500 metabolites

University of North Carolina Wilmington

Challenges of metabolomic data

• Nonnormal distributions

• Outliers

• Informative missing values

• High correlation among metabolites

• n < p problem

(n - number of biological samples and

p - number of metabolites)

University of North Carolina Wilmington

Why recursive partitioning?• Is fairly robust to non-normal data

• Missing values is not an issue

• Correlation among variables is not an issue

• Useful for discovering outliers

• Is efficient at handling large p, small n data sets

University of North Carolina Wilmington

How recursive partitioning works• Recursive partitioning efficiently searches through

all of the variables and finds the one with the best split (most significant)

• Once data is split or “partitioned” on this variable, the resulting daughter nodes are more homogeneous

• Now each daughter node is explored to find the best split

• This process is continued until no significant split remains

University of North Carolina Wilmington

Example

University of North Carolina Wilmington

Multiple Trees• All effects are not necessarily found in a

single tree

• In any node, there may be more than one significant variable

• Creating multiple trees may reveal a number of possible effects

• Gain an understanding of interactions/correlations among metabolites

University of North Carolina Wilmington

Software• Helix Tree (Partitionator)

• www.goldenhelix.com

• Uses Formal Inference-based Recursive Modeling (FIRM) developed by Douglas Hawkins

• Anyone can download free 7 day trial (webinars to assist in using the software)

University of North Carolina Wilmington

Illustration of Software• Data

– 317 metabolites – LC/MS and GC/MS– 63 biological samples– Want to discover which metabolites differentiate

between the diseased group and the “healthy” individuals (within the diseased group there is a subset of individuals currently taking drugs)