Nimbus

Post on 21-Oct-2014

617 views 1 download

Tags:

description

Nimbus is a ruby gem implementing random forest algorithm for genomic selection contexts. http://www.nimbusgem.org Presentation from the EAAP 2011 conference in Stavanger, Norway.

Transcript of Nimbus

An opensource library to implement random forests in genomic contexts

Oscar González-Recio Juanjo Bazán Selma Forni

The Problem

The ProblemMassive amount of information from high troughput genotyping platforms.

The ProblemMassive amount of information from high troughput genotyping platforms.

Need to extract knowledge from large, noisy, redundant, missing and fuzzy data.

The ProblemMassive amount of information from high troughput genotyping platforms.

Massive amount of information consumes the attention of its recipients. We need to allocate that attention efciently.

Need to extract knowledge from large, noisy, redundant, missing and fuzzy data.

Why Random Forest?

Why Random Forest?Using Machine Learning techniques we can extract hidden relationships that exist in these huge volumes of data and do not follow a particular parametric design.

Why Random Forest?Using Machine Learning techniques we can extract hidden relationships that exist in these huge volumes of data and do not follow a particular parametric design.

Random Forest have desirable statistical properties.

Why Random Forest?Using Machine Learning techniques we can extract hidden relationships that exist in these huge volumes of data and do not follow a particular parametric design.

Random Forest have desirable statistical properties.

Random Forest scales well computationally.

Using Machine Learning techniques we can extract hidden relationships that exist in these huge volumes of data and do not follow a particular parametric design.

Random Forest have desirable statistical properties.

Random Forest scales well computationally.

Random Forest performs extremely well in a variety of possible complex domains (Breiman, 2001; Gonzalez-Recio & Forni, 2011).

Why Random Forest?

The Algorithm

Ensemble methods:

- Combination of diferent methods (usually simple models).- They have very good predictive ability because use additivity of models performances.

Based on Classifcation And Regression Trees (CART).

Use Randomization and Bagging.

Performs Feature Subset Selection.

Convenient for classifcation problems.

Fast computation.

Simple interpretation of results for human minds.

Previous work in genome-wide prediction (Gonzalez-Recio and Forni, 2011)

The Algorithm

The AlgorithmPerform bootstrap on data: Ψ* = (y, X)

Build a CART ( fi (y, X) = ht (x) ) using only mtry proportion of SNPs in each node.

Repeat M times to reduce residuals by a factor of M.

Average estimates c0 = μ ; ci = 1/M

Let Ψ = (y, X) be a set of data, with

y = vector of phenotypes (response variables)

X = (x1, x2) = matrix of features

y1 x11 x12… … ...yi xi1 xi2… … ...yn xn1 xn2

The AlgorithmClassifcation and regression trees:

The Algorithm

Nimbus library

Nimbus library

Written in Ruby

www.ruby-lang.org

Open source programming language

Syntax focused on simplicity

Natural to read and easy to write

Nimbus library

How to install:

> gem install nimbus

Prerequisites:

Ruby and Rubygems (default library manager) installed in the system

Nimbus library

How to run:

> nimbus

Confguration:

Via confg.yml fle

Nimbus libraryconfg.yml fle:

Nimbus libraryInput fles:

training

testing

Nimbus libraryFeatures, use cases:

Training of a prediction forestTraining of a prediction forest

Genomic Prediction of a testing sample

Using a training set of individuals

Nimbus creates a reutilizable forest

Nimbus calculates SNP importancesGeneralization error are computed for every tree in the forest

Using a custom/reused forest, specifi ed via confi g.yml

Using a new trained forest

OutputsRandom Forest fle

In standard YAML format

OutputsPredictions for the training sample

OutputsPredictions for the testing sample

OutputsSNP importances

More info:Nimbus website:

Source code:

Report bugs/request features:

www.nimbusgem.org

www.github.com/xuanxu/nimbus/issues

www.github.com/xuanxu/nimbus

Thank you!

Questions?