Analysis of metabolomic data: tools, current strategies ...€¦ · and proteomics), metabolomics...

13
Analysis of metabolomic data: tools, current strategies and future challenges for omics data integration Alice Cambiaghi, Manuela Ferrario and Marco Masseroli Corresponding author: Alice Cambiaghi, Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133, Milan, Italy. Tel.: þ39-02-2399-3381; Fax: þ39-02-2399-3360; E-mail: [email protected] Abstract Metabolomics is a rapidly growing field consisting of the analysis of a large number of metabolites at a system scale. The two major goals of metabolomics are the identification of the metabolites characterizing each organism state and the measurement of their dynamics under different situations (e.g. pathological conditions, environmental factors). Knowledge about metabolites is crucial for the understanding of most cellular phenomena, but this information alone is not sufficient to gain a comprehensive view of all the biological processes involved. Integrated approaches combining metabolomics with transcriptomics and proteomics are thus required to obtain much deeper insights than any of these techniques alone. Although this information is available, multilevel integration of different ‘omics’ data is still a challenge. The handling, processing, analysis and integration of these data require specialized mathematical, statistical and bioinformatics tools, and several technical problems hampering a rapid progress in the field exist. Here, we review four main tools for number of users or provided features (MetaCore TM , MetaboAnalyst, InCroMAP and 3Omics) out of the several available for metabolomic data analysis and integration with other ‘omics’ data, highlighting their strong and weak aspects; a number of related issues affecting data analysis and integration are also identified and discussed. Overall, we provide an objective description of how some of the main currently available software packages work, which may help the experimental practitioner in the choice of a robust pipeline for metabolomic data analysis and integration. Key words: Metabolomics; integrated ‘omics’; data analysis; metabolomic analysis tools; comparative tool evaluation Introduction Metabolomics is an emerging field of the biological sciences; it concerns the high-throughput characterization of metabolites, i.e. small molecular compounds (< 1500 Da), which constitute the end products of the cellular metabolism and form the chem- ical fingerprint of an organism at a precise time point. More pre- cisely, metabolomic studies involve the identification and quantification of metabolites, with the aim of correlating their changes with pathological states, or with the effect of external influencing factors such as drugs or contaminants [1]. Together with the other main ‘omics’ areas (genomics, transcriptomics and proteomics), metabolomics constitutes one of the building blocks of systems biology. Because of its focus on small mol- ecules and small interactions, it has lately reached a wide- spread application in many different fields, including molecular epidemiology, toxicity assessment, functional and nutritional genomics, biomarker discovery and identification, drug devel- opment and personalized health care [13]. Alice Cambiaghi is a PhD student in biomedical engineering at the Department of Electronics, Information and Bioengineering, Politecnico di Milano, Italy. Her PhD project deals with finding new strategies for multilevel integration of ‘omics’ data, as an approach for the identification of molecular biomarkers in acute heart failure induced by shock. Manuela Ferrario is an assistant professor at Politecnico di Milano, Italy. She is a member of the scientific committee of the EU project ‘Shockomics: multi- scale approach to the identification of molecular biomarkers in acute heart failure induced by shock’ (2013–2017). Her research interests include mathem- atical modeling of physiological systems, nonlinear analysis, variability analysis of cardiovascular signals, data mining and prediction models. Marco Masseroli is an associate professor of bioinformatics and biomedical informatics at the Department of Electronics, Information and Bioengineering, Politecnico di Milano, Italy. His research interests include distributed Internet technologies, biomolecular databases, biomedical terminologies and bio- ontologies to effectively retrieve, manage, analyze and semantically integrate genomic information with clinical and high-throughout genetic data. He is the author of more than 170 scientific articles. Submitted: 8 January 2016; Received (in revised form): 29 February 2016 V C The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 498 Briefings in Bioinformatics, 18(3), 2017, 498–510 doi: 10.1093/bib/bbw031 Advance Access Publication Date: 12 April 2016 Software Review Downloaded from https://academic.oup.com/bib/article-abstract/18/3/498/2453286 by Universidad Nacional Autonoma de Mexico user on 12 May 2019

Transcript of Analysis of metabolomic data: tools, current strategies ...€¦ · and proteomics), metabolomics...

Page 1: Analysis of metabolomic data: tools, current strategies ...€¦ · and proteomics), metabolomics constitutes one of the building blocks of systems biology. Because of its focus on

Analysis of metabolomic data tools current strategies

and future challenges for omics data integrationAlice Cambiaghi Manuela Ferrario and Marco MasseroliCorresponding author Alice Cambiaghi Dipartimento di Elettronica Informazione e Bioingegneria Politecnico di Milano Piazza Leonardo da Vinci 3220133 Milan Italy Tel thorn39-02-2399-3381 Fax thorn39-02-2399-3360 E-mail alicecambiaghipolimiit

Abstract

Metabolomics is a rapidly growing field consisting of the analysis of a large number of metabolites at a system scale Thetwo major goals of metabolomics are the identification of the metabolites characterizing each organism state and themeasurement of their dynamics under different situations (eg pathological conditions environmental factors) Knowledgeabout metabolites is crucial for the understanding of most cellular phenomena but this information alone is not sufficientto gain a comprehensive view of all the biological processes involved Integrated approaches combining metabolomics withtranscriptomics and proteomics are thus required to obtain much deeper insights than any of these techniques aloneAlthough this information is available multilevel integration of different lsquoomicsrsquo data is still a challenge The handlingprocessing analysis and integration of these data require specialized mathematical statistical and bioinformatics toolsand several technical problems hampering a rapid progress in the field exist Here we review four main tools for number ofusers or provided features (MetaCoreTM MetaboAnalyst InCroMAP and 3Omics) out of the several available for metabolomicdata analysis and integration with other lsquoomicsrsquo data highlighting their strong and weak aspects a number of related issuesaffecting data analysis and integration are also identified and discussed Overall we provide an objective description of howsome of the main currently available software packages work which may help the experimental practitioner in the choiceof a robust pipeline for metabolomic data analysis and integration

Key words Metabolomics integrated lsquoomicsrsquo data analysis metabolomic analysis tools comparative tool evaluation

Introduction

Metabolomics is an emerging field of the biological sciences itconcerns the high-throughput characterization of metabolitesie small molecular compounds (lt 1500 Da) which constitutethe end products of the cellular metabolism and form the chem-ical fingerprint of an organism at a precise time point More pre-cisely metabolomic studies involve the identification andquantification of metabolites with the aim of correlating theirchanges with pathological states or with the effect of external

influencing factors such as drugs or contaminants [1] Togetherwith the other main lsquoomicsrsquo areas (genomics transcriptomicsand proteomics) metabolomics constitutes one of the buildingblocks of systems biology Because of its focus on small mol-ecules and small interactions it has lately reached a wide-spread application in many different fields including molecularepidemiology toxicity assessment functional and nutritionalgenomics biomarker discovery and identification drug devel-opment and personalized health care [1ndash3]

Alice Cambiaghi is a PhD student in biomedical engineering at the Department of Electronics Information and Bioengineering Politecnico di Milano ItalyHer PhD project deals with finding new strategies for multilevel integration of lsquoomicsrsquo data as an approach for the identification of molecular biomarkersin acute heart failure induced by shockManuela Ferrario is an assistant professor at Politecnico di Milano Italy She is a member of the scientific committee of the EU project lsquoShockomics multi-scale approach to the identification of molecular biomarkers in acute heart failure induced by shockrsquo (2013ndash2017) Her research interests include mathem-atical modeling of physiological systems nonlinear analysis variability analysis of cardiovascular signals data mining and prediction modelsMarco Masseroli is an associate professor of bioinformatics and biomedical informatics at the Department of Electronics Information and BioengineeringPolitecnico di Milano Italy His research interests include distributed Internet technologies biomolecular databases biomedical terminologies and bio-ontologies to effectively retrieve manage analyze and semantically integrate genomic information with clinical and high-throughout genetic data He isthe author of more than 170 scientific articlesSubmitted 8 January 2016 Received (in revised form) 29 February 2016

VC The Author 2016 Published by Oxford University Press All rights reserved For Permissions please email journalspermissionsoupcom

498

Briefings in Bioinformatics 18(3) 2017 498ndash510

doi 101093bibbbw031Advance Access Publication Date 12 April 2016Software Review

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

The following two are the general analytical approaches inperforming a metabolomic analysis targeted and untargetedTargeted metabolomics refers to the detection and precise quan-tification (in nM or mgmL) of a small set of known compoundsIt is driven by a specific biochemical question or hypothesis inwhich the set of metabolites related to one or more pathways isalready defined A limitation of the targeted approach is that itrequires the compounds of interest to be known a priori and tobe available in their purified form Currently only few purifiedstandards (ie defined groups of chemically characterized andbiochemically annotated metabolites) have been clearly identi-fied and are available for a calibration process therefore owingto the wide variety of metabolites and their complex dynamicswithin a cell the targeted approach cannot yet be used alone fora comprehensive analysis of the metabolome The untargetedapproach instead also called lsquometabolite fingerprintingrsquo is notdriven by an a priori hypothesis and it is used for completemetabolome comparison (ie as many metabolites as possibleare measured and compared between samples) [4] Metabolitevariations are observed principally as total changes of chroma-tographic patterns without requiring previous knowledge of thecompounds under investigation Therefore untargeted metabo-lomics does not attempt to precisely quantify all measurablemetabolites in a sample but it only gives their relative quantifi-cation (fold change) [5 6] It is important to stress that in spiteof the presence of extensive metabolomics spectra repositories(eg SMPDB [7] KEEG [8] MetaCyc [9] and HumanCyc [10] to citesome) metabolite identification still constitutes a challenge inuntargeted metabolomics This is mainly because of techno-logical limitations such as the dependence on the intrinsic ana-lytical coverage of the platform used and the possible biastoward the detection of the most abundant molecules [11]Moreover the same molecule can be fragmented differently de-pending on the specific instrument or technique used and thishampers the metabolites spectra matching Furthermore in-strument-dependent variability between different kinds ofmass-analyzers and even between the ones of the same kindbut of different brands increases the variability in compoundidentification

Metabolomic research leads to the handling of complex datasets which include hundreds of metabolites their comprehen-sive evaluation requires a specialized data analysis that in-volves cheminformatics bioinformatics and statistics aspectsMoreover to better understand the role of each metabolite inthe studied condition metabolomic data must be interpretedthis requires that every chemical information derived frommetabolomic analyses has to be related to both biochemicalcauses and physiological consequences [1 12] Toward this endthe multilevel integration of metabolomic proteomic and tran-scriptomic information is fundamental for a better understand-ing of the cellular biology Although this information isavailable its fast evaluation and integration is still hamperedby technical and biological issues including (i) the complexityand heterogeneity intrinsic to biological data which require ap-propriate statistical and computational analysis methods (ii)the limited reproducibility of the results of transcriptomicproteomic and metabolomic research and the heterogeneity ofthe available analysis techniques which make data comparisonamong different labs hard (iii) the lack of standard data formatsboth for lsquoomicsrsquo data and for metadata and (iv) the need ofuser-friendly tools for integrative analysis of multiple datatypes [13 14]

Although several literature about metabolomic data produc-tion and analysis techniques exist [3ndash5 15ndash23] to the best of our

knowledge a comprehensive review illustrating the availabletools for the analysis and integration of metabolomic data withother lsquoomicsrsquo data has not been reported yet Here we describeand compare four tools (MetaCoreTM MetaboAnalyst InCroMAPand 3Omics [24ndash27]) which we selected among the several onescurrently available for metabolomic data analysis and integra-tion (Table 1 reports a list of the tools most frequently cited inthe literature) related issues and challenges arising in lsquoomicsrsquodata integration and analysis are also identified and discussed

MetaCoreTM and MetaboAnalyst are the most commonlyused tools by researchers who work with metabolomic data (thenumber of data analysis jobs submitted to MetaboAnalyst wasabout 40 000month in 2014 [41]) Both of them have been avail-able since several years (since 2004 and 2009 respectively)Conversely InCroMAP and 3Omics have been implementedmore recently (in 2011 and 2013 respectively) and they havenot yet overcome the previous ones However given their easeof use the knowledge of these two latter tools may be advanta-geous for researchers interested in evaluating the current possi-bility to integrate lsquoomicsrsquo data across multiple platforms In thisreview we provide an objective and practical assessment ofthese tools which may be helpful as a guide for the choice of arobust pipeline for metabolomic data integration and analysisWe do not give a description of lsquothe best toolsrsquo for metabolomicdata analysis but we elucidate how some of the main currentlyavailable software packages work highlighting their strong andweak aspects

Metabolomic analysis workflow a briefoverview

A detailed description of how a metabolomic analysis is per-formed can be found in [5 6 15] In the following we provide abrief overview to better understand the related issues we sum-marize the main aspects focusing on the integration with otherkinds of lsquoomicsrsquo data

A typical metabolomic study consists of several differentparts (schematized in Figure 1) which can be grouped in fourmain steps [5 12 16 42]

Sample preparation

The first step in the metabolomic workflow is the preparation ofthe biological sample (eg blood plasma and serum urine sal-iva solid tissues and cultured cells) According to the kind ofsample to analyze several approaches are used as described indetails elsewhere [18ndash21] Samples are usually homogenized orpulverized into smaller particles to increase their surface areafor the exposure to the extraction buffer which is chosen in ac-cordance to their chemical characteristics

Data acquisition

Once the sample is ready different techniques can be used toseparate and characterize chemically diverse groups of metab-olites To have different measurement dimensions based bothon chemical and physical properties of metabolites compoundseparation techniques (eg gas chromatography high-perform-ance liquid chromatography ultra-high-performance liquidchromatography and capillary electrophoresis) are combinedwith compound detection techniques such as mass spectrom-etry (MS) or nuclear magnetic resonance (NMR) Each detectionor separation method has different resolution sensitivity andtechnological limitations in identifying metabolites and it is

Analysis of metabolomic data | 499

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Tab

le1

Feat

ure

so

fth

ecu

rren

tly

avai

labl

em

etab

olo

mic

dat

aan

alys

isto

ols

mo

stci

ted

inth

eli

tera

ture

ord

ered

byth

eir

year

of

pu

blic

atio

n

To

oln

ame

To

old

escr

ipti

on

Dat

ap

rep

roce

ssin

gO

mic

sd

ata

inte

grat

edan

alys

is

Path

way

anal

ysis

Tra

nsc

rip

tom

icd

ata

Pro

teo

mic

dat

aM

etab

olo

mic

dat

aY

ear

Ref

eren

ce

IPA

aA

nal

ysis

and

visu

aliz

atio

no

fd

iffe

ren

tki

nd

so

fo

mic

sd

ata

N

A[2

8]

Met

aCo

rea

Fun

ctio

nal

anal

ysis

and

visu

aliz

atio

no

fd

iffe

ren

tki

nd

so

fh

igh

-th

rou

ghp

ut

om

ics

dat

a

o

2004

[24]

PaV

ESy

Dat

am

anag

ing

syst

emfo

red

itin

gan

dvi

sual

izat

ion

of

bio

logi

calp

ath

way

so

oo

2004

[29]

Pro

teo

me

Soft

war

ea

Qu

anti

zati

on

co

mp

ou

nd

iden

tifi

cati

on

and

stat

isti

cala

nd

pat

hw

ayan

alys

is

20

05[3

0]

Vis

AN

TV

isu

aliz

atio

nan

dan

alys

iso

fm

any

typ

eso

fbi

olo

gica

lnet

wo

rks

o

oo

2005

[31]

VA

NT

EDV

isu

aliz

atio

nan

dan

alys

iso

fn

etw

ork

sw

ith

rela

ted

exp

erim

enta

ldat

a

20

06[3

2]

Mas

sTR

IXA

nn

ota

tio

no

fin

pu

tm

ass

pea

ksan

dm

ap-

pin

go

fth

eid

enti

fied

com

po

un

ds

on

toth

esp

ecifi

cm

etab

oli

cp

ath

way

20

08[3

3]

Pro

MeT

raV

isu

aliz

atio

nan

din

tegr

atio

no

fd

ata

sets

of

dif

fere

nt

kin

ds

of

om

ics

dat

ao

nu

ser-

defi

ned

met

abo

lic

pat

hw

aym

aps

o

2009

[34]

Met

abo

An

alys

tC

om

pre

hen

sive

met

abo

lom

icd

ata

ana-

lysi

svi

sual

izat

ion

and

inte

rpre

tati

on

2009

[25

35]

Pain

tom

ics

Inte

grat

edvi

sual

anal

ysis

of

tran

scri

p-

tom

ican

dm

etab

olo

mic

dat

ao

20

10[3

6]

Met

PaPa

thw

ayan

alys

isan

dvi

sual

izat

ion

for

met

abo

lom

icd

ata

o

20

10[3

7]

IMPa

LaJo

int

pat

hw

ayan

alys

iso

ftr

ansc

rip

tom

ico

rp

rote

om

ican

dm

etab

olo

mic

dat

ao

2011

[38]

InC

roM

AP

Sin

gle

dat

ase

tan

din

tegr

ated

cro

ss-p

lat-

form

enri

chm

ent

anal

ysis

an

dp

ath

-w

ay-b

ased

visu

aliz

atio

ns

of

om

ics

dat

a

20

11[2

639

]

3Om

ics

An

alys

isi

nte

grat

ion

and

visu

aliz

atio

no

fd

iffe

ren

tki

nd

so

fh

um

ano

mic

sd

ata

o

2013

[27

40]

aIn

gen

uit

yPa

thw

ayA

nal

ysis

Co

mm

erci

alto

ol

Tic

ksin

dic

ate

full

fun

ctio

nal

ity

emp

tyci

rcle

sin

dic

ate

par

tial

fun

ctio

nal

ity

500 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Figure 1 Flowchart of a typical metabolomic study After sample preparation specific metabolic signals are acquired using heterogeneous analytical platforms (DATA

ACQUISITION) Raw signals are then pre-processed to produce data in a suitable format for univariate and multivariate statistical analyses For untargeted studies me-

tabolites have first to be identified from spectral information (DATA PROCESSING) Significantly expressed metabolites are then linked to the biological context

through enrichment and pathway analysis and mapped into networks Finally metabolomic data are integrated with other lsquoomicsrsquo data and with prior knowledge to

gain a comprehensive view of the molecular processes involved (DATA INTERPRETATION AND INTEGRATION)

Analysis of metabolomic data | 501

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

chosen in accordance to the chemical and physical characteris-tic of each sample and to the kind of analysis to be performed(targeted or untargeted) [12 15 18] MS is the most widely appliedtechnique as it allows reliable metabolite identification par-ticularly when used in tandem with chromatographic separ-ation methods so as to enhance its mass-resolving capabilitiesIt is rapid (the analysis time ranges from 5 to 140 min) andallows performing sensitive and selective qualitative and quan-titative analyses The main drawbacks of the MS technique arethe need to separate or purify the sample before it is directedinto the mass analyzer and the high cost of the instrumentCompared with MS NMR has a lower sensitivity thus resultingin limited ability for metabolite identification and quantifica-tion This implies that potentially important compounds thatare present at smaller concentrations can be hidden by largerpeaks and are thus less likely to be identified The advantagesof NMR are the high analytical reproducibility and that it is anon-destructive method that requires minimal sample prepar-ation [43 44]

Data processing

Once acquired raw signals (chromatograms spectra or NMRdata) are pre-processed by ad hoc software tools to facilitatecompound quantification (eg the commercial softwareSIEVETM by Thermo Scientific [45] or some freely availablesoftware packages such as the cloud-based platform XCMS [46]or the open-source cross-platforms MAVEN [47] and MZmine 2[48]) Generally this preprocessing involves noise reductionretention time correction peak detection and integration andchromatogram alignment Finally for untargeted metabolomicstudies different databases such as the Human MetabolomeDatabase (HMDB) [49] or the Metabolite and Tandem MSDatabase (METLIN) [50] are used to identify the metabolitesfrom spectra During data integrity checking different inputdata are then prepared to produce appropriate data matricesfor further analyses as better detailed in the next sectionBriefly before starting any kind of statistical analysis datanormalization is performed to reduce systematic biases ortechnical variations and to avoid misidentification of signifi-cant changes owing to the different orders of magnitudes ofmetabolomic data Following which significant differences be-tween sample sets can be identified using appropriate statis-tical methods A typical statistical analysis for metabolomicdata consists of two phases initially different univariate andmultivariate methods are used to generate an overview of theconsidered data sets and to identify the metabolites that showsignificant changes under the studied conditions then datamining techniques are used to discriminate groups of func-tionally related metabolites [16] A limit of traditional statis-tical methods is that they highlight relationships amongvariables based only on mathematical criteria (eg maximiza-tion of variance or correlation) and they do not take into ac-count correlations from biological origin [17] For this reasonthe combined use of several statistical and data mining tech-niques is highly recommended [4] Once identified signifi-cantly expressed metabolites are first ranked usingappropriate P-values then a cut-off threshold is applied to se-lect the top-k ones from the ranked list The choice of thisthreshold which is often arbitrary is critical as it may influ-ence the final biological interpretation In fact some moder-ate but significant changes may be missed or criticalcomponents of a particular biological process may be left outthus compromising subsequent analyses [42]

Data interpretation and integration

In this last step the selected metabolites are linked to the biolo-gical context under study through enrichment and pathwayanalyses More precisely enrichment analysis aims to investi-gate the enrichment (ie over- andor under-expression) of pre-defined groups of functionally related metabolites (iemetabolite sets) to identify significant and coordinated expres-sion changes among them This allows taking advantage of thelist of altered metabolites to suggest a biological pathway ordisease condition which can be further investigated [42]Conversely pathway analysis involves the description and visu-alization of the interactions among genes proteins or metabol-ites within cells tissues or organs Its goal is to identify thepathways that significantly impact on a given biological process[37] Enrichment and pathway analyses are performed using adhoc software tools [37] which map significant metabolites toknown biochemical pathways on the basis of the informationcontained in public databases such as the Kyoto Encyclopediaof Genes and Genomes (KEGG) [8] Once the metabolic pathwaysare identified this information has to be integrated with tran-scriptomic and proteomic data to obtain a comprehensive viewof all the biological processes involved [22] To capture all theinteractions that these data describe network-based visualiza-tion tools are commonly used by investigators to better under-stand and show their findings Depending on the kind ofinteraction under study a biological network can be repre-sented through a different type of graph Graphs are mathemat-ical structures of several kinds such as directed or undirectedgraphs directed acyclic graphs trees forests minimum span-ning trees Boolean networks and Steiner trees [51] To makegraph layouts informative and reproducible several visualiza-tion strategies have been proposed and adopted A frequentlyused visualization method is the ball-and-stick diagram wherepathway data are presented as networks with compounds (egmetabolites proteins) as nodes and reactions as edges [23 3752] Nodes can be placed hierarchically (a father node with oneor more child nodes) or radially as in radial networks or hiveplots which more closely represent the complexity of biologicalsystems [53] To reach a more reliable evaluation of the processunder study integration with biological knowledge derivedfrom the literature or from previous experimental data can alsobe performed [12 17] In spite of the availability of this informa-tion effective data integration is still far to be achieved owing tothe heterogeneity of current databases As a consequence usershave to take into account multiple databases to extract andmanually assemble the several different information neededthis makes data integration time-consuming and the final in-terpretation is often prone to errors because of different back-ground knowledge or biases of individual researchers [42]

Software tools for metabolomic data analysisand integration

Powerful software tools are essential to address the vastamount and variety of data generated by metabolomic analysesRequired software capabilities include (i) processing of rawspectral data (ii) statistical analysis to find significantly ex-pressed metabolites (iii) connection to metabolite databases formetabolite identification (iv) integration and analysis of mul-tiple heterogeneous lsquoomicsrsquo data and (v) bioinformatics ana-lysis and visualization of molecular interaction networks [1618] In this section we introduce the four data analysis tools se-lected (ie MetaCoreTM MetaboAnalyst InCroMAP and 3Omics)

502 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

their main functionalities are illustrated and compared on thebasis of some selected features such as data preprocessingtechniques used statistical analyses performed and methodsused for functional interpretation and if available for data inte-gration The main features of each tool are summarized inTable 2 It is important to point out that only MetaboAnalystprovides a comprehensive module for data preprocessing andstatistical analysis [25 41] MetaCoreTM and 3Omics only offerlimited support for these functionalities whereas InCroMAPonly accepts already preprocessed data In spite of these limita-tions we decided to include MetaCoreTM 3Omics and InCroMAPin this review because of their ability to perform comprehensiveanalyses of multi-omics data which are currently limited inother high-level analysis tools

MetaCoreTM

MetaCoreTM [24] is a commercial tool available both as a stand-alone and as a Web-based application It is a software suite forfunctional analysis of different kinds of high-throughput mo-lecular data (eg next-generation sequencing siRNAmicroRNA microarray-based gene expression and serial ana-lysis of gene expression data array-comparative genomic-hybridization DNA arrays proteomic data metabolic profilesand screening data) MetaCoreTM is an integrated system whichconsists of (i) a high-quality manually curated database ofmammalian biology including metabolites and other molecularclasses bioactive molecules and their interactions signalingand metabolic pathways (ii) genomic analysis tools to identifypotentially significant variants (iii) a data mining toolkit fordata visualization analysis and exchange of data (iv) a toolkit(pathway editor) for custom assembly of functional networksand (v) a set of parsers to upload and manipulate different typesof high-throughput molecular data [54ndash56] Unfortunately nopublic information is available about the details of howMetaCoreTM works thus our review of this tool is partially lim-ited and some details remain unclear

MetaboAnalyst

MetaboAnalyst [25 35] is an integrated freely accessible Web-based platform It was first released in 2009 then upgraded in2012 (MetaboAnalyst 20 [57]) and in 2015 (MetaboAnalyst 30also available for download and local installation) It offers a setof online tools for metabolomic data analysis that combine stat-istical analysis of data with their functional and biological inter-pretation and visualization [25] tutorials and protocolpapers are also available online MetaboAnalyst 30 has been re-implemented to improve performance capability and userinteractivity It offers eight functional modules which can begrouped in three categories (i) exploratory statistical analysis(Statistical Analysis and Time-Series Analysis modules) (ii) func-tional analysis (Enrichment Analysis Pathway Analysis andIntegrated Pathway Analysis modules for both genes and metabol-ites) and (iii) advanced methods for translational studies(Biomarker Analysis Sample Size Estimation and Power Analysismodules) In addition it has also an Other Utilities module con-taining a specialized function for lipidomic data analysis and acompound ID-conversion tool [41]

InCroMAP

InCroMAP is a stand-alone Java software originally developed in2011 for enrichment analysis and pathway-based visualizationsof genomic and proteomic data [27 40] The extended version

InCroMAP 15 released in 2013 also supports annotated metab-olomic data thus making this tool suitable for comprehensivesystem biology studies [58] InCroMAP is freely available at itsWeb site [39] which includes a user guide [59] example datafiles to test the tool and a short video tutorial The software per-forms metabolite set enrichment analysis and generates inter-active global maps of the cellular metabolism These mapsallow an integrated pathway-based visualization of data frommultiple lsquoomicsrsquo platforms and provide a useful overview of themetabolic changes present in the studied experimental condi-tion [26 59]

3Omics

3Omics is a platform-independent Web-based tool developedin 2013 for the analysis integration and visualization of tran-scriptomic proteomic and metabolomic human data It is freelyaccessible at [40] and the Web site also includes a help sectionTo demonstrate the software functionalities applied to realdata two case studies are also reported and explained in [27]3Omics supports correlation analysis co-expression profilingphenotype mapping pathway enrichment analysis and geneontology-based enrichment analysis More precisely dependingon the data provided the software offers four types of lsquoomicsrsquoanalyses (i) TranscriptomicsndashProteomicsndashMetabolomics (TndashPndashM) (ii) TranscriptomicsndashProteomics (TndashP) (iii) ProteomicsndashMetabolomics (PndashM) and (iv) TranscriptomicsndashMetabolomics(TndashM) A single-omics mode is also available to reveal intra-omics relationships 3Omics can also supplement missing tran-script protein and metabolite information related to the inputdata by text-mining the biomedical literature through iHOP (in-formation Hyperlinked Over Protein [60 61]) Users can thusperform multi-omics analyses even when only one or two outof three lsquoomicsrsquo data sets are available [27]

Evaluation and comparison of softwarefunctionalities

In this section strong and weak aspects of each of the four se-lected tools are reviewed and comparatively evaluated follow-ing the data analysis workflow previously described (Figure 1)

Input data

Various proprietary data formats have been developed to han-dle and store MS or NMR raw data this heterogeneity makes itdifficult for researchers to manipulate these data For this rea-son several software for the conversion of raw data file types(eg RAW) into a universal format (eg CSV TXT mzXML orMGF) have been implemented such as ProteoWizard [62]MassMatrix [63] and several others including MATLAB [64] andR [65 66] Once raw data have been converted to a suitable for-mat they can be further analyzed Data types and formatsaccepted as input by the four reviewed tools are reported inTable 2 Input data are usually organized as data matrices withsamples as rows and signal features as columns MetaCoreTMMetaboAnalyst and IncroMAP accept data in different kindsof tabular format (eg textual tab-delimited TXT or comma-separated value CSV files) whereas 3Omics only accepts CSVfiles MetaboAnalyst also accepts zipped files of NMR or MSpeak lists or of MS spectra which must be in mzXML mzDATAor NetCDF format As for MetaCoreTM gene lists can also be im-ported from output files of microarray analysis software such asAffymetrix [67] or Agilent [68] Compound names or IDs

Analysis of metabolomic data | 503

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Tab

le2

Mai

nfe

atu

res

of

the

sele

cted

fou

rso

ftw

are

too

lsfo

rm

etab

olo

mic

dat

aan

alys

is

Too

lM

etaC

oreT

MM

etab

oAn

alys

tIn

Cro

MA

P3O

mic

sY

ear

2004

2009

2011

2013

Inst

itu

tio

nG

eneG

oU

niv

ersi

tyo

fA

lber

taM

cGil

lU

niv

ersi

tyM

on

trea

l(C

A)

Cen

ter

for

Bio

info

rma

tics

of

the

Un

iver

sity

of

Tu

bin

gen

(D)

Mo

lecu

lar

Des

ign

ampM

etab

olo

mic

sLa

bora

tory

U

niv

ersi

tyo

fT

aiw

an(T

W)

Imp

lem

enta

tio

nW

eb-b

asedthorn

stan

d-a

lon

eW

eb-b

ased

Stan

d-a

lon

eW

eb-b

ased

Lice

nse

Co

mm

erci

alG

PL(G

NU

Gen

eral

Publ

icLi

cen

se)

LGPL

(GN

ULe

sser

Gen

eral

Publ

icLi

cen

se)

No

ne

Typ

eo

fkn

ow

led

gePr

op

riet

ary

Publ

icPu

blic

Publ

icIn

pu

td

ata

Gen

ep

rote

ino

rm

etab

oli

teli

sts

imp

ort

edas

tab-

de-

lim

ited

text

(TX

T)

com

ma-

sep

arat

edva

lues

(CSV

)or

Exce

lfile

sge

ne

list

sfr

om

mic

roar

ray

anal

ysis

soft

war

e(A

ffym

etri

xA

gile

nt)

Tab

-del

imit

edte

xt(T

XT

)or

com

ma-

sep

arat

edva

lues

(CSV

)fo

rco

nce

ntr

atio

ns

spec

tral

bin

so

rp

eak

inte

n-

sity

dat

azi

pp

edfi

les

(ZIP

)of

NM

Ro

rM

Sp

eak

list

so

ro

fM

Ssp

ectr

a(i

nN

etC

DF

mzX

ML

or

mzD

AT

Afo

rmat

)

Tab

-del

imit

edte

xt(T

XT

)or

com

ma-

sep

arat

edva

lues

(CSV

)of

het

ero

gen

eou

sty

pes

of

pro

cess

edlsquoo

mic

srsquod

ata

Co

mm

a-se

par

ated

valu

es(C

SV)

of

pro

cess

edtr

ansc

rip

tom

ic

pro

teo

mic

or

met

abo

lom

icd

ata

Dat

ap

rep

arat

ion

Dat

ain

tegr

ity

chec

kin

g

Dat

an

orm

aliz

atio

n

Co

mp

ou

nd

nam

eid

enti

fica

tio

n

Stat

isti

cala

nal

ysis

Un

ivar

iate

anal

ysis

Mu

ltiv

aria

tean

alys

is

Clu

ster

ing

Cla

ssifi

cati

on

Dat

ain

terp

reta

tio

nan

din

tegr

atio

nFu

nct

ion

alin

terp

reta

tio

n

Met

abo

lite

set

enri

chm

ent

anal

ysis

Met

abo

lic

pat

hw

ayan

alys

is

Met

abo

lite

map

pin

g

Hyp

erli

nks

toex

tern

ald

atab

ases

Dat

aIn

tegr

atio

n

Ou

tpu

td

ata

Net

wo

rks

can

beex

po

rted

intw

ofo

rmat

sN

etsh

ot

and

Net

wo

rki

mag

esas

PNG

file

s

PDF

rep

ort

sco

nta

inin

gp

lots

gr

aph

san

dta

bles

wit

hal

lth

ere

sult

sIm

ages

are

avai

l-ab

leas

TIF

or

PNG

file

s

Tab

ula

rfo

rmat

(eg

CSV

)fo

ren

rich

men

tan

alys

isre

sult

san

dJP

Gfi

les

for

pat

hw

ay-

base

dvi

sual

izat

ion

PNG

SV

Go

rSI

Ffo

rmat

sfo

rim

ages

Tic

ksin

dic

ate

too

lfu

nct

ion

alit

ies

504 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

together with the values of a numeric attribute (eg fold-changes P-values gene expression changes protein levels me-tabolite concentrations) have to be included and the data typehas to be specified by the user [58 69]

Data preparation

Data preparation includes integrity checking normalizationand compound name identification This set of procedures isfully available only in MetaboAnalyst MetaCoreTM and 3Omiscjust offer a tool for compound name standardization butMetaCoreTM also allows the integration with specific softwarefor data preprocessing InCroMAP and 3Omics only accept dataalready preprocessed and normalized with appropriate software(eg OpenMS MayDay R IBM-SPSS Statistics SAS or JMP) [2758] A brief description of each data preparation step is given inthe following text

Data integrity checkingAmong the four tools reviewed data integrity checking is per-formed only by MetaboAnalyst even if other software providethis functionality (eg XCMS [46] for metabolomic data orProgenesis QI [70] for proteomic data) As for MetaboAnalystthis step includes handling of missing values as well as identifi-cation and removal of outliers [69] Missing values are automat-ically replaced with small ones ie half of the minimumpositive value in the original data but the user can also specifyother missing value estimation methods eg by replacingthem with the meanmedian of the original data or by usingK-nearest neighbors probabilistic principal component ana-lysis Bayesian PCA or singular value decomposition methods

Data normalizationMetaboAnalyst provides different normalization methods (egby sum median or reference sample) that can be selected by theuser data transformation (log or cube root) and scaling (autopareto or range scaling) are also available [69]

Compound name identificationAs there is no universally accepted set of compound labels mol-ecule labels in userrsquos input data have to be converted to identi-fiers in public databases (eg HMDB PubChem CompoundChEBI KEGG or METLIN) MetaboAnalyst and 3Omics provide amodule specifically designed for this purpose [27 41] whereasMetaCoreTM has a built-in synonym dictionary for gene and pro-tein names that enables compound label standardization [54]InCroMAP does not provide an ID-conversion tool as it only ac-cepts data with an appropriate identifier (eg Affymetrix orAgilent for genes KEEG HMDB or PubChem for metabolites)[58] We highlight that the conversion of a compound name intoits correct ID code is not a painless procedure and can be anadditional source of error or ambiguity Furthermore differentresults may be obtained depending on the used platformtoolIn fact the same compound could be associated with differentIDs according to its different molecular structures (eg chirality)We recall an example reported by Cavill et al [22] regarding thecase of lactate and KEGG identifiers there are three KEGG iden-tifiers that relate to lactate according to the chirality of the mol-ecule (ie non-superposabity on its mirror image) and whetherthese chiral metabolites are distinguishable depends on theresolution provided by the analytical platform used Becausethey have the same mass a solution could be using specificallyselected elution columns in MS to ensure they elute at differenttimes and are thus experimentally distinguishable

Statistical analysis

Out of the four tools reviewed MetaboAnalyst is the one thatoffers the most complete set of statistical and machine learningmethods for data analysis 3Omics only performs clustering andco-expression analysis whereas InCroMAP does not provide astatistical module Unfortunately only limited information isavailable about the data analysis performed by MetaCoreTM butit seems incorporated in the pathway analysis module Analysisalgorithms used in MetaboAnalyst and 3Omics have originallybeen implemented in the R open-source project [65 66]

A wide variety of statistical methods and data miningapproaches can be applied on metabolomic data depending onthe kind of experiment performed In the following text we il-lustrate both univariate and multivariate statistics and alsobriefly describe time-course analysis for completeness

Univariate analysisIt is usually the first analysis performed as it provides a prelim-inary overview about the data features that are potentially sig-nificant in discriminating the conditions under study 3Omicsperforms co-expression analysis by means of an algorithm thatcomputes dissimilarity coefficients using the Euclidean dis-tance and displays results as heat maps

For two-group data MetaboAnalyst provides fold changeanalysis t-test and volcano plots (ie a type of scatter plots toquickly identify changes in large data sets of replicate data hav-ing the fold change on the x-axis and the negative log of theP-value on the y-axis) both for unpaired and paired analysesFor multi-group data it provides one-way analysis of variance(ANOVA) with associated post hoc analyses and correlationanalysis As a large number of metabolites are usually presentfor each patient or biological sample and an individual statis-tical test has to be performed for each metabolite a high num-ber of false-positive results can be obtained owing to multipletesting To limit it a multiple testing correction technique (egBonferroni BonferronindashHolm WestfallndashYoung or BenjaminindashHochberg correction) must be used to adjust the obtained sig-nificance P-values to keep the probability of observing at leastone significant result owing to chance below a predeterminedlevel [71] The BenjaminindashHochberg correction [72] also knownas false discovery rate (FDR) is the recommended one as itallows controlling the proportion of false positives among allsignificant results Both FDR- and Bonferroni-corrected P-valuesare provided by MetaboAnalyst for each evaluated metabolite

Multivariate analysisIt involves the simultaneous observation and analysis of morethan two statistical variables It is ideal for the analysis oflsquoomicsrsquo data as they usually consist of several features thatchange as a function of time phenotype or experimental condi-tion Multivariate analysis techniques include multivariateANOVA multivariate regression analysis factor analysis prin-cipal component analysis and partial least square discriminantanalysis all of them are supported by MetaboAnalyst Thesetechniques are useful for exploratory data analysis as theyallow summarizing original variables in fewer variables usingtheir weighted average

ClusteringCluster analysis aims to determine intrinsic groups in a set ofunlabeled data thus it is useful to identify groups of metabol-ites that have similar characteristic or behavior or that belongto the same biological pathway MetaboAnalyst supports two

Analysis of metabolomic data | 505

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering

ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines

Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]

Data interpretation and integration

This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text

Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge

MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]

MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in

Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided

with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact

and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-

way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound

colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas

other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of

this figure is available at BIB online httpbiboxfordjournalsorg

506 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view

InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]

3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]

Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation

MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results

Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)

InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]

3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts

Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-

logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds

with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version

of this figure is available at BIB online httpbiboxfordjournalsorg

Analysis of metabolomic data | 507

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]

Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]

Output data

Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis

MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]

MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results

of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing

Conclusion

The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions

Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford

Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations

Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification

508 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future

Key Points

bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools

bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge

bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge

Funding

This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)

References1 Mehrotra B Mendes P Bioinformatics approaches to inte-

grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15

2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210

3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93

4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95

5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39

6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9

7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)

8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34

9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)

10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)

11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302

12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65

13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1

14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93

15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293

16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108

17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2

18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43

19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9

20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48

21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69

22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090

23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100

24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)

25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60

26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6

27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764

28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)

29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4

30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)

31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7

32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109

33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4

34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382

35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)

Analysis of metabolomic data | 509

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9

37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4

38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18

39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)

403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)

41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7

42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7

43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7

44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244

45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)

46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87

47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411

48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395

49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7

50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51

51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5

52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278

53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44

54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50

55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008

56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014

57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33

58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82

59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012

60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6

61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)

62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)

63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)

64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)

65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010

66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)

67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)

68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)

69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60

70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)

71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7

72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300

73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649

510 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

  • bbw031-TF1
  • bbw031-TF2
Page 2: Analysis of metabolomic data: tools, current strategies ...€¦ · and proteomics), metabolomics constitutes one of the building blocks of systems biology. Because of its focus on

The following two are the general analytical approaches inperforming a metabolomic analysis targeted and untargetedTargeted metabolomics refers to the detection and precise quan-tification (in nM or mgmL) of a small set of known compoundsIt is driven by a specific biochemical question or hypothesis inwhich the set of metabolites related to one or more pathways isalready defined A limitation of the targeted approach is that itrequires the compounds of interest to be known a priori and tobe available in their purified form Currently only few purifiedstandards (ie defined groups of chemically characterized andbiochemically annotated metabolites) have been clearly identi-fied and are available for a calibration process therefore owingto the wide variety of metabolites and their complex dynamicswithin a cell the targeted approach cannot yet be used alone fora comprehensive analysis of the metabolome The untargetedapproach instead also called lsquometabolite fingerprintingrsquo is notdriven by an a priori hypothesis and it is used for completemetabolome comparison (ie as many metabolites as possibleare measured and compared between samples) [4] Metabolitevariations are observed principally as total changes of chroma-tographic patterns without requiring previous knowledge of thecompounds under investigation Therefore untargeted metabo-lomics does not attempt to precisely quantify all measurablemetabolites in a sample but it only gives their relative quantifi-cation (fold change) [5 6] It is important to stress that in spiteof the presence of extensive metabolomics spectra repositories(eg SMPDB [7] KEEG [8] MetaCyc [9] and HumanCyc [10] to citesome) metabolite identification still constitutes a challenge inuntargeted metabolomics This is mainly because of techno-logical limitations such as the dependence on the intrinsic ana-lytical coverage of the platform used and the possible biastoward the detection of the most abundant molecules [11]Moreover the same molecule can be fragmented differently de-pending on the specific instrument or technique used and thishampers the metabolites spectra matching Furthermore in-strument-dependent variability between different kinds ofmass-analyzers and even between the ones of the same kindbut of different brands increases the variability in compoundidentification

Metabolomic research leads to the handling of complex datasets which include hundreds of metabolites their comprehen-sive evaluation requires a specialized data analysis that in-volves cheminformatics bioinformatics and statistics aspectsMoreover to better understand the role of each metabolite inthe studied condition metabolomic data must be interpretedthis requires that every chemical information derived frommetabolomic analyses has to be related to both biochemicalcauses and physiological consequences [1 12] Toward this endthe multilevel integration of metabolomic proteomic and tran-scriptomic information is fundamental for a better understand-ing of the cellular biology Although this information isavailable its fast evaluation and integration is still hamperedby technical and biological issues including (i) the complexityand heterogeneity intrinsic to biological data which require ap-propriate statistical and computational analysis methods (ii)the limited reproducibility of the results of transcriptomicproteomic and metabolomic research and the heterogeneity ofthe available analysis techniques which make data comparisonamong different labs hard (iii) the lack of standard data formatsboth for lsquoomicsrsquo data and for metadata and (iv) the need ofuser-friendly tools for integrative analysis of multiple datatypes [13 14]

Although several literature about metabolomic data produc-tion and analysis techniques exist [3ndash5 15ndash23] to the best of our

knowledge a comprehensive review illustrating the availabletools for the analysis and integration of metabolomic data withother lsquoomicsrsquo data has not been reported yet Here we describeand compare four tools (MetaCoreTM MetaboAnalyst InCroMAPand 3Omics [24ndash27]) which we selected among the several onescurrently available for metabolomic data analysis and integra-tion (Table 1 reports a list of the tools most frequently cited inthe literature) related issues and challenges arising in lsquoomicsrsquodata integration and analysis are also identified and discussed

MetaCoreTM and MetaboAnalyst are the most commonlyused tools by researchers who work with metabolomic data (thenumber of data analysis jobs submitted to MetaboAnalyst wasabout 40 000month in 2014 [41]) Both of them have been avail-able since several years (since 2004 and 2009 respectively)Conversely InCroMAP and 3Omics have been implementedmore recently (in 2011 and 2013 respectively) and they havenot yet overcome the previous ones However given their easeof use the knowledge of these two latter tools may be advanta-geous for researchers interested in evaluating the current possi-bility to integrate lsquoomicsrsquo data across multiple platforms In thisreview we provide an objective and practical assessment ofthese tools which may be helpful as a guide for the choice of arobust pipeline for metabolomic data integration and analysisWe do not give a description of lsquothe best toolsrsquo for metabolomicdata analysis but we elucidate how some of the main currentlyavailable software packages work highlighting their strong andweak aspects

Metabolomic analysis workflow a briefoverview

A detailed description of how a metabolomic analysis is per-formed can be found in [5 6 15] In the following we provide abrief overview to better understand the related issues we sum-marize the main aspects focusing on the integration with otherkinds of lsquoomicsrsquo data

A typical metabolomic study consists of several differentparts (schematized in Figure 1) which can be grouped in fourmain steps [5 12 16 42]

Sample preparation

The first step in the metabolomic workflow is the preparation ofthe biological sample (eg blood plasma and serum urine sal-iva solid tissues and cultured cells) According to the kind ofsample to analyze several approaches are used as described indetails elsewhere [18ndash21] Samples are usually homogenized orpulverized into smaller particles to increase their surface areafor the exposure to the extraction buffer which is chosen in ac-cordance to their chemical characteristics

Data acquisition

Once the sample is ready different techniques can be used toseparate and characterize chemically diverse groups of metab-olites To have different measurement dimensions based bothon chemical and physical properties of metabolites compoundseparation techniques (eg gas chromatography high-perform-ance liquid chromatography ultra-high-performance liquidchromatography and capillary electrophoresis) are combinedwith compound detection techniques such as mass spectrom-etry (MS) or nuclear magnetic resonance (NMR) Each detectionor separation method has different resolution sensitivity andtechnological limitations in identifying metabolites and it is

Analysis of metabolomic data | 499

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Tab

le1

Feat

ure

so

fth

ecu

rren

tly

avai

labl

em

etab

olo

mic

dat

aan

alys

isto

ols

mo

stci

ted

inth

eli

tera

ture

ord

ered

byth

eir

year

of

pu

blic

atio

n

To

oln

ame

To

old

escr

ipti

on

Dat

ap

rep

roce

ssin

gO

mic

sd

ata

inte

grat

edan

alys

is

Path

way

anal

ysis

Tra

nsc

rip

tom

icd

ata

Pro

teo

mic

dat

aM

etab

olo

mic

dat

aY

ear

Ref

eren

ce

IPA

aA

nal

ysis

and

visu

aliz

atio

no

fd

iffe

ren

tki

nd

so

fo

mic

sd

ata

N

A[2

8]

Met

aCo

rea

Fun

ctio

nal

anal

ysis

and

visu

aliz

atio

no

fd

iffe

ren

tki

nd

so

fh

igh

-th

rou

ghp

ut

om

ics

dat

a

o

2004

[24]

PaV

ESy

Dat

am

anag

ing

syst

emfo

red

itin

gan

dvi

sual

izat

ion

of

bio

logi

calp

ath

way

so

oo

2004

[29]

Pro

teo

me

Soft

war

ea

Qu

anti

zati

on

co

mp

ou

nd

iden

tifi

cati

on

and

stat

isti

cala

nd

pat

hw

ayan

alys

is

20

05[3

0]

Vis

AN

TV

isu

aliz

atio

nan

dan

alys

iso

fm

any

typ

eso

fbi

olo

gica

lnet

wo

rks

o

oo

2005

[31]

VA

NT

EDV

isu

aliz

atio

nan

dan

alys

iso

fn

etw

ork

sw

ith

rela

ted

exp

erim

enta

ldat

a

20

06[3

2]

Mas

sTR

IXA

nn

ota

tio

no

fin

pu

tm

ass

pea

ksan

dm

ap-

pin

go

fth

eid

enti

fied

com

po

un

ds

on

toth

esp

ecifi

cm

etab

oli

cp

ath

way

20

08[3

3]

Pro

MeT

raV

isu

aliz

atio

nan

din

tegr

atio

no

fd

ata

sets

of

dif

fere

nt

kin

ds

of

om

ics

dat

ao

nu

ser-

defi

ned

met

abo

lic

pat

hw

aym

aps

o

2009

[34]

Met

abo

An

alys

tC

om

pre

hen

sive

met

abo

lom

icd

ata

ana-

lysi

svi

sual

izat

ion

and

inte

rpre

tati

on

2009

[25

35]

Pain

tom

ics

Inte

grat

edvi

sual

anal

ysis

of

tran

scri

p-

tom

ican

dm

etab

olo

mic

dat

ao

20

10[3

6]

Met

PaPa

thw

ayan

alys

isan

dvi

sual

izat

ion

for

met

abo

lom

icd

ata

o

20

10[3

7]

IMPa

LaJo

int

pat

hw

ayan

alys

iso

ftr

ansc

rip

tom

ico

rp

rote

om

ican

dm

etab

olo

mic

dat

ao

2011

[38]

InC

roM

AP

Sin

gle

dat

ase

tan

din

tegr

ated

cro

ss-p

lat-

form

enri

chm

ent

anal

ysis

an

dp

ath

-w

ay-b

ased

visu

aliz

atio

ns

of

om

ics

dat

a

20

11[2

639

]

3Om

ics

An

alys

isi

nte

grat

ion

and

visu

aliz

atio

no

fd

iffe

ren

tki

nd

so

fh

um

ano

mic

sd

ata

o

2013

[27

40]

aIn

gen

uit

yPa

thw

ayA

nal

ysis

Co

mm

erci

alto

ol

Tic

ksin

dic

ate

full

fun

ctio

nal

ity

emp

tyci

rcle

sin

dic

ate

par

tial

fun

ctio

nal

ity

500 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Figure 1 Flowchart of a typical metabolomic study After sample preparation specific metabolic signals are acquired using heterogeneous analytical platforms (DATA

ACQUISITION) Raw signals are then pre-processed to produce data in a suitable format for univariate and multivariate statistical analyses For untargeted studies me-

tabolites have first to be identified from spectral information (DATA PROCESSING) Significantly expressed metabolites are then linked to the biological context

through enrichment and pathway analysis and mapped into networks Finally metabolomic data are integrated with other lsquoomicsrsquo data and with prior knowledge to

gain a comprehensive view of the molecular processes involved (DATA INTERPRETATION AND INTEGRATION)

Analysis of metabolomic data | 501

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

chosen in accordance to the chemical and physical characteris-tic of each sample and to the kind of analysis to be performed(targeted or untargeted) [12 15 18] MS is the most widely appliedtechnique as it allows reliable metabolite identification par-ticularly when used in tandem with chromatographic separ-ation methods so as to enhance its mass-resolving capabilitiesIt is rapid (the analysis time ranges from 5 to 140 min) andallows performing sensitive and selective qualitative and quan-titative analyses The main drawbacks of the MS technique arethe need to separate or purify the sample before it is directedinto the mass analyzer and the high cost of the instrumentCompared with MS NMR has a lower sensitivity thus resultingin limited ability for metabolite identification and quantifica-tion This implies that potentially important compounds thatare present at smaller concentrations can be hidden by largerpeaks and are thus less likely to be identified The advantagesof NMR are the high analytical reproducibility and that it is anon-destructive method that requires minimal sample prepar-ation [43 44]

Data processing

Once acquired raw signals (chromatograms spectra or NMRdata) are pre-processed by ad hoc software tools to facilitatecompound quantification (eg the commercial softwareSIEVETM by Thermo Scientific [45] or some freely availablesoftware packages such as the cloud-based platform XCMS [46]or the open-source cross-platforms MAVEN [47] and MZmine 2[48]) Generally this preprocessing involves noise reductionretention time correction peak detection and integration andchromatogram alignment Finally for untargeted metabolomicstudies different databases such as the Human MetabolomeDatabase (HMDB) [49] or the Metabolite and Tandem MSDatabase (METLIN) [50] are used to identify the metabolitesfrom spectra During data integrity checking different inputdata are then prepared to produce appropriate data matricesfor further analyses as better detailed in the next sectionBriefly before starting any kind of statistical analysis datanormalization is performed to reduce systematic biases ortechnical variations and to avoid misidentification of signifi-cant changes owing to the different orders of magnitudes ofmetabolomic data Following which significant differences be-tween sample sets can be identified using appropriate statis-tical methods A typical statistical analysis for metabolomicdata consists of two phases initially different univariate andmultivariate methods are used to generate an overview of theconsidered data sets and to identify the metabolites that showsignificant changes under the studied conditions then datamining techniques are used to discriminate groups of func-tionally related metabolites [16] A limit of traditional statis-tical methods is that they highlight relationships amongvariables based only on mathematical criteria (eg maximiza-tion of variance or correlation) and they do not take into ac-count correlations from biological origin [17] For this reasonthe combined use of several statistical and data mining tech-niques is highly recommended [4] Once identified signifi-cantly expressed metabolites are first ranked usingappropriate P-values then a cut-off threshold is applied to se-lect the top-k ones from the ranked list The choice of thisthreshold which is often arbitrary is critical as it may influ-ence the final biological interpretation In fact some moder-ate but significant changes may be missed or criticalcomponents of a particular biological process may be left outthus compromising subsequent analyses [42]

Data interpretation and integration

In this last step the selected metabolites are linked to the biolo-gical context under study through enrichment and pathwayanalyses More precisely enrichment analysis aims to investi-gate the enrichment (ie over- andor under-expression) of pre-defined groups of functionally related metabolites (iemetabolite sets) to identify significant and coordinated expres-sion changes among them This allows taking advantage of thelist of altered metabolites to suggest a biological pathway ordisease condition which can be further investigated [42]Conversely pathway analysis involves the description and visu-alization of the interactions among genes proteins or metabol-ites within cells tissues or organs Its goal is to identify thepathways that significantly impact on a given biological process[37] Enrichment and pathway analyses are performed using adhoc software tools [37] which map significant metabolites toknown biochemical pathways on the basis of the informationcontained in public databases such as the Kyoto Encyclopediaof Genes and Genomes (KEGG) [8] Once the metabolic pathwaysare identified this information has to be integrated with tran-scriptomic and proteomic data to obtain a comprehensive viewof all the biological processes involved [22] To capture all theinteractions that these data describe network-based visualiza-tion tools are commonly used by investigators to better under-stand and show their findings Depending on the kind ofinteraction under study a biological network can be repre-sented through a different type of graph Graphs are mathemat-ical structures of several kinds such as directed or undirectedgraphs directed acyclic graphs trees forests minimum span-ning trees Boolean networks and Steiner trees [51] To makegraph layouts informative and reproducible several visualiza-tion strategies have been proposed and adopted A frequentlyused visualization method is the ball-and-stick diagram wherepathway data are presented as networks with compounds (egmetabolites proteins) as nodes and reactions as edges [23 3752] Nodes can be placed hierarchically (a father node with oneor more child nodes) or radially as in radial networks or hiveplots which more closely represent the complexity of biologicalsystems [53] To reach a more reliable evaluation of the processunder study integration with biological knowledge derivedfrom the literature or from previous experimental data can alsobe performed [12 17] In spite of the availability of this informa-tion effective data integration is still far to be achieved owing tothe heterogeneity of current databases As a consequence usershave to take into account multiple databases to extract andmanually assemble the several different information neededthis makes data integration time-consuming and the final in-terpretation is often prone to errors because of different back-ground knowledge or biases of individual researchers [42]

Software tools for metabolomic data analysisand integration

Powerful software tools are essential to address the vastamount and variety of data generated by metabolomic analysesRequired software capabilities include (i) processing of rawspectral data (ii) statistical analysis to find significantly ex-pressed metabolites (iii) connection to metabolite databases formetabolite identification (iv) integration and analysis of mul-tiple heterogeneous lsquoomicsrsquo data and (v) bioinformatics ana-lysis and visualization of molecular interaction networks [1618] In this section we introduce the four data analysis tools se-lected (ie MetaCoreTM MetaboAnalyst InCroMAP and 3Omics)

502 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

their main functionalities are illustrated and compared on thebasis of some selected features such as data preprocessingtechniques used statistical analyses performed and methodsused for functional interpretation and if available for data inte-gration The main features of each tool are summarized inTable 2 It is important to point out that only MetaboAnalystprovides a comprehensive module for data preprocessing andstatistical analysis [25 41] MetaCoreTM and 3Omics only offerlimited support for these functionalities whereas InCroMAPonly accepts already preprocessed data In spite of these limita-tions we decided to include MetaCoreTM 3Omics and InCroMAPin this review because of their ability to perform comprehensiveanalyses of multi-omics data which are currently limited inother high-level analysis tools

MetaCoreTM

MetaCoreTM [24] is a commercial tool available both as a stand-alone and as a Web-based application It is a software suite forfunctional analysis of different kinds of high-throughput mo-lecular data (eg next-generation sequencing siRNAmicroRNA microarray-based gene expression and serial ana-lysis of gene expression data array-comparative genomic-hybridization DNA arrays proteomic data metabolic profilesand screening data) MetaCoreTM is an integrated system whichconsists of (i) a high-quality manually curated database ofmammalian biology including metabolites and other molecularclasses bioactive molecules and their interactions signalingand metabolic pathways (ii) genomic analysis tools to identifypotentially significant variants (iii) a data mining toolkit fordata visualization analysis and exchange of data (iv) a toolkit(pathway editor) for custom assembly of functional networksand (v) a set of parsers to upload and manipulate different typesof high-throughput molecular data [54ndash56] Unfortunately nopublic information is available about the details of howMetaCoreTM works thus our review of this tool is partially lim-ited and some details remain unclear

MetaboAnalyst

MetaboAnalyst [25 35] is an integrated freely accessible Web-based platform It was first released in 2009 then upgraded in2012 (MetaboAnalyst 20 [57]) and in 2015 (MetaboAnalyst 30also available for download and local installation) It offers a setof online tools for metabolomic data analysis that combine stat-istical analysis of data with their functional and biological inter-pretation and visualization [25] tutorials and protocolpapers are also available online MetaboAnalyst 30 has been re-implemented to improve performance capability and userinteractivity It offers eight functional modules which can begrouped in three categories (i) exploratory statistical analysis(Statistical Analysis and Time-Series Analysis modules) (ii) func-tional analysis (Enrichment Analysis Pathway Analysis andIntegrated Pathway Analysis modules for both genes and metabol-ites) and (iii) advanced methods for translational studies(Biomarker Analysis Sample Size Estimation and Power Analysismodules) In addition it has also an Other Utilities module con-taining a specialized function for lipidomic data analysis and acompound ID-conversion tool [41]

InCroMAP

InCroMAP is a stand-alone Java software originally developed in2011 for enrichment analysis and pathway-based visualizationsof genomic and proteomic data [27 40] The extended version

InCroMAP 15 released in 2013 also supports annotated metab-olomic data thus making this tool suitable for comprehensivesystem biology studies [58] InCroMAP is freely available at itsWeb site [39] which includes a user guide [59] example datafiles to test the tool and a short video tutorial The software per-forms metabolite set enrichment analysis and generates inter-active global maps of the cellular metabolism These mapsallow an integrated pathway-based visualization of data frommultiple lsquoomicsrsquo platforms and provide a useful overview of themetabolic changes present in the studied experimental condi-tion [26 59]

3Omics

3Omics is a platform-independent Web-based tool developedin 2013 for the analysis integration and visualization of tran-scriptomic proteomic and metabolomic human data It is freelyaccessible at [40] and the Web site also includes a help sectionTo demonstrate the software functionalities applied to realdata two case studies are also reported and explained in [27]3Omics supports correlation analysis co-expression profilingphenotype mapping pathway enrichment analysis and geneontology-based enrichment analysis More precisely dependingon the data provided the software offers four types of lsquoomicsrsquoanalyses (i) TranscriptomicsndashProteomicsndashMetabolomics (TndashPndashM) (ii) TranscriptomicsndashProteomics (TndashP) (iii) ProteomicsndashMetabolomics (PndashM) and (iv) TranscriptomicsndashMetabolomics(TndashM) A single-omics mode is also available to reveal intra-omics relationships 3Omics can also supplement missing tran-script protein and metabolite information related to the inputdata by text-mining the biomedical literature through iHOP (in-formation Hyperlinked Over Protein [60 61]) Users can thusperform multi-omics analyses even when only one or two outof three lsquoomicsrsquo data sets are available [27]

Evaluation and comparison of softwarefunctionalities

In this section strong and weak aspects of each of the four se-lected tools are reviewed and comparatively evaluated follow-ing the data analysis workflow previously described (Figure 1)

Input data

Various proprietary data formats have been developed to han-dle and store MS or NMR raw data this heterogeneity makes itdifficult for researchers to manipulate these data For this rea-son several software for the conversion of raw data file types(eg RAW) into a universal format (eg CSV TXT mzXML orMGF) have been implemented such as ProteoWizard [62]MassMatrix [63] and several others including MATLAB [64] andR [65 66] Once raw data have been converted to a suitable for-mat they can be further analyzed Data types and formatsaccepted as input by the four reviewed tools are reported inTable 2 Input data are usually organized as data matrices withsamples as rows and signal features as columns MetaCoreTMMetaboAnalyst and IncroMAP accept data in different kindsof tabular format (eg textual tab-delimited TXT or comma-separated value CSV files) whereas 3Omics only accepts CSVfiles MetaboAnalyst also accepts zipped files of NMR or MSpeak lists or of MS spectra which must be in mzXML mzDATAor NetCDF format As for MetaCoreTM gene lists can also be im-ported from output files of microarray analysis software such asAffymetrix [67] or Agilent [68] Compound names or IDs

Analysis of metabolomic data | 503

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Tab

le2

Mai

nfe

atu

res

of

the

sele

cted

fou

rso

ftw

are

too

lsfo

rm

etab

olo

mic

dat

aan

alys

is

Too

lM

etaC

oreT

MM

etab

oAn

alys

tIn

Cro

MA

P3O

mic

sY

ear

2004

2009

2011

2013

Inst

itu

tio

nG

eneG

oU

niv

ersi

tyo

fA

lber

taM

cGil

lU

niv

ersi

tyM

on

trea

l(C

A)

Cen

ter

for

Bio

info

rma

tics

of

the

Un

iver

sity

of

Tu

bin

gen

(D)

Mo

lecu

lar

Des

ign

ampM

etab

olo

mic

sLa

bora

tory

U

niv

ersi

tyo

fT

aiw

an(T

W)

Imp

lem

enta

tio

nW

eb-b

asedthorn

stan

d-a

lon

eW

eb-b

ased

Stan

d-a

lon

eW

eb-b

ased

Lice

nse

Co

mm

erci

alG

PL(G

NU

Gen

eral

Publ

icLi

cen

se)

LGPL

(GN

ULe

sser

Gen

eral

Publ

icLi

cen

se)

No

ne

Typ

eo

fkn

ow

led

gePr

op

riet

ary

Publ

icPu

blic

Publ

icIn

pu

td

ata

Gen

ep

rote

ino

rm

etab

oli

teli

sts

imp

ort

edas

tab-

de-

lim

ited

text

(TX

T)

com

ma-

sep

arat

edva

lues

(CSV

)or

Exce

lfile

sge

ne

list

sfr

om

mic

roar

ray

anal

ysis

soft

war

e(A

ffym

etri

xA

gile

nt)

Tab

-del

imit

edte

xt(T

XT

)or

com

ma-

sep

arat

edva

lues

(CSV

)fo

rco

nce

ntr

atio

ns

spec

tral

bin

so

rp

eak

inte

n-

sity

dat

azi

pp

edfi

les

(ZIP

)of

NM

Ro

rM

Sp

eak

list

so

ro

fM

Ssp

ectr

a(i

nN

etC

DF

mzX

ML

or

mzD

AT

Afo

rmat

)

Tab

-del

imit

edte

xt(T

XT

)or

com

ma-

sep

arat

edva

lues

(CSV

)of

het

ero

gen

eou

sty

pes

of

pro

cess

edlsquoo

mic

srsquod

ata

Co

mm

a-se

par

ated

valu

es(C

SV)

of

pro

cess

edtr

ansc

rip

tom

ic

pro

teo

mic

or

met

abo

lom

icd

ata

Dat

ap

rep

arat

ion

Dat

ain

tegr

ity

chec

kin

g

Dat

an

orm

aliz

atio

n

Co

mp

ou

nd

nam

eid

enti

fica

tio

n

Stat

isti

cala

nal

ysis

Un

ivar

iate

anal

ysis

Mu

ltiv

aria

tean

alys

is

Clu

ster

ing

Cla

ssifi

cati

on

Dat

ain

terp

reta

tio

nan

din

tegr

atio

nFu

nct

ion

alin

terp

reta

tio

n

Met

abo

lite

set

enri

chm

ent

anal

ysis

Met

abo

lic

pat

hw

ayan

alys

is

Met

abo

lite

map

pin

g

Hyp

erli

nks

toex

tern

ald

atab

ases

Dat

aIn

tegr

atio

n

Ou

tpu

td

ata

Net

wo

rks

can

beex

po

rted

intw

ofo

rmat

sN

etsh

ot

and

Net

wo

rki

mag

esas

PNG

file

s

PDF

rep

ort

sco

nta

inin

gp

lots

gr

aph

san

dta

bles

wit

hal

lth

ere

sult

sIm

ages

are

avai

l-ab

leas

TIF

or

PNG

file

s

Tab

ula

rfo

rmat

(eg

CSV

)fo

ren

rich

men

tan

alys

isre

sult

san

dJP

Gfi

les

for

pat

hw

ay-

base

dvi

sual

izat

ion

PNG

SV

Go

rSI

Ffo

rmat

sfo

rim

ages

Tic

ksin

dic

ate

too

lfu

nct

ion

alit

ies

504 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

together with the values of a numeric attribute (eg fold-changes P-values gene expression changes protein levels me-tabolite concentrations) have to be included and the data typehas to be specified by the user [58 69]

Data preparation

Data preparation includes integrity checking normalizationand compound name identification This set of procedures isfully available only in MetaboAnalyst MetaCoreTM and 3Omiscjust offer a tool for compound name standardization butMetaCoreTM also allows the integration with specific softwarefor data preprocessing InCroMAP and 3Omics only accept dataalready preprocessed and normalized with appropriate software(eg OpenMS MayDay R IBM-SPSS Statistics SAS or JMP) [2758] A brief description of each data preparation step is given inthe following text

Data integrity checkingAmong the four tools reviewed data integrity checking is per-formed only by MetaboAnalyst even if other software providethis functionality (eg XCMS [46] for metabolomic data orProgenesis QI [70] for proteomic data) As for MetaboAnalystthis step includes handling of missing values as well as identifi-cation and removal of outliers [69] Missing values are automat-ically replaced with small ones ie half of the minimumpositive value in the original data but the user can also specifyother missing value estimation methods eg by replacingthem with the meanmedian of the original data or by usingK-nearest neighbors probabilistic principal component ana-lysis Bayesian PCA or singular value decomposition methods

Data normalizationMetaboAnalyst provides different normalization methods (egby sum median or reference sample) that can be selected by theuser data transformation (log or cube root) and scaling (autopareto or range scaling) are also available [69]

Compound name identificationAs there is no universally accepted set of compound labels mol-ecule labels in userrsquos input data have to be converted to identi-fiers in public databases (eg HMDB PubChem CompoundChEBI KEGG or METLIN) MetaboAnalyst and 3Omics provide amodule specifically designed for this purpose [27 41] whereasMetaCoreTM has a built-in synonym dictionary for gene and pro-tein names that enables compound label standardization [54]InCroMAP does not provide an ID-conversion tool as it only ac-cepts data with an appropriate identifier (eg Affymetrix orAgilent for genes KEEG HMDB or PubChem for metabolites)[58] We highlight that the conversion of a compound name intoits correct ID code is not a painless procedure and can be anadditional source of error or ambiguity Furthermore differentresults may be obtained depending on the used platformtoolIn fact the same compound could be associated with differentIDs according to its different molecular structures (eg chirality)We recall an example reported by Cavill et al [22] regarding thecase of lactate and KEGG identifiers there are three KEGG iden-tifiers that relate to lactate according to the chirality of the mol-ecule (ie non-superposabity on its mirror image) and whetherthese chiral metabolites are distinguishable depends on theresolution provided by the analytical platform used Becausethey have the same mass a solution could be using specificallyselected elution columns in MS to ensure they elute at differenttimes and are thus experimentally distinguishable

Statistical analysis

Out of the four tools reviewed MetaboAnalyst is the one thatoffers the most complete set of statistical and machine learningmethods for data analysis 3Omics only performs clustering andco-expression analysis whereas InCroMAP does not provide astatistical module Unfortunately only limited information isavailable about the data analysis performed by MetaCoreTM butit seems incorporated in the pathway analysis module Analysisalgorithms used in MetaboAnalyst and 3Omics have originallybeen implemented in the R open-source project [65 66]

A wide variety of statistical methods and data miningapproaches can be applied on metabolomic data depending onthe kind of experiment performed In the following text we il-lustrate both univariate and multivariate statistics and alsobriefly describe time-course analysis for completeness

Univariate analysisIt is usually the first analysis performed as it provides a prelim-inary overview about the data features that are potentially sig-nificant in discriminating the conditions under study 3Omicsperforms co-expression analysis by means of an algorithm thatcomputes dissimilarity coefficients using the Euclidean dis-tance and displays results as heat maps

For two-group data MetaboAnalyst provides fold changeanalysis t-test and volcano plots (ie a type of scatter plots toquickly identify changes in large data sets of replicate data hav-ing the fold change on the x-axis and the negative log of theP-value on the y-axis) both for unpaired and paired analysesFor multi-group data it provides one-way analysis of variance(ANOVA) with associated post hoc analyses and correlationanalysis As a large number of metabolites are usually presentfor each patient or biological sample and an individual statis-tical test has to be performed for each metabolite a high num-ber of false-positive results can be obtained owing to multipletesting To limit it a multiple testing correction technique (egBonferroni BonferronindashHolm WestfallndashYoung or BenjaminindashHochberg correction) must be used to adjust the obtained sig-nificance P-values to keep the probability of observing at leastone significant result owing to chance below a predeterminedlevel [71] The BenjaminindashHochberg correction [72] also knownas false discovery rate (FDR) is the recommended one as itallows controlling the proportion of false positives among allsignificant results Both FDR- and Bonferroni-corrected P-valuesare provided by MetaboAnalyst for each evaluated metabolite

Multivariate analysisIt involves the simultaneous observation and analysis of morethan two statistical variables It is ideal for the analysis oflsquoomicsrsquo data as they usually consist of several features thatchange as a function of time phenotype or experimental condi-tion Multivariate analysis techniques include multivariateANOVA multivariate regression analysis factor analysis prin-cipal component analysis and partial least square discriminantanalysis all of them are supported by MetaboAnalyst Thesetechniques are useful for exploratory data analysis as theyallow summarizing original variables in fewer variables usingtheir weighted average

ClusteringCluster analysis aims to determine intrinsic groups in a set ofunlabeled data thus it is useful to identify groups of metabol-ites that have similar characteristic or behavior or that belongto the same biological pathway MetaboAnalyst supports two

Analysis of metabolomic data | 505

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering

ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines

Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]

Data interpretation and integration

This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text

Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge

MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]

MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in

Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided

with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact

and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-

way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound

colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas

other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of

this figure is available at BIB online httpbiboxfordjournalsorg

506 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view

InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]

3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]

Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation

MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results

Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)

InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]

3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts

Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-

logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds

with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version

of this figure is available at BIB online httpbiboxfordjournalsorg

Analysis of metabolomic data | 507

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]

Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]

Output data

Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis

MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]

MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results

of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing

Conclusion

The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions

Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford

Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations

Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification

508 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future

Key Points

bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools

bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge

bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge

Funding

This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)

References1 Mehrotra B Mendes P Bioinformatics approaches to inte-

grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15

2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210

3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93

4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95

5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39

6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9

7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)

8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34

9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)

10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)

11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302

12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65

13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1

14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93

15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293

16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108

17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2

18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43

19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9

20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48

21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69

22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090

23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100

24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)

25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60

26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6

27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764

28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)

29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4

30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)

31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7

32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109

33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4

34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382

35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)

Analysis of metabolomic data | 509

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9

37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4

38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18

39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)

403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)

41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7

42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7

43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7

44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244

45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)

46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87

47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411

48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395

49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7

50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51

51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5

52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278

53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44

54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50

55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008

56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014

57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33

58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82

59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012

60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6

61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)

62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)

63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)

64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)

65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010

66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)

67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)

68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)

69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60

70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)

71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7

72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300

73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649

510 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

  • bbw031-TF1
  • bbw031-TF2
Page 3: Analysis of metabolomic data: tools, current strategies ...€¦ · and proteomics), metabolomics constitutes one of the building blocks of systems biology. Because of its focus on

Tab

le1

Feat

ure

so

fth

ecu

rren

tly

avai

labl

em

etab

olo

mic

dat

aan

alys

isto

ols

mo

stci

ted

inth

eli

tera

ture

ord

ered

byth

eir

year

of

pu

blic

atio

n

To

oln

ame

To

old

escr

ipti

on

Dat

ap

rep

roce

ssin

gO

mic

sd

ata

inte

grat

edan

alys

is

Path

way

anal

ysis

Tra

nsc

rip

tom

icd

ata

Pro

teo

mic

dat

aM

etab

olo

mic

dat

aY

ear

Ref

eren

ce

IPA

aA

nal

ysis

and

visu

aliz

atio

no

fd

iffe

ren

tki

nd

so

fo

mic

sd

ata

N

A[2

8]

Met

aCo

rea

Fun

ctio

nal

anal

ysis

and

visu

aliz

atio

no

fd

iffe

ren

tki

nd

so

fh

igh

-th

rou

ghp

ut

om

ics

dat

a

o

2004

[24]

PaV

ESy

Dat

am

anag

ing

syst

emfo

red

itin

gan

dvi

sual

izat

ion

of

bio

logi

calp

ath

way

so

oo

2004

[29]

Pro

teo

me

Soft

war

ea

Qu

anti

zati

on

co

mp

ou

nd

iden

tifi

cati

on

and

stat

isti

cala

nd

pat

hw

ayan

alys

is

20

05[3

0]

Vis

AN

TV

isu

aliz

atio

nan

dan

alys

iso

fm

any

typ

eso

fbi

olo

gica

lnet

wo

rks

o

oo

2005

[31]

VA

NT

EDV

isu

aliz

atio

nan

dan

alys

iso

fn

etw

ork

sw

ith

rela

ted

exp

erim

enta

ldat

a

20

06[3

2]

Mas

sTR

IXA

nn

ota

tio

no

fin

pu

tm

ass

pea

ksan

dm

ap-

pin

go

fth

eid

enti

fied

com

po

un

ds

on

toth

esp

ecifi

cm

etab

oli

cp

ath

way

20

08[3

3]

Pro

MeT

raV

isu

aliz

atio

nan

din

tegr

atio

no

fd

ata

sets

of

dif

fere

nt

kin

ds

of

om

ics

dat

ao

nu

ser-

defi

ned

met

abo

lic

pat

hw

aym

aps

o

2009

[34]

Met

abo

An

alys

tC

om

pre

hen

sive

met

abo

lom

icd

ata

ana-

lysi

svi

sual

izat

ion

and

inte

rpre

tati

on

2009

[25

35]

Pain

tom

ics

Inte

grat

edvi

sual

anal

ysis

of

tran

scri

p-

tom

ican

dm

etab

olo

mic

dat

ao

20

10[3

6]

Met

PaPa

thw

ayan

alys

isan

dvi

sual

izat

ion

for

met

abo

lom

icd

ata

o

20

10[3

7]

IMPa

LaJo

int

pat

hw

ayan

alys

iso

ftr

ansc

rip

tom

ico

rp

rote

om

ican

dm

etab

olo

mic

dat

ao

2011

[38]

InC

roM

AP

Sin

gle

dat

ase

tan

din

tegr

ated

cro

ss-p

lat-

form

enri

chm

ent

anal

ysis

an

dp

ath

-w

ay-b

ased

visu

aliz

atio

ns

of

om

ics

dat

a

20

11[2

639

]

3Om

ics

An

alys

isi

nte

grat

ion

and

visu

aliz

atio

no

fd

iffe

ren

tki

nd

so

fh

um

ano

mic

sd

ata

o

2013

[27

40]

aIn

gen

uit

yPa

thw

ayA

nal

ysis

Co

mm

erci

alto

ol

Tic

ksin

dic

ate

full

fun

ctio

nal

ity

emp

tyci

rcle

sin

dic

ate

par

tial

fun

ctio

nal

ity

500 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Figure 1 Flowchart of a typical metabolomic study After sample preparation specific metabolic signals are acquired using heterogeneous analytical platforms (DATA

ACQUISITION) Raw signals are then pre-processed to produce data in a suitable format for univariate and multivariate statistical analyses For untargeted studies me-

tabolites have first to be identified from spectral information (DATA PROCESSING) Significantly expressed metabolites are then linked to the biological context

through enrichment and pathway analysis and mapped into networks Finally metabolomic data are integrated with other lsquoomicsrsquo data and with prior knowledge to

gain a comprehensive view of the molecular processes involved (DATA INTERPRETATION AND INTEGRATION)

Analysis of metabolomic data | 501

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

chosen in accordance to the chemical and physical characteris-tic of each sample and to the kind of analysis to be performed(targeted or untargeted) [12 15 18] MS is the most widely appliedtechnique as it allows reliable metabolite identification par-ticularly when used in tandem with chromatographic separ-ation methods so as to enhance its mass-resolving capabilitiesIt is rapid (the analysis time ranges from 5 to 140 min) andallows performing sensitive and selective qualitative and quan-titative analyses The main drawbacks of the MS technique arethe need to separate or purify the sample before it is directedinto the mass analyzer and the high cost of the instrumentCompared with MS NMR has a lower sensitivity thus resultingin limited ability for metabolite identification and quantifica-tion This implies that potentially important compounds thatare present at smaller concentrations can be hidden by largerpeaks and are thus less likely to be identified The advantagesof NMR are the high analytical reproducibility and that it is anon-destructive method that requires minimal sample prepar-ation [43 44]

Data processing

Once acquired raw signals (chromatograms spectra or NMRdata) are pre-processed by ad hoc software tools to facilitatecompound quantification (eg the commercial softwareSIEVETM by Thermo Scientific [45] or some freely availablesoftware packages such as the cloud-based platform XCMS [46]or the open-source cross-platforms MAVEN [47] and MZmine 2[48]) Generally this preprocessing involves noise reductionretention time correction peak detection and integration andchromatogram alignment Finally for untargeted metabolomicstudies different databases such as the Human MetabolomeDatabase (HMDB) [49] or the Metabolite and Tandem MSDatabase (METLIN) [50] are used to identify the metabolitesfrom spectra During data integrity checking different inputdata are then prepared to produce appropriate data matricesfor further analyses as better detailed in the next sectionBriefly before starting any kind of statistical analysis datanormalization is performed to reduce systematic biases ortechnical variations and to avoid misidentification of signifi-cant changes owing to the different orders of magnitudes ofmetabolomic data Following which significant differences be-tween sample sets can be identified using appropriate statis-tical methods A typical statistical analysis for metabolomicdata consists of two phases initially different univariate andmultivariate methods are used to generate an overview of theconsidered data sets and to identify the metabolites that showsignificant changes under the studied conditions then datamining techniques are used to discriminate groups of func-tionally related metabolites [16] A limit of traditional statis-tical methods is that they highlight relationships amongvariables based only on mathematical criteria (eg maximiza-tion of variance or correlation) and they do not take into ac-count correlations from biological origin [17] For this reasonthe combined use of several statistical and data mining tech-niques is highly recommended [4] Once identified signifi-cantly expressed metabolites are first ranked usingappropriate P-values then a cut-off threshold is applied to se-lect the top-k ones from the ranked list The choice of thisthreshold which is often arbitrary is critical as it may influ-ence the final biological interpretation In fact some moder-ate but significant changes may be missed or criticalcomponents of a particular biological process may be left outthus compromising subsequent analyses [42]

Data interpretation and integration

In this last step the selected metabolites are linked to the biolo-gical context under study through enrichment and pathwayanalyses More precisely enrichment analysis aims to investi-gate the enrichment (ie over- andor under-expression) of pre-defined groups of functionally related metabolites (iemetabolite sets) to identify significant and coordinated expres-sion changes among them This allows taking advantage of thelist of altered metabolites to suggest a biological pathway ordisease condition which can be further investigated [42]Conversely pathway analysis involves the description and visu-alization of the interactions among genes proteins or metabol-ites within cells tissues or organs Its goal is to identify thepathways that significantly impact on a given biological process[37] Enrichment and pathway analyses are performed using adhoc software tools [37] which map significant metabolites toknown biochemical pathways on the basis of the informationcontained in public databases such as the Kyoto Encyclopediaof Genes and Genomes (KEGG) [8] Once the metabolic pathwaysare identified this information has to be integrated with tran-scriptomic and proteomic data to obtain a comprehensive viewof all the biological processes involved [22] To capture all theinteractions that these data describe network-based visualiza-tion tools are commonly used by investigators to better under-stand and show their findings Depending on the kind ofinteraction under study a biological network can be repre-sented through a different type of graph Graphs are mathemat-ical structures of several kinds such as directed or undirectedgraphs directed acyclic graphs trees forests minimum span-ning trees Boolean networks and Steiner trees [51] To makegraph layouts informative and reproducible several visualiza-tion strategies have been proposed and adopted A frequentlyused visualization method is the ball-and-stick diagram wherepathway data are presented as networks with compounds (egmetabolites proteins) as nodes and reactions as edges [23 3752] Nodes can be placed hierarchically (a father node with oneor more child nodes) or radially as in radial networks or hiveplots which more closely represent the complexity of biologicalsystems [53] To reach a more reliable evaluation of the processunder study integration with biological knowledge derivedfrom the literature or from previous experimental data can alsobe performed [12 17] In spite of the availability of this informa-tion effective data integration is still far to be achieved owing tothe heterogeneity of current databases As a consequence usershave to take into account multiple databases to extract andmanually assemble the several different information neededthis makes data integration time-consuming and the final in-terpretation is often prone to errors because of different back-ground knowledge or biases of individual researchers [42]

Software tools for metabolomic data analysisand integration

Powerful software tools are essential to address the vastamount and variety of data generated by metabolomic analysesRequired software capabilities include (i) processing of rawspectral data (ii) statistical analysis to find significantly ex-pressed metabolites (iii) connection to metabolite databases formetabolite identification (iv) integration and analysis of mul-tiple heterogeneous lsquoomicsrsquo data and (v) bioinformatics ana-lysis and visualization of molecular interaction networks [1618] In this section we introduce the four data analysis tools se-lected (ie MetaCoreTM MetaboAnalyst InCroMAP and 3Omics)

502 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

their main functionalities are illustrated and compared on thebasis of some selected features such as data preprocessingtechniques used statistical analyses performed and methodsused for functional interpretation and if available for data inte-gration The main features of each tool are summarized inTable 2 It is important to point out that only MetaboAnalystprovides a comprehensive module for data preprocessing andstatistical analysis [25 41] MetaCoreTM and 3Omics only offerlimited support for these functionalities whereas InCroMAPonly accepts already preprocessed data In spite of these limita-tions we decided to include MetaCoreTM 3Omics and InCroMAPin this review because of their ability to perform comprehensiveanalyses of multi-omics data which are currently limited inother high-level analysis tools

MetaCoreTM

MetaCoreTM [24] is a commercial tool available both as a stand-alone and as a Web-based application It is a software suite forfunctional analysis of different kinds of high-throughput mo-lecular data (eg next-generation sequencing siRNAmicroRNA microarray-based gene expression and serial ana-lysis of gene expression data array-comparative genomic-hybridization DNA arrays proteomic data metabolic profilesand screening data) MetaCoreTM is an integrated system whichconsists of (i) a high-quality manually curated database ofmammalian biology including metabolites and other molecularclasses bioactive molecules and their interactions signalingand metabolic pathways (ii) genomic analysis tools to identifypotentially significant variants (iii) a data mining toolkit fordata visualization analysis and exchange of data (iv) a toolkit(pathway editor) for custom assembly of functional networksand (v) a set of parsers to upload and manipulate different typesof high-throughput molecular data [54ndash56] Unfortunately nopublic information is available about the details of howMetaCoreTM works thus our review of this tool is partially lim-ited and some details remain unclear

MetaboAnalyst

MetaboAnalyst [25 35] is an integrated freely accessible Web-based platform It was first released in 2009 then upgraded in2012 (MetaboAnalyst 20 [57]) and in 2015 (MetaboAnalyst 30also available for download and local installation) It offers a setof online tools for metabolomic data analysis that combine stat-istical analysis of data with their functional and biological inter-pretation and visualization [25] tutorials and protocolpapers are also available online MetaboAnalyst 30 has been re-implemented to improve performance capability and userinteractivity It offers eight functional modules which can begrouped in three categories (i) exploratory statistical analysis(Statistical Analysis and Time-Series Analysis modules) (ii) func-tional analysis (Enrichment Analysis Pathway Analysis andIntegrated Pathway Analysis modules for both genes and metabol-ites) and (iii) advanced methods for translational studies(Biomarker Analysis Sample Size Estimation and Power Analysismodules) In addition it has also an Other Utilities module con-taining a specialized function for lipidomic data analysis and acompound ID-conversion tool [41]

InCroMAP

InCroMAP is a stand-alone Java software originally developed in2011 for enrichment analysis and pathway-based visualizationsof genomic and proteomic data [27 40] The extended version

InCroMAP 15 released in 2013 also supports annotated metab-olomic data thus making this tool suitable for comprehensivesystem biology studies [58] InCroMAP is freely available at itsWeb site [39] which includes a user guide [59] example datafiles to test the tool and a short video tutorial The software per-forms metabolite set enrichment analysis and generates inter-active global maps of the cellular metabolism These mapsallow an integrated pathway-based visualization of data frommultiple lsquoomicsrsquo platforms and provide a useful overview of themetabolic changes present in the studied experimental condi-tion [26 59]

3Omics

3Omics is a platform-independent Web-based tool developedin 2013 for the analysis integration and visualization of tran-scriptomic proteomic and metabolomic human data It is freelyaccessible at [40] and the Web site also includes a help sectionTo demonstrate the software functionalities applied to realdata two case studies are also reported and explained in [27]3Omics supports correlation analysis co-expression profilingphenotype mapping pathway enrichment analysis and geneontology-based enrichment analysis More precisely dependingon the data provided the software offers four types of lsquoomicsrsquoanalyses (i) TranscriptomicsndashProteomicsndashMetabolomics (TndashPndashM) (ii) TranscriptomicsndashProteomics (TndashP) (iii) ProteomicsndashMetabolomics (PndashM) and (iv) TranscriptomicsndashMetabolomics(TndashM) A single-omics mode is also available to reveal intra-omics relationships 3Omics can also supplement missing tran-script protein and metabolite information related to the inputdata by text-mining the biomedical literature through iHOP (in-formation Hyperlinked Over Protein [60 61]) Users can thusperform multi-omics analyses even when only one or two outof three lsquoomicsrsquo data sets are available [27]

Evaluation and comparison of softwarefunctionalities

In this section strong and weak aspects of each of the four se-lected tools are reviewed and comparatively evaluated follow-ing the data analysis workflow previously described (Figure 1)

Input data

Various proprietary data formats have been developed to han-dle and store MS or NMR raw data this heterogeneity makes itdifficult for researchers to manipulate these data For this rea-son several software for the conversion of raw data file types(eg RAW) into a universal format (eg CSV TXT mzXML orMGF) have been implemented such as ProteoWizard [62]MassMatrix [63] and several others including MATLAB [64] andR [65 66] Once raw data have been converted to a suitable for-mat they can be further analyzed Data types and formatsaccepted as input by the four reviewed tools are reported inTable 2 Input data are usually organized as data matrices withsamples as rows and signal features as columns MetaCoreTMMetaboAnalyst and IncroMAP accept data in different kindsof tabular format (eg textual tab-delimited TXT or comma-separated value CSV files) whereas 3Omics only accepts CSVfiles MetaboAnalyst also accepts zipped files of NMR or MSpeak lists or of MS spectra which must be in mzXML mzDATAor NetCDF format As for MetaCoreTM gene lists can also be im-ported from output files of microarray analysis software such asAffymetrix [67] or Agilent [68] Compound names or IDs

Analysis of metabolomic data | 503

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Tab

le2

Mai

nfe

atu

res

of

the

sele

cted

fou

rso

ftw

are

too

lsfo

rm

etab

olo

mic

dat

aan

alys

is

Too

lM

etaC

oreT

MM

etab

oAn

alys

tIn

Cro

MA

P3O

mic

sY

ear

2004

2009

2011

2013

Inst

itu

tio

nG

eneG

oU

niv

ersi

tyo

fA

lber

taM

cGil

lU

niv

ersi

tyM

on

trea

l(C

A)

Cen

ter

for

Bio

info

rma

tics

of

the

Un

iver

sity

of

Tu

bin

gen

(D)

Mo

lecu

lar

Des

ign

ampM

etab

olo

mic

sLa

bora

tory

U

niv

ersi

tyo

fT

aiw

an(T

W)

Imp

lem

enta

tio

nW

eb-b

asedthorn

stan

d-a

lon

eW

eb-b

ased

Stan

d-a

lon

eW

eb-b

ased

Lice

nse

Co

mm

erci

alG

PL(G

NU

Gen

eral

Publ

icLi

cen

se)

LGPL

(GN

ULe

sser

Gen

eral

Publ

icLi

cen

se)

No

ne

Typ

eo

fkn

ow

led

gePr

op

riet

ary

Publ

icPu

blic

Publ

icIn

pu

td

ata

Gen

ep

rote

ino

rm

etab

oli

teli

sts

imp

ort

edas

tab-

de-

lim

ited

text

(TX

T)

com

ma-

sep

arat

edva

lues

(CSV

)or

Exce

lfile

sge

ne

list

sfr

om

mic

roar

ray

anal

ysis

soft

war

e(A

ffym

etri

xA

gile

nt)

Tab

-del

imit

edte

xt(T

XT

)or

com

ma-

sep

arat

edva

lues

(CSV

)fo

rco

nce

ntr

atio

ns

spec

tral

bin

so

rp

eak

inte

n-

sity

dat

azi

pp

edfi

les

(ZIP

)of

NM

Ro

rM

Sp

eak

list

so

ro

fM

Ssp

ectr

a(i

nN

etC

DF

mzX

ML

or

mzD

AT

Afo

rmat

)

Tab

-del

imit

edte

xt(T

XT

)or

com

ma-

sep

arat

edva

lues

(CSV

)of

het

ero

gen

eou

sty

pes

of

pro

cess

edlsquoo

mic

srsquod

ata

Co

mm

a-se

par

ated

valu

es(C

SV)

of

pro

cess

edtr

ansc

rip

tom

ic

pro

teo

mic

or

met

abo

lom

icd

ata

Dat

ap

rep

arat

ion

Dat

ain

tegr

ity

chec

kin

g

Dat

an

orm

aliz

atio

n

Co

mp

ou

nd

nam

eid

enti

fica

tio

n

Stat

isti

cala

nal

ysis

Un

ivar

iate

anal

ysis

Mu

ltiv

aria

tean

alys

is

Clu

ster

ing

Cla

ssifi

cati

on

Dat

ain

terp

reta

tio

nan

din

tegr

atio

nFu

nct

ion

alin

terp

reta

tio

n

Met

abo

lite

set

enri

chm

ent

anal

ysis

Met

abo

lic

pat

hw

ayan

alys

is

Met

abo

lite

map

pin

g

Hyp

erli

nks

toex

tern

ald

atab

ases

Dat

aIn

tegr

atio

n

Ou

tpu

td

ata

Net

wo

rks

can

beex

po

rted

intw

ofo

rmat

sN

etsh

ot

and

Net

wo

rki

mag

esas

PNG

file

s

PDF

rep

ort

sco

nta

inin

gp

lots

gr

aph

san

dta

bles

wit

hal

lth

ere

sult

sIm

ages

are

avai

l-ab

leas

TIF

or

PNG

file

s

Tab

ula

rfo

rmat

(eg

CSV

)fo

ren

rich

men

tan

alys

isre

sult

san

dJP

Gfi

les

for

pat

hw

ay-

base

dvi

sual

izat

ion

PNG

SV

Go

rSI

Ffo

rmat

sfo

rim

ages

Tic

ksin

dic

ate

too

lfu

nct

ion

alit

ies

504 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

together with the values of a numeric attribute (eg fold-changes P-values gene expression changes protein levels me-tabolite concentrations) have to be included and the data typehas to be specified by the user [58 69]

Data preparation

Data preparation includes integrity checking normalizationand compound name identification This set of procedures isfully available only in MetaboAnalyst MetaCoreTM and 3Omiscjust offer a tool for compound name standardization butMetaCoreTM also allows the integration with specific softwarefor data preprocessing InCroMAP and 3Omics only accept dataalready preprocessed and normalized with appropriate software(eg OpenMS MayDay R IBM-SPSS Statistics SAS or JMP) [2758] A brief description of each data preparation step is given inthe following text

Data integrity checkingAmong the four tools reviewed data integrity checking is per-formed only by MetaboAnalyst even if other software providethis functionality (eg XCMS [46] for metabolomic data orProgenesis QI [70] for proteomic data) As for MetaboAnalystthis step includes handling of missing values as well as identifi-cation and removal of outliers [69] Missing values are automat-ically replaced with small ones ie half of the minimumpositive value in the original data but the user can also specifyother missing value estimation methods eg by replacingthem with the meanmedian of the original data or by usingK-nearest neighbors probabilistic principal component ana-lysis Bayesian PCA or singular value decomposition methods

Data normalizationMetaboAnalyst provides different normalization methods (egby sum median or reference sample) that can be selected by theuser data transformation (log or cube root) and scaling (autopareto or range scaling) are also available [69]

Compound name identificationAs there is no universally accepted set of compound labels mol-ecule labels in userrsquos input data have to be converted to identi-fiers in public databases (eg HMDB PubChem CompoundChEBI KEGG or METLIN) MetaboAnalyst and 3Omics provide amodule specifically designed for this purpose [27 41] whereasMetaCoreTM has a built-in synonym dictionary for gene and pro-tein names that enables compound label standardization [54]InCroMAP does not provide an ID-conversion tool as it only ac-cepts data with an appropriate identifier (eg Affymetrix orAgilent for genes KEEG HMDB or PubChem for metabolites)[58] We highlight that the conversion of a compound name intoits correct ID code is not a painless procedure and can be anadditional source of error or ambiguity Furthermore differentresults may be obtained depending on the used platformtoolIn fact the same compound could be associated with differentIDs according to its different molecular structures (eg chirality)We recall an example reported by Cavill et al [22] regarding thecase of lactate and KEGG identifiers there are three KEGG iden-tifiers that relate to lactate according to the chirality of the mol-ecule (ie non-superposabity on its mirror image) and whetherthese chiral metabolites are distinguishable depends on theresolution provided by the analytical platform used Becausethey have the same mass a solution could be using specificallyselected elution columns in MS to ensure they elute at differenttimes and are thus experimentally distinguishable

Statistical analysis

Out of the four tools reviewed MetaboAnalyst is the one thatoffers the most complete set of statistical and machine learningmethods for data analysis 3Omics only performs clustering andco-expression analysis whereas InCroMAP does not provide astatistical module Unfortunately only limited information isavailable about the data analysis performed by MetaCoreTM butit seems incorporated in the pathway analysis module Analysisalgorithms used in MetaboAnalyst and 3Omics have originallybeen implemented in the R open-source project [65 66]

A wide variety of statistical methods and data miningapproaches can be applied on metabolomic data depending onthe kind of experiment performed In the following text we il-lustrate both univariate and multivariate statistics and alsobriefly describe time-course analysis for completeness

Univariate analysisIt is usually the first analysis performed as it provides a prelim-inary overview about the data features that are potentially sig-nificant in discriminating the conditions under study 3Omicsperforms co-expression analysis by means of an algorithm thatcomputes dissimilarity coefficients using the Euclidean dis-tance and displays results as heat maps

For two-group data MetaboAnalyst provides fold changeanalysis t-test and volcano plots (ie a type of scatter plots toquickly identify changes in large data sets of replicate data hav-ing the fold change on the x-axis and the negative log of theP-value on the y-axis) both for unpaired and paired analysesFor multi-group data it provides one-way analysis of variance(ANOVA) with associated post hoc analyses and correlationanalysis As a large number of metabolites are usually presentfor each patient or biological sample and an individual statis-tical test has to be performed for each metabolite a high num-ber of false-positive results can be obtained owing to multipletesting To limit it a multiple testing correction technique (egBonferroni BonferronindashHolm WestfallndashYoung or BenjaminindashHochberg correction) must be used to adjust the obtained sig-nificance P-values to keep the probability of observing at leastone significant result owing to chance below a predeterminedlevel [71] The BenjaminindashHochberg correction [72] also knownas false discovery rate (FDR) is the recommended one as itallows controlling the proportion of false positives among allsignificant results Both FDR- and Bonferroni-corrected P-valuesare provided by MetaboAnalyst for each evaluated metabolite

Multivariate analysisIt involves the simultaneous observation and analysis of morethan two statistical variables It is ideal for the analysis oflsquoomicsrsquo data as they usually consist of several features thatchange as a function of time phenotype or experimental condi-tion Multivariate analysis techniques include multivariateANOVA multivariate regression analysis factor analysis prin-cipal component analysis and partial least square discriminantanalysis all of them are supported by MetaboAnalyst Thesetechniques are useful for exploratory data analysis as theyallow summarizing original variables in fewer variables usingtheir weighted average

ClusteringCluster analysis aims to determine intrinsic groups in a set ofunlabeled data thus it is useful to identify groups of metabol-ites that have similar characteristic or behavior or that belongto the same biological pathway MetaboAnalyst supports two

Analysis of metabolomic data | 505

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering

ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines

Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]

Data interpretation and integration

This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text

Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge

MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]

MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in

Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided

with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact

and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-

way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound

colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas

other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of

this figure is available at BIB online httpbiboxfordjournalsorg

506 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view

InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]

3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]

Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation

MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results

Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)

InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]

3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts

Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-

logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds

with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version

of this figure is available at BIB online httpbiboxfordjournalsorg

Analysis of metabolomic data | 507

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]

Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]

Output data

Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis

MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]

MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results

of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing

Conclusion

The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions

Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford

Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations

Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification

508 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future

Key Points

bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools

bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge

bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge

Funding

This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)

References1 Mehrotra B Mendes P Bioinformatics approaches to inte-

grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15

2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210

3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93

4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95

5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39

6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9

7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)

8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34

9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)

10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)

11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302

12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65

13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1

14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93

15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293

16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108

17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2

18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43

19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9

20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48

21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69

22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090

23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100

24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)

25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60

26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6

27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764

28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)

29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4

30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)

31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7

32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109

33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4

34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382

35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)

Analysis of metabolomic data | 509

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9

37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4

38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18

39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)

403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)

41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7

42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7

43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7

44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244

45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)

46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87

47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411

48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395

49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7

50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51

51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5

52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278

53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44

54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50

55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008

56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014

57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33

58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82

59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012

60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6

61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)

62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)

63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)

64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)

65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010

66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)

67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)

68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)

69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60

70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)

71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7

72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300

73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649

510 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

  • bbw031-TF1
  • bbw031-TF2
Page 4: Analysis of metabolomic data: tools, current strategies ...€¦ · and proteomics), metabolomics constitutes one of the building blocks of systems biology. Because of its focus on

Figure 1 Flowchart of a typical metabolomic study After sample preparation specific metabolic signals are acquired using heterogeneous analytical platforms (DATA

ACQUISITION) Raw signals are then pre-processed to produce data in a suitable format for univariate and multivariate statistical analyses For untargeted studies me-

tabolites have first to be identified from spectral information (DATA PROCESSING) Significantly expressed metabolites are then linked to the biological context

through enrichment and pathway analysis and mapped into networks Finally metabolomic data are integrated with other lsquoomicsrsquo data and with prior knowledge to

gain a comprehensive view of the molecular processes involved (DATA INTERPRETATION AND INTEGRATION)

Analysis of metabolomic data | 501

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

chosen in accordance to the chemical and physical characteris-tic of each sample and to the kind of analysis to be performed(targeted or untargeted) [12 15 18] MS is the most widely appliedtechnique as it allows reliable metabolite identification par-ticularly when used in tandem with chromatographic separ-ation methods so as to enhance its mass-resolving capabilitiesIt is rapid (the analysis time ranges from 5 to 140 min) andallows performing sensitive and selective qualitative and quan-titative analyses The main drawbacks of the MS technique arethe need to separate or purify the sample before it is directedinto the mass analyzer and the high cost of the instrumentCompared with MS NMR has a lower sensitivity thus resultingin limited ability for metabolite identification and quantifica-tion This implies that potentially important compounds thatare present at smaller concentrations can be hidden by largerpeaks and are thus less likely to be identified The advantagesof NMR are the high analytical reproducibility and that it is anon-destructive method that requires minimal sample prepar-ation [43 44]

Data processing

Once acquired raw signals (chromatograms spectra or NMRdata) are pre-processed by ad hoc software tools to facilitatecompound quantification (eg the commercial softwareSIEVETM by Thermo Scientific [45] or some freely availablesoftware packages such as the cloud-based platform XCMS [46]or the open-source cross-platforms MAVEN [47] and MZmine 2[48]) Generally this preprocessing involves noise reductionretention time correction peak detection and integration andchromatogram alignment Finally for untargeted metabolomicstudies different databases such as the Human MetabolomeDatabase (HMDB) [49] or the Metabolite and Tandem MSDatabase (METLIN) [50] are used to identify the metabolitesfrom spectra During data integrity checking different inputdata are then prepared to produce appropriate data matricesfor further analyses as better detailed in the next sectionBriefly before starting any kind of statistical analysis datanormalization is performed to reduce systematic biases ortechnical variations and to avoid misidentification of signifi-cant changes owing to the different orders of magnitudes ofmetabolomic data Following which significant differences be-tween sample sets can be identified using appropriate statis-tical methods A typical statistical analysis for metabolomicdata consists of two phases initially different univariate andmultivariate methods are used to generate an overview of theconsidered data sets and to identify the metabolites that showsignificant changes under the studied conditions then datamining techniques are used to discriminate groups of func-tionally related metabolites [16] A limit of traditional statis-tical methods is that they highlight relationships amongvariables based only on mathematical criteria (eg maximiza-tion of variance or correlation) and they do not take into ac-count correlations from biological origin [17] For this reasonthe combined use of several statistical and data mining tech-niques is highly recommended [4] Once identified signifi-cantly expressed metabolites are first ranked usingappropriate P-values then a cut-off threshold is applied to se-lect the top-k ones from the ranked list The choice of thisthreshold which is often arbitrary is critical as it may influ-ence the final biological interpretation In fact some moder-ate but significant changes may be missed or criticalcomponents of a particular biological process may be left outthus compromising subsequent analyses [42]

Data interpretation and integration

In this last step the selected metabolites are linked to the biolo-gical context under study through enrichment and pathwayanalyses More precisely enrichment analysis aims to investi-gate the enrichment (ie over- andor under-expression) of pre-defined groups of functionally related metabolites (iemetabolite sets) to identify significant and coordinated expres-sion changes among them This allows taking advantage of thelist of altered metabolites to suggest a biological pathway ordisease condition which can be further investigated [42]Conversely pathway analysis involves the description and visu-alization of the interactions among genes proteins or metabol-ites within cells tissues or organs Its goal is to identify thepathways that significantly impact on a given biological process[37] Enrichment and pathway analyses are performed using adhoc software tools [37] which map significant metabolites toknown biochemical pathways on the basis of the informationcontained in public databases such as the Kyoto Encyclopediaof Genes and Genomes (KEGG) [8] Once the metabolic pathwaysare identified this information has to be integrated with tran-scriptomic and proteomic data to obtain a comprehensive viewof all the biological processes involved [22] To capture all theinteractions that these data describe network-based visualiza-tion tools are commonly used by investigators to better under-stand and show their findings Depending on the kind ofinteraction under study a biological network can be repre-sented through a different type of graph Graphs are mathemat-ical structures of several kinds such as directed or undirectedgraphs directed acyclic graphs trees forests minimum span-ning trees Boolean networks and Steiner trees [51] To makegraph layouts informative and reproducible several visualiza-tion strategies have been proposed and adopted A frequentlyused visualization method is the ball-and-stick diagram wherepathway data are presented as networks with compounds (egmetabolites proteins) as nodes and reactions as edges [23 3752] Nodes can be placed hierarchically (a father node with oneor more child nodes) or radially as in radial networks or hiveplots which more closely represent the complexity of biologicalsystems [53] To reach a more reliable evaluation of the processunder study integration with biological knowledge derivedfrom the literature or from previous experimental data can alsobe performed [12 17] In spite of the availability of this informa-tion effective data integration is still far to be achieved owing tothe heterogeneity of current databases As a consequence usershave to take into account multiple databases to extract andmanually assemble the several different information neededthis makes data integration time-consuming and the final in-terpretation is often prone to errors because of different back-ground knowledge or biases of individual researchers [42]

Software tools for metabolomic data analysisand integration

Powerful software tools are essential to address the vastamount and variety of data generated by metabolomic analysesRequired software capabilities include (i) processing of rawspectral data (ii) statistical analysis to find significantly ex-pressed metabolites (iii) connection to metabolite databases formetabolite identification (iv) integration and analysis of mul-tiple heterogeneous lsquoomicsrsquo data and (v) bioinformatics ana-lysis and visualization of molecular interaction networks [1618] In this section we introduce the four data analysis tools se-lected (ie MetaCoreTM MetaboAnalyst InCroMAP and 3Omics)

502 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

their main functionalities are illustrated and compared on thebasis of some selected features such as data preprocessingtechniques used statistical analyses performed and methodsused for functional interpretation and if available for data inte-gration The main features of each tool are summarized inTable 2 It is important to point out that only MetaboAnalystprovides a comprehensive module for data preprocessing andstatistical analysis [25 41] MetaCoreTM and 3Omics only offerlimited support for these functionalities whereas InCroMAPonly accepts already preprocessed data In spite of these limita-tions we decided to include MetaCoreTM 3Omics and InCroMAPin this review because of their ability to perform comprehensiveanalyses of multi-omics data which are currently limited inother high-level analysis tools

MetaCoreTM

MetaCoreTM [24] is a commercial tool available both as a stand-alone and as a Web-based application It is a software suite forfunctional analysis of different kinds of high-throughput mo-lecular data (eg next-generation sequencing siRNAmicroRNA microarray-based gene expression and serial ana-lysis of gene expression data array-comparative genomic-hybridization DNA arrays proteomic data metabolic profilesand screening data) MetaCoreTM is an integrated system whichconsists of (i) a high-quality manually curated database ofmammalian biology including metabolites and other molecularclasses bioactive molecules and their interactions signalingand metabolic pathways (ii) genomic analysis tools to identifypotentially significant variants (iii) a data mining toolkit fordata visualization analysis and exchange of data (iv) a toolkit(pathway editor) for custom assembly of functional networksand (v) a set of parsers to upload and manipulate different typesof high-throughput molecular data [54ndash56] Unfortunately nopublic information is available about the details of howMetaCoreTM works thus our review of this tool is partially lim-ited and some details remain unclear

MetaboAnalyst

MetaboAnalyst [25 35] is an integrated freely accessible Web-based platform It was first released in 2009 then upgraded in2012 (MetaboAnalyst 20 [57]) and in 2015 (MetaboAnalyst 30also available for download and local installation) It offers a setof online tools for metabolomic data analysis that combine stat-istical analysis of data with their functional and biological inter-pretation and visualization [25] tutorials and protocolpapers are also available online MetaboAnalyst 30 has been re-implemented to improve performance capability and userinteractivity It offers eight functional modules which can begrouped in three categories (i) exploratory statistical analysis(Statistical Analysis and Time-Series Analysis modules) (ii) func-tional analysis (Enrichment Analysis Pathway Analysis andIntegrated Pathway Analysis modules for both genes and metabol-ites) and (iii) advanced methods for translational studies(Biomarker Analysis Sample Size Estimation and Power Analysismodules) In addition it has also an Other Utilities module con-taining a specialized function for lipidomic data analysis and acompound ID-conversion tool [41]

InCroMAP

InCroMAP is a stand-alone Java software originally developed in2011 for enrichment analysis and pathway-based visualizationsof genomic and proteomic data [27 40] The extended version

InCroMAP 15 released in 2013 also supports annotated metab-olomic data thus making this tool suitable for comprehensivesystem biology studies [58] InCroMAP is freely available at itsWeb site [39] which includes a user guide [59] example datafiles to test the tool and a short video tutorial The software per-forms metabolite set enrichment analysis and generates inter-active global maps of the cellular metabolism These mapsallow an integrated pathway-based visualization of data frommultiple lsquoomicsrsquo platforms and provide a useful overview of themetabolic changes present in the studied experimental condi-tion [26 59]

3Omics

3Omics is a platform-independent Web-based tool developedin 2013 for the analysis integration and visualization of tran-scriptomic proteomic and metabolomic human data It is freelyaccessible at [40] and the Web site also includes a help sectionTo demonstrate the software functionalities applied to realdata two case studies are also reported and explained in [27]3Omics supports correlation analysis co-expression profilingphenotype mapping pathway enrichment analysis and geneontology-based enrichment analysis More precisely dependingon the data provided the software offers four types of lsquoomicsrsquoanalyses (i) TranscriptomicsndashProteomicsndashMetabolomics (TndashPndashM) (ii) TranscriptomicsndashProteomics (TndashP) (iii) ProteomicsndashMetabolomics (PndashM) and (iv) TranscriptomicsndashMetabolomics(TndashM) A single-omics mode is also available to reveal intra-omics relationships 3Omics can also supplement missing tran-script protein and metabolite information related to the inputdata by text-mining the biomedical literature through iHOP (in-formation Hyperlinked Over Protein [60 61]) Users can thusperform multi-omics analyses even when only one or two outof three lsquoomicsrsquo data sets are available [27]

Evaluation and comparison of softwarefunctionalities

In this section strong and weak aspects of each of the four se-lected tools are reviewed and comparatively evaluated follow-ing the data analysis workflow previously described (Figure 1)

Input data

Various proprietary data formats have been developed to han-dle and store MS or NMR raw data this heterogeneity makes itdifficult for researchers to manipulate these data For this rea-son several software for the conversion of raw data file types(eg RAW) into a universal format (eg CSV TXT mzXML orMGF) have been implemented such as ProteoWizard [62]MassMatrix [63] and several others including MATLAB [64] andR [65 66] Once raw data have been converted to a suitable for-mat they can be further analyzed Data types and formatsaccepted as input by the four reviewed tools are reported inTable 2 Input data are usually organized as data matrices withsamples as rows and signal features as columns MetaCoreTMMetaboAnalyst and IncroMAP accept data in different kindsof tabular format (eg textual tab-delimited TXT or comma-separated value CSV files) whereas 3Omics only accepts CSVfiles MetaboAnalyst also accepts zipped files of NMR or MSpeak lists or of MS spectra which must be in mzXML mzDATAor NetCDF format As for MetaCoreTM gene lists can also be im-ported from output files of microarray analysis software such asAffymetrix [67] or Agilent [68] Compound names or IDs

Analysis of metabolomic data | 503

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Tab

le2

Mai

nfe

atu

res

of

the

sele

cted

fou

rso

ftw

are

too

lsfo

rm

etab

olo

mic

dat

aan

alys

is

Too

lM

etaC

oreT

MM

etab

oAn

alys

tIn

Cro

MA

P3O

mic

sY

ear

2004

2009

2011

2013

Inst

itu

tio

nG

eneG

oU

niv

ersi

tyo

fA

lber

taM

cGil

lU

niv

ersi

tyM

on

trea

l(C

A)

Cen

ter

for

Bio

info

rma

tics

of

the

Un

iver

sity

of

Tu

bin

gen

(D)

Mo

lecu

lar

Des

ign

ampM

etab

olo

mic

sLa

bora

tory

U

niv

ersi

tyo

fT

aiw

an(T

W)

Imp

lem

enta

tio

nW

eb-b

asedthorn

stan

d-a

lon

eW

eb-b

ased

Stan

d-a

lon

eW

eb-b

ased

Lice

nse

Co

mm

erci

alG

PL(G

NU

Gen

eral

Publ

icLi

cen

se)

LGPL

(GN

ULe

sser

Gen

eral

Publ

icLi

cen

se)

No

ne

Typ

eo

fkn

ow

led

gePr

op

riet

ary

Publ

icPu

blic

Publ

icIn

pu

td

ata

Gen

ep

rote

ino

rm

etab

oli

teli

sts

imp

ort

edas

tab-

de-

lim

ited

text

(TX

T)

com

ma-

sep

arat

edva

lues

(CSV

)or

Exce

lfile

sge

ne

list

sfr

om

mic

roar

ray

anal

ysis

soft

war

e(A

ffym

etri

xA

gile

nt)

Tab

-del

imit

edte

xt(T

XT

)or

com

ma-

sep

arat

edva

lues

(CSV

)fo

rco

nce

ntr

atio

ns

spec

tral

bin

so

rp

eak

inte

n-

sity

dat

azi

pp

edfi

les

(ZIP

)of

NM

Ro

rM

Sp

eak

list

so

ro

fM

Ssp

ectr

a(i

nN

etC

DF

mzX

ML

or

mzD

AT

Afo

rmat

)

Tab

-del

imit

edte

xt(T

XT

)or

com

ma-

sep

arat

edva

lues

(CSV

)of

het

ero

gen

eou

sty

pes

of

pro

cess

edlsquoo

mic

srsquod

ata

Co

mm

a-se

par

ated

valu

es(C

SV)

of

pro

cess

edtr

ansc

rip

tom

ic

pro

teo

mic

or

met

abo

lom

icd

ata

Dat

ap

rep

arat

ion

Dat

ain

tegr

ity

chec

kin

g

Dat

an

orm

aliz

atio

n

Co

mp

ou

nd

nam

eid

enti

fica

tio

n

Stat

isti

cala

nal

ysis

Un

ivar

iate

anal

ysis

Mu

ltiv

aria

tean

alys

is

Clu

ster

ing

Cla

ssifi

cati

on

Dat

ain

terp

reta

tio

nan

din

tegr

atio

nFu

nct

ion

alin

terp

reta

tio

n

Met

abo

lite

set

enri

chm

ent

anal

ysis

Met

abo

lic

pat

hw

ayan

alys

is

Met

abo

lite

map

pin

g

Hyp

erli

nks

toex

tern

ald

atab

ases

Dat

aIn

tegr

atio

n

Ou

tpu

td

ata

Net

wo

rks

can

beex

po

rted

intw

ofo

rmat

sN

etsh

ot

and

Net

wo

rki

mag

esas

PNG

file

s

PDF

rep

ort

sco

nta

inin

gp

lots

gr

aph

san

dta

bles

wit

hal

lth

ere

sult

sIm

ages

are

avai

l-ab

leas

TIF

or

PNG

file

s

Tab

ula

rfo

rmat

(eg

CSV

)fo

ren

rich

men

tan

alys

isre

sult

san

dJP

Gfi

les

for

pat

hw

ay-

base

dvi

sual

izat

ion

PNG

SV

Go

rSI

Ffo

rmat

sfo

rim

ages

Tic

ksin

dic

ate

too

lfu

nct

ion

alit

ies

504 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

together with the values of a numeric attribute (eg fold-changes P-values gene expression changes protein levels me-tabolite concentrations) have to be included and the data typehas to be specified by the user [58 69]

Data preparation

Data preparation includes integrity checking normalizationand compound name identification This set of procedures isfully available only in MetaboAnalyst MetaCoreTM and 3Omiscjust offer a tool for compound name standardization butMetaCoreTM also allows the integration with specific softwarefor data preprocessing InCroMAP and 3Omics only accept dataalready preprocessed and normalized with appropriate software(eg OpenMS MayDay R IBM-SPSS Statistics SAS or JMP) [2758] A brief description of each data preparation step is given inthe following text

Data integrity checkingAmong the four tools reviewed data integrity checking is per-formed only by MetaboAnalyst even if other software providethis functionality (eg XCMS [46] for metabolomic data orProgenesis QI [70] for proteomic data) As for MetaboAnalystthis step includes handling of missing values as well as identifi-cation and removal of outliers [69] Missing values are automat-ically replaced with small ones ie half of the minimumpositive value in the original data but the user can also specifyother missing value estimation methods eg by replacingthem with the meanmedian of the original data or by usingK-nearest neighbors probabilistic principal component ana-lysis Bayesian PCA or singular value decomposition methods

Data normalizationMetaboAnalyst provides different normalization methods (egby sum median or reference sample) that can be selected by theuser data transformation (log or cube root) and scaling (autopareto or range scaling) are also available [69]

Compound name identificationAs there is no universally accepted set of compound labels mol-ecule labels in userrsquos input data have to be converted to identi-fiers in public databases (eg HMDB PubChem CompoundChEBI KEGG or METLIN) MetaboAnalyst and 3Omics provide amodule specifically designed for this purpose [27 41] whereasMetaCoreTM has a built-in synonym dictionary for gene and pro-tein names that enables compound label standardization [54]InCroMAP does not provide an ID-conversion tool as it only ac-cepts data with an appropriate identifier (eg Affymetrix orAgilent for genes KEEG HMDB or PubChem for metabolites)[58] We highlight that the conversion of a compound name intoits correct ID code is not a painless procedure and can be anadditional source of error or ambiguity Furthermore differentresults may be obtained depending on the used platformtoolIn fact the same compound could be associated with differentIDs according to its different molecular structures (eg chirality)We recall an example reported by Cavill et al [22] regarding thecase of lactate and KEGG identifiers there are three KEGG iden-tifiers that relate to lactate according to the chirality of the mol-ecule (ie non-superposabity on its mirror image) and whetherthese chiral metabolites are distinguishable depends on theresolution provided by the analytical platform used Becausethey have the same mass a solution could be using specificallyselected elution columns in MS to ensure they elute at differenttimes and are thus experimentally distinguishable

Statistical analysis

Out of the four tools reviewed MetaboAnalyst is the one thatoffers the most complete set of statistical and machine learningmethods for data analysis 3Omics only performs clustering andco-expression analysis whereas InCroMAP does not provide astatistical module Unfortunately only limited information isavailable about the data analysis performed by MetaCoreTM butit seems incorporated in the pathway analysis module Analysisalgorithms used in MetaboAnalyst and 3Omics have originallybeen implemented in the R open-source project [65 66]

A wide variety of statistical methods and data miningapproaches can be applied on metabolomic data depending onthe kind of experiment performed In the following text we il-lustrate both univariate and multivariate statistics and alsobriefly describe time-course analysis for completeness

Univariate analysisIt is usually the first analysis performed as it provides a prelim-inary overview about the data features that are potentially sig-nificant in discriminating the conditions under study 3Omicsperforms co-expression analysis by means of an algorithm thatcomputes dissimilarity coefficients using the Euclidean dis-tance and displays results as heat maps

For two-group data MetaboAnalyst provides fold changeanalysis t-test and volcano plots (ie a type of scatter plots toquickly identify changes in large data sets of replicate data hav-ing the fold change on the x-axis and the negative log of theP-value on the y-axis) both for unpaired and paired analysesFor multi-group data it provides one-way analysis of variance(ANOVA) with associated post hoc analyses and correlationanalysis As a large number of metabolites are usually presentfor each patient or biological sample and an individual statis-tical test has to be performed for each metabolite a high num-ber of false-positive results can be obtained owing to multipletesting To limit it a multiple testing correction technique (egBonferroni BonferronindashHolm WestfallndashYoung or BenjaminindashHochberg correction) must be used to adjust the obtained sig-nificance P-values to keep the probability of observing at leastone significant result owing to chance below a predeterminedlevel [71] The BenjaminindashHochberg correction [72] also knownas false discovery rate (FDR) is the recommended one as itallows controlling the proportion of false positives among allsignificant results Both FDR- and Bonferroni-corrected P-valuesare provided by MetaboAnalyst for each evaluated metabolite

Multivariate analysisIt involves the simultaneous observation and analysis of morethan two statistical variables It is ideal for the analysis oflsquoomicsrsquo data as they usually consist of several features thatchange as a function of time phenotype or experimental condi-tion Multivariate analysis techniques include multivariateANOVA multivariate regression analysis factor analysis prin-cipal component analysis and partial least square discriminantanalysis all of them are supported by MetaboAnalyst Thesetechniques are useful for exploratory data analysis as theyallow summarizing original variables in fewer variables usingtheir weighted average

ClusteringCluster analysis aims to determine intrinsic groups in a set ofunlabeled data thus it is useful to identify groups of metabol-ites that have similar characteristic or behavior or that belongto the same biological pathway MetaboAnalyst supports two

Analysis of metabolomic data | 505

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering

ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines

Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]

Data interpretation and integration

This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text

Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge

MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]

MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in

Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided

with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact

and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-

way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound

colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas

other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of

this figure is available at BIB online httpbiboxfordjournalsorg

506 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view

InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]

3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]

Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation

MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results

Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)

InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]

3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts

Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-

logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds

with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version

of this figure is available at BIB online httpbiboxfordjournalsorg

Analysis of metabolomic data | 507

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]

Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]

Output data

Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis

MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]

MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results

of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing

Conclusion

The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions

Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford

Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations

Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification

508 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future

Key Points

bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools

bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge

bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge

Funding

This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)

References1 Mehrotra B Mendes P Bioinformatics approaches to inte-

grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15

2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210

3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93

4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95

5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39

6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9

7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)

8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34

9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)

10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)

11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302

12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65

13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1

14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93

15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293

16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108

17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2

18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43

19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9

20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48

21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69

22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090

23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100

24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)

25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60

26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6

27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764

28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)

29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4

30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)

31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7

32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109

33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4

34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382

35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)

Analysis of metabolomic data | 509

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9

37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4

38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18

39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)

403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)

41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7

42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7

43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7

44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244

45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)

46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87

47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411

48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395

49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7

50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51

51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5

52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278

53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44

54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50

55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008

56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014

57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33

58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82

59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012

60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6

61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)

62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)

63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)

64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)

65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010

66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)

67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)

68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)

69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60

70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)

71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7

72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300

73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649

510 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

  • bbw031-TF1
  • bbw031-TF2
Page 5: Analysis of metabolomic data: tools, current strategies ...€¦ · and proteomics), metabolomics constitutes one of the building blocks of systems biology. Because of its focus on

chosen in accordance to the chemical and physical characteris-tic of each sample and to the kind of analysis to be performed(targeted or untargeted) [12 15 18] MS is the most widely appliedtechnique as it allows reliable metabolite identification par-ticularly when used in tandem with chromatographic separ-ation methods so as to enhance its mass-resolving capabilitiesIt is rapid (the analysis time ranges from 5 to 140 min) andallows performing sensitive and selective qualitative and quan-titative analyses The main drawbacks of the MS technique arethe need to separate or purify the sample before it is directedinto the mass analyzer and the high cost of the instrumentCompared with MS NMR has a lower sensitivity thus resultingin limited ability for metabolite identification and quantifica-tion This implies that potentially important compounds thatare present at smaller concentrations can be hidden by largerpeaks and are thus less likely to be identified The advantagesof NMR are the high analytical reproducibility and that it is anon-destructive method that requires minimal sample prepar-ation [43 44]

Data processing

Once acquired raw signals (chromatograms spectra or NMRdata) are pre-processed by ad hoc software tools to facilitatecompound quantification (eg the commercial softwareSIEVETM by Thermo Scientific [45] or some freely availablesoftware packages such as the cloud-based platform XCMS [46]or the open-source cross-platforms MAVEN [47] and MZmine 2[48]) Generally this preprocessing involves noise reductionretention time correction peak detection and integration andchromatogram alignment Finally for untargeted metabolomicstudies different databases such as the Human MetabolomeDatabase (HMDB) [49] or the Metabolite and Tandem MSDatabase (METLIN) [50] are used to identify the metabolitesfrom spectra During data integrity checking different inputdata are then prepared to produce appropriate data matricesfor further analyses as better detailed in the next sectionBriefly before starting any kind of statistical analysis datanormalization is performed to reduce systematic biases ortechnical variations and to avoid misidentification of signifi-cant changes owing to the different orders of magnitudes ofmetabolomic data Following which significant differences be-tween sample sets can be identified using appropriate statis-tical methods A typical statistical analysis for metabolomicdata consists of two phases initially different univariate andmultivariate methods are used to generate an overview of theconsidered data sets and to identify the metabolites that showsignificant changes under the studied conditions then datamining techniques are used to discriminate groups of func-tionally related metabolites [16] A limit of traditional statis-tical methods is that they highlight relationships amongvariables based only on mathematical criteria (eg maximiza-tion of variance or correlation) and they do not take into ac-count correlations from biological origin [17] For this reasonthe combined use of several statistical and data mining tech-niques is highly recommended [4] Once identified signifi-cantly expressed metabolites are first ranked usingappropriate P-values then a cut-off threshold is applied to se-lect the top-k ones from the ranked list The choice of thisthreshold which is often arbitrary is critical as it may influ-ence the final biological interpretation In fact some moder-ate but significant changes may be missed or criticalcomponents of a particular biological process may be left outthus compromising subsequent analyses [42]

Data interpretation and integration

In this last step the selected metabolites are linked to the biolo-gical context under study through enrichment and pathwayanalyses More precisely enrichment analysis aims to investi-gate the enrichment (ie over- andor under-expression) of pre-defined groups of functionally related metabolites (iemetabolite sets) to identify significant and coordinated expres-sion changes among them This allows taking advantage of thelist of altered metabolites to suggest a biological pathway ordisease condition which can be further investigated [42]Conversely pathway analysis involves the description and visu-alization of the interactions among genes proteins or metabol-ites within cells tissues or organs Its goal is to identify thepathways that significantly impact on a given biological process[37] Enrichment and pathway analyses are performed using adhoc software tools [37] which map significant metabolites toknown biochemical pathways on the basis of the informationcontained in public databases such as the Kyoto Encyclopediaof Genes and Genomes (KEGG) [8] Once the metabolic pathwaysare identified this information has to be integrated with tran-scriptomic and proteomic data to obtain a comprehensive viewof all the biological processes involved [22] To capture all theinteractions that these data describe network-based visualiza-tion tools are commonly used by investigators to better under-stand and show their findings Depending on the kind ofinteraction under study a biological network can be repre-sented through a different type of graph Graphs are mathemat-ical structures of several kinds such as directed or undirectedgraphs directed acyclic graphs trees forests minimum span-ning trees Boolean networks and Steiner trees [51] To makegraph layouts informative and reproducible several visualiza-tion strategies have been proposed and adopted A frequentlyused visualization method is the ball-and-stick diagram wherepathway data are presented as networks with compounds (egmetabolites proteins) as nodes and reactions as edges [23 3752] Nodes can be placed hierarchically (a father node with oneor more child nodes) or radially as in radial networks or hiveplots which more closely represent the complexity of biologicalsystems [53] To reach a more reliable evaluation of the processunder study integration with biological knowledge derivedfrom the literature or from previous experimental data can alsobe performed [12 17] In spite of the availability of this informa-tion effective data integration is still far to be achieved owing tothe heterogeneity of current databases As a consequence usershave to take into account multiple databases to extract andmanually assemble the several different information neededthis makes data integration time-consuming and the final in-terpretation is often prone to errors because of different back-ground knowledge or biases of individual researchers [42]

Software tools for metabolomic data analysisand integration

Powerful software tools are essential to address the vastamount and variety of data generated by metabolomic analysesRequired software capabilities include (i) processing of rawspectral data (ii) statistical analysis to find significantly ex-pressed metabolites (iii) connection to metabolite databases formetabolite identification (iv) integration and analysis of mul-tiple heterogeneous lsquoomicsrsquo data and (v) bioinformatics ana-lysis and visualization of molecular interaction networks [1618] In this section we introduce the four data analysis tools se-lected (ie MetaCoreTM MetaboAnalyst InCroMAP and 3Omics)

502 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

their main functionalities are illustrated and compared on thebasis of some selected features such as data preprocessingtechniques used statistical analyses performed and methodsused for functional interpretation and if available for data inte-gration The main features of each tool are summarized inTable 2 It is important to point out that only MetaboAnalystprovides a comprehensive module for data preprocessing andstatistical analysis [25 41] MetaCoreTM and 3Omics only offerlimited support for these functionalities whereas InCroMAPonly accepts already preprocessed data In spite of these limita-tions we decided to include MetaCoreTM 3Omics and InCroMAPin this review because of their ability to perform comprehensiveanalyses of multi-omics data which are currently limited inother high-level analysis tools

MetaCoreTM

MetaCoreTM [24] is a commercial tool available both as a stand-alone and as a Web-based application It is a software suite forfunctional analysis of different kinds of high-throughput mo-lecular data (eg next-generation sequencing siRNAmicroRNA microarray-based gene expression and serial ana-lysis of gene expression data array-comparative genomic-hybridization DNA arrays proteomic data metabolic profilesand screening data) MetaCoreTM is an integrated system whichconsists of (i) a high-quality manually curated database ofmammalian biology including metabolites and other molecularclasses bioactive molecules and their interactions signalingand metabolic pathways (ii) genomic analysis tools to identifypotentially significant variants (iii) a data mining toolkit fordata visualization analysis and exchange of data (iv) a toolkit(pathway editor) for custom assembly of functional networksand (v) a set of parsers to upload and manipulate different typesof high-throughput molecular data [54ndash56] Unfortunately nopublic information is available about the details of howMetaCoreTM works thus our review of this tool is partially lim-ited and some details remain unclear

MetaboAnalyst

MetaboAnalyst [25 35] is an integrated freely accessible Web-based platform It was first released in 2009 then upgraded in2012 (MetaboAnalyst 20 [57]) and in 2015 (MetaboAnalyst 30also available for download and local installation) It offers a setof online tools for metabolomic data analysis that combine stat-istical analysis of data with their functional and biological inter-pretation and visualization [25] tutorials and protocolpapers are also available online MetaboAnalyst 30 has been re-implemented to improve performance capability and userinteractivity It offers eight functional modules which can begrouped in three categories (i) exploratory statistical analysis(Statistical Analysis and Time-Series Analysis modules) (ii) func-tional analysis (Enrichment Analysis Pathway Analysis andIntegrated Pathway Analysis modules for both genes and metabol-ites) and (iii) advanced methods for translational studies(Biomarker Analysis Sample Size Estimation and Power Analysismodules) In addition it has also an Other Utilities module con-taining a specialized function for lipidomic data analysis and acompound ID-conversion tool [41]

InCroMAP

InCroMAP is a stand-alone Java software originally developed in2011 for enrichment analysis and pathway-based visualizationsof genomic and proteomic data [27 40] The extended version

InCroMAP 15 released in 2013 also supports annotated metab-olomic data thus making this tool suitable for comprehensivesystem biology studies [58] InCroMAP is freely available at itsWeb site [39] which includes a user guide [59] example datafiles to test the tool and a short video tutorial The software per-forms metabolite set enrichment analysis and generates inter-active global maps of the cellular metabolism These mapsallow an integrated pathway-based visualization of data frommultiple lsquoomicsrsquo platforms and provide a useful overview of themetabolic changes present in the studied experimental condi-tion [26 59]

3Omics

3Omics is a platform-independent Web-based tool developedin 2013 for the analysis integration and visualization of tran-scriptomic proteomic and metabolomic human data It is freelyaccessible at [40] and the Web site also includes a help sectionTo demonstrate the software functionalities applied to realdata two case studies are also reported and explained in [27]3Omics supports correlation analysis co-expression profilingphenotype mapping pathway enrichment analysis and geneontology-based enrichment analysis More precisely dependingon the data provided the software offers four types of lsquoomicsrsquoanalyses (i) TranscriptomicsndashProteomicsndashMetabolomics (TndashPndashM) (ii) TranscriptomicsndashProteomics (TndashP) (iii) ProteomicsndashMetabolomics (PndashM) and (iv) TranscriptomicsndashMetabolomics(TndashM) A single-omics mode is also available to reveal intra-omics relationships 3Omics can also supplement missing tran-script protein and metabolite information related to the inputdata by text-mining the biomedical literature through iHOP (in-formation Hyperlinked Over Protein [60 61]) Users can thusperform multi-omics analyses even when only one or two outof three lsquoomicsrsquo data sets are available [27]

Evaluation and comparison of softwarefunctionalities

In this section strong and weak aspects of each of the four se-lected tools are reviewed and comparatively evaluated follow-ing the data analysis workflow previously described (Figure 1)

Input data

Various proprietary data formats have been developed to han-dle and store MS or NMR raw data this heterogeneity makes itdifficult for researchers to manipulate these data For this rea-son several software for the conversion of raw data file types(eg RAW) into a universal format (eg CSV TXT mzXML orMGF) have been implemented such as ProteoWizard [62]MassMatrix [63] and several others including MATLAB [64] andR [65 66] Once raw data have been converted to a suitable for-mat they can be further analyzed Data types and formatsaccepted as input by the four reviewed tools are reported inTable 2 Input data are usually organized as data matrices withsamples as rows and signal features as columns MetaCoreTMMetaboAnalyst and IncroMAP accept data in different kindsof tabular format (eg textual tab-delimited TXT or comma-separated value CSV files) whereas 3Omics only accepts CSVfiles MetaboAnalyst also accepts zipped files of NMR or MSpeak lists or of MS spectra which must be in mzXML mzDATAor NetCDF format As for MetaCoreTM gene lists can also be im-ported from output files of microarray analysis software such asAffymetrix [67] or Agilent [68] Compound names or IDs

Analysis of metabolomic data | 503

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Tab

le2

Mai

nfe

atu

res

of

the

sele

cted

fou

rso

ftw

are

too

lsfo

rm

etab

olo

mic

dat

aan

alys

is

Too

lM

etaC

oreT

MM

etab

oAn

alys

tIn

Cro

MA

P3O

mic

sY

ear

2004

2009

2011

2013

Inst

itu

tio

nG

eneG

oU

niv

ersi

tyo

fA

lber

taM

cGil

lU

niv

ersi

tyM

on

trea

l(C

A)

Cen

ter

for

Bio

info

rma

tics

of

the

Un

iver

sity

of

Tu

bin

gen

(D)

Mo

lecu

lar

Des

ign

ampM

etab

olo

mic

sLa

bora

tory

U

niv

ersi

tyo

fT

aiw

an(T

W)

Imp

lem

enta

tio

nW

eb-b

asedthorn

stan

d-a

lon

eW

eb-b

ased

Stan

d-a

lon

eW

eb-b

ased

Lice

nse

Co

mm

erci

alG

PL(G

NU

Gen

eral

Publ

icLi

cen

se)

LGPL

(GN

ULe

sser

Gen

eral

Publ

icLi

cen

se)

No

ne

Typ

eo

fkn

ow

led

gePr

op

riet

ary

Publ

icPu

blic

Publ

icIn

pu

td

ata

Gen

ep

rote

ino

rm

etab

oli

teli

sts

imp

ort

edas

tab-

de-

lim

ited

text

(TX

T)

com

ma-

sep

arat

edva

lues

(CSV

)or

Exce

lfile

sge

ne

list

sfr

om

mic

roar

ray

anal

ysis

soft

war

e(A

ffym

etri

xA

gile

nt)

Tab

-del

imit

edte

xt(T

XT

)or

com

ma-

sep

arat

edva

lues

(CSV

)fo

rco

nce

ntr

atio

ns

spec

tral

bin

so

rp

eak

inte

n-

sity

dat

azi

pp

edfi

les

(ZIP

)of

NM

Ro

rM

Sp

eak

list

so

ro

fM

Ssp

ectr

a(i

nN

etC

DF

mzX

ML

or

mzD

AT

Afo

rmat

)

Tab

-del

imit

edte

xt(T

XT

)or

com

ma-

sep

arat

edva

lues

(CSV

)of

het

ero

gen

eou

sty

pes

of

pro

cess

edlsquoo

mic

srsquod

ata

Co

mm

a-se

par

ated

valu

es(C

SV)

of

pro

cess

edtr

ansc

rip

tom

ic

pro

teo

mic

or

met

abo

lom

icd

ata

Dat

ap

rep

arat

ion

Dat

ain

tegr

ity

chec

kin

g

Dat

an

orm

aliz

atio

n

Co

mp

ou

nd

nam

eid

enti

fica

tio

n

Stat

isti

cala

nal

ysis

Un

ivar

iate

anal

ysis

Mu

ltiv

aria

tean

alys

is

Clu

ster

ing

Cla

ssifi

cati

on

Dat

ain

terp

reta

tio

nan

din

tegr

atio

nFu

nct

ion

alin

terp

reta

tio

n

Met

abo

lite

set

enri

chm

ent

anal

ysis

Met

abo

lic

pat

hw

ayan

alys

is

Met

abo

lite

map

pin

g

Hyp

erli

nks

toex

tern

ald

atab

ases

Dat

aIn

tegr

atio

n

Ou

tpu

td

ata

Net

wo

rks

can

beex

po

rted

intw

ofo

rmat

sN

etsh

ot

and

Net

wo

rki

mag

esas

PNG

file

s

PDF

rep

ort

sco

nta

inin

gp

lots

gr

aph

san

dta

bles

wit

hal

lth

ere

sult

sIm

ages

are

avai

l-ab

leas

TIF

or

PNG

file

s

Tab

ula

rfo

rmat

(eg

CSV

)fo

ren

rich

men

tan

alys

isre

sult

san

dJP

Gfi

les

for

pat

hw

ay-

base

dvi

sual

izat

ion

PNG

SV

Go

rSI

Ffo

rmat

sfo

rim

ages

Tic

ksin

dic

ate

too

lfu

nct

ion

alit

ies

504 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

together with the values of a numeric attribute (eg fold-changes P-values gene expression changes protein levels me-tabolite concentrations) have to be included and the data typehas to be specified by the user [58 69]

Data preparation

Data preparation includes integrity checking normalizationand compound name identification This set of procedures isfully available only in MetaboAnalyst MetaCoreTM and 3Omiscjust offer a tool for compound name standardization butMetaCoreTM also allows the integration with specific softwarefor data preprocessing InCroMAP and 3Omics only accept dataalready preprocessed and normalized with appropriate software(eg OpenMS MayDay R IBM-SPSS Statistics SAS or JMP) [2758] A brief description of each data preparation step is given inthe following text

Data integrity checkingAmong the four tools reviewed data integrity checking is per-formed only by MetaboAnalyst even if other software providethis functionality (eg XCMS [46] for metabolomic data orProgenesis QI [70] for proteomic data) As for MetaboAnalystthis step includes handling of missing values as well as identifi-cation and removal of outliers [69] Missing values are automat-ically replaced with small ones ie half of the minimumpositive value in the original data but the user can also specifyother missing value estimation methods eg by replacingthem with the meanmedian of the original data or by usingK-nearest neighbors probabilistic principal component ana-lysis Bayesian PCA or singular value decomposition methods

Data normalizationMetaboAnalyst provides different normalization methods (egby sum median or reference sample) that can be selected by theuser data transformation (log or cube root) and scaling (autopareto or range scaling) are also available [69]

Compound name identificationAs there is no universally accepted set of compound labels mol-ecule labels in userrsquos input data have to be converted to identi-fiers in public databases (eg HMDB PubChem CompoundChEBI KEGG or METLIN) MetaboAnalyst and 3Omics provide amodule specifically designed for this purpose [27 41] whereasMetaCoreTM has a built-in synonym dictionary for gene and pro-tein names that enables compound label standardization [54]InCroMAP does not provide an ID-conversion tool as it only ac-cepts data with an appropriate identifier (eg Affymetrix orAgilent for genes KEEG HMDB or PubChem for metabolites)[58] We highlight that the conversion of a compound name intoits correct ID code is not a painless procedure and can be anadditional source of error or ambiguity Furthermore differentresults may be obtained depending on the used platformtoolIn fact the same compound could be associated with differentIDs according to its different molecular structures (eg chirality)We recall an example reported by Cavill et al [22] regarding thecase of lactate and KEGG identifiers there are three KEGG iden-tifiers that relate to lactate according to the chirality of the mol-ecule (ie non-superposabity on its mirror image) and whetherthese chiral metabolites are distinguishable depends on theresolution provided by the analytical platform used Becausethey have the same mass a solution could be using specificallyselected elution columns in MS to ensure they elute at differenttimes and are thus experimentally distinguishable

Statistical analysis

Out of the four tools reviewed MetaboAnalyst is the one thatoffers the most complete set of statistical and machine learningmethods for data analysis 3Omics only performs clustering andco-expression analysis whereas InCroMAP does not provide astatistical module Unfortunately only limited information isavailable about the data analysis performed by MetaCoreTM butit seems incorporated in the pathway analysis module Analysisalgorithms used in MetaboAnalyst and 3Omics have originallybeen implemented in the R open-source project [65 66]

A wide variety of statistical methods and data miningapproaches can be applied on metabolomic data depending onthe kind of experiment performed In the following text we il-lustrate both univariate and multivariate statistics and alsobriefly describe time-course analysis for completeness

Univariate analysisIt is usually the first analysis performed as it provides a prelim-inary overview about the data features that are potentially sig-nificant in discriminating the conditions under study 3Omicsperforms co-expression analysis by means of an algorithm thatcomputes dissimilarity coefficients using the Euclidean dis-tance and displays results as heat maps

For two-group data MetaboAnalyst provides fold changeanalysis t-test and volcano plots (ie a type of scatter plots toquickly identify changes in large data sets of replicate data hav-ing the fold change on the x-axis and the negative log of theP-value on the y-axis) both for unpaired and paired analysesFor multi-group data it provides one-way analysis of variance(ANOVA) with associated post hoc analyses and correlationanalysis As a large number of metabolites are usually presentfor each patient or biological sample and an individual statis-tical test has to be performed for each metabolite a high num-ber of false-positive results can be obtained owing to multipletesting To limit it a multiple testing correction technique (egBonferroni BonferronindashHolm WestfallndashYoung or BenjaminindashHochberg correction) must be used to adjust the obtained sig-nificance P-values to keep the probability of observing at leastone significant result owing to chance below a predeterminedlevel [71] The BenjaminindashHochberg correction [72] also knownas false discovery rate (FDR) is the recommended one as itallows controlling the proportion of false positives among allsignificant results Both FDR- and Bonferroni-corrected P-valuesare provided by MetaboAnalyst for each evaluated metabolite

Multivariate analysisIt involves the simultaneous observation and analysis of morethan two statistical variables It is ideal for the analysis oflsquoomicsrsquo data as they usually consist of several features thatchange as a function of time phenotype or experimental condi-tion Multivariate analysis techniques include multivariateANOVA multivariate regression analysis factor analysis prin-cipal component analysis and partial least square discriminantanalysis all of them are supported by MetaboAnalyst Thesetechniques are useful for exploratory data analysis as theyallow summarizing original variables in fewer variables usingtheir weighted average

ClusteringCluster analysis aims to determine intrinsic groups in a set ofunlabeled data thus it is useful to identify groups of metabol-ites that have similar characteristic or behavior or that belongto the same biological pathway MetaboAnalyst supports two

Analysis of metabolomic data | 505

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering

ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines

Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]

Data interpretation and integration

This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text

Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge

MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]

MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in

Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided

with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact

and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-

way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound

colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas

other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of

this figure is available at BIB online httpbiboxfordjournalsorg

506 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view

InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]

3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]

Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation

MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results

Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)

InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]

3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts

Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-

logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds

with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version

of this figure is available at BIB online httpbiboxfordjournalsorg

Analysis of metabolomic data | 507

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]

Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]

Output data

Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis

MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]

MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results

of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing

Conclusion

The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions

Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford

Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations

Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification

508 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future

Key Points

bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools

bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge

bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge

Funding

This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)

References1 Mehrotra B Mendes P Bioinformatics approaches to inte-

grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15

2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210

3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93

4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95

5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39

6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9

7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)

8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34

9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)

10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)

11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302

12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65

13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1

14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93

15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293

16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108

17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2

18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43

19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9

20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48

21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69

22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090

23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100

24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)

25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60

26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6

27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764

28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)

29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4

30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)

31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7

32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109

33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4

34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382

35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)

Analysis of metabolomic data | 509

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9

37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4

38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18

39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)

403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)

41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7

42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7

43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7

44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244

45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)

46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87

47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411

48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395

49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7

50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51

51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5

52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278

53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44

54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50

55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008

56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014

57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33

58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82

59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012

60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6

61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)

62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)

63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)

64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)

65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010

66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)

67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)

68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)

69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60

70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)

71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7

72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300

73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649

510 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

  • bbw031-TF1
  • bbw031-TF2
Page 6: Analysis of metabolomic data: tools, current strategies ...€¦ · and proteomics), metabolomics constitutes one of the building blocks of systems biology. Because of its focus on

their main functionalities are illustrated and compared on thebasis of some selected features such as data preprocessingtechniques used statistical analyses performed and methodsused for functional interpretation and if available for data inte-gration The main features of each tool are summarized inTable 2 It is important to point out that only MetaboAnalystprovides a comprehensive module for data preprocessing andstatistical analysis [25 41] MetaCoreTM and 3Omics only offerlimited support for these functionalities whereas InCroMAPonly accepts already preprocessed data In spite of these limita-tions we decided to include MetaCoreTM 3Omics and InCroMAPin this review because of their ability to perform comprehensiveanalyses of multi-omics data which are currently limited inother high-level analysis tools

MetaCoreTM

MetaCoreTM [24] is a commercial tool available both as a stand-alone and as a Web-based application It is a software suite forfunctional analysis of different kinds of high-throughput mo-lecular data (eg next-generation sequencing siRNAmicroRNA microarray-based gene expression and serial ana-lysis of gene expression data array-comparative genomic-hybridization DNA arrays proteomic data metabolic profilesand screening data) MetaCoreTM is an integrated system whichconsists of (i) a high-quality manually curated database ofmammalian biology including metabolites and other molecularclasses bioactive molecules and their interactions signalingand metabolic pathways (ii) genomic analysis tools to identifypotentially significant variants (iii) a data mining toolkit fordata visualization analysis and exchange of data (iv) a toolkit(pathway editor) for custom assembly of functional networksand (v) a set of parsers to upload and manipulate different typesof high-throughput molecular data [54ndash56] Unfortunately nopublic information is available about the details of howMetaCoreTM works thus our review of this tool is partially lim-ited and some details remain unclear

MetaboAnalyst

MetaboAnalyst [25 35] is an integrated freely accessible Web-based platform It was first released in 2009 then upgraded in2012 (MetaboAnalyst 20 [57]) and in 2015 (MetaboAnalyst 30also available for download and local installation) It offers a setof online tools for metabolomic data analysis that combine stat-istical analysis of data with their functional and biological inter-pretation and visualization [25] tutorials and protocolpapers are also available online MetaboAnalyst 30 has been re-implemented to improve performance capability and userinteractivity It offers eight functional modules which can begrouped in three categories (i) exploratory statistical analysis(Statistical Analysis and Time-Series Analysis modules) (ii) func-tional analysis (Enrichment Analysis Pathway Analysis andIntegrated Pathway Analysis modules for both genes and metabol-ites) and (iii) advanced methods for translational studies(Biomarker Analysis Sample Size Estimation and Power Analysismodules) In addition it has also an Other Utilities module con-taining a specialized function for lipidomic data analysis and acompound ID-conversion tool [41]

InCroMAP

InCroMAP is a stand-alone Java software originally developed in2011 for enrichment analysis and pathway-based visualizationsof genomic and proteomic data [27 40] The extended version

InCroMAP 15 released in 2013 also supports annotated metab-olomic data thus making this tool suitable for comprehensivesystem biology studies [58] InCroMAP is freely available at itsWeb site [39] which includes a user guide [59] example datafiles to test the tool and a short video tutorial The software per-forms metabolite set enrichment analysis and generates inter-active global maps of the cellular metabolism These mapsallow an integrated pathway-based visualization of data frommultiple lsquoomicsrsquo platforms and provide a useful overview of themetabolic changes present in the studied experimental condi-tion [26 59]

3Omics

3Omics is a platform-independent Web-based tool developedin 2013 for the analysis integration and visualization of tran-scriptomic proteomic and metabolomic human data It is freelyaccessible at [40] and the Web site also includes a help sectionTo demonstrate the software functionalities applied to realdata two case studies are also reported and explained in [27]3Omics supports correlation analysis co-expression profilingphenotype mapping pathway enrichment analysis and geneontology-based enrichment analysis More precisely dependingon the data provided the software offers four types of lsquoomicsrsquoanalyses (i) TranscriptomicsndashProteomicsndashMetabolomics (TndashPndashM) (ii) TranscriptomicsndashProteomics (TndashP) (iii) ProteomicsndashMetabolomics (PndashM) and (iv) TranscriptomicsndashMetabolomics(TndashM) A single-omics mode is also available to reveal intra-omics relationships 3Omics can also supplement missing tran-script protein and metabolite information related to the inputdata by text-mining the biomedical literature through iHOP (in-formation Hyperlinked Over Protein [60 61]) Users can thusperform multi-omics analyses even when only one or two outof three lsquoomicsrsquo data sets are available [27]

Evaluation and comparison of softwarefunctionalities

In this section strong and weak aspects of each of the four se-lected tools are reviewed and comparatively evaluated follow-ing the data analysis workflow previously described (Figure 1)

Input data

Various proprietary data formats have been developed to han-dle and store MS or NMR raw data this heterogeneity makes itdifficult for researchers to manipulate these data For this rea-son several software for the conversion of raw data file types(eg RAW) into a universal format (eg CSV TXT mzXML orMGF) have been implemented such as ProteoWizard [62]MassMatrix [63] and several others including MATLAB [64] andR [65 66] Once raw data have been converted to a suitable for-mat they can be further analyzed Data types and formatsaccepted as input by the four reviewed tools are reported inTable 2 Input data are usually organized as data matrices withsamples as rows and signal features as columns MetaCoreTMMetaboAnalyst and IncroMAP accept data in different kindsof tabular format (eg textual tab-delimited TXT or comma-separated value CSV files) whereas 3Omics only accepts CSVfiles MetaboAnalyst also accepts zipped files of NMR or MSpeak lists or of MS spectra which must be in mzXML mzDATAor NetCDF format As for MetaCoreTM gene lists can also be im-ported from output files of microarray analysis software such asAffymetrix [67] or Agilent [68] Compound names or IDs

Analysis of metabolomic data | 503

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Tab

le2

Mai

nfe

atu

res

of

the

sele

cted

fou

rso

ftw

are

too

lsfo

rm

etab

olo

mic

dat

aan

alys

is

Too

lM

etaC

oreT

MM

etab

oAn

alys

tIn

Cro

MA

P3O

mic

sY

ear

2004

2009

2011

2013

Inst

itu

tio

nG

eneG

oU

niv

ersi

tyo

fA

lber

taM

cGil

lU

niv

ersi

tyM

on

trea

l(C

A)

Cen

ter

for

Bio

info

rma

tics

of

the

Un

iver

sity

of

Tu

bin

gen

(D)

Mo

lecu

lar

Des

ign

ampM

etab

olo

mic

sLa

bora

tory

U

niv

ersi

tyo

fT

aiw

an(T

W)

Imp

lem

enta

tio

nW

eb-b

asedthorn

stan

d-a

lon

eW

eb-b

ased

Stan

d-a

lon

eW

eb-b

ased

Lice

nse

Co

mm

erci

alG

PL(G

NU

Gen

eral

Publ

icLi

cen

se)

LGPL

(GN

ULe

sser

Gen

eral

Publ

icLi

cen

se)

No

ne

Typ

eo

fkn

ow

led

gePr

op

riet

ary

Publ

icPu

blic

Publ

icIn

pu

td

ata

Gen

ep

rote

ino

rm

etab

oli

teli

sts

imp

ort

edas

tab-

de-

lim

ited

text

(TX

T)

com

ma-

sep

arat

edva

lues

(CSV

)or

Exce

lfile

sge

ne

list

sfr

om

mic

roar

ray

anal

ysis

soft

war

e(A

ffym

etri

xA

gile

nt)

Tab

-del

imit

edte

xt(T

XT

)or

com

ma-

sep

arat

edva

lues

(CSV

)fo

rco

nce

ntr

atio

ns

spec

tral

bin

so

rp

eak

inte

n-

sity

dat

azi

pp

edfi

les

(ZIP

)of

NM

Ro

rM

Sp

eak

list

so

ro

fM

Ssp

ectr

a(i

nN

etC

DF

mzX

ML

or

mzD

AT

Afo

rmat

)

Tab

-del

imit

edte

xt(T

XT

)or

com

ma-

sep

arat

edva

lues

(CSV

)of

het

ero

gen

eou

sty

pes

of

pro

cess

edlsquoo

mic

srsquod

ata

Co

mm

a-se

par

ated

valu

es(C

SV)

of

pro

cess

edtr

ansc

rip

tom

ic

pro

teo

mic

or

met

abo

lom

icd

ata

Dat

ap

rep

arat

ion

Dat

ain

tegr

ity

chec

kin

g

Dat

an

orm

aliz

atio

n

Co

mp

ou

nd

nam

eid

enti

fica

tio

n

Stat

isti

cala

nal

ysis

Un

ivar

iate

anal

ysis

Mu

ltiv

aria

tean

alys

is

Clu

ster

ing

Cla

ssifi

cati

on

Dat

ain

terp

reta

tio

nan

din

tegr

atio

nFu

nct

ion

alin

terp

reta

tio

n

Met

abo

lite

set

enri

chm

ent

anal

ysis

Met

abo

lic

pat

hw

ayan

alys

is

Met

abo

lite

map

pin

g

Hyp

erli

nks

toex

tern

ald

atab

ases

Dat

aIn

tegr

atio

n

Ou

tpu

td

ata

Net

wo

rks

can

beex

po

rted

intw

ofo

rmat

sN

etsh

ot

and

Net

wo

rki

mag

esas

PNG

file

s

PDF

rep

ort

sco

nta

inin

gp

lots

gr

aph

san

dta

bles

wit

hal

lth

ere

sult

sIm

ages

are

avai

l-ab

leas

TIF

or

PNG

file

s

Tab

ula

rfo

rmat

(eg

CSV

)fo

ren

rich

men

tan

alys

isre

sult

san

dJP

Gfi

les

for

pat

hw

ay-

base

dvi

sual

izat

ion

PNG

SV

Go

rSI

Ffo

rmat

sfo

rim

ages

Tic

ksin

dic

ate

too

lfu

nct

ion

alit

ies

504 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

together with the values of a numeric attribute (eg fold-changes P-values gene expression changes protein levels me-tabolite concentrations) have to be included and the data typehas to be specified by the user [58 69]

Data preparation

Data preparation includes integrity checking normalizationand compound name identification This set of procedures isfully available only in MetaboAnalyst MetaCoreTM and 3Omiscjust offer a tool for compound name standardization butMetaCoreTM also allows the integration with specific softwarefor data preprocessing InCroMAP and 3Omics only accept dataalready preprocessed and normalized with appropriate software(eg OpenMS MayDay R IBM-SPSS Statistics SAS or JMP) [2758] A brief description of each data preparation step is given inthe following text

Data integrity checkingAmong the four tools reviewed data integrity checking is per-formed only by MetaboAnalyst even if other software providethis functionality (eg XCMS [46] for metabolomic data orProgenesis QI [70] for proteomic data) As for MetaboAnalystthis step includes handling of missing values as well as identifi-cation and removal of outliers [69] Missing values are automat-ically replaced with small ones ie half of the minimumpositive value in the original data but the user can also specifyother missing value estimation methods eg by replacingthem with the meanmedian of the original data or by usingK-nearest neighbors probabilistic principal component ana-lysis Bayesian PCA or singular value decomposition methods

Data normalizationMetaboAnalyst provides different normalization methods (egby sum median or reference sample) that can be selected by theuser data transformation (log or cube root) and scaling (autopareto or range scaling) are also available [69]

Compound name identificationAs there is no universally accepted set of compound labels mol-ecule labels in userrsquos input data have to be converted to identi-fiers in public databases (eg HMDB PubChem CompoundChEBI KEGG or METLIN) MetaboAnalyst and 3Omics provide amodule specifically designed for this purpose [27 41] whereasMetaCoreTM has a built-in synonym dictionary for gene and pro-tein names that enables compound label standardization [54]InCroMAP does not provide an ID-conversion tool as it only ac-cepts data with an appropriate identifier (eg Affymetrix orAgilent for genes KEEG HMDB or PubChem for metabolites)[58] We highlight that the conversion of a compound name intoits correct ID code is not a painless procedure and can be anadditional source of error or ambiguity Furthermore differentresults may be obtained depending on the used platformtoolIn fact the same compound could be associated with differentIDs according to its different molecular structures (eg chirality)We recall an example reported by Cavill et al [22] regarding thecase of lactate and KEGG identifiers there are three KEGG iden-tifiers that relate to lactate according to the chirality of the mol-ecule (ie non-superposabity on its mirror image) and whetherthese chiral metabolites are distinguishable depends on theresolution provided by the analytical platform used Becausethey have the same mass a solution could be using specificallyselected elution columns in MS to ensure they elute at differenttimes and are thus experimentally distinguishable

Statistical analysis

Out of the four tools reviewed MetaboAnalyst is the one thatoffers the most complete set of statistical and machine learningmethods for data analysis 3Omics only performs clustering andco-expression analysis whereas InCroMAP does not provide astatistical module Unfortunately only limited information isavailable about the data analysis performed by MetaCoreTM butit seems incorporated in the pathway analysis module Analysisalgorithms used in MetaboAnalyst and 3Omics have originallybeen implemented in the R open-source project [65 66]

A wide variety of statistical methods and data miningapproaches can be applied on metabolomic data depending onthe kind of experiment performed In the following text we il-lustrate both univariate and multivariate statistics and alsobriefly describe time-course analysis for completeness

Univariate analysisIt is usually the first analysis performed as it provides a prelim-inary overview about the data features that are potentially sig-nificant in discriminating the conditions under study 3Omicsperforms co-expression analysis by means of an algorithm thatcomputes dissimilarity coefficients using the Euclidean dis-tance and displays results as heat maps

For two-group data MetaboAnalyst provides fold changeanalysis t-test and volcano plots (ie a type of scatter plots toquickly identify changes in large data sets of replicate data hav-ing the fold change on the x-axis and the negative log of theP-value on the y-axis) both for unpaired and paired analysesFor multi-group data it provides one-way analysis of variance(ANOVA) with associated post hoc analyses and correlationanalysis As a large number of metabolites are usually presentfor each patient or biological sample and an individual statis-tical test has to be performed for each metabolite a high num-ber of false-positive results can be obtained owing to multipletesting To limit it a multiple testing correction technique (egBonferroni BonferronindashHolm WestfallndashYoung or BenjaminindashHochberg correction) must be used to adjust the obtained sig-nificance P-values to keep the probability of observing at leastone significant result owing to chance below a predeterminedlevel [71] The BenjaminindashHochberg correction [72] also knownas false discovery rate (FDR) is the recommended one as itallows controlling the proportion of false positives among allsignificant results Both FDR- and Bonferroni-corrected P-valuesare provided by MetaboAnalyst for each evaluated metabolite

Multivariate analysisIt involves the simultaneous observation and analysis of morethan two statistical variables It is ideal for the analysis oflsquoomicsrsquo data as they usually consist of several features thatchange as a function of time phenotype or experimental condi-tion Multivariate analysis techniques include multivariateANOVA multivariate regression analysis factor analysis prin-cipal component analysis and partial least square discriminantanalysis all of them are supported by MetaboAnalyst Thesetechniques are useful for exploratory data analysis as theyallow summarizing original variables in fewer variables usingtheir weighted average

ClusteringCluster analysis aims to determine intrinsic groups in a set ofunlabeled data thus it is useful to identify groups of metabol-ites that have similar characteristic or behavior or that belongto the same biological pathway MetaboAnalyst supports two

Analysis of metabolomic data | 505

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering

ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines

Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]

Data interpretation and integration

This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text

Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge

MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]

MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in

Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided

with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact

and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-

way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound

colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas

other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of

this figure is available at BIB online httpbiboxfordjournalsorg

506 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view

InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]

3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]

Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation

MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results

Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)

InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]

3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts

Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-

logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds

with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version

of this figure is available at BIB online httpbiboxfordjournalsorg

Analysis of metabolomic data | 507

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]

Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]

Output data

Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis

MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]

MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results

of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing

Conclusion

The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions

Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford

Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations

Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification

508 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future

Key Points

bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools

bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge

bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge

Funding

This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)

References1 Mehrotra B Mendes P Bioinformatics approaches to inte-

grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15

2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210

3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93

4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95

5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39

6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9

7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)

8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34

9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)

10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)

11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302

12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65

13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1

14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93

15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293

16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108

17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2

18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43

19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9

20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48

21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69

22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090

23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100

24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)

25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60

26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6

27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764

28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)

29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4

30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)

31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7

32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109

33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4

34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382

35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)

Analysis of metabolomic data | 509

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9

37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4

38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18

39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)

403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)

41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7

42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7

43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7

44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244

45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)

46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87

47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411

48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395

49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7

50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51

51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5

52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278

53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44

54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50

55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008

56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014

57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33

58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82

59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012

60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6

61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)

62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)

63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)

64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)

65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010

66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)

67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)

68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)

69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60

70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)

71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7

72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300

73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649

510 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

  • bbw031-TF1
  • bbw031-TF2
Page 7: Analysis of metabolomic data: tools, current strategies ...€¦ · and proteomics), metabolomics constitutes one of the building blocks of systems biology. Because of its focus on

Tab

le2

Mai

nfe

atu

res

of

the

sele

cted

fou

rso

ftw

are

too

lsfo

rm

etab

olo

mic

dat

aan

alys

is

Too

lM

etaC

oreT

MM

etab

oAn

alys

tIn

Cro

MA

P3O

mic

sY

ear

2004

2009

2011

2013

Inst

itu

tio

nG

eneG

oU

niv

ersi

tyo

fA

lber

taM

cGil

lU

niv

ersi

tyM

on

trea

l(C

A)

Cen

ter

for

Bio

info

rma

tics

of

the

Un

iver

sity

of

Tu

bin

gen

(D)

Mo

lecu

lar

Des

ign

ampM

etab

olo

mic

sLa

bora

tory

U

niv

ersi

tyo

fT

aiw

an(T

W)

Imp

lem

enta

tio

nW

eb-b

asedthorn

stan

d-a

lon

eW

eb-b

ased

Stan

d-a

lon

eW

eb-b

ased

Lice

nse

Co

mm

erci

alG

PL(G

NU

Gen

eral

Publ

icLi

cen

se)

LGPL

(GN

ULe

sser

Gen

eral

Publ

icLi

cen

se)

No

ne

Typ

eo

fkn

ow

led

gePr

op

riet

ary

Publ

icPu

blic

Publ

icIn

pu

td

ata

Gen

ep

rote

ino

rm

etab

oli

teli

sts

imp

ort

edas

tab-

de-

lim

ited

text

(TX

T)

com

ma-

sep

arat

edva

lues

(CSV

)or

Exce

lfile

sge

ne

list

sfr

om

mic

roar

ray

anal

ysis

soft

war

e(A

ffym

etri

xA

gile

nt)

Tab

-del

imit

edte

xt(T

XT

)or

com

ma-

sep

arat

edva

lues

(CSV

)fo

rco

nce

ntr

atio

ns

spec

tral

bin

so

rp

eak

inte

n-

sity

dat

azi

pp

edfi

les

(ZIP

)of

NM

Ro

rM

Sp

eak

list

so

ro

fM

Ssp

ectr

a(i

nN

etC

DF

mzX

ML

or

mzD

AT

Afo

rmat

)

Tab

-del

imit

edte

xt(T

XT

)or

com

ma-

sep

arat

edva

lues

(CSV

)of

het

ero

gen

eou

sty

pes

of

pro

cess

edlsquoo

mic

srsquod

ata

Co

mm

a-se

par

ated

valu

es(C

SV)

of

pro

cess

edtr

ansc

rip

tom

ic

pro

teo

mic

or

met

abo

lom

icd

ata

Dat

ap

rep

arat

ion

Dat

ain

tegr

ity

chec

kin

g

Dat

an

orm

aliz

atio

n

Co

mp

ou

nd

nam

eid

enti

fica

tio

n

Stat

isti

cala

nal

ysis

Un

ivar

iate

anal

ysis

Mu

ltiv

aria

tean

alys

is

Clu

ster

ing

Cla

ssifi

cati

on

Dat

ain

terp

reta

tio

nan

din

tegr

atio

nFu

nct

ion

alin

terp

reta

tio

n

Met

abo

lite

set

enri

chm

ent

anal

ysis

Met

abo

lic

pat

hw

ayan

alys

is

Met

abo

lite

map

pin

g

Hyp

erli

nks

toex

tern

ald

atab

ases

Dat

aIn

tegr

atio

n

Ou

tpu

td

ata

Net

wo

rks

can

beex

po

rted

intw

ofo

rmat

sN

etsh

ot

and

Net

wo

rki

mag

esas

PNG

file

s

PDF

rep

ort

sco

nta

inin

gp

lots

gr

aph

san

dta

bles

wit

hal

lth

ere

sult

sIm

ages

are

avai

l-ab

leas

TIF

or

PNG

file

s

Tab

ula

rfo

rmat

(eg

CSV

)fo

ren

rich

men

tan

alys

isre

sult

san

dJP

Gfi

les

for

pat

hw

ay-

base

dvi

sual

izat

ion

PNG

SV

Go

rSI

Ffo

rmat

sfo

rim

ages

Tic

ksin

dic

ate

too

lfu

nct

ion

alit

ies

504 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

together with the values of a numeric attribute (eg fold-changes P-values gene expression changes protein levels me-tabolite concentrations) have to be included and the data typehas to be specified by the user [58 69]

Data preparation

Data preparation includes integrity checking normalizationand compound name identification This set of procedures isfully available only in MetaboAnalyst MetaCoreTM and 3Omiscjust offer a tool for compound name standardization butMetaCoreTM also allows the integration with specific softwarefor data preprocessing InCroMAP and 3Omics only accept dataalready preprocessed and normalized with appropriate software(eg OpenMS MayDay R IBM-SPSS Statistics SAS or JMP) [2758] A brief description of each data preparation step is given inthe following text

Data integrity checkingAmong the four tools reviewed data integrity checking is per-formed only by MetaboAnalyst even if other software providethis functionality (eg XCMS [46] for metabolomic data orProgenesis QI [70] for proteomic data) As for MetaboAnalystthis step includes handling of missing values as well as identifi-cation and removal of outliers [69] Missing values are automat-ically replaced with small ones ie half of the minimumpositive value in the original data but the user can also specifyother missing value estimation methods eg by replacingthem with the meanmedian of the original data or by usingK-nearest neighbors probabilistic principal component ana-lysis Bayesian PCA or singular value decomposition methods

Data normalizationMetaboAnalyst provides different normalization methods (egby sum median or reference sample) that can be selected by theuser data transformation (log or cube root) and scaling (autopareto or range scaling) are also available [69]

Compound name identificationAs there is no universally accepted set of compound labels mol-ecule labels in userrsquos input data have to be converted to identi-fiers in public databases (eg HMDB PubChem CompoundChEBI KEGG or METLIN) MetaboAnalyst and 3Omics provide amodule specifically designed for this purpose [27 41] whereasMetaCoreTM has a built-in synonym dictionary for gene and pro-tein names that enables compound label standardization [54]InCroMAP does not provide an ID-conversion tool as it only ac-cepts data with an appropriate identifier (eg Affymetrix orAgilent for genes KEEG HMDB or PubChem for metabolites)[58] We highlight that the conversion of a compound name intoits correct ID code is not a painless procedure and can be anadditional source of error or ambiguity Furthermore differentresults may be obtained depending on the used platformtoolIn fact the same compound could be associated with differentIDs according to its different molecular structures (eg chirality)We recall an example reported by Cavill et al [22] regarding thecase of lactate and KEGG identifiers there are three KEGG iden-tifiers that relate to lactate according to the chirality of the mol-ecule (ie non-superposabity on its mirror image) and whetherthese chiral metabolites are distinguishable depends on theresolution provided by the analytical platform used Becausethey have the same mass a solution could be using specificallyselected elution columns in MS to ensure they elute at differenttimes and are thus experimentally distinguishable

Statistical analysis

Out of the four tools reviewed MetaboAnalyst is the one thatoffers the most complete set of statistical and machine learningmethods for data analysis 3Omics only performs clustering andco-expression analysis whereas InCroMAP does not provide astatistical module Unfortunately only limited information isavailable about the data analysis performed by MetaCoreTM butit seems incorporated in the pathway analysis module Analysisalgorithms used in MetaboAnalyst and 3Omics have originallybeen implemented in the R open-source project [65 66]

A wide variety of statistical methods and data miningapproaches can be applied on metabolomic data depending onthe kind of experiment performed In the following text we il-lustrate both univariate and multivariate statistics and alsobriefly describe time-course analysis for completeness

Univariate analysisIt is usually the first analysis performed as it provides a prelim-inary overview about the data features that are potentially sig-nificant in discriminating the conditions under study 3Omicsperforms co-expression analysis by means of an algorithm thatcomputes dissimilarity coefficients using the Euclidean dis-tance and displays results as heat maps

For two-group data MetaboAnalyst provides fold changeanalysis t-test and volcano plots (ie a type of scatter plots toquickly identify changes in large data sets of replicate data hav-ing the fold change on the x-axis and the negative log of theP-value on the y-axis) both for unpaired and paired analysesFor multi-group data it provides one-way analysis of variance(ANOVA) with associated post hoc analyses and correlationanalysis As a large number of metabolites are usually presentfor each patient or biological sample and an individual statis-tical test has to be performed for each metabolite a high num-ber of false-positive results can be obtained owing to multipletesting To limit it a multiple testing correction technique (egBonferroni BonferronindashHolm WestfallndashYoung or BenjaminindashHochberg correction) must be used to adjust the obtained sig-nificance P-values to keep the probability of observing at leastone significant result owing to chance below a predeterminedlevel [71] The BenjaminindashHochberg correction [72] also knownas false discovery rate (FDR) is the recommended one as itallows controlling the proportion of false positives among allsignificant results Both FDR- and Bonferroni-corrected P-valuesare provided by MetaboAnalyst for each evaluated metabolite

Multivariate analysisIt involves the simultaneous observation and analysis of morethan two statistical variables It is ideal for the analysis oflsquoomicsrsquo data as they usually consist of several features thatchange as a function of time phenotype or experimental condi-tion Multivariate analysis techniques include multivariateANOVA multivariate regression analysis factor analysis prin-cipal component analysis and partial least square discriminantanalysis all of them are supported by MetaboAnalyst Thesetechniques are useful for exploratory data analysis as theyallow summarizing original variables in fewer variables usingtheir weighted average

ClusteringCluster analysis aims to determine intrinsic groups in a set ofunlabeled data thus it is useful to identify groups of metabol-ites that have similar characteristic or behavior or that belongto the same biological pathway MetaboAnalyst supports two

Analysis of metabolomic data | 505

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering

ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines

Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]

Data interpretation and integration

This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text

Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge

MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]

MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in

Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided

with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact

and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-

way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound

colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas

other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of

this figure is available at BIB online httpbiboxfordjournalsorg

506 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view

InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]

3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]

Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation

MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results

Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)

InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]

3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts

Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-

logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds

with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version

of this figure is available at BIB online httpbiboxfordjournalsorg

Analysis of metabolomic data | 507

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]

Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]

Output data

Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis

MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]

MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results

of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing

Conclusion

The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions

Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford

Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations

Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification

508 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future

Key Points

bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools

bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge

bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge

Funding

This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)

References1 Mehrotra B Mendes P Bioinformatics approaches to inte-

grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15

2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210

3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93

4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95

5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39

6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9

7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)

8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34

9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)

10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)

11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302

12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65

13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1

14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93

15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293

16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108

17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2

18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43

19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9

20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48

21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69

22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090

23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100

24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)

25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60

26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6

27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764

28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)

29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4

30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)

31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7

32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109

33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4

34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382

35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)

Analysis of metabolomic data | 509

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9

37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4

38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18

39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)

403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)

41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7

42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7

43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7

44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244

45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)

46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87

47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411

48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395

49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7

50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51

51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5

52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278

53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44

54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50

55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008

56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014

57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33

58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82

59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012

60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6

61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)

62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)

63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)

64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)

65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010

66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)

67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)

68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)

69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60

70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)

71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7

72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300

73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649

510 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

  • bbw031-TF1
  • bbw031-TF2
Page 8: Analysis of metabolomic data: tools, current strategies ...€¦ · and proteomics), metabolomics constitutes one of the building blocks of systems biology. Because of its focus on

together with the values of a numeric attribute (eg fold-changes P-values gene expression changes protein levels me-tabolite concentrations) have to be included and the data typehas to be specified by the user [58 69]

Data preparation

Data preparation includes integrity checking normalizationand compound name identification This set of procedures isfully available only in MetaboAnalyst MetaCoreTM and 3Omiscjust offer a tool for compound name standardization butMetaCoreTM also allows the integration with specific softwarefor data preprocessing InCroMAP and 3Omics only accept dataalready preprocessed and normalized with appropriate software(eg OpenMS MayDay R IBM-SPSS Statistics SAS or JMP) [2758] A brief description of each data preparation step is given inthe following text

Data integrity checkingAmong the four tools reviewed data integrity checking is per-formed only by MetaboAnalyst even if other software providethis functionality (eg XCMS [46] for metabolomic data orProgenesis QI [70] for proteomic data) As for MetaboAnalystthis step includes handling of missing values as well as identifi-cation and removal of outliers [69] Missing values are automat-ically replaced with small ones ie half of the minimumpositive value in the original data but the user can also specifyother missing value estimation methods eg by replacingthem with the meanmedian of the original data or by usingK-nearest neighbors probabilistic principal component ana-lysis Bayesian PCA or singular value decomposition methods

Data normalizationMetaboAnalyst provides different normalization methods (egby sum median or reference sample) that can be selected by theuser data transformation (log or cube root) and scaling (autopareto or range scaling) are also available [69]

Compound name identificationAs there is no universally accepted set of compound labels mol-ecule labels in userrsquos input data have to be converted to identi-fiers in public databases (eg HMDB PubChem CompoundChEBI KEGG or METLIN) MetaboAnalyst and 3Omics provide amodule specifically designed for this purpose [27 41] whereasMetaCoreTM has a built-in synonym dictionary for gene and pro-tein names that enables compound label standardization [54]InCroMAP does not provide an ID-conversion tool as it only ac-cepts data with an appropriate identifier (eg Affymetrix orAgilent for genes KEEG HMDB or PubChem for metabolites)[58] We highlight that the conversion of a compound name intoits correct ID code is not a painless procedure and can be anadditional source of error or ambiguity Furthermore differentresults may be obtained depending on the used platformtoolIn fact the same compound could be associated with differentIDs according to its different molecular structures (eg chirality)We recall an example reported by Cavill et al [22] regarding thecase of lactate and KEGG identifiers there are three KEGG iden-tifiers that relate to lactate according to the chirality of the mol-ecule (ie non-superposabity on its mirror image) and whetherthese chiral metabolites are distinguishable depends on theresolution provided by the analytical platform used Becausethey have the same mass a solution could be using specificallyselected elution columns in MS to ensure they elute at differenttimes and are thus experimentally distinguishable

Statistical analysis

Out of the four tools reviewed MetaboAnalyst is the one thatoffers the most complete set of statistical and machine learningmethods for data analysis 3Omics only performs clustering andco-expression analysis whereas InCroMAP does not provide astatistical module Unfortunately only limited information isavailable about the data analysis performed by MetaCoreTM butit seems incorporated in the pathway analysis module Analysisalgorithms used in MetaboAnalyst and 3Omics have originallybeen implemented in the R open-source project [65 66]

A wide variety of statistical methods and data miningapproaches can be applied on metabolomic data depending onthe kind of experiment performed In the following text we il-lustrate both univariate and multivariate statistics and alsobriefly describe time-course analysis for completeness

Univariate analysisIt is usually the first analysis performed as it provides a prelim-inary overview about the data features that are potentially sig-nificant in discriminating the conditions under study 3Omicsperforms co-expression analysis by means of an algorithm thatcomputes dissimilarity coefficients using the Euclidean dis-tance and displays results as heat maps

For two-group data MetaboAnalyst provides fold changeanalysis t-test and volcano plots (ie a type of scatter plots toquickly identify changes in large data sets of replicate data hav-ing the fold change on the x-axis and the negative log of theP-value on the y-axis) both for unpaired and paired analysesFor multi-group data it provides one-way analysis of variance(ANOVA) with associated post hoc analyses and correlationanalysis As a large number of metabolites are usually presentfor each patient or biological sample and an individual statis-tical test has to be performed for each metabolite a high num-ber of false-positive results can be obtained owing to multipletesting To limit it a multiple testing correction technique (egBonferroni BonferronindashHolm WestfallndashYoung or BenjaminindashHochberg correction) must be used to adjust the obtained sig-nificance P-values to keep the probability of observing at leastone significant result owing to chance below a predeterminedlevel [71] The BenjaminindashHochberg correction [72] also knownas false discovery rate (FDR) is the recommended one as itallows controlling the proportion of false positives among allsignificant results Both FDR- and Bonferroni-corrected P-valuesare provided by MetaboAnalyst for each evaluated metabolite

Multivariate analysisIt involves the simultaneous observation and analysis of morethan two statistical variables It is ideal for the analysis oflsquoomicsrsquo data as they usually consist of several features thatchange as a function of time phenotype or experimental condi-tion Multivariate analysis techniques include multivariateANOVA multivariate regression analysis factor analysis prin-cipal component analysis and partial least square discriminantanalysis all of them are supported by MetaboAnalyst Thesetechniques are useful for exploratory data analysis as theyallow summarizing original variables in fewer variables usingtheir weighted average

ClusteringCluster analysis aims to determine intrinsic groups in a set ofunlabeled data thus it is useful to identify groups of metabol-ites that have similar characteristic or behavior or that belongto the same biological pathway MetaboAnalyst supports two

Analysis of metabolomic data | 505

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering

ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines

Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]

Data interpretation and integration

This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text

Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge

MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]

MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in

Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided

with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact

and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-

way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound

colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas

other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of

this figure is available at BIB online httpbiboxfordjournalsorg

506 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view

InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]

3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]

Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation

MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results

Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)

InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]

3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts

Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-

logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds

with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version

of this figure is available at BIB online httpbiboxfordjournalsorg

Analysis of metabolomic data | 507

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]

Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]

Output data

Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis

MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]

MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results

of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing

Conclusion

The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions

Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford

Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations

Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification

508 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future

Key Points

bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools

bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge

bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge

Funding

This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)

References1 Mehrotra B Mendes P Bioinformatics approaches to inte-

grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15

2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210

3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93

4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95

5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39

6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9

7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)

8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34

9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)

10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)

11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302

12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65

13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1

14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93

15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293

16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108

17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2

18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43

19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9

20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48

21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69

22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090

23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100

24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)

25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60

26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6

27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764

28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)

29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4

30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)

31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7

32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109

33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4

34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382

35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)

Analysis of metabolomic data | 509

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9

37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4

38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18

39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)

403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)

41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7

42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7

43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7

44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244

45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)

46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87

47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411

48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395

49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7

50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51

51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5

52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278

53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44

54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50

55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008

56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014

57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33

58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82

59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012

60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6

61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)

62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)

63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)

64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)

65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010

66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)

67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)

68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)

69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60

70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)

71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7

72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300

73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649

510 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

  • bbw031-TF1
  • bbw031-TF2
Page 9: Analysis of metabolomic data: tools, current strategies ...€¦ · and proteomics), metabolomics constitutes one of the building blocks of systems biology. Because of its focus on

types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering

ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines

Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]

Data interpretation and integration

This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text

Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge

MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]

MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in

Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided

with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact

and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-

way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound

colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas

other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of

this figure is available at BIB online httpbiboxfordjournalsorg

506 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view

InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]

3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]

Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation

MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results

Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)

InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]

3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts

Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-

logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds

with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version

of this figure is available at BIB online httpbiboxfordjournalsorg

Analysis of metabolomic data | 507

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]

Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]

Output data

Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis

MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]

MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results

of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing

Conclusion

The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions

Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford

Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations

Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification

508 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future

Key Points

bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools

bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge

bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge

Funding

This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)

References1 Mehrotra B Mendes P Bioinformatics approaches to inte-

grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15

2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210

3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93

4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95

5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39

6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9

7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)

8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34

9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)

10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)

11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302

12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65

13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1

14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93

15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293

16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108

17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2

18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43

19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9

20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48

21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69

22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090

23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100

24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)

25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60

26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6

27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764

28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)

29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4

30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)

31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7

32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109

33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4

34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382

35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)

Analysis of metabolomic data | 509

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9

37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4

38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18

39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)

403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)

41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7

42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7

43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7

44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244

45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)

46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87

47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411

48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395

49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7

50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51

51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5

52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278

53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44

54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50

55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008

56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014

57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33

58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82

59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012

60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6

61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)

62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)

63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)

64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)

65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010

66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)

67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)

68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)

69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60

70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)

71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7

72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300

73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649

510 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

  • bbw031-TF1
  • bbw031-TF2
Page 10: Analysis of metabolomic data: tools, current strategies ...€¦ · and proteomics), metabolomics constitutes one of the building blocks of systems biology. Because of its focus on

Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view

InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]

3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]

Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation

MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results

Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)

InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]

3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts

Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-

logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds

with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version

of this figure is available at BIB online httpbiboxfordjournalsorg

Analysis of metabolomic data | 507

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]

Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]

Output data

Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis

MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]

MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results

of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing

Conclusion

The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions

Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford

Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations

Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification

508 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future

Key Points

bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools

bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge

bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge

Funding

This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)

References1 Mehrotra B Mendes P Bioinformatics approaches to inte-

grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15

2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210

3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93

4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95

5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39

6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9

7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)

8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34

9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)

10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)

11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302

12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65

13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1

14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93

15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293

16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108

17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2

18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43

19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9

20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48

21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69

22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090

23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100

24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)

25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60

26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6

27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764

28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)

29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4

30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)

31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7

32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109

33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4

34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382

35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)

Analysis of metabolomic data | 509

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9

37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4

38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18

39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)

403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)

41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7

42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7

43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7

44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244

45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)

46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87

47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411

48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395

49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7

50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51

51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5

52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278

53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44

54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50

55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008

56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014

57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33

58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82

59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012

60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6

61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)

62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)

63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)

64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)

65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010

66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)

67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)

68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)

69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60

70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)

71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7

72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300

73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649

510 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

  • bbw031-TF1
  • bbw031-TF2
Page 11: Analysis of metabolomic data: tools, current strategies ...€¦ · and proteomics), metabolomics constitutes one of the building blocks of systems biology. Because of its focus on

proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]

Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]

Output data

Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis

MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]

MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results

of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing

Conclusion

The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions

Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford

Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations

Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification

508 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future

Key Points

bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools

bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge

bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge

Funding

This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)

References1 Mehrotra B Mendes P Bioinformatics approaches to inte-

grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15

2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210

3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93

4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95

5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39

6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9

7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)

8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34

9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)

10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)

11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302

12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65

13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1

14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93

15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293

16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108

17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2

18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43

19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9

20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48

21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69

22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090

23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100

24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)

25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60

26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6

27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764

28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)

29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4

30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)

31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7

32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109

33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4

34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382

35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)

Analysis of metabolomic data | 509

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9

37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4

38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18

39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)

403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)

41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7

42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7

43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7

44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244

45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)

46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87

47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411

48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395

49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7

50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51

51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5

52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278

53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44

54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50

55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008

56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014

57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33

58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82

59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012

60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6

61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)

62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)

63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)

64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)

65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010

66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)

67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)

68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)

69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60

70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)

71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7

72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300

73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649

510 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

  • bbw031-TF1
  • bbw031-TF2
Page 12: Analysis of metabolomic data: tools, current strategies ...€¦ · and proteomics), metabolomics constitutes one of the building blocks of systems biology. Because of its focus on

and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future

Key Points

bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools

bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge

bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge

Funding

This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)

References1 Mehrotra B Mendes P Bioinformatics approaches to inte-

grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15

2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210

3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93

4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95

5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39

6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9

7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)

8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34

9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)

10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)

11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302

12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65

13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1

14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93

15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293

16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108

17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2

18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43

19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9

20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48

21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69

22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090

23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100

24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)

25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60

26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6

27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764

28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)

29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4

30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)

31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7

32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109

33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4

34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382

35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)

Analysis of metabolomic data | 509

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9

37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4

38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18

39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)

403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)

41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7

42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7

43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7

44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244

45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)

46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87

47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411

48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395

49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7

50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51

51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5

52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278

53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44

54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50

55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008

56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014

57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33

58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82

59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012

60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6

61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)

62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)

63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)

64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)

65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010

66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)

67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)

68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)

69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60

70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)

71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7

72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300

73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649

510 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

  • bbw031-TF1
  • bbw031-TF2
Page 13: Analysis of metabolomic data: tools, current strategies ...€¦ · and proteomics), metabolomics constitutes one of the building blocks of systems biology. Because of its focus on

36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9

37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4

38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18

39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)

403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)

41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7

42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7

43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7

44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244

45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)

46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87

47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411

48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395

49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7

50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51

51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5

52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278

53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44

54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50

55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008

56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014

57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33

58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82

59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012

60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6

61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)

62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)

63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)

64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)

65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010

66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)

67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)

68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)

69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60

70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)

71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7

72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300

73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649

510 | Cambiaghi et al

Dow

nloaded from httpsacadem

icoupcombibarticle-abstract1834982453286 by U

niversidad Nacional Autonom

a de Mexico user on 12 M

ay 2019

  • bbw031-TF1
  • bbw031-TF2