Overview Radboudumc Center for Proteomics, Glycomics and Metabolomics april 2015
Analysis of metabolomic data: tools, current strategies ...€¦ · and proteomics), metabolomics...
Transcript of Analysis of metabolomic data: tools, current strategies ...€¦ · and proteomics), metabolomics...
Analysis of metabolomic data tools current strategies
and future challenges for omics data integrationAlice Cambiaghi Manuela Ferrario and Marco MasseroliCorresponding author Alice Cambiaghi Dipartimento di Elettronica Informazione e Bioingegneria Politecnico di Milano Piazza Leonardo da Vinci 3220133 Milan Italy Tel thorn39-02-2399-3381 Fax thorn39-02-2399-3360 E-mail alicecambiaghipolimiit
Abstract
Metabolomics is a rapidly growing field consisting of the analysis of a large number of metabolites at a system scale Thetwo major goals of metabolomics are the identification of the metabolites characterizing each organism state and themeasurement of their dynamics under different situations (eg pathological conditions environmental factors) Knowledgeabout metabolites is crucial for the understanding of most cellular phenomena but this information alone is not sufficientto gain a comprehensive view of all the biological processes involved Integrated approaches combining metabolomics withtranscriptomics and proteomics are thus required to obtain much deeper insights than any of these techniques aloneAlthough this information is available multilevel integration of different lsquoomicsrsquo data is still a challenge The handlingprocessing analysis and integration of these data require specialized mathematical statistical and bioinformatics toolsand several technical problems hampering a rapid progress in the field exist Here we review four main tools for number ofusers or provided features (MetaCoreTM MetaboAnalyst InCroMAP and 3Omics) out of the several available for metabolomicdata analysis and integration with other lsquoomicsrsquo data highlighting their strong and weak aspects a number of related issuesaffecting data analysis and integration are also identified and discussed Overall we provide an objective description of howsome of the main currently available software packages work which may help the experimental practitioner in the choiceof a robust pipeline for metabolomic data analysis and integration
Key words Metabolomics integrated lsquoomicsrsquo data analysis metabolomic analysis tools comparative tool evaluation
Introduction
Metabolomics is an emerging field of the biological sciences itconcerns the high-throughput characterization of metabolitesie small molecular compounds (lt 1500 Da) which constitutethe end products of the cellular metabolism and form the chem-ical fingerprint of an organism at a precise time point More pre-cisely metabolomic studies involve the identification andquantification of metabolites with the aim of correlating theirchanges with pathological states or with the effect of external
influencing factors such as drugs or contaminants [1] Togetherwith the other main lsquoomicsrsquo areas (genomics transcriptomicsand proteomics) metabolomics constitutes one of the buildingblocks of systems biology Because of its focus on small mol-ecules and small interactions it has lately reached a wide-spread application in many different fields including molecularepidemiology toxicity assessment functional and nutritionalgenomics biomarker discovery and identification drug devel-opment and personalized health care [1ndash3]
Alice Cambiaghi is a PhD student in biomedical engineering at the Department of Electronics Information and Bioengineering Politecnico di Milano ItalyHer PhD project deals with finding new strategies for multilevel integration of lsquoomicsrsquo data as an approach for the identification of molecular biomarkersin acute heart failure induced by shockManuela Ferrario is an assistant professor at Politecnico di Milano Italy She is a member of the scientific committee of the EU project lsquoShockomics multi-scale approach to the identification of molecular biomarkers in acute heart failure induced by shockrsquo (2013ndash2017) Her research interests include mathem-atical modeling of physiological systems nonlinear analysis variability analysis of cardiovascular signals data mining and prediction modelsMarco Masseroli is an associate professor of bioinformatics and biomedical informatics at the Department of Electronics Information and BioengineeringPolitecnico di Milano Italy His research interests include distributed Internet technologies biomolecular databases biomedical terminologies and bio-ontologies to effectively retrieve manage analyze and semantically integrate genomic information with clinical and high-throughout genetic data He isthe author of more than 170 scientific articlesSubmitted 8 January 2016 Received (in revised form) 29 February 2016
VC The Author 2016 Published by Oxford University Press All rights reserved For Permissions please email journalspermissionsoupcom
498
Briefings in Bioinformatics 18(3) 2017 498ndash510
doi 101093bibbbw031Advance Access Publication Date 12 April 2016Software Review
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
The following two are the general analytical approaches inperforming a metabolomic analysis targeted and untargetedTargeted metabolomics refers to the detection and precise quan-tification (in nM or mgmL) of a small set of known compoundsIt is driven by a specific biochemical question or hypothesis inwhich the set of metabolites related to one or more pathways isalready defined A limitation of the targeted approach is that itrequires the compounds of interest to be known a priori and tobe available in their purified form Currently only few purifiedstandards (ie defined groups of chemically characterized andbiochemically annotated metabolites) have been clearly identi-fied and are available for a calibration process therefore owingto the wide variety of metabolites and their complex dynamicswithin a cell the targeted approach cannot yet be used alone fora comprehensive analysis of the metabolome The untargetedapproach instead also called lsquometabolite fingerprintingrsquo is notdriven by an a priori hypothesis and it is used for completemetabolome comparison (ie as many metabolites as possibleare measured and compared between samples) [4] Metabolitevariations are observed principally as total changes of chroma-tographic patterns without requiring previous knowledge of thecompounds under investigation Therefore untargeted metabo-lomics does not attempt to precisely quantify all measurablemetabolites in a sample but it only gives their relative quantifi-cation (fold change) [5 6] It is important to stress that in spiteof the presence of extensive metabolomics spectra repositories(eg SMPDB [7] KEEG [8] MetaCyc [9] and HumanCyc [10] to citesome) metabolite identification still constitutes a challenge inuntargeted metabolomics This is mainly because of techno-logical limitations such as the dependence on the intrinsic ana-lytical coverage of the platform used and the possible biastoward the detection of the most abundant molecules [11]Moreover the same molecule can be fragmented differently de-pending on the specific instrument or technique used and thishampers the metabolites spectra matching Furthermore in-strument-dependent variability between different kinds ofmass-analyzers and even between the ones of the same kindbut of different brands increases the variability in compoundidentification
Metabolomic research leads to the handling of complex datasets which include hundreds of metabolites their comprehen-sive evaluation requires a specialized data analysis that in-volves cheminformatics bioinformatics and statistics aspectsMoreover to better understand the role of each metabolite inthe studied condition metabolomic data must be interpretedthis requires that every chemical information derived frommetabolomic analyses has to be related to both biochemicalcauses and physiological consequences [1 12] Toward this endthe multilevel integration of metabolomic proteomic and tran-scriptomic information is fundamental for a better understand-ing of the cellular biology Although this information isavailable its fast evaluation and integration is still hamperedby technical and biological issues including (i) the complexityand heterogeneity intrinsic to biological data which require ap-propriate statistical and computational analysis methods (ii)the limited reproducibility of the results of transcriptomicproteomic and metabolomic research and the heterogeneity ofthe available analysis techniques which make data comparisonamong different labs hard (iii) the lack of standard data formatsboth for lsquoomicsrsquo data and for metadata and (iv) the need ofuser-friendly tools for integrative analysis of multiple datatypes [13 14]
Although several literature about metabolomic data produc-tion and analysis techniques exist [3ndash5 15ndash23] to the best of our
knowledge a comprehensive review illustrating the availabletools for the analysis and integration of metabolomic data withother lsquoomicsrsquo data has not been reported yet Here we describeand compare four tools (MetaCoreTM MetaboAnalyst InCroMAPand 3Omics [24ndash27]) which we selected among the several onescurrently available for metabolomic data analysis and integra-tion (Table 1 reports a list of the tools most frequently cited inthe literature) related issues and challenges arising in lsquoomicsrsquodata integration and analysis are also identified and discussed
MetaCoreTM and MetaboAnalyst are the most commonlyused tools by researchers who work with metabolomic data (thenumber of data analysis jobs submitted to MetaboAnalyst wasabout 40 000month in 2014 [41]) Both of them have been avail-able since several years (since 2004 and 2009 respectively)Conversely InCroMAP and 3Omics have been implementedmore recently (in 2011 and 2013 respectively) and they havenot yet overcome the previous ones However given their easeof use the knowledge of these two latter tools may be advanta-geous for researchers interested in evaluating the current possi-bility to integrate lsquoomicsrsquo data across multiple platforms In thisreview we provide an objective and practical assessment ofthese tools which may be helpful as a guide for the choice of arobust pipeline for metabolomic data integration and analysisWe do not give a description of lsquothe best toolsrsquo for metabolomicdata analysis but we elucidate how some of the main currentlyavailable software packages work highlighting their strong andweak aspects
Metabolomic analysis workflow a briefoverview
A detailed description of how a metabolomic analysis is per-formed can be found in [5 6 15] In the following we provide abrief overview to better understand the related issues we sum-marize the main aspects focusing on the integration with otherkinds of lsquoomicsrsquo data
A typical metabolomic study consists of several differentparts (schematized in Figure 1) which can be grouped in fourmain steps [5 12 16 42]
Sample preparation
The first step in the metabolomic workflow is the preparation ofthe biological sample (eg blood plasma and serum urine sal-iva solid tissues and cultured cells) According to the kind ofsample to analyze several approaches are used as described indetails elsewhere [18ndash21] Samples are usually homogenized orpulverized into smaller particles to increase their surface areafor the exposure to the extraction buffer which is chosen in ac-cordance to their chemical characteristics
Data acquisition
Once the sample is ready different techniques can be used toseparate and characterize chemically diverse groups of metab-olites To have different measurement dimensions based bothon chemical and physical properties of metabolites compoundseparation techniques (eg gas chromatography high-perform-ance liquid chromatography ultra-high-performance liquidchromatography and capillary electrophoresis) are combinedwith compound detection techniques such as mass spectrom-etry (MS) or nuclear magnetic resonance (NMR) Each detectionor separation method has different resolution sensitivity andtechnological limitations in identifying metabolites and it is
Analysis of metabolomic data | 499
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Tab
le1
Feat
ure
so
fth
ecu
rren
tly
avai
labl
em
etab
olo
mic
dat
aan
alys
isto
ols
mo
stci
ted
inth
eli
tera
ture
ord
ered
byth
eir
year
of
pu
blic
atio
n
To
oln
ame
To
old
escr
ipti
on
Dat
ap
rep
roce
ssin
gO
mic
sd
ata
inte
grat
edan
alys
is
Path
way
anal
ysis
Tra
nsc
rip
tom
icd
ata
Pro
teo
mic
dat
aM
etab
olo
mic
dat
aY
ear
Ref
eren
ce
IPA
aA
nal
ysis
and
visu
aliz
atio
no
fd
iffe
ren
tki
nd
so
fo
mic
sd
ata
N
A[2
8]
Met
aCo
rea
Fun
ctio
nal
anal
ysis
and
visu
aliz
atio
no
fd
iffe
ren
tki
nd
so
fh
igh
-th
rou
ghp
ut
om
ics
dat
a
o
2004
[24]
PaV
ESy
Dat
am
anag
ing
syst
emfo
red
itin
gan
dvi
sual
izat
ion
of
bio
logi
calp
ath
way
so
oo
2004
[29]
Pro
teo
me
Soft
war
ea
Qu
anti
zati
on
co
mp
ou
nd
iden
tifi
cati
on
and
stat
isti
cala
nd
pat
hw
ayan
alys
is
20
05[3
0]
Vis
AN
TV
isu
aliz
atio
nan
dan
alys
iso
fm
any
typ
eso
fbi
olo
gica
lnet
wo
rks
o
oo
2005
[31]
VA
NT
EDV
isu
aliz
atio
nan
dan
alys
iso
fn
etw
ork
sw
ith
rela
ted
exp
erim
enta
ldat
a
20
06[3
2]
Mas
sTR
IXA
nn
ota
tio
no
fin
pu
tm
ass
pea
ksan
dm
ap-
pin
go
fth
eid
enti
fied
com
po
un
ds
on
toth
esp
ecifi
cm
etab
oli
cp
ath
way
20
08[3
3]
Pro
MeT
raV
isu
aliz
atio
nan
din
tegr
atio
no
fd
ata
sets
of
dif
fere
nt
kin
ds
of
om
ics
dat
ao
nu
ser-
defi
ned
met
abo
lic
pat
hw
aym
aps
o
2009
[34]
Met
abo
An
alys
tC
om
pre
hen
sive
met
abo
lom
icd
ata
ana-
lysi
svi
sual
izat
ion
and
inte
rpre
tati
on
2009
[25
35]
Pain
tom
ics
Inte
grat
edvi
sual
anal
ysis
of
tran
scri
p-
tom
ican
dm
etab
olo
mic
dat
ao
20
10[3
6]
Met
PaPa
thw
ayan
alys
isan
dvi
sual
izat
ion
for
met
abo
lom
icd
ata
o
20
10[3
7]
IMPa
LaJo
int
pat
hw
ayan
alys
iso
ftr
ansc
rip
tom
ico
rp
rote
om
ican
dm
etab
olo
mic
dat
ao
2011
[38]
InC
roM
AP
Sin
gle
dat
ase
tan
din
tegr
ated
cro
ss-p
lat-
form
enri
chm
ent
anal
ysis
an
dp
ath
-w
ay-b
ased
visu
aliz
atio
ns
of
om
ics
dat
a
20
11[2
639
]
3Om
ics
An
alys
isi
nte
grat
ion
and
visu
aliz
atio
no
fd
iffe
ren
tki
nd
so
fh
um
ano
mic
sd
ata
o
2013
[27
40]
aIn
gen
uit
yPa
thw
ayA
nal
ysis
Co
mm
erci
alto
ol
Tic
ksin
dic
ate
full
fun
ctio
nal
ity
emp
tyci
rcle
sin
dic
ate
par
tial
fun
ctio
nal
ity
500 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Figure 1 Flowchart of a typical metabolomic study After sample preparation specific metabolic signals are acquired using heterogeneous analytical platforms (DATA
ACQUISITION) Raw signals are then pre-processed to produce data in a suitable format for univariate and multivariate statistical analyses For untargeted studies me-
tabolites have first to be identified from spectral information (DATA PROCESSING) Significantly expressed metabolites are then linked to the biological context
through enrichment and pathway analysis and mapped into networks Finally metabolomic data are integrated with other lsquoomicsrsquo data and with prior knowledge to
gain a comprehensive view of the molecular processes involved (DATA INTERPRETATION AND INTEGRATION)
Analysis of metabolomic data | 501
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
chosen in accordance to the chemical and physical characteris-tic of each sample and to the kind of analysis to be performed(targeted or untargeted) [12 15 18] MS is the most widely appliedtechnique as it allows reliable metabolite identification par-ticularly when used in tandem with chromatographic separ-ation methods so as to enhance its mass-resolving capabilitiesIt is rapid (the analysis time ranges from 5 to 140 min) andallows performing sensitive and selective qualitative and quan-titative analyses The main drawbacks of the MS technique arethe need to separate or purify the sample before it is directedinto the mass analyzer and the high cost of the instrumentCompared with MS NMR has a lower sensitivity thus resultingin limited ability for metabolite identification and quantifica-tion This implies that potentially important compounds thatare present at smaller concentrations can be hidden by largerpeaks and are thus less likely to be identified The advantagesof NMR are the high analytical reproducibility and that it is anon-destructive method that requires minimal sample prepar-ation [43 44]
Data processing
Once acquired raw signals (chromatograms spectra or NMRdata) are pre-processed by ad hoc software tools to facilitatecompound quantification (eg the commercial softwareSIEVETM by Thermo Scientific [45] or some freely availablesoftware packages such as the cloud-based platform XCMS [46]or the open-source cross-platforms MAVEN [47] and MZmine 2[48]) Generally this preprocessing involves noise reductionretention time correction peak detection and integration andchromatogram alignment Finally for untargeted metabolomicstudies different databases such as the Human MetabolomeDatabase (HMDB) [49] or the Metabolite and Tandem MSDatabase (METLIN) [50] are used to identify the metabolitesfrom spectra During data integrity checking different inputdata are then prepared to produce appropriate data matricesfor further analyses as better detailed in the next sectionBriefly before starting any kind of statistical analysis datanormalization is performed to reduce systematic biases ortechnical variations and to avoid misidentification of signifi-cant changes owing to the different orders of magnitudes ofmetabolomic data Following which significant differences be-tween sample sets can be identified using appropriate statis-tical methods A typical statistical analysis for metabolomicdata consists of two phases initially different univariate andmultivariate methods are used to generate an overview of theconsidered data sets and to identify the metabolites that showsignificant changes under the studied conditions then datamining techniques are used to discriminate groups of func-tionally related metabolites [16] A limit of traditional statis-tical methods is that they highlight relationships amongvariables based only on mathematical criteria (eg maximiza-tion of variance or correlation) and they do not take into ac-count correlations from biological origin [17] For this reasonthe combined use of several statistical and data mining tech-niques is highly recommended [4] Once identified signifi-cantly expressed metabolites are first ranked usingappropriate P-values then a cut-off threshold is applied to se-lect the top-k ones from the ranked list The choice of thisthreshold which is often arbitrary is critical as it may influ-ence the final biological interpretation In fact some moder-ate but significant changes may be missed or criticalcomponents of a particular biological process may be left outthus compromising subsequent analyses [42]
Data interpretation and integration
In this last step the selected metabolites are linked to the biolo-gical context under study through enrichment and pathwayanalyses More precisely enrichment analysis aims to investi-gate the enrichment (ie over- andor under-expression) of pre-defined groups of functionally related metabolites (iemetabolite sets) to identify significant and coordinated expres-sion changes among them This allows taking advantage of thelist of altered metabolites to suggest a biological pathway ordisease condition which can be further investigated [42]Conversely pathway analysis involves the description and visu-alization of the interactions among genes proteins or metabol-ites within cells tissues or organs Its goal is to identify thepathways that significantly impact on a given biological process[37] Enrichment and pathway analyses are performed using adhoc software tools [37] which map significant metabolites toknown biochemical pathways on the basis of the informationcontained in public databases such as the Kyoto Encyclopediaof Genes and Genomes (KEGG) [8] Once the metabolic pathwaysare identified this information has to be integrated with tran-scriptomic and proteomic data to obtain a comprehensive viewof all the biological processes involved [22] To capture all theinteractions that these data describe network-based visualiza-tion tools are commonly used by investigators to better under-stand and show their findings Depending on the kind ofinteraction under study a biological network can be repre-sented through a different type of graph Graphs are mathemat-ical structures of several kinds such as directed or undirectedgraphs directed acyclic graphs trees forests minimum span-ning trees Boolean networks and Steiner trees [51] To makegraph layouts informative and reproducible several visualiza-tion strategies have been proposed and adopted A frequentlyused visualization method is the ball-and-stick diagram wherepathway data are presented as networks with compounds (egmetabolites proteins) as nodes and reactions as edges [23 3752] Nodes can be placed hierarchically (a father node with oneor more child nodes) or radially as in radial networks or hiveplots which more closely represent the complexity of biologicalsystems [53] To reach a more reliable evaluation of the processunder study integration with biological knowledge derivedfrom the literature or from previous experimental data can alsobe performed [12 17] In spite of the availability of this informa-tion effective data integration is still far to be achieved owing tothe heterogeneity of current databases As a consequence usershave to take into account multiple databases to extract andmanually assemble the several different information neededthis makes data integration time-consuming and the final in-terpretation is often prone to errors because of different back-ground knowledge or biases of individual researchers [42]
Software tools for metabolomic data analysisand integration
Powerful software tools are essential to address the vastamount and variety of data generated by metabolomic analysesRequired software capabilities include (i) processing of rawspectral data (ii) statistical analysis to find significantly ex-pressed metabolites (iii) connection to metabolite databases formetabolite identification (iv) integration and analysis of mul-tiple heterogeneous lsquoomicsrsquo data and (v) bioinformatics ana-lysis and visualization of molecular interaction networks [1618] In this section we introduce the four data analysis tools se-lected (ie MetaCoreTM MetaboAnalyst InCroMAP and 3Omics)
502 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
their main functionalities are illustrated and compared on thebasis of some selected features such as data preprocessingtechniques used statistical analyses performed and methodsused for functional interpretation and if available for data inte-gration The main features of each tool are summarized inTable 2 It is important to point out that only MetaboAnalystprovides a comprehensive module for data preprocessing andstatistical analysis [25 41] MetaCoreTM and 3Omics only offerlimited support for these functionalities whereas InCroMAPonly accepts already preprocessed data In spite of these limita-tions we decided to include MetaCoreTM 3Omics and InCroMAPin this review because of their ability to perform comprehensiveanalyses of multi-omics data which are currently limited inother high-level analysis tools
MetaCoreTM
MetaCoreTM [24] is a commercial tool available both as a stand-alone and as a Web-based application It is a software suite forfunctional analysis of different kinds of high-throughput mo-lecular data (eg next-generation sequencing siRNAmicroRNA microarray-based gene expression and serial ana-lysis of gene expression data array-comparative genomic-hybridization DNA arrays proteomic data metabolic profilesand screening data) MetaCoreTM is an integrated system whichconsists of (i) a high-quality manually curated database ofmammalian biology including metabolites and other molecularclasses bioactive molecules and their interactions signalingand metabolic pathways (ii) genomic analysis tools to identifypotentially significant variants (iii) a data mining toolkit fordata visualization analysis and exchange of data (iv) a toolkit(pathway editor) for custom assembly of functional networksand (v) a set of parsers to upload and manipulate different typesof high-throughput molecular data [54ndash56] Unfortunately nopublic information is available about the details of howMetaCoreTM works thus our review of this tool is partially lim-ited and some details remain unclear
MetaboAnalyst
MetaboAnalyst [25 35] is an integrated freely accessible Web-based platform It was first released in 2009 then upgraded in2012 (MetaboAnalyst 20 [57]) and in 2015 (MetaboAnalyst 30also available for download and local installation) It offers a setof online tools for metabolomic data analysis that combine stat-istical analysis of data with their functional and biological inter-pretation and visualization [25] tutorials and protocolpapers are also available online MetaboAnalyst 30 has been re-implemented to improve performance capability and userinteractivity It offers eight functional modules which can begrouped in three categories (i) exploratory statistical analysis(Statistical Analysis and Time-Series Analysis modules) (ii) func-tional analysis (Enrichment Analysis Pathway Analysis andIntegrated Pathway Analysis modules for both genes and metabol-ites) and (iii) advanced methods for translational studies(Biomarker Analysis Sample Size Estimation and Power Analysismodules) In addition it has also an Other Utilities module con-taining a specialized function for lipidomic data analysis and acompound ID-conversion tool [41]
InCroMAP
InCroMAP is a stand-alone Java software originally developed in2011 for enrichment analysis and pathway-based visualizationsof genomic and proteomic data [27 40] The extended version
InCroMAP 15 released in 2013 also supports annotated metab-olomic data thus making this tool suitable for comprehensivesystem biology studies [58] InCroMAP is freely available at itsWeb site [39] which includes a user guide [59] example datafiles to test the tool and a short video tutorial The software per-forms metabolite set enrichment analysis and generates inter-active global maps of the cellular metabolism These mapsallow an integrated pathway-based visualization of data frommultiple lsquoomicsrsquo platforms and provide a useful overview of themetabolic changes present in the studied experimental condi-tion [26 59]
3Omics
3Omics is a platform-independent Web-based tool developedin 2013 for the analysis integration and visualization of tran-scriptomic proteomic and metabolomic human data It is freelyaccessible at [40] and the Web site also includes a help sectionTo demonstrate the software functionalities applied to realdata two case studies are also reported and explained in [27]3Omics supports correlation analysis co-expression profilingphenotype mapping pathway enrichment analysis and geneontology-based enrichment analysis More precisely dependingon the data provided the software offers four types of lsquoomicsrsquoanalyses (i) TranscriptomicsndashProteomicsndashMetabolomics (TndashPndashM) (ii) TranscriptomicsndashProteomics (TndashP) (iii) ProteomicsndashMetabolomics (PndashM) and (iv) TranscriptomicsndashMetabolomics(TndashM) A single-omics mode is also available to reveal intra-omics relationships 3Omics can also supplement missing tran-script protein and metabolite information related to the inputdata by text-mining the biomedical literature through iHOP (in-formation Hyperlinked Over Protein [60 61]) Users can thusperform multi-omics analyses even when only one or two outof three lsquoomicsrsquo data sets are available [27]
Evaluation and comparison of softwarefunctionalities
In this section strong and weak aspects of each of the four se-lected tools are reviewed and comparatively evaluated follow-ing the data analysis workflow previously described (Figure 1)
Input data
Various proprietary data formats have been developed to han-dle and store MS or NMR raw data this heterogeneity makes itdifficult for researchers to manipulate these data For this rea-son several software for the conversion of raw data file types(eg RAW) into a universal format (eg CSV TXT mzXML orMGF) have been implemented such as ProteoWizard [62]MassMatrix [63] and several others including MATLAB [64] andR [65 66] Once raw data have been converted to a suitable for-mat they can be further analyzed Data types and formatsaccepted as input by the four reviewed tools are reported inTable 2 Input data are usually organized as data matrices withsamples as rows and signal features as columns MetaCoreTMMetaboAnalyst and IncroMAP accept data in different kindsof tabular format (eg textual tab-delimited TXT or comma-separated value CSV files) whereas 3Omics only accepts CSVfiles MetaboAnalyst also accepts zipped files of NMR or MSpeak lists or of MS spectra which must be in mzXML mzDATAor NetCDF format As for MetaCoreTM gene lists can also be im-ported from output files of microarray analysis software such asAffymetrix [67] or Agilent [68] Compound names or IDs
Analysis of metabolomic data | 503
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Tab
le2
Mai
nfe
atu
res
of
the
sele
cted
fou
rso
ftw
are
too
lsfo
rm
etab
olo
mic
dat
aan
alys
is
Too
lM
etaC
oreT
MM
etab
oAn
alys
tIn
Cro
MA
P3O
mic
sY
ear
2004
2009
2011
2013
Inst
itu
tio
nG
eneG
oU
niv
ersi
tyo
fA
lber
taM
cGil
lU
niv
ersi
tyM
on
trea
l(C
A)
Cen
ter
for
Bio
info
rma
tics
of
the
Un
iver
sity
of
Tu
bin
gen
(D)
Mo
lecu
lar
Des
ign
ampM
etab
olo
mic
sLa
bora
tory
U
niv
ersi
tyo
fT
aiw
an(T
W)
Imp
lem
enta
tio
nW
eb-b
asedthorn
stan
d-a
lon
eW
eb-b
ased
Stan
d-a
lon
eW
eb-b
ased
Lice
nse
Co
mm
erci
alG
PL(G
NU
Gen
eral
Publ
icLi
cen
se)
LGPL
(GN
ULe
sser
Gen
eral
Publ
icLi
cen
se)
No
ne
Typ
eo
fkn
ow
led
gePr
op
riet
ary
Publ
icPu
blic
Publ
icIn
pu
td
ata
Gen
ep
rote
ino
rm
etab
oli
teli
sts
imp
ort
edas
tab-
de-
lim
ited
text
(TX
T)
com
ma-
sep
arat
edva
lues
(CSV
)or
Exce
lfile
sge
ne
list
sfr
om
mic
roar
ray
anal
ysis
soft
war
e(A
ffym
etri
xA
gile
nt)
Tab
-del
imit
edte
xt(T
XT
)or
com
ma-
sep
arat
edva
lues
(CSV
)fo
rco
nce
ntr
atio
ns
spec
tral
bin
so
rp
eak
inte
n-
sity
dat
azi
pp
edfi
les
(ZIP
)of
NM
Ro
rM
Sp
eak
list
so
ro
fM
Ssp
ectr
a(i
nN
etC
DF
mzX
ML
or
mzD
AT
Afo
rmat
)
Tab
-del
imit
edte
xt(T
XT
)or
com
ma-
sep
arat
edva
lues
(CSV
)of
het
ero
gen
eou
sty
pes
of
pro
cess
edlsquoo
mic
srsquod
ata
Co
mm
a-se
par
ated
valu
es(C
SV)
of
pro
cess
edtr
ansc
rip
tom
ic
pro
teo
mic
or
met
abo
lom
icd
ata
Dat
ap
rep
arat
ion
Dat
ain
tegr
ity
chec
kin
g
Dat
an
orm
aliz
atio
n
Co
mp
ou
nd
nam
eid
enti
fica
tio
n
Stat
isti
cala
nal
ysis
Un
ivar
iate
anal
ysis
Mu
ltiv
aria
tean
alys
is
Clu
ster
ing
Cla
ssifi
cati
on
Dat
ain
terp
reta
tio
nan
din
tegr
atio
nFu
nct
ion
alin
terp
reta
tio
n
Met
abo
lite
set
enri
chm
ent
anal
ysis
Met
abo
lic
pat
hw
ayan
alys
is
Met
abo
lite
map
pin
g
Hyp
erli
nks
toex
tern
ald
atab
ases
Dat
aIn
tegr
atio
n
Ou
tpu
td
ata
Net
wo
rks
can
beex
po
rted
intw
ofo
rmat
sN
etsh
ot
and
Net
wo
rki
mag
esas
PNG
file
s
rep
ort
sco
nta
inin
gp
lots
gr
aph
san
dta
bles
wit
hal
lth
ere
sult
sIm
ages
are
avai
l-ab
leas
TIF
or
PNG
file
s
Tab
ula
rfo
rmat
(eg
CSV
)fo
ren
rich
men
tan
alys
isre
sult
san
dJP
Gfi
les
for
pat
hw
ay-
base
dvi
sual
izat
ion
PNG
SV
Go
rSI
Ffo
rmat
sfo
rim
ages
Tic
ksin
dic
ate
too
lfu
nct
ion
alit
ies
504 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
together with the values of a numeric attribute (eg fold-changes P-values gene expression changes protein levels me-tabolite concentrations) have to be included and the data typehas to be specified by the user [58 69]
Data preparation
Data preparation includes integrity checking normalizationand compound name identification This set of procedures isfully available only in MetaboAnalyst MetaCoreTM and 3Omiscjust offer a tool for compound name standardization butMetaCoreTM also allows the integration with specific softwarefor data preprocessing InCroMAP and 3Omics only accept dataalready preprocessed and normalized with appropriate software(eg OpenMS MayDay R IBM-SPSS Statistics SAS or JMP) [2758] A brief description of each data preparation step is given inthe following text
Data integrity checkingAmong the four tools reviewed data integrity checking is per-formed only by MetaboAnalyst even if other software providethis functionality (eg XCMS [46] for metabolomic data orProgenesis QI [70] for proteomic data) As for MetaboAnalystthis step includes handling of missing values as well as identifi-cation and removal of outliers [69] Missing values are automat-ically replaced with small ones ie half of the minimumpositive value in the original data but the user can also specifyother missing value estimation methods eg by replacingthem with the meanmedian of the original data or by usingK-nearest neighbors probabilistic principal component ana-lysis Bayesian PCA or singular value decomposition methods
Data normalizationMetaboAnalyst provides different normalization methods (egby sum median or reference sample) that can be selected by theuser data transformation (log or cube root) and scaling (autopareto or range scaling) are also available [69]
Compound name identificationAs there is no universally accepted set of compound labels mol-ecule labels in userrsquos input data have to be converted to identi-fiers in public databases (eg HMDB PubChem CompoundChEBI KEGG or METLIN) MetaboAnalyst and 3Omics provide amodule specifically designed for this purpose [27 41] whereasMetaCoreTM has a built-in synonym dictionary for gene and pro-tein names that enables compound label standardization [54]InCroMAP does not provide an ID-conversion tool as it only ac-cepts data with an appropriate identifier (eg Affymetrix orAgilent for genes KEEG HMDB or PubChem for metabolites)[58] We highlight that the conversion of a compound name intoits correct ID code is not a painless procedure and can be anadditional source of error or ambiguity Furthermore differentresults may be obtained depending on the used platformtoolIn fact the same compound could be associated with differentIDs according to its different molecular structures (eg chirality)We recall an example reported by Cavill et al [22] regarding thecase of lactate and KEGG identifiers there are three KEGG iden-tifiers that relate to lactate according to the chirality of the mol-ecule (ie non-superposabity on its mirror image) and whetherthese chiral metabolites are distinguishable depends on theresolution provided by the analytical platform used Becausethey have the same mass a solution could be using specificallyselected elution columns in MS to ensure they elute at differenttimes and are thus experimentally distinguishable
Statistical analysis
Out of the four tools reviewed MetaboAnalyst is the one thatoffers the most complete set of statistical and machine learningmethods for data analysis 3Omics only performs clustering andco-expression analysis whereas InCroMAP does not provide astatistical module Unfortunately only limited information isavailable about the data analysis performed by MetaCoreTM butit seems incorporated in the pathway analysis module Analysisalgorithms used in MetaboAnalyst and 3Omics have originallybeen implemented in the R open-source project [65 66]
A wide variety of statistical methods and data miningapproaches can be applied on metabolomic data depending onthe kind of experiment performed In the following text we il-lustrate both univariate and multivariate statistics and alsobriefly describe time-course analysis for completeness
Univariate analysisIt is usually the first analysis performed as it provides a prelim-inary overview about the data features that are potentially sig-nificant in discriminating the conditions under study 3Omicsperforms co-expression analysis by means of an algorithm thatcomputes dissimilarity coefficients using the Euclidean dis-tance and displays results as heat maps
For two-group data MetaboAnalyst provides fold changeanalysis t-test and volcano plots (ie a type of scatter plots toquickly identify changes in large data sets of replicate data hav-ing the fold change on the x-axis and the negative log of theP-value on the y-axis) both for unpaired and paired analysesFor multi-group data it provides one-way analysis of variance(ANOVA) with associated post hoc analyses and correlationanalysis As a large number of metabolites are usually presentfor each patient or biological sample and an individual statis-tical test has to be performed for each metabolite a high num-ber of false-positive results can be obtained owing to multipletesting To limit it a multiple testing correction technique (egBonferroni BonferronindashHolm WestfallndashYoung or BenjaminindashHochberg correction) must be used to adjust the obtained sig-nificance P-values to keep the probability of observing at leastone significant result owing to chance below a predeterminedlevel [71] The BenjaminindashHochberg correction [72] also knownas false discovery rate (FDR) is the recommended one as itallows controlling the proportion of false positives among allsignificant results Both FDR- and Bonferroni-corrected P-valuesare provided by MetaboAnalyst for each evaluated metabolite
Multivariate analysisIt involves the simultaneous observation and analysis of morethan two statistical variables It is ideal for the analysis oflsquoomicsrsquo data as they usually consist of several features thatchange as a function of time phenotype or experimental condi-tion Multivariate analysis techniques include multivariateANOVA multivariate regression analysis factor analysis prin-cipal component analysis and partial least square discriminantanalysis all of them are supported by MetaboAnalyst Thesetechniques are useful for exploratory data analysis as theyallow summarizing original variables in fewer variables usingtheir weighted average
ClusteringCluster analysis aims to determine intrinsic groups in a set ofunlabeled data thus it is useful to identify groups of metabol-ites that have similar characteristic or behavior or that belongto the same biological pathway MetaboAnalyst supports two
Analysis of metabolomic data | 505
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering
ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines
Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]
Data interpretation and integration
This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text
Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge
MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]
MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in
Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided
with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact
and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-
way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound
colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas
other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of
this figure is available at BIB online httpbiboxfordjournalsorg
506 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view
InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]
3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]
Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation
MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results
Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)
InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]
3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts
Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-
logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds
with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version
of this figure is available at BIB online httpbiboxfordjournalsorg
Analysis of metabolomic data | 507
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]
Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]
Output data
Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis
MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]
MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results
of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing
Conclusion
The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions
Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford
Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations
Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification
508 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future
Key Points
bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools
bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge
bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge
Funding
This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)
References1 Mehrotra B Mendes P Bioinformatics approaches to inte-
grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15
2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210
3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93
4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95
5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39
6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9
7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)
8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34
9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)
10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)
11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302
12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65
13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1
14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93
15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293
16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108
17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2
18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43
19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9
20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48
21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69
22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090
23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100
24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)
25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60
26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6
27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764
28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)
29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4
30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)
31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7
32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109
33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4
34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382
35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)
Analysis of metabolomic data | 509
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9
37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4
38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18
39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)
403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)
41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7
42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7
43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7
44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244
45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)
46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87
47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411
48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395
49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7
50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51
51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5
52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278
53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44
54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50
55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008
56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014
57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33
58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82
59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012
60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6
61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)
62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)
63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)
64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)
65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010
66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)
67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)
68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)
69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60
70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)
71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7
72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300
73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649
510 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
- bbw031-TF1
- bbw031-TF2
-
The following two are the general analytical approaches inperforming a metabolomic analysis targeted and untargetedTargeted metabolomics refers to the detection and precise quan-tification (in nM or mgmL) of a small set of known compoundsIt is driven by a specific biochemical question or hypothesis inwhich the set of metabolites related to one or more pathways isalready defined A limitation of the targeted approach is that itrequires the compounds of interest to be known a priori and tobe available in their purified form Currently only few purifiedstandards (ie defined groups of chemically characterized andbiochemically annotated metabolites) have been clearly identi-fied and are available for a calibration process therefore owingto the wide variety of metabolites and their complex dynamicswithin a cell the targeted approach cannot yet be used alone fora comprehensive analysis of the metabolome The untargetedapproach instead also called lsquometabolite fingerprintingrsquo is notdriven by an a priori hypothesis and it is used for completemetabolome comparison (ie as many metabolites as possibleare measured and compared between samples) [4] Metabolitevariations are observed principally as total changes of chroma-tographic patterns without requiring previous knowledge of thecompounds under investigation Therefore untargeted metabo-lomics does not attempt to precisely quantify all measurablemetabolites in a sample but it only gives their relative quantifi-cation (fold change) [5 6] It is important to stress that in spiteof the presence of extensive metabolomics spectra repositories(eg SMPDB [7] KEEG [8] MetaCyc [9] and HumanCyc [10] to citesome) metabolite identification still constitutes a challenge inuntargeted metabolomics This is mainly because of techno-logical limitations such as the dependence on the intrinsic ana-lytical coverage of the platform used and the possible biastoward the detection of the most abundant molecules [11]Moreover the same molecule can be fragmented differently de-pending on the specific instrument or technique used and thishampers the metabolites spectra matching Furthermore in-strument-dependent variability between different kinds ofmass-analyzers and even between the ones of the same kindbut of different brands increases the variability in compoundidentification
Metabolomic research leads to the handling of complex datasets which include hundreds of metabolites their comprehen-sive evaluation requires a specialized data analysis that in-volves cheminformatics bioinformatics and statistics aspectsMoreover to better understand the role of each metabolite inthe studied condition metabolomic data must be interpretedthis requires that every chemical information derived frommetabolomic analyses has to be related to both biochemicalcauses and physiological consequences [1 12] Toward this endthe multilevel integration of metabolomic proteomic and tran-scriptomic information is fundamental for a better understand-ing of the cellular biology Although this information isavailable its fast evaluation and integration is still hamperedby technical and biological issues including (i) the complexityand heterogeneity intrinsic to biological data which require ap-propriate statistical and computational analysis methods (ii)the limited reproducibility of the results of transcriptomicproteomic and metabolomic research and the heterogeneity ofthe available analysis techniques which make data comparisonamong different labs hard (iii) the lack of standard data formatsboth for lsquoomicsrsquo data and for metadata and (iv) the need ofuser-friendly tools for integrative analysis of multiple datatypes [13 14]
Although several literature about metabolomic data produc-tion and analysis techniques exist [3ndash5 15ndash23] to the best of our
knowledge a comprehensive review illustrating the availabletools for the analysis and integration of metabolomic data withother lsquoomicsrsquo data has not been reported yet Here we describeand compare four tools (MetaCoreTM MetaboAnalyst InCroMAPand 3Omics [24ndash27]) which we selected among the several onescurrently available for metabolomic data analysis and integra-tion (Table 1 reports a list of the tools most frequently cited inthe literature) related issues and challenges arising in lsquoomicsrsquodata integration and analysis are also identified and discussed
MetaCoreTM and MetaboAnalyst are the most commonlyused tools by researchers who work with metabolomic data (thenumber of data analysis jobs submitted to MetaboAnalyst wasabout 40 000month in 2014 [41]) Both of them have been avail-able since several years (since 2004 and 2009 respectively)Conversely InCroMAP and 3Omics have been implementedmore recently (in 2011 and 2013 respectively) and they havenot yet overcome the previous ones However given their easeof use the knowledge of these two latter tools may be advanta-geous for researchers interested in evaluating the current possi-bility to integrate lsquoomicsrsquo data across multiple platforms In thisreview we provide an objective and practical assessment ofthese tools which may be helpful as a guide for the choice of arobust pipeline for metabolomic data integration and analysisWe do not give a description of lsquothe best toolsrsquo for metabolomicdata analysis but we elucidate how some of the main currentlyavailable software packages work highlighting their strong andweak aspects
Metabolomic analysis workflow a briefoverview
A detailed description of how a metabolomic analysis is per-formed can be found in [5 6 15] In the following we provide abrief overview to better understand the related issues we sum-marize the main aspects focusing on the integration with otherkinds of lsquoomicsrsquo data
A typical metabolomic study consists of several differentparts (schematized in Figure 1) which can be grouped in fourmain steps [5 12 16 42]
Sample preparation
The first step in the metabolomic workflow is the preparation ofthe biological sample (eg blood plasma and serum urine sal-iva solid tissues and cultured cells) According to the kind ofsample to analyze several approaches are used as described indetails elsewhere [18ndash21] Samples are usually homogenized orpulverized into smaller particles to increase their surface areafor the exposure to the extraction buffer which is chosen in ac-cordance to their chemical characteristics
Data acquisition
Once the sample is ready different techniques can be used toseparate and characterize chemically diverse groups of metab-olites To have different measurement dimensions based bothon chemical and physical properties of metabolites compoundseparation techniques (eg gas chromatography high-perform-ance liquid chromatography ultra-high-performance liquidchromatography and capillary electrophoresis) are combinedwith compound detection techniques such as mass spectrom-etry (MS) or nuclear magnetic resonance (NMR) Each detectionor separation method has different resolution sensitivity andtechnological limitations in identifying metabolites and it is
Analysis of metabolomic data | 499
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Tab
le1
Feat
ure
so
fth
ecu
rren
tly
avai
labl
em
etab
olo
mic
dat
aan
alys
isto
ols
mo
stci
ted
inth
eli
tera
ture
ord
ered
byth
eir
year
of
pu
blic
atio
n
To
oln
ame
To
old
escr
ipti
on
Dat
ap
rep
roce
ssin
gO
mic
sd
ata
inte
grat
edan
alys
is
Path
way
anal
ysis
Tra
nsc
rip
tom
icd
ata
Pro
teo
mic
dat
aM
etab
olo
mic
dat
aY
ear
Ref
eren
ce
IPA
aA
nal
ysis
and
visu
aliz
atio
no
fd
iffe
ren
tki
nd
so
fo
mic
sd
ata
N
A[2
8]
Met
aCo
rea
Fun
ctio
nal
anal
ysis
and
visu
aliz
atio
no
fd
iffe
ren
tki
nd
so
fh
igh
-th
rou
ghp
ut
om
ics
dat
a
o
2004
[24]
PaV
ESy
Dat
am
anag
ing
syst
emfo
red
itin
gan
dvi
sual
izat
ion
of
bio
logi
calp
ath
way
so
oo
2004
[29]
Pro
teo
me
Soft
war
ea
Qu
anti
zati
on
co
mp
ou
nd
iden
tifi
cati
on
and
stat
isti
cala
nd
pat
hw
ayan
alys
is
20
05[3
0]
Vis
AN
TV
isu
aliz
atio
nan
dan
alys
iso
fm
any
typ
eso
fbi
olo
gica
lnet
wo
rks
o
oo
2005
[31]
VA
NT
EDV
isu
aliz
atio
nan
dan
alys
iso
fn
etw
ork
sw
ith
rela
ted
exp
erim
enta
ldat
a
20
06[3
2]
Mas
sTR
IXA
nn
ota
tio
no
fin
pu
tm
ass
pea
ksan
dm
ap-
pin
go
fth
eid
enti
fied
com
po
un
ds
on
toth
esp
ecifi
cm
etab
oli
cp
ath
way
20
08[3
3]
Pro
MeT
raV
isu
aliz
atio
nan
din
tegr
atio
no
fd
ata
sets
of
dif
fere
nt
kin
ds
of
om
ics
dat
ao
nu
ser-
defi
ned
met
abo
lic
pat
hw
aym
aps
o
2009
[34]
Met
abo
An
alys
tC
om
pre
hen
sive
met
abo
lom
icd
ata
ana-
lysi
svi
sual
izat
ion
and
inte
rpre
tati
on
2009
[25
35]
Pain
tom
ics
Inte
grat
edvi
sual
anal
ysis
of
tran
scri
p-
tom
ican
dm
etab
olo
mic
dat
ao
20
10[3
6]
Met
PaPa
thw
ayan
alys
isan
dvi
sual
izat
ion
for
met
abo
lom
icd
ata
o
20
10[3
7]
IMPa
LaJo
int
pat
hw
ayan
alys
iso
ftr
ansc
rip
tom
ico
rp
rote
om
ican
dm
etab
olo
mic
dat
ao
2011
[38]
InC
roM
AP
Sin
gle
dat
ase
tan
din
tegr
ated
cro
ss-p
lat-
form
enri
chm
ent
anal
ysis
an
dp
ath
-w
ay-b
ased
visu
aliz
atio
ns
of
om
ics
dat
a
20
11[2
639
]
3Om
ics
An
alys
isi
nte
grat
ion
and
visu
aliz
atio
no
fd
iffe
ren
tki
nd
so
fh
um
ano
mic
sd
ata
o
2013
[27
40]
aIn
gen
uit
yPa
thw
ayA
nal
ysis
Co
mm
erci
alto
ol
Tic
ksin
dic
ate
full
fun
ctio
nal
ity
emp
tyci
rcle
sin
dic
ate
par
tial
fun
ctio
nal
ity
500 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Figure 1 Flowchart of a typical metabolomic study After sample preparation specific metabolic signals are acquired using heterogeneous analytical platforms (DATA
ACQUISITION) Raw signals are then pre-processed to produce data in a suitable format for univariate and multivariate statistical analyses For untargeted studies me-
tabolites have first to be identified from spectral information (DATA PROCESSING) Significantly expressed metabolites are then linked to the biological context
through enrichment and pathway analysis and mapped into networks Finally metabolomic data are integrated with other lsquoomicsrsquo data and with prior knowledge to
gain a comprehensive view of the molecular processes involved (DATA INTERPRETATION AND INTEGRATION)
Analysis of metabolomic data | 501
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
chosen in accordance to the chemical and physical characteris-tic of each sample and to the kind of analysis to be performed(targeted or untargeted) [12 15 18] MS is the most widely appliedtechnique as it allows reliable metabolite identification par-ticularly when used in tandem with chromatographic separ-ation methods so as to enhance its mass-resolving capabilitiesIt is rapid (the analysis time ranges from 5 to 140 min) andallows performing sensitive and selective qualitative and quan-titative analyses The main drawbacks of the MS technique arethe need to separate or purify the sample before it is directedinto the mass analyzer and the high cost of the instrumentCompared with MS NMR has a lower sensitivity thus resultingin limited ability for metabolite identification and quantifica-tion This implies that potentially important compounds thatare present at smaller concentrations can be hidden by largerpeaks and are thus less likely to be identified The advantagesof NMR are the high analytical reproducibility and that it is anon-destructive method that requires minimal sample prepar-ation [43 44]
Data processing
Once acquired raw signals (chromatograms spectra or NMRdata) are pre-processed by ad hoc software tools to facilitatecompound quantification (eg the commercial softwareSIEVETM by Thermo Scientific [45] or some freely availablesoftware packages such as the cloud-based platform XCMS [46]or the open-source cross-platforms MAVEN [47] and MZmine 2[48]) Generally this preprocessing involves noise reductionretention time correction peak detection and integration andchromatogram alignment Finally for untargeted metabolomicstudies different databases such as the Human MetabolomeDatabase (HMDB) [49] or the Metabolite and Tandem MSDatabase (METLIN) [50] are used to identify the metabolitesfrom spectra During data integrity checking different inputdata are then prepared to produce appropriate data matricesfor further analyses as better detailed in the next sectionBriefly before starting any kind of statistical analysis datanormalization is performed to reduce systematic biases ortechnical variations and to avoid misidentification of signifi-cant changes owing to the different orders of magnitudes ofmetabolomic data Following which significant differences be-tween sample sets can be identified using appropriate statis-tical methods A typical statistical analysis for metabolomicdata consists of two phases initially different univariate andmultivariate methods are used to generate an overview of theconsidered data sets and to identify the metabolites that showsignificant changes under the studied conditions then datamining techniques are used to discriminate groups of func-tionally related metabolites [16] A limit of traditional statis-tical methods is that they highlight relationships amongvariables based only on mathematical criteria (eg maximiza-tion of variance or correlation) and they do not take into ac-count correlations from biological origin [17] For this reasonthe combined use of several statistical and data mining tech-niques is highly recommended [4] Once identified signifi-cantly expressed metabolites are first ranked usingappropriate P-values then a cut-off threshold is applied to se-lect the top-k ones from the ranked list The choice of thisthreshold which is often arbitrary is critical as it may influ-ence the final biological interpretation In fact some moder-ate but significant changes may be missed or criticalcomponents of a particular biological process may be left outthus compromising subsequent analyses [42]
Data interpretation and integration
In this last step the selected metabolites are linked to the biolo-gical context under study through enrichment and pathwayanalyses More precisely enrichment analysis aims to investi-gate the enrichment (ie over- andor under-expression) of pre-defined groups of functionally related metabolites (iemetabolite sets) to identify significant and coordinated expres-sion changes among them This allows taking advantage of thelist of altered metabolites to suggest a biological pathway ordisease condition which can be further investigated [42]Conversely pathway analysis involves the description and visu-alization of the interactions among genes proteins or metabol-ites within cells tissues or organs Its goal is to identify thepathways that significantly impact on a given biological process[37] Enrichment and pathway analyses are performed using adhoc software tools [37] which map significant metabolites toknown biochemical pathways on the basis of the informationcontained in public databases such as the Kyoto Encyclopediaof Genes and Genomes (KEGG) [8] Once the metabolic pathwaysare identified this information has to be integrated with tran-scriptomic and proteomic data to obtain a comprehensive viewof all the biological processes involved [22] To capture all theinteractions that these data describe network-based visualiza-tion tools are commonly used by investigators to better under-stand and show their findings Depending on the kind ofinteraction under study a biological network can be repre-sented through a different type of graph Graphs are mathemat-ical structures of several kinds such as directed or undirectedgraphs directed acyclic graphs trees forests minimum span-ning trees Boolean networks and Steiner trees [51] To makegraph layouts informative and reproducible several visualiza-tion strategies have been proposed and adopted A frequentlyused visualization method is the ball-and-stick diagram wherepathway data are presented as networks with compounds (egmetabolites proteins) as nodes and reactions as edges [23 3752] Nodes can be placed hierarchically (a father node with oneor more child nodes) or radially as in radial networks or hiveplots which more closely represent the complexity of biologicalsystems [53] To reach a more reliable evaluation of the processunder study integration with biological knowledge derivedfrom the literature or from previous experimental data can alsobe performed [12 17] In spite of the availability of this informa-tion effective data integration is still far to be achieved owing tothe heterogeneity of current databases As a consequence usershave to take into account multiple databases to extract andmanually assemble the several different information neededthis makes data integration time-consuming and the final in-terpretation is often prone to errors because of different back-ground knowledge or biases of individual researchers [42]
Software tools for metabolomic data analysisand integration
Powerful software tools are essential to address the vastamount and variety of data generated by metabolomic analysesRequired software capabilities include (i) processing of rawspectral data (ii) statistical analysis to find significantly ex-pressed metabolites (iii) connection to metabolite databases formetabolite identification (iv) integration and analysis of mul-tiple heterogeneous lsquoomicsrsquo data and (v) bioinformatics ana-lysis and visualization of molecular interaction networks [1618] In this section we introduce the four data analysis tools se-lected (ie MetaCoreTM MetaboAnalyst InCroMAP and 3Omics)
502 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
their main functionalities are illustrated and compared on thebasis of some selected features such as data preprocessingtechniques used statistical analyses performed and methodsused for functional interpretation and if available for data inte-gration The main features of each tool are summarized inTable 2 It is important to point out that only MetaboAnalystprovides a comprehensive module for data preprocessing andstatistical analysis [25 41] MetaCoreTM and 3Omics only offerlimited support for these functionalities whereas InCroMAPonly accepts already preprocessed data In spite of these limita-tions we decided to include MetaCoreTM 3Omics and InCroMAPin this review because of their ability to perform comprehensiveanalyses of multi-omics data which are currently limited inother high-level analysis tools
MetaCoreTM
MetaCoreTM [24] is a commercial tool available both as a stand-alone and as a Web-based application It is a software suite forfunctional analysis of different kinds of high-throughput mo-lecular data (eg next-generation sequencing siRNAmicroRNA microarray-based gene expression and serial ana-lysis of gene expression data array-comparative genomic-hybridization DNA arrays proteomic data metabolic profilesand screening data) MetaCoreTM is an integrated system whichconsists of (i) a high-quality manually curated database ofmammalian biology including metabolites and other molecularclasses bioactive molecules and their interactions signalingand metabolic pathways (ii) genomic analysis tools to identifypotentially significant variants (iii) a data mining toolkit fordata visualization analysis and exchange of data (iv) a toolkit(pathway editor) for custom assembly of functional networksand (v) a set of parsers to upload and manipulate different typesof high-throughput molecular data [54ndash56] Unfortunately nopublic information is available about the details of howMetaCoreTM works thus our review of this tool is partially lim-ited and some details remain unclear
MetaboAnalyst
MetaboAnalyst [25 35] is an integrated freely accessible Web-based platform It was first released in 2009 then upgraded in2012 (MetaboAnalyst 20 [57]) and in 2015 (MetaboAnalyst 30also available for download and local installation) It offers a setof online tools for metabolomic data analysis that combine stat-istical analysis of data with their functional and biological inter-pretation and visualization [25] tutorials and protocolpapers are also available online MetaboAnalyst 30 has been re-implemented to improve performance capability and userinteractivity It offers eight functional modules which can begrouped in three categories (i) exploratory statistical analysis(Statistical Analysis and Time-Series Analysis modules) (ii) func-tional analysis (Enrichment Analysis Pathway Analysis andIntegrated Pathway Analysis modules for both genes and metabol-ites) and (iii) advanced methods for translational studies(Biomarker Analysis Sample Size Estimation and Power Analysismodules) In addition it has also an Other Utilities module con-taining a specialized function for lipidomic data analysis and acompound ID-conversion tool [41]
InCroMAP
InCroMAP is a stand-alone Java software originally developed in2011 for enrichment analysis and pathway-based visualizationsof genomic and proteomic data [27 40] The extended version
InCroMAP 15 released in 2013 also supports annotated metab-olomic data thus making this tool suitable for comprehensivesystem biology studies [58] InCroMAP is freely available at itsWeb site [39] which includes a user guide [59] example datafiles to test the tool and a short video tutorial The software per-forms metabolite set enrichment analysis and generates inter-active global maps of the cellular metabolism These mapsallow an integrated pathway-based visualization of data frommultiple lsquoomicsrsquo platforms and provide a useful overview of themetabolic changes present in the studied experimental condi-tion [26 59]
3Omics
3Omics is a platform-independent Web-based tool developedin 2013 for the analysis integration and visualization of tran-scriptomic proteomic and metabolomic human data It is freelyaccessible at [40] and the Web site also includes a help sectionTo demonstrate the software functionalities applied to realdata two case studies are also reported and explained in [27]3Omics supports correlation analysis co-expression profilingphenotype mapping pathway enrichment analysis and geneontology-based enrichment analysis More precisely dependingon the data provided the software offers four types of lsquoomicsrsquoanalyses (i) TranscriptomicsndashProteomicsndashMetabolomics (TndashPndashM) (ii) TranscriptomicsndashProteomics (TndashP) (iii) ProteomicsndashMetabolomics (PndashM) and (iv) TranscriptomicsndashMetabolomics(TndashM) A single-omics mode is also available to reveal intra-omics relationships 3Omics can also supplement missing tran-script protein and metabolite information related to the inputdata by text-mining the biomedical literature through iHOP (in-formation Hyperlinked Over Protein [60 61]) Users can thusperform multi-omics analyses even when only one or two outof three lsquoomicsrsquo data sets are available [27]
Evaluation and comparison of softwarefunctionalities
In this section strong and weak aspects of each of the four se-lected tools are reviewed and comparatively evaluated follow-ing the data analysis workflow previously described (Figure 1)
Input data
Various proprietary data formats have been developed to han-dle and store MS or NMR raw data this heterogeneity makes itdifficult for researchers to manipulate these data For this rea-son several software for the conversion of raw data file types(eg RAW) into a universal format (eg CSV TXT mzXML orMGF) have been implemented such as ProteoWizard [62]MassMatrix [63] and several others including MATLAB [64] andR [65 66] Once raw data have been converted to a suitable for-mat they can be further analyzed Data types and formatsaccepted as input by the four reviewed tools are reported inTable 2 Input data are usually organized as data matrices withsamples as rows and signal features as columns MetaCoreTMMetaboAnalyst and IncroMAP accept data in different kindsof tabular format (eg textual tab-delimited TXT or comma-separated value CSV files) whereas 3Omics only accepts CSVfiles MetaboAnalyst also accepts zipped files of NMR or MSpeak lists or of MS spectra which must be in mzXML mzDATAor NetCDF format As for MetaCoreTM gene lists can also be im-ported from output files of microarray analysis software such asAffymetrix [67] or Agilent [68] Compound names or IDs
Analysis of metabolomic data | 503
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Tab
le2
Mai
nfe
atu
res
of
the
sele
cted
fou
rso
ftw
are
too
lsfo
rm
etab
olo
mic
dat
aan
alys
is
Too
lM
etaC
oreT
MM
etab
oAn
alys
tIn
Cro
MA
P3O
mic
sY
ear
2004
2009
2011
2013
Inst
itu
tio
nG
eneG
oU
niv
ersi
tyo
fA
lber
taM
cGil
lU
niv
ersi
tyM
on
trea
l(C
A)
Cen
ter
for
Bio
info
rma
tics
of
the
Un
iver
sity
of
Tu
bin
gen
(D)
Mo
lecu
lar
Des
ign
ampM
etab
olo
mic
sLa
bora
tory
U
niv
ersi
tyo
fT
aiw
an(T
W)
Imp
lem
enta
tio
nW
eb-b
asedthorn
stan
d-a
lon
eW
eb-b
ased
Stan
d-a
lon
eW
eb-b
ased
Lice
nse
Co
mm
erci
alG
PL(G
NU
Gen
eral
Publ
icLi
cen
se)
LGPL
(GN
ULe
sser
Gen
eral
Publ
icLi
cen
se)
No
ne
Typ
eo
fkn
ow
led
gePr
op
riet
ary
Publ
icPu
blic
Publ
icIn
pu
td
ata
Gen
ep
rote
ino
rm
etab
oli
teli
sts
imp
ort
edas
tab-
de-
lim
ited
text
(TX
T)
com
ma-
sep
arat
edva
lues
(CSV
)or
Exce
lfile
sge
ne
list
sfr
om
mic
roar
ray
anal
ysis
soft
war
e(A
ffym
etri
xA
gile
nt)
Tab
-del
imit
edte
xt(T
XT
)or
com
ma-
sep
arat
edva
lues
(CSV
)fo
rco
nce
ntr
atio
ns
spec
tral
bin
so
rp
eak
inte
n-
sity
dat
azi
pp
edfi
les
(ZIP
)of
NM
Ro
rM
Sp
eak
list
so
ro
fM
Ssp
ectr
a(i
nN
etC
DF
mzX
ML
or
mzD
AT
Afo
rmat
)
Tab
-del
imit
edte
xt(T
XT
)or
com
ma-
sep
arat
edva
lues
(CSV
)of
het
ero
gen
eou
sty
pes
of
pro
cess
edlsquoo
mic
srsquod
ata
Co
mm
a-se
par
ated
valu
es(C
SV)
of
pro
cess
edtr
ansc
rip
tom
ic
pro
teo
mic
or
met
abo
lom
icd
ata
Dat
ap
rep
arat
ion
Dat
ain
tegr
ity
chec
kin
g
Dat
an
orm
aliz
atio
n
Co
mp
ou
nd
nam
eid
enti
fica
tio
n
Stat
isti
cala
nal
ysis
Un
ivar
iate
anal
ysis
Mu
ltiv
aria
tean
alys
is
Clu
ster
ing
Cla
ssifi
cati
on
Dat
ain
terp
reta
tio
nan
din
tegr
atio
nFu
nct
ion
alin
terp
reta
tio
n
Met
abo
lite
set
enri
chm
ent
anal
ysis
Met
abo
lic
pat
hw
ayan
alys
is
Met
abo
lite
map
pin
g
Hyp
erli
nks
toex
tern
ald
atab
ases
Dat
aIn
tegr
atio
n
Ou
tpu
td
ata
Net
wo
rks
can
beex
po
rted
intw
ofo
rmat
sN
etsh
ot
and
Net
wo
rki
mag
esas
PNG
file
s
rep
ort
sco
nta
inin
gp
lots
gr
aph
san
dta
bles
wit
hal
lth
ere
sult
sIm
ages
are
avai
l-ab
leas
TIF
or
PNG
file
s
Tab
ula
rfo
rmat
(eg
CSV
)fo
ren
rich
men
tan
alys
isre
sult
san
dJP
Gfi
les
for
pat
hw
ay-
base
dvi
sual
izat
ion
PNG
SV
Go
rSI
Ffo
rmat
sfo
rim
ages
Tic
ksin
dic
ate
too
lfu
nct
ion
alit
ies
504 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
together with the values of a numeric attribute (eg fold-changes P-values gene expression changes protein levels me-tabolite concentrations) have to be included and the data typehas to be specified by the user [58 69]
Data preparation
Data preparation includes integrity checking normalizationand compound name identification This set of procedures isfully available only in MetaboAnalyst MetaCoreTM and 3Omiscjust offer a tool for compound name standardization butMetaCoreTM also allows the integration with specific softwarefor data preprocessing InCroMAP and 3Omics only accept dataalready preprocessed and normalized with appropriate software(eg OpenMS MayDay R IBM-SPSS Statistics SAS or JMP) [2758] A brief description of each data preparation step is given inthe following text
Data integrity checkingAmong the four tools reviewed data integrity checking is per-formed only by MetaboAnalyst even if other software providethis functionality (eg XCMS [46] for metabolomic data orProgenesis QI [70] for proteomic data) As for MetaboAnalystthis step includes handling of missing values as well as identifi-cation and removal of outliers [69] Missing values are automat-ically replaced with small ones ie half of the minimumpositive value in the original data but the user can also specifyother missing value estimation methods eg by replacingthem with the meanmedian of the original data or by usingK-nearest neighbors probabilistic principal component ana-lysis Bayesian PCA or singular value decomposition methods
Data normalizationMetaboAnalyst provides different normalization methods (egby sum median or reference sample) that can be selected by theuser data transformation (log or cube root) and scaling (autopareto or range scaling) are also available [69]
Compound name identificationAs there is no universally accepted set of compound labels mol-ecule labels in userrsquos input data have to be converted to identi-fiers in public databases (eg HMDB PubChem CompoundChEBI KEGG or METLIN) MetaboAnalyst and 3Omics provide amodule specifically designed for this purpose [27 41] whereasMetaCoreTM has a built-in synonym dictionary for gene and pro-tein names that enables compound label standardization [54]InCroMAP does not provide an ID-conversion tool as it only ac-cepts data with an appropriate identifier (eg Affymetrix orAgilent for genes KEEG HMDB or PubChem for metabolites)[58] We highlight that the conversion of a compound name intoits correct ID code is not a painless procedure and can be anadditional source of error or ambiguity Furthermore differentresults may be obtained depending on the used platformtoolIn fact the same compound could be associated with differentIDs according to its different molecular structures (eg chirality)We recall an example reported by Cavill et al [22] regarding thecase of lactate and KEGG identifiers there are three KEGG iden-tifiers that relate to lactate according to the chirality of the mol-ecule (ie non-superposabity on its mirror image) and whetherthese chiral metabolites are distinguishable depends on theresolution provided by the analytical platform used Becausethey have the same mass a solution could be using specificallyselected elution columns in MS to ensure they elute at differenttimes and are thus experimentally distinguishable
Statistical analysis
Out of the four tools reviewed MetaboAnalyst is the one thatoffers the most complete set of statistical and machine learningmethods for data analysis 3Omics only performs clustering andco-expression analysis whereas InCroMAP does not provide astatistical module Unfortunately only limited information isavailable about the data analysis performed by MetaCoreTM butit seems incorporated in the pathway analysis module Analysisalgorithms used in MetaboAnalyst and 3Omics have originallybeen implemented in the R open-source project [65 66]
A wide variety of statistical methods and data miningapproaches can be applied on metabolomic data depending onthe kind of experiment performed In the following text we il-lustrate both univariate and multivariate statistics and alsobriefly describe time-course analysis for completeness
Univariate analysisIt is usually the first analysis performed as it provides a prelim-inary overview about the data features that are potentially sig-nificant in discriminating the conditions under study 3Omicsperforms co-expression analysis by means of an algorithm thatcomputes dissimilarity coefficients using the Euclidean dis-tance and displays results as heat maps
For two-group data MetaboAnalyst provides fold changeanalysis t-test and volcano plots (ie a type of scatter plots toquickly identify changes in large data sets of replicate data hav-ing the fold change on the x-axis and the negative log of theP-value on the y-axis) both for unpaired and paired analysesFor multi-group data it provides one-way analysis of variance(ANOVA) with associated post hoc analyses and correlationanalysis As a large number of metabolites are usually presentfor each patient or biological sample and an individual statis-tical test has to be performed for each metabolite a high num-ber of false-positive results can be obtained owing to multipletesting To limit it a multiple testing correction technique (egBonferroni BonferronindashHolm WestfallndashYoung or BenjaminindashHochberg correction) must be used to adjust the obtained sig-nificance P-values to keep the probability of observing at leastone significant result owing to chance below a predeterminedlevel [71] The BenjaminindashHochberg correction [72] also knownas false discovery rate (FDR) is the recommended one as itallows controlling the proportion of false positives among allsignificant results Both FDR- and Bonferroni-corrected P-valuesare provided by MetaboAnalyst for each evaluated metabolite
Multivariate analysisIt involves the simultaneous observation and analysis of morethan two statistical variables It is ideal for the analysis oflsquoomicsrsquo data as they usually consist of several features thatchange as a function of time phenotype or experimental condi-tion Multivariate analysis techniques include multivariateANOVA multivariate regression analysis factor analysis prin-cipal component analysis and partial least square discriminantanalysis all of them are supported by MetaboAnalyst Thesetechniques are useful for exploratory data analysis as theyallow summarizing original variables in fewer variables usingtheir weighted average
ClusteringCluster analysis aims to determine intrinsic groups in a set ofunlabeled data thus it is useful to identify groups of metabol-ites that have similar characteristic or behavior or that belongto the same biological pathway MetaboAnalyst supports two
Analysis of metabolomic data | 505
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering
ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines
Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]
Data interpretation and integration
This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text
Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge
MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]
MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in
Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided
with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact
and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-
way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound
colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas
other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of
this figure is available at BIB online httpbiboxfordjournalsorg
506 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view
InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]
3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]
Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation
MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results
Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)
InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]
3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts
Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-
logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds
with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version
of this figure is available at BIB online httpbiboxfordjournalsorg
Analysis of metabolomic data | 507
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]
Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]
Output data
Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis
MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]
MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results
of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing
Conclusion
The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions
Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford
Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations
Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification
508 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future
Key Points
bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools
bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge
bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge
Funding
This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)
References1 Mehrotra B Mendes P Bioinformatics approaches to inte-
grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15
2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210
3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93
4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95
5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39
6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9
7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)
8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34
9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)
10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)
11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302
12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65
13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1
14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93
15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293
16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108
17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2
18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43
19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9
20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48
21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69
22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090
23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100
24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)
25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60
26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6
27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764
28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)
29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4
30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)
31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7
32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109
33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4
34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382
35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)
Analysis of metabolomic data | 509
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9
37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4
38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18
39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)
403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)
41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7
42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7
43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7
44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244
45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)
46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87
47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411
48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395
49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7
50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51
51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5
52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278
53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44
54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50
55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008
56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014
57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33
58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82
59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012
60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6
61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)
62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)
63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)
64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)
65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010
66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)
67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)
68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)
69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60
70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)
71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7
72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300
73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649
510 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
- bbw031-TF1
- bbw031-TF2
-
Tab
le1
Feat
ure
so
fth
ecu
rren
tly
avai
labl
em
etab
olo
mic
dat
aan
alys
isto
ols
mo
stci
ted
inth
eli
tera
ture
ord
ered
byth
eir
year
of
pu
blic
atio
n
To
oln
ame
To
old
escr
ipti
on
Dat
ap
rep
roce
ssin
gO
mic
sd
ata
inte
grat
edan
alys
is
Path
way
anal
ysis
Tra
nsc
rip
tom
icd
ata
Pro
teo
mic
dat
aM
etab
olo
mic
dat
aY
ear
Ref
eren
ce
IPA
aA
nal
ysis
and
visu
aliz
atio
no
fd
iffe
ren
tki
nd
so
fo
mic
sd
ata
N
A[2
8]
Met
aCo
rea
Fun
ctio
nal
anal
ysis
and
visu
aliz
atio
no
fd
iffe
ren
tki
nd
so
fh
igh
-th
rou
ghp
ut
om
ics
dat
a
o
2004
[24]
PaV
ESy
Dat
am
anag
ing
syst
emfo
red
itin
gan
dvi
sual
izat
ion
of
bio
logi
calp
ath
way
so
oo
2004
[29]
Pro
teo
me
Soft
war
ea
Qu
anti
zati
on
co
mp
ou
nd
iden
tifi
cati
on
and
stat
isti
cala
nd
pat
hw
ayan
alys
is
20
05[3
0]
Vis
AN
TV
isu
aliz
atio
nan
dan
alys
iso
fm
any
typ
eso
fbi
olo
gica
lnet
wo
rks
o
oo
2005
[31]
VA
NT
EDV
isu
aliz
atio
nan
dan
alys
iso
fn
etw
ork
sw
ith
rela
ted
exp
erim
enta
ldat
a
20
06[3
2]
Mas
sTR
IXA
nn
ota
tio
no
fin
pu
tm
ass
pea
ksan
dm
ap-
pin
go
fth
eid
enti
fied
com
po
un
ds
on
toth
esp
ecifi
cm
etab
oli
cp
ath
way
20
08[3
3]
Pro
MeT
raV
isu
aliz
atio
nan
din
tegr
atio
no
fd
ata
sets
of
dif
fere
nt
kin
ds
of
om
ics
dat
ao
nu
ser-
defi
ned
met
abo
lic
pat
hw
aym
aps
o
2009
[34]
Met
abo
An
alys
tC
om
pre
hen
sive
met
abo
lom
icd
ata
ana-
lysi
svi
sual
izat
ion
and
inte
rpre
tati
on
2009
[25
35]
Pain
tom
ics
Inte
grat
edvi
sual
anal
ysis
of
tran
scri
p-
tom
ican
dm
etab
olo
mic
dat
ao
20
10[3
6]
Met
PaPa
thw
ayan
alys
isan
dvi
sual
izat
ion
for
met
abo
lom
icd
ata
o
20
10[3
7]
IMPa
LaJo
int
pat
hw
ayan
alys
iso
ftr
ansc
rip
tom
ico
rp
rote
om
ican
dm
etab
olo
mic
dat
ao
2011
[38]
InC
roM
AP
Sin
gle
dat
ase
tan
din
tegr
ated
cro
ss-p
lat-
form
enri
chm
ent
anal
ysis
an
dp
ath
-w
ay-b
ased
visu
aliz
atio
ns
of
om
ics
dat
a
20
11[2
639
]
3Om
ics
An
alys
isi
nte
grat
ion
and
visu
aliz
atio
no
fd
iffe
ren
tki
nd
so
fh
um
ano
mic
sd
ata
o
2013
[27
40]
aIn
gen
uit
yPa
thw
ayA
nal
ysis
Co
mm
erci
alto
ol
Tic
ksin
dic
ate
full
fun
ctio
nal
ity
emp
tyci
rcle
sin
dic
ate
par
tial
fun
ctio
nal
ity
500 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Figure 1 Flowchart of a typical metabolomic study After sample preparation specific metabolic signals are acquired using heterogeneous analytical platforms (DATA
ACQUISITION) Raw signals are then pre-processed to produce data in a suitable format for univariate and multivariate statistical analyses For untargeted studies me-
tabolites have first to be identified from spectral information (DATA PROCESSING) Significantly expressed metabolites are then linked to the biological context
through enrichment and pathway analysis and mapped into networks Finally metabolomic data are integrated with other lsquoomicsrsquo data and with prior knowledge to
gain a comprehensive view of the molecular processes involved (DATA INTERPRETATION AND INTEGRATION)
Analysis of metabolomic data | 501
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
chosen in accordance to the chemical and physical characteris-tic of each sample and to the kind of analysis to be performed(targeted or untargeted) [12 15 18] MS is the most widely appliedtechnique as it allows reliable metabolite identification par-ticularly when used in tandem with chromatographic separ-ation methods so as to enhance its mass-resolving capabilitiesIt is rapid (the analysis time ranges from 5 to 140 min) andallows performing sensitive and selective qualitative and quan-titative analyses The main drawbacks of the MS technique arethe need to separate or purify the sample before it is directedinto the mass analyzer and the high cost of the instrumentCompared with MS NMR has a lower sensitivity thus resultingin limited ability for metabolite identification and quantifica-tion This implies that potentially important compounds thatare present at smaller concentrations can be hidden by largerpeaks and are thus less likely to be identified The advantagesof NMR are the high analytical reproducibility and that it is anon-destructive method that requires minimal sample prepar-ation [43 44]
Data processing
Once acquired raw signals (chromatograms spectra or NMRdata) are pre-processed by ad hoc software tools to facilitatecompound quantification (eg the commercial softwareSIEVETM by Thermo Scientific [45] or some freely availablesoftware packages such as the cloud-based platform XCMS [46]or the open-source cross-platforms MAVEN [47] and MZmine 2[48]) Generally this preprocessing involves noise reductionretention time correction peak detection and integration andchromatogram alignment Finally for untargeted metabolomicstudies different databases such as the Human MetabolomeDatabase (HMDB) [49] or the Metabolite and Tandem MSDatabase (METLIN) [50] are used to identify the metabolitesfrom spectra During data integrity checking different inputdata are then prepared to produce appropriate data matricesfor further analyses as better detailed in the next sectionBriefly before starting any kind of statistical analysis datanormalization is performed to reduce systematic biases ortechnical variations and to avoid misidentification of signifi-cant changes owing to the different orders of magnitudes ofmetabolomic data Following which significant differences be-tween sample sets can be identified using appropriate statis-tical methods A typical statistical analysis for metabolomicdata consists of two phases initially different univariate andmultivariate methods are used to generate an overview of theconsidered data sets and to identify the metabolites that showsignificant changes under the studied conditions then datamining techniques are used to discriminate groups of func-tionally related metabolites [16] A limit of traditional statis-tical methods is that they highlight relationships amongvariables based only on mathematical criteria (eg maximiza-tion of variance or correlation) and they do not take into ac-count correlations from biological origin [17] For this reasonthe combined use of several statistical and data mining tech-niques is highly recommended [4] Once identified signifi-cantly expressed metabolites are first ranked usingappropriate P-values then a cut-off threshold is applied to se-lect the top-k ones from the ranked list The choice of thisthreshold which is often arbitrary is critical as it may influ-ence the final biological interpretation In fact some moder-ate but significant changes may be missed or criticalcomponents of a particular biological process may be left outthus compromising subsequent analyses [42]
Data interpretation and integration
In this last step the selected metabolites are linked to the biolo-gical context under study through enrichment and pathwayanalyses More precisely enrichment analysis aims to investi-gate the enrichment (ie over- andor under-expression) of pre-defined groups of functionally related metabolites (iemetabolite sets) to identify significant and coordinated expres-sion changes among them This allows taking advantage of thelist of altered metabolites to suggest a biological pathway ordisease condition which can be further investigated [42]Conversely pathway analysis involves the description and visu-alization of the interactions among genes proteins or metabol-ites within cells tissues or organs Its goal is to identify thepathways that significantly impact on a given biological process[37] Enrichment and pathway analyses are performed using adhoc software tools [37] which map significant metabolites toknown biochemical pathways on the basis of the informationcontained in public databases such as the Kyoto Encyclopediaof Genes and Genomes (KEGG) [8] Once the metabolic pathwaysare identified this information has to be integrated with tran-scriptomic and proteomic data to obtain a comprehensive viewof all the biological processes involved [22] To capture all theinteractions that these data describe network-based visualiza-tion tools are commonly used by investigators to better under-stand and show their findings Depending on the kind ofinteraction under study a biological network can be repre-sented through a different type of graph Graphs are mathemat-ical structures of several kinds such as directed or undirectedgraphs directed acyclic graphs trees forests minimum span-ning trees Boolean networks and Steiner trees [51] To makegraph layouts informative and reproducible several visualiza-tion strategies have been proposed and adopted A frequentlyused visualization method is the ball-and-stick diagram wherepathway data are presented as networks with compounds (egmetabolites proteins) as nodes and reactions as edges [23 3752] Nodes can be placed hierarchically (a father node with oneor more child nodes) or radially as in radial networks or hiveplots which more closely represent the complexity of biologicalsystems [53] To reach a more reliable evaluation of the processunder study integration with biological knowledge derivedfrom the literature or from previous experimental data can alsobe performed [12 17] In spite of the availability of this informa-tion effective data integration is still far to be achieved owing tothe heterogeneity of current databases As a consequence usershave to take into account multiple databases to extract andmanually assemble the several different information neededthis makes data integration time-consuming and the final in-terpretation is often prone to errors because of different back-ground knowledge or biases of individual researchers [42]
Software tools for metabolomic data analysisand integration
Powerful software tools are essential to address the vastamount and variety of data generated by metabolomic analysesRequired software capabilities include (i) processing of rawspectral data (ii) statistical analysis to find significantly ex-pressed metabolites (iii) connection to metabolite databases formetabolite identification (iv) integration and analysis of mul-tiple heterogeneous lsquoomicsrsquo data and (v) bioinformatics ana-lysis and visualization of molecular interaction networks [1618] In this section we introduce the four data analysis tools se-lected (ie MetaCoreTM MetaboAnalyst InCroMAP and 3Omics)
502 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
their main functionalities are illustrated and compared on thebasis of some selected features such as data preprocessingtechniques used statistical analyses performed and methodsused for functional interpretation and if available for data inte-gration The main features of each tool are summarized inTable 2 It is important to point out that only MetaboAnalystprovides a comprehensive module for data preprocessing andstatistical analysis [25 41] MetaCoreTM and 3Omics only offerlimited support for these functionalities whereas InCroMAPonly accepts already preprocessed data In spite of these limita-tions we decided to include MetaCoreTM 3Omics and InCroMAPin this review because of their ability to perform comprehensiveanalyses of multi-omics data which are currently limited inother high-level analysis tools
MetaCoreTM
MetaCoreTM [24] is a commercial tool available both as a stand-alone and as a Web-based application It is a software suite forfunctional analysis of different kinds of high-throughput mo-lecular data (eg next-generation sequencing siRNAmicroRNA microarray-based gene expression and serial ana-lysis of gene expression data array-comparative genomic-hybridization DNA arrays proteomic data metabolic profilesand screening data) MetaCoreTM is an integrated system whichconsists of (i) a high-quality manually curated database ofmammalian biology including metabolites and other molecularclasses bioactive molecules and their interactions signalingand metabolic pathways (ii) genomic analysis tools to identifypotentially significant variants (iii) a data mining toolkit fordata visualization analysis and exchange of data (iv) a toolkit(pathway editor) for custom assembly of functional networksand (v) a set of parsers to upload and manipulate different typesof high-throughput molecular data [54ndash56] Unfortunately nopublic information is available about the details of howMetaCoreTM works thus our review of this tool is partially lim-ited and some details remain unclear
MetaboAnalyst
MetaboAnalyst [25 35] is an integrated freely accessible Web-based platform It was first released in 2009 then upgraded in2012 (MetaboAnalyst 20 [57]) and in 2015 (MetaboAnalyst 30also available for download and local installation) It offers a setof online tools for metabolomic data analysis that combine stat-istical analysis of data with their functional and biological inter-pretation and visualization [25] tutorials and protocolpapers are also available online MetaboAnalyst 30 has been re-implemented to improve performance capability and userinteractivity It offers eight functional modules which can begrouped in three categories (i) exploratory statistical analysis(Statistical Analysis and Time-Series Analysis modules) (ii) func-tional analysis (Enrichment Analysis Pathway Analysis andIntegrated Pathway Analysis modules for both genes and metabol-ites) and (iii) advanced methods for translational studies(Biomarker Analysis Sample Size Estimation and Power Analysismodules) In addition it has also an Other Utilities module con-taining a specialized function for lipidomic data analysis and acompound ID-conversion tool [41]
InCroMAP
InCroMAP is a stand-alone Java software originally developed in2011 for enrichment analysis and pathway-based visualizationsof genomic and proteomic data [27 40] The extended version
InCroMAP 15 released in 2013 also supports annotated metab-olomic data thus making this tool suitable for comprehensivesystem biology studies [58] InCroMAP is freely available at itsWeb site [39] which includes a user guide [59] example datafiles to test the tool and a short video tutorial The software per-forms metabolite set enrichment analysis and generates inter-active global maps of the cellular metabolism These mapsallow an integrated pathway-based visualization of data frommultiple lsquoomicsrsquo platforms and provide a useful overview of themetabolic changes present in the studied experimental condi-tion [26 59]
3Omics
3Omics is a platform-independent Web-based tool developedin 2013 for the analysis integration and visualization of tran-scriptomic proteomic and metabolomic human data It is freelyaccessible at [40] and the Web site also includes a help sectionTo demonstrate the software functionalities applied to realdata two case studies are also reported and explained in [27]3Omics supports correlation analysis co-expression profilingphenotype mapping pathway enrichment analysis and geneontology-based enrichment analysis More precisely dependingon the data provided the software offers four types of lsquoomicsrsquoanalyses (i) TranscriptomicsndashProteomicsndashMetabolomics (TndashPndashM) (ii) TranscriptomicsndashProteomics (TndashP) (iii) ProteomicsndashMetabolomics (PndashM) and (iv) TranscriptomicsndashMetabolomics(TndashM) A single-omics mode is also available to reveal intra-omics relationships 3Omics can also supplement missing tran-script protein and metabolite information related to the inputdata by text-mining the biomedical literature through iHOP (in-formation Hyperlinked Over Protein [60 61]) Users can thusperform multi-omics analyses even when only one or two outof three lsquoomicsrsquo data sets are available [27]
Evaluation and comparison of softwarefunctionalities
In this section strong and weak aspects of each of the four se-lected tools are reviewed and comparatively evaluated follow-ing the data analysis workflow previously described (Figure 1)
Input data
Various proprietary data formats have been developed to han-dle and store MS or NMR raw data this heterogeneity makes itdifficult for researchers to manipulate these data For this rea-son several software for the conversion of raw data file types(eg RAW) into a universal format (eg CSV TXT mzXML orMGF) have been implemented such as ProteoWizard [62]MassMatrix [63] and several others including MATLAB [64] andR [65 66] Once raw data have been converted to a suitable for-mat they can be further analyzed Data types and formatsaccepted as input by the four reviewed tools are reported inTable 2 Input data are usually organized as data matrices withsamples as rows and signal features as columns MetaCoreTMMetaboAnalyst and IncroMAP accept data in different kindsof tabular format (eg textual tab-delimited TXT or comma-separated value CSV files) whereas 3Omics only accepts CSVfiles MetaboAnalyst also accepts zipped files of NMR or MSpeak lists or of MS spectra which must be in mzXML mzDATAor NetCDF format As for MetaCoreTM gene lists can also be im-ported from output files of microarray analysis software such asAffymetrix [67] or Agilent [68] Compound names or IDs
Analysis of metabolomic data | 503
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Tab
le2
Mai
nfe
atu
res
of
the
sele
cted
fou
rso
ftw
are
too
lsfo
rm
etab
olo
mic
dat
aan
alys
is
Too
lM
etaC
oreT
MM
etab
oAn
alys
tIn
Cro
MA
P3O
mic
sY
ear
2004
2009
2011
2013
Inst
itu
tio
nG
eneG
oU
niv
ersi
tyo
fA
lber
taM
cGil
lU
niv
ersi
tyM
on
trea
l(C
A)
Cen
ter
for
Bio
info
rma
tics
of
the
Un
iver
sity
of
Tu
bin
gen
(D)
Mo
lecu
lar
Des
ign
ampM
etab
olo
mic
sLa
bora
tory
U
niv
ersi
tyo
fT
aiw
an(T
W)
Imp
lem
enta
tio
nW
eb-b
asedthorn
stan
d-a
lon
eW
eb-b
ased
Stan
d-a
lon
eW
eb-b
ased
Lice
nse
Co
mm
erci
alG
PL(G
NU
Gen
eral
Publ
icLi
cen
se)
LGPL
(GN
ULe
sser
Gen
eral
Publ
icLi
cen
se)
No
ne
Typ
eo
fkn
ow
led
gePr
op
riet
ary
Publ
icPu
blic
Publ
icIn
pu
td
ata
Gen
ep
rote
ino
rm
etab
oli
teli
sts
imp
ort
edas
tab-
de-
lim
ited
text
(TX
T)
com
ma-
sep
arat
edva
lues
(CSV
)or
Exce
lfile
sge
ne
list
sfr
om
mic
roar
ray
anal
ysis
soft
war
e(A
ffym
etri
xA
gile
nt)
Tab
-del
imit
edte
xt(T
XT
)or
com
ma-
sep
arat
edva
lues
(CSV
)fo
rco
nce
ntr
atio
ns
spec
tral
bin
so
rp
eak
inte
n-
sity
dat
azi
pp
edfi
les
(ZIP
)of
NM
Ro
rM
Sp
eak
list
so
ro
fM
Ssp
ectr
a(i
nN
etC
DF
mzX
ML
or
mzD
AT
Afo
rmat
)
Tab
-del
imit
edte
xt(T
XT
)or
com
ma-
sep
arat
edva
lues
(CSV
)of
het
ero
gen
eou
sty
pes
of
pro
cess
edlsquoo
mic
srsquod
ata
Co
mm
a-se
par
ated
valu
es(C
SV)
of
pro
cess
edtr
ansc
rip
tom
ic
pro
teo
mic
or
met
abo
lom
icd
ata
Dat
ap
rep
arat
ion
Dat
ain
tegr
ity
chec
kin
g
Dat
an
orm
aliz
atio
n
Co
mp
ou
nd
nam
eid
enti
fica
tio
n
Stat
isti
cala
nal
ysis
Un
ivar
iate
anal
ysis
Mu
ltiv
aria
tean
alys
is
Clu
ster
ing
Cla
ssifi
cati
on
Dat
ain
terp
reta
tio
nan
din
tegr
atio
nFu
nct
ion
alin
terp
reta
tio
n
Met
abo
lite
set
enri
chm
ent
anal
ysis
Met
abo
lic
pat
hw
ayan
alys
is
Met
abo
lite
map
pin
g
Hyp
erli
nks
toex
tern
ald
atab
ases
Dat
aIn
tegr
atio
n
Ou
tpu
td
ata
Net
wo
rks
can
beex
po
rted
intw
ofo
rmat
sN
etsh
ot
and
Net
wo
rki
mag
esas
PNG
file
s
rep
ort
sco
nta
inin
gp
lots
gr
aph
san
dta
bles
wit
hal
lth
ere
sult
sIm
ages
are
avai
l-ab
leas
TIF
or
PNG
file
s
Tab
ula
rfo
rmat
(eg
CSV
)fo
ren
rich
men
tan
alys
isre
sult
san
dJP
Gfi
les
for
pat
hw
ay-
base
dvi
sual
izat
ion
PNG
SV
Go
rSI
Ffo
rmat
sfo
rim
ages
Tic
ksin
dic
ate
too
lfu
nct
ion
alit
ies
504 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
together with the values of a numeric attribute (eg fold-changes P-values gene expression changes protein levels me-tabolite concentrations) have to be included and the data typehas to be specified by the user [58 69]
Data preparation
Data preparation includes integrity checking normalizationand compound name identification This set of procedures isfully available only in MetaboAnalyst MetaCoreTM and 3Omiscjust offer a tool for compound name standardization butMetaCoreTM also allows the integration with specific softwarefor data preprocessing InCroMAP and 3Omics only accept dataalready preprocessed and normalized with appropriate software(eg OpenMS MayDay R IBM-SPSS Statistics SAS or JMP) [2758] A brief description of each data preparation step is given inthe following text
Data integrity checkingAmong the four tools reviewed data integrity checking is per-formed only by MetaboAnalyst even if other software providethis functionality (eg XCMS [46] for metabolomic data orProgenesis QI [70] for proteomic data) As for MetaboAnalystthis step includes handling of missing values as well as identifi-cation and removal of outliers [69] Missing values are automat-ically replaced with small ones ie half of the minimumpositive value in the original data but the user can also specifyother missing value estimation methods eg by replacingthem with the meanmedian of the original data or by usingK-nearest neighbors probabilistic principal component ana-lysis Bayesian PCA or singular value decomposition methods
Data normalizationMetaboAnalyst provides different normalization methods (egby sum median or reference sample) that can be selected by theuser data transformation (log or cube root) and scaling (autopareto or range scaling) are also available [69]
Compound name identificationAs there is no universally accepted set of compound labels mol-ecule labels in userrsquos input data have to be converted to identi-fiers in public databases (eg HMDB PubChem CompoundChEBI KEGG or METLIN) MetaboAnalyst and 3Omics provide amodule specifically designed for this purpose [27 41] whereasMetaCoreTM has a built-in synonym dictionary for gene and pro-tein names that enables compound label standardization [54]InCroMAP does not provide an ID-conversion tool as it only ac-cepts data with an appropriate identifier (eg Affymetrix orAgilent for genes KEEG HMDB or PubChem for metabolites)[58] We highlight that the conversion of a compound name intoits correct ID code is not a painless procedure and can be anadditional source of error or ambiguity Furthermore differentresults may be obtained depending on the used platformtoolIn fact the same compound could be associated with differentIDs according to its different molecular structures (eg chirality)We recall an example reported by Cavill et al [22] regarding thecase of lactate and KEGG identifiers there are three KEGG iden-tifiers that relate to lactate according to the chirality of the mol-ecule (ie non-superposabity on its mirror image) and whetherthese chiral metabolites are distinguishable depends on theresolution provided by the analytical platform used Becausethey have the same mass a solution could be using specificallyselected elution columns in MS to ensure they elute at differenttimes and are thus experimentally distinguishable
Statistical analysis
Out of the four tools reviewed MetaboAnalyst is the one thatoffers the most complete set of statistical and machine learningmethods for data analysis 3Omics only performs clustering andco-expression analysis whereas InCroMAP does not provide astatistical module Unfortunately only limited information isavailable about the data analysis performed by MetaCoreTM butit seems incorporated in the pathway analysis module Analysisalgorithms used in MetaboAnalyst and 3Omics have originallybeen implemented in the R open-source project [65 66]
A wide variety of statistical methods and data miningapproaches can be applied on metabolomic data depending onthe kind of experiment performed In the following text we il-lustrate both univariate and multivariate statistics and alsobriefly describe time-course analysis for completeness
Univariate analysisIt is usually the first analysis performed as it provides a prelim-inary overview about the data features that are potentially sig-nificant in discriminating the conditions under study 3Omicsperforms co-expression analysis by means of an algorithm thatcomputes dissimilarity coefficients using the Euclidean dis-tance and displays results as heat maps
For two-group data MetaboAnalyst provides fold changeanalysis t-test and volcano plots (ie a type of scatter plots toquickly identify changes in large data sets of replicate data hav-ing the fold change on the x-axis and the negative log of theP-value on the y-axis) both for unpaired and paired analysesFor multi-group data it provides one-way analysis of variance(ANOVA) with associated post hoc analyses and correlationanalysis As a large number of metabolites are usually presentfor each patient or biological sample and an individual statis-tical test has to be performed for each metabolite a high num-ber of false-positive results can be obtained owing to multipletesting To limit it a multiple testing correction technique (egBonferroni BonferronindashHolm WestfallndashYoung or BenjaminindashHochberg correction) must be used to adjust the obtained sig-nificance P-values to keep the probability of observing at leastone significant result owing to chance below a predeterminedlevel [71] The BenjaminindashHochberg correction [72] also knownas false discovery rate (FDR) is the recommended one as itallows controlling the proportion of false positives among allsignificant results Both FDR- and Bonferroni-corrected P-valuesare provided by MetaboAnalyst for each evaluated metabolite
Multivariate analysisIt involves the simultaneous observation and analysis of morethan two statistical variables It is ideal for the analysis oflsquoomicsrsquo data as they usually consist of several features thatchange as a function of time phenotype or experimental condi-tion Multivariate analysis techniques include multivariateANOVA multivariate regression analysis factor analysis prin-cipal component analysis and partial least square discriminantanalysis all of them are supported by MetaboAnalyst Thesetechniques are useful for exploratory data analysis as theyallow summarizing original variables in fewer variables usingtheir weighted average
ClusteringCluster analysis aims to determine intrinsic groups in a set ofunlabeled data thus it is useful to identify groups of metabol-ites that have similar characteristic or behavior or that belongto the same biological pathway MetaboAnalyst supports two
Analysis of metabolomic data | 505
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering
ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines
Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]
Data interpretation and integration
This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text
Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge
MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]
MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in
Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided
with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact
and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-
way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound
colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas
other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of
this figure is available at BIB online httpbiboxfordjournalsorg
506 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view
InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]
3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]
Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation
MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results
Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)
InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]
3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts
Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-
logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds
with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version
of this figure is available at BIB online httpbiboxfordjournalsorg
Analysis of metabolomic data | 507
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]
Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]
Output data
Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis
MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]
MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results
of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing
Conclusion
The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions
Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford
Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations
Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification
508 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future
Key Points
bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools
bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge
bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge
Funding
This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)
References1 Mehrotra B Mendes P Bioinformatics approaches to inte-
grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15
2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210
3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93
4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95
5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39
6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9
7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)
8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34
9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)
10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)
11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302
12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65
13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1
14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93
15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293
16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108
17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2
18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43
19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9
20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48
21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69
22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090
23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100
24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)
25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60
26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6
27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764
28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)
29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4
30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)
31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7
32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109
33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4
34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382
35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)
Analysis of metabolomic data | 509
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9
37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4
38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18
39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)
403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)
41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7
42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7
43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7
44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244
45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)
46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87
47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411
48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395
49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7
50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51
51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5
52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278
53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44
54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50
55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008
56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014
57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33
58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82
59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012
60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6
61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)
62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)
63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)
64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)
65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010
66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)
67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)
68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)
69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60
70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)
71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7
72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300
73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649
510 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
- bbw031-TF1
- bbw031-TF2
-
Figure 1 Flowchart of a typical metabolomic study After sample preparation specific metabolic signals are acquired using heterogeneous analytical platforms (DATA
ACQUISITION) Raw signals are then pre-processed to produce data in a suitable format for univariate and multivariate statistical analyses For untargeted studies me-
tabolites have first to be identified from spectral information (DATA PROCESSING) Significantly expressed metabolites are then linked to the biological context
through enrichment and pathway analysis and mapped into networks Finally metabolomic data are integrated with other lsquoomicsrsquo data and with prior knowledge to
gain a comprehensive view of the molecular processes involved (DATA INTERPRETATION AND INTEGRATION)
Analysis of metabolomic data | 501
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
chosen in accordance to the chemical and physical characteris-tic of each sample and to the kind of analysis to be performed(targeted or untargeted) [12 15 18] MS is the most widely appliedtechnique as it allows reliable metabolite identification par-ticularly when used in tandem with chromatographic separ-ation methods so as to enhance its mass-resolving capabilitiesIt is rapid (the analysis time ranges from 5 to 140 min) andallows performing sensitive and selective qualitative and quan-titative analyses The main drawbacks of the MS technique arethe need to separate or purify the sample before it is directedinto the mass analyzer and the high cost of the instrumentCompared with MS NMR has a lower sensitivity thus resultingin limited ability for metabolite identification and quantifica-tion This implies that potentially important compounds thatare present at smaller concentrations can be hidden by largerpeaks and are thus less likely to be identified The advantagesof NMR are the high analytical reproducibility and that it is anon-destructive method that requires minimal sample prepar-ation [43 44]
Data processing
Once acquired raw signals (chromatograms spectra or NMRdata) are pre-processed by ad hoc software tools to facilitatecompound quantification (eg the commercial softwareSIEVETM by Thermo Scientific [45] or some freely availablesoftware packages such as the cloud-based platform XCMS [46]or the open-source cross-platforms MAVEN [47] and MZmine 2[48]) Generally this preprocessing involves noise reductionretention time correction peak detection and integration andchromatogram alignment Finally for untargeted metabolomicstudies different databases such as the Human MetabolomeDatabase (HMDB) [49] or the Metabolite and Tandem MSDatabase (METLIN) [50] are used to identify the metabolitesfrom spectra During data integrity checking different inputdata are then prepared to produce appropriate data matricesfor further analyses as better detailed in the next sectionBriefly before starting any kind of statistical analysis datanormalization is performed to reduce systematic biases ortechnical variations and to avoid misidentification of signifi-cant changes owing to the different orders of magnitudes ofmetabolomic data Following which significant differences be-tween sample sets can be identified using appropriate statis-tical methods A typical statistical analysis for metabolomicdata consists of two phases initially different univariate andmultivariate methods are used to generate an overview of theconsidered data sets and to identify the metabolites that showsignificant changes under the studied conditions then datamining techniques are used to discriminate groups of func-tionally related metabolites [16] A limit of traditional statis-tical methods is that they highlight relationships amongvariables based only on mathematical criteria (eg maximiza-tion of variance or correlation) and they do not take into ac-count correlations from biological origin [17] For this reasonthe combined use of several statistical and data mining tech-niques is highly recommended [4] Once identified signifi-cantly expressed metabolites are first ranked usingappropriate P-values then a cut-off threshold is applied to se-lect the top-k ones from the ranked list The choice of thisthreshold which is often arbitrary is critical as it may influ-ence the final biological interpretation In fact some moder-ate but significant changes may be missed or criticalcomponents of a particular biological process may be left outthus compromising subsequent analyses [42]
Data interpretation and integration
In this last step the selected metabolites are linked to the biolo-gical context under study through enrichment and pathwayanalyses More precisely enrichment analysis aims to investi-gate the enrichment (ie over- andor under-expression) of pre-defined groups of functionally related metabolites (iemetabolite sets) to identify significant and coordinated expres-sion changes among them This allows taking advantage of thelist of altered metabolites to suggest a biological pathway ordisease condition which can be further investigated [42]Conversely pathway analysis involves the description and visu-alization of the interactions among genes proteins or metabol-ites within cells tissues or organs Its goal is to identify thepathways that significantly impact on a given biological process[37] Enrichment and pathway analyses are performed using adhoc software tools [37] which map significant metabolites toknown biochemical pathways on the basis of the informationcontained in public databases such as the Kyoto Encyclopediaof Genes and Genomes (KEGG) [8] Once the metabolic pathwaysare identified this information has to be integrated with tran-scriptomic and proteomic data to obtain a comprehensive viewof all the biological processes involved [22] To capture all theinteractions that these data describe network-based visualiza-tion tools are commonly used by investigators to better under-stand and show their findings Depending on the kind ofinteraction under study a biological network can be repre-sented through a different type of graph Graphs are mathemat-ical structures of several kinds such as directed or undirectedgraphs directed acyclic graphs trees forests minimum span-ning trees Boolean networks and Steiner trees [51] To makegraph layouts informative and reproducible several visualiza-tion strategies have been proposed and adopted A frequentlyused visualization method is the ball-and-stick diagram wherepathway data are presented as networks with compounds (egmetabolites proteins) as nodes and reactions as edges [23 3752] Nodes can be placed hierarchically (a father node with oneor more child nodes) or radially as in radial networks or hiveplots which more closely represent the complexity of biologicalsystems [53] To reach a more reliable evaluation of the processunder study integration with biological knowledge derivedfrom the literature or from previous experimental data can alsobe performed [12 17] In spite of the availability of this informa-tion effective data integration is still far to be achieved owing tothe heterogeneity of current databases As a consequence usershave to take into account multiple databases to extract andmanually assemble the several different information neededthis makes data integration time-consuming and the final in-terpretation is often prone to errors because of different back-ground knowledge or biases of individual researchers [42]
Software tools for metabolomic data analysisand integration
Powerful software tools are essential to address the vastamount and variety of data generated by metabolomic analysesRequired software capabilities include (i) processing of rawspectral data (ii) statistical analysis to find significantly ex-pressed metabolites (iii) connection to metabolite databases formetabolite identification (iv) integration and analysis of mul-tiple heterogeneous lsquoomicsrsquo data and (v) bioinformatics ana-lysis and visualization of molecular interaction networks [1618] In this section we introduce the four data analysis tools se-lected (ie MetaCoreTM MetaboAnalyst InCroMAP and 3Omics)
502 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
their main functionalities are illustrated and compared on thebasis of some selected features such as data preprocessingtechniques used statistical analyses performed and methodsused for functional interpretation and if available for data inte-gration The main features of each tool are summarized inTable 2 It is important to point out that only MetaboAnalystprovides a comprehensive module for data preprocessing andstatistical analysis [25 41] MetaCoreTM and 3Omics only offerlimited support for these functionalities whereas InCroMAPonly accepts already preprocessed data In spite of these limita-tions we decided to include MetaCoreTM 3Omics and InCroMAPin this review because of their ability to perform comprehensiveanalyses of multi-omics data which are currently limited inother high-level analysis tools
MetaCoreTM
MetaCoreTM [24] is a commercial tool available both as a stand-alone and as a Web-based application It is a software suite forfunctional analysis of different kinds of high-throughput mo-lecular data (eg next-generation sequencing siRNAmicroRNA microarray-based gene expression and serial ana-lysis of gene expression data array-comparative genomic-hybridization DNA arrays proteomic data metabolic profilesand screening data) MetaCoreTM is an integrated system whichconsists of (i) a high-quality manually curated database ofmammalian biology including metabolites and other molecularclasses bioactive molecules and their interactions signalingand metabolic pathways (ii) genomic analysis tools to identifypotentially significant variants (iii) a data mining toolkit fordata visualization analysis and exchange of data (iv) a toolkit(pathway editor) for custom assembly of functional networksand (v) a set of parsers to upload and manipulate different typesof high-throughput molecular data [54ndash56] Unfortunately nopublic information is available about the details of howMetaCoreTM works thus our review of this tool is partially lim-ited and some details remain unclear
MetaboAnalyst
MetaboAnalyst [25 35] is an integrated freely accessible Web-based platform It was first released in 2009 then upgraded in2012 (MetaboAnalyst 20 [57]) and in 2015 (MetaboAnalyst 30also available for download and local installation) It offers a setof online tools for metabolomic data analysis that combine stat-istical analysis of data with their functional and biological inter-pretation and visualization [25] tutorials and protocolpapers are also available online MetaboAnalyst 30 has been re-implemented to improve performance capability and userinteractivity It offers eight functional modules which can begrouped in three categories (i) exploratory statistical analysis(Statistical Analysis and Time-Series Analysis modules) (ii) func-tional analysis (Enrichment Analysis Pathway Analysis andIntegrated Pathway Analysis modules for both genes and metabol-ites) and (iii) advanced methods for translational studies(Biomarker Analysis Sample Size Estimation and Power Analysismodules) In addition it has also an Other Utilities module con-taining a specialized function for lipidomic data analysis and acompound ID-conversion tool [41]
InCroMAP
InCroMAP is a stand-alone Java software originally developed in2011 for enrichment analysis and pathway-based visualizationsof genomic and proteomic data [27 40] The extended version
InCroMAP 15 released in 2013 also supports annotated metab-olomic data thus making this tool suitable for comprehensivesystem biology studies [58] InCroMAP is freely available at itsWeb site [39] which includes a user guide [59] example datafiles to test the tool and a short video tutorial The software per-forms metabolite set enrichment analysis and generates inter-active global maps of the cellular metabolism These mapsallow an integrated pathway-based visualization of data frommultiple lsquoomicsrsquo platforms and provide a useful overview of themetabolic changes present in the studied experimental condi-tion [26 59]
3Omics
3Omics is a platform-independent Web-based tool developedin 2013 for the analysis integration and visualization of tran-scriptomic proteomic and metabolomic human data It is freelyaccessible at [40] and the Web site also includes a help sectionTo demonstrate the software functionalities applied to realdata two case studies are also reported and explained in [27]3Omics supports correlation analysis co-expression profilingphenotype mapping pathway enrichment analysis and geneontology-based enrichment analysis More precisely dependingon the data provided the software offers four types of lsquoomicsrsquoanalyses (i) TranscriptomicsndashProteomicsndashMetabolomics (TndashPndashM) (ii) TranscriptomicsndashProteomics (TndashP) (iii) ProteomicsndashMetabolomics (PndashM) and (iv) TranscriptomicsndashMetabolomics(TndashM) A single-omics mode is also available to reveal intra-omics relationships 3Omics can also supplement missing tran-script protein and metabolite information related to the inputdata by text-mining the biomedical literature through iHOP (in-formation Hyperlinked Over Protein [60 61]) Users can thusperform multi-omics analyses even when only one or two outof three lsquoomicsrsquo data sets are available [27]
Evaluation and comparison of softwarefunctionalities
In this section strong and weak aspects of each of the four se-lected tools are reviewed and comparatively evaluated follow-ing the data analysis workflow previously described (Figure 1)
Input data
Various proprietary data formats have been developed to han-dle and store MS or NMR raw data this heterogeneity makes itdifficult for researchers to manipulate these data For this rea-son several software for the conversion of raw data file types(eg RAW) into a universal format (eg CSV TXT mzXML orMGF) have been implemented such as ProteoWizard [62]MassMatrix [63] and several others including MATLAB [64] andR [65 66] Once raw data have been converted to a suitable for-mat they can be further analyzed Data types and formatsaccepted as input by the four reviewed tools are reported inTable 2 Input data are usually organized as data matrices withsamples as rows and signal features as columns MetaCoreTMMetaboAnalyst and IncroMAP accept data in different kindsof tabular format (eg textual tab-delimited TXT or comma-separated value CSV files) whereas 3Omics only accepts CSVfiles MetaboAnalyst also accepts zipped files of NMR or MSpeak lists or of MS spectra which must be in mzXML mzDATAor NetCDF format As for MetaCoreTM gene lists can also be im-ported from output files of microarray analysis software such asAffymetrix [67] or Agilent [68] Compound names or IDs
Analysis of metabolomic data | 503
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Tab
le2
Mai
nfe
atu
res
of
the
sele
cted
fou
rso
ftw
are
too
lsfo
rm
etab
olo
mic
dat
aan
alys
is
Too
lM
etaC
oreT
MM
etab
oAn
alys
tIn
Cro
MA
P3O
mic
sY
ear
2004
2009
2011
2013
Inst
itu
tio
nG
eneG
oU
niv
ersi
tyo
fA
lber
taM
cGil
lU
niv
ersi
tyM
on
trea
l(C
A)
Cen
ter
for
Bio
info
rma
tics
of
the
Un
iver
sity
of
Tu
bin
gen
(D)
Mo
lecu
lar
Des
ign
ampM
etab
olo
mic
sLa
bora
tory
U
niv
ersi
tyo
fT
aiw
an(T
W)
Imp
lem
enta
tio
nW
eb-b
asedthorn
stan
d-a
lon
eW
eb-b
ased
Stan
d-a
lon
eW
eb-b
ased
Lice
nse
Co
mm
erci
alG
PL(G
NU
Gen
eral
Publ
icLi
cen
se)
LGPL
(GN
ULe
sser
Gen
eral
Publ
icLi
cen
se)
No
ne
Typ
eo
fkn
ow
led
gePr
op
riet
ary
Publ
icPu
blic
Publ
icIn
pu
td
ata
Gen
ep
rote
ino
rm
etab
oli
teli
sts
imp
ort
edas
tab-
de-
lim
ited
text
(TX
T)
com
ma-
sep
arat
edva
lues
(CSV
)or
Exce
lfile
sge
ne
list
sfr
om
mic
roar
ray
anal
ysis
soft
war
e(A
ffym
etri
xA
gile
nt)
Tab
-del
imit
edte
xt(T
XT
)or
com
ma-
sep
arat
edva
lues
(CSV
)fo
rco
nce
ntr
atio
ns
spec
tral
bin
so
rp
eak
inte
n-
sity
dat
azi
pp
edfi
les
(ZIP
)of
NM
Ro
rM
Sp
eak
list
so
ro
fM
Ssp
ectr
a(i
nN
etC
DF
mzX
ML
or
mzD
AT
Afo
rmat
)
Tab
-del
imit
edte
xt(T
XT
)or
com
ma-
sep
arat
edva
lues
(CSV
)of
het
ero
gen
eou
sty
pes
of
pro
cess
edlsquoo
mic
srsquod
ata
Co
mm
a-se
par
ated
valu
es(C
SV)
of
pro
cess
edtr
ansc
rip
tom
ic
pro
teo
mic
or
met
abo
lom
icd
ata
Dat
ap
rep
arat
ion
Dat
ain
tegr
ity
chec
kin
g
Dat
an
orm
aliz
atio
n
Co
mp
ou
nd
nam
eid
enti
fica
tio
n
Stat
isti
cala
nal
ysis
Un
ivar
iate
anal
ysis
Mu
ltiv
aria
tean
alys
is
Clu
ster
ing
Cla
ssifi
cati
on
Dat
ain
terp
reta
tio
nan
din
tegr
atio
nFu
nct
ion
alin
terp
reta
tio
n
Met
abo
lite
set
enri
chm
ent
anal
ysis
Met
abo
lic
pat
hw
ayan
alys
is
Met
abo
lite
map
pin
g
Hyp
erli
nks
toex
tern
ald
atab
ases
Dat
aIn
tegr
atio
n
Ou
tpu
td
ata
Net
wo
rks
can
beex
po
rted
intw
ofo
rmat
sN
etsh
ot
and
Net
wo
rki
mag
esas
PNG
file
s
rep
ort
sco
nta
inin
gp
lots
gr
aph
san
dta
bles
wit
hal
lth
ere
sult
sIm
ages
are
avai
l-ab
leas
TIF
or
PNG
file
s
Tab
ula
rfo
rmat
(eg
CSV
)fo
ren
rich
men
tan
alys
isre
sult
san
dJP
Gfi
les
for
pat
hw
ay-
base
dvi
sual
izat
ion
PNG
SV
Go
rSI
Ffo
rmat
sfo
rim
ages
Tic
ksin
dic
ate
too
lfu
nct
ion
alit
ies
504 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
together with the values of a numeric attribute (eg fold-changes P-values gene expression changes protein levels me-tabolite concentrations) have to be included and the data typehas to be specified by the user [58 69]
Data preparation
Data preparation includes integrity checking normalizationand compound name identification This set of procedures isfully available only in MetaboAnalyst MetaCoreTM and 3Omiscjust offer a tool for compound name standardization butMetaCoreTM also allows the integration with specific softwarefor data preprocessing InCroMAP and 3Omics only accept dataalready preprocessed and normalized with appropriate software(eg OpenMS MayDay R IBM-SPSS Statistics SAS or JMP) [2758] A brief description of each data preparation step is given inthe following text
Data integrity checkingAmong the four tools reviewed data integrity checking is per-formed only by MetaboAnalyst even if other software providethis functionality (eg XCMS [46] for metabolomic data orProgenesis QI [70] for proteomic data) As for MetaboAnalystthis step includes handling of missing values as well as identifi-cation and removal of outliers [69] Missing values are automat-ically replaced with small ones ie half of the minimumpositive value in the original data but the user can also specifyother missing value estimation methods eg by replacingthem with the meanmedian of the original data or by usingK-nearest neighbors probabilistic principal component ana-lysis Bayesian PCA or singular value decomposition methods
Data normalizationMetaboAnalyst provides different normalization methods (egby sum median or reference sample) that can be selected by theuser data transformation (log or cube root) and scaling (autopareto or range scaling) are also available [69]
Compound name identificationAs there is no universally accepted set of compound labels mol-ecule labels in userrsquos input data have to be converted to identi-fiers in public databases (eg HMDB PubChem CompoundChEBI KEGG or METLIN) MetaboAnalyst and 3Omics provide amodule specifically designed for this purpose [27 41] whereasMetaCoreTM has a built-in synonym dictionary for gene and pro-tein names that enables compound label standardization [54]InCroMAP does not provide an ID-conversion tool as it only ac-cepts data with an appropriate identifier (eg Affymetrix orAgilent for genes KEEG HMDB or PubChem for metabolites)[58] We highlight that the conversion of a compound name intoits correct ID code is not a painless procedure and can be anadditional source of error or ambiguity Furthermore differentresults may be obtained depending on the used platformtoolIn fact the same compound could be associated with differentIDs according to its different molecular structures (eg chirality)We recall an example reported by Cavill et al [22] regarding thecase of lactate and KEGG identifiers there are three KEGG iden-tifiers that relate to lactate according to the chirality of the mol-ecule (ie non-superposabity on its mirror image) and whetherthese chiral metabolites are distinguishable depends on theresolution provided by the analytical platform used Becausethey have the same mass a solution could be using specificallyselected elution columns in MS to ensure they elute at differenttimes and are thus experimentally distinguishable
Statistical analysis
Out of the four tools reviewed MetaboAnalyst is the one thatoffers the most complete set of statistical and machine learningmethods for data analysis 3Omics only performs clustering andco-expression analysis whereas InCroMAP does not provide astatistical module Unfortunately only limited information isavailable about the data analysis performed by MetaCoreTM butit seems incorporated in the pathway analysis module Analysisalgorithms used in MetaboAnalyst and 3Omics have originallybeen implemented in the R open-source project [65 66]
A wide variety of statistical methods and data miningapproaches can be applied on metabolomic data depending onthe kind of experiment performed In the following text we il-lustrate both univariate and multivariate statistics and alsobriefly describe time-course analysis for completeness
Univariate analysisIt is usually the first analysis performed as it provides a prelim-inary overview about the data features that are potentially sig-nificant in discriminating the conditions under study 3Omicsperforms co-expression analysis by means of an algorithm thatcomputes dissimilarity coefficients using the Euclidean dis-tance and displays results as heat maps
For two-group data MetaboAnalyst provides fold changeanalysis t-test and volcano plots (ie a type of scatter plots toquickly identify changes in large data sets of replicate data hav-ing the fold change on the x-axis and the negative log of theP-value on the y-axis) both for unpaired and paired analysesFor multi-group data it provides one-way analysis of variance(ANOVA) with associated post hoc analyses and correlationanalysis As a large number of metabolites are usually presentfor each patient or biological sample and an individual statis-tical test has to be performed for each metabolite a high num-ber of false-positive results can be obtained owing to multipletesting To limit it a multiple testing correction technique (egBonferroni BonferronindashHolm WestfallndashYoung or BenjaminindashHochberg correction) must be used to adjust the obtained sig-nificance P-values to keep the probability of observing at leastone significant result owing to chance below a predeterminedlevel [71] The BenjaminindashHochberg correction [72] also knownas false discovery rate (FDR) is the recommended one as itallows controlling the proportion of false positives among allsignificant results Both FDR- and Bonferroni-corrected P-valuesare provided by MetaboAnalyst for each evaluated metabolite
Multivariate analysisIt involves the simultaneous observation and analysis of morethan two statistical variables It is ideal for the analysis oflsquoomicsrsquo data as they usually consist of several features thatchange as a function of time phenotype or experimental condi-tion Multivariate analysis techniques include multivariateANOVA multivariate regression analysis factor analysis prin-cipal component analysis and partial least square discriminantanalysis all of them are supported by MetaboAnalyst Thesetechniques are useful for exploratory data analysis as theyallow summarizing original variables in fewer variables usingtheir weighted average
ClusteringCluster analysis aims to determine intrinsic groups in a set ofunlabeled data thus it is useful to identify groups of metabol-ites that have similar characteristic or behavior or that belongto the same biological pathway MetaboAnalyst supports two
Analysis of metabolomic data | 505
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering
ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines
Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]
Data interpretation and integration
This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text
Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge
MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]
MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in
Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided
with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact
and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-
way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound
colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas
other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of
this figure is available at BIB online httpbiboxfordjournalsorg
506 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view
InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]
3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]
Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation
MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results
Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)
InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]
3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts
Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-
logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds
with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version
of this figure is available at BIB online httpbiboxfordjournalsorg
Analysis of metabolomic data | 507
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]
Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]
Output data
Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis
MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]
MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results
of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing
Conclusion
The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions
Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford
Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations
Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification
508 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future
Key Points
bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools
bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge
bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge
Funding
This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)
References1 Mehrotra B Mendes P Bioinformatics approaches to inte-
grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15
2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210
3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93
4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95
5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39
6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9
7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)
8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34
9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)
10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)
11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302
12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65
13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1
14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93
15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293
16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108
17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2
18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43
19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9
20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48
21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69
22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090
23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100
24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)
25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60
26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6
27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764
28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)
29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4
30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)
31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7
32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109
33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4
34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382
35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)
Analysis of metabolomic data | 509
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9
37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4
38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18
39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)
403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)
41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7
42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7
43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7
44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244
45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)
46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87
47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411
48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395
49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7
50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51
51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5
52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278
53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44
54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50
55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008
56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014
57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33
58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82
59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012
60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6
61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)
62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)
63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)
64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)
65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010
66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)
67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)
68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)
69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60
70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)
71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7
72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300
73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649
510 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
- bbw031-TF1
- bbw031-TF2
-
chosen in accordance to the chemical and physical characteris-tic of each sample and to the kind of analysis to be performed(targeted or untargeted) [12 15 18] MS is the most widely appliedtechnique as it allows reliable metabolite identification par-ticularly when used in tandem with chromatographic separ-ation methods so as to enhance its mass-resolving capabilitiesIt is rapid (the analysis time ranges from 5 to 140 min) andallows performing sensitive and selective qualitative and quan-titative analyses The main drawbacks of the MS technique arethe need to separate or purify the sample before it is directedinto the mass analyzer and the high cost of the instrumentCompared with MS NMR has a lower sensitivity thus resultingin limited ability for metabolite identification and quantifica-tion This implies that potentially important compounds thatare present at smaller concentrations can be hidden by largerpeaks and are thus less likely to be identified The advantagesof NMR are the high analytical reproducibility and that it is anon-destructive method that requires minimal sample prepar-ation [43 44]
Data processing
Once acquired raw signals (chromatograms spectra or NMRdata) are pre-processed by ad hoc software tools to facilitatecompound quantification (eg the commercial softwareSIEVETM by Thermo Scientific [45] or some freely availablesoftware packages such as the cloud-based platform XCMS [46]or the open-source cross-platforms MAVEN [47] and MZmine 2[48]) Generally this preprocessing involves noise reductionretention time correction peak detection and integration andchromatogram alignment Finally for untargeted metabolomicstudies different databases such as the Human MetabolomeDatabase (HMDB) [49] or the Metabolite and Tandem MSDatabase (METLIN) [50] are used to identify the metabolitesfrom spectra During data integrity checking different inputdata are then prepared to produce appropriate data matricesfor further analyses as better detailed in the next sectionBriefly before starting any kind of statistical analysis datanormalization is performed to reduce systematic biases ortechnical variations and to avoid misidentification of signifi-cant changes owing to the different orders of magnitudes ofmetabolomic data Following which significant differences be-tween sample sets can be identified using appropriate statis-tical methods A typical statistical analysis for metabolomicdata consists of two phases initially different univariate andmultivariate methods are used to generate an overview of theconsidered data sets and to identify the metabolites that showsignificant changes under the studied conditions then datamining techniques are used to discriminate groups of func-tionally related metabolites [16] A limit of traditional statis-tical methods is that they highlight relationships amongvariables based only on mathematical criteria (eg maximiza-tion of variance or correlation) and they do not take into ac-count correlations from biological origin [17] For this reasonthe combined use of several statistical and data mining tech-niques is highly recommended [4] Once identified signifi-cantly expressed metabolites are first ranked usingappropriate P-values then a cut-off threshold is applied to se-lect the top-k ones from the ranked list The choice of thisthreshold which is often arbitrary is critical as it may influ-ence the final biological interpretation In fact some moder-ate but significant changes may be missed or criticalcomponents of a particular biological process may be left outthus compromising subsequent analyses [42]
Data interpretation and integration
In this last step the selected metabolites are linked to the biolo-gical context under study through enrichment and pathwayanalyses More precisely enrichment analysis aims to investi-gate the enrichment (ie over- andor under-expression) of pre-defined groups of functionally related metabolites (iemetabolite sets) to identify significant and coordinated expres-sion changes among them This allows taking advantage of thelist of altered metabolites to suggest a biological pathway ordisease condition which can be further investigated [42]Conversely pathway analysis involves the description and visu-alization of the interactions among genes proteins or metabol-ites within cells tissues or organs Its goal is to identify thepathways that significantly impact on a given biological process[37] Enrichment and pathway analyses are performed using adhoc software tools [37] which map significant metabolites toknown biochemical pathways on the basis of the informationcontained in public databases such as the Kyoto Encyclopediaof Genes and Genomes (KEGG) [8] Once the metabolic pathwaysare identified this information has to be integrated with tran-scriptomic and proteomic data to obtain a comprehensive viewof all the biological processes involved [22] To capture all theinteractions that these data describe network-based visualiza-tion tools are commonly used by investigators to better under-stand and show their findings Depending on the kind ofinteraction under study a biological network can be repre-sented through a different type of graph Graphs are mathemat-ical structures of several kinds such as directed or undirectedgraphs directed acyclic graphs trees forests minimum span-ning trees Boolean networks and Steiner trees [51] To makegraph layouts informative and reproducible several visualiza-tion strategies have been proposed and adopted A frequentlyused visualization method is the ball-and-stick diagram wherepathway data are presented as networks with compounds (egmetabolites proteins) as nodes and reactions as edges [23 3752] Nodes can be placed hierarchically (a father node with oneor more child nodes) or radially as in radial networks or hiveplots which more closely represent the complexity of biologicalsystems [53] To reach a more reliable evaluation of the processunder study integration with biological knowledge derivedfrom the literature or from previous experimental data can alsobe performed [12 17] In spite of the availability of this informa-tion effective data integration is still far to be achieved owing tothe heterogeneity of current databases As a consequence usershave to take into account multiple databases to extract andmanually assemble the several different information neededthis makes data integration time-consuming and the final in-terpretation is often prone to errors because of different back-ground knowledge or biases of individual researchers [42]
Software tools for metabolomic data analysisand integration
Powerful software tools are essential to address the vastamount and variety of data generated by metabolomic analysesRequired software capabilities include (i) processing of rawspectral data (ii) statistical analysis to find significantly ex-pressed metabolites (iii) connection to metabolite databases formetabolite identification (iv) integration and analysis of mul-tiple heterogeneous lsquoomicsrsquo data and (v) bioinformatics ana-lysis and visualization of molecular interaction networks [1618] In this section we introduce the four data analysis tools se-lected (ie MetaCoreTM MetaboAnalyst InCroMAP and 3Omics)
502 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
their main functionalities are illustrated and compared on thebasis of some selected features such as data preprocessingtechniques used statistical analyses performed and methodsused for functional interpretation and if available for data inte-gration The main features of each tool are summarized inTable 2 It is important to point out that only MetaboAnalystprovides a comprehensive module for data preprocessing andstatistical analysis [25 41] MetaCoreTM and 3Omics only offerlimited support for these functionalities whereas InCroMAPonly accepts already preprocessed data In spite of these limita-tions we decided to include MetaCoreTM 3Omics and InCroMAPin this review because of their ability to perform comprehensiveanalyses of multi-omics data which are currently limited inother high-level analysis tools
MetaCoreTM
MetaCoreTM [24] is a commercial tool available both as a stand-alone and as a Web-based application It is a software suite forfunctional analysis of different kinds of high-throughput mo-lecular data (eg next-generation sequencing siRNAmicroRNA microarray-based gene expression and serial ana-lysis of gene expression data array-comparative genomic-hybridization DNA arrays proteomic data metabolic profilesand screening data) MetaCoreTM is an integrated system whichconsists of (i) a high-quality manually curated database ofmammalian biology including metabolites and other molecularclasses bioactive molecules and their interactions signalingand metabolic pathways (ii) genomic analysis tools to identifypotentially significant variants (iii) a data mining toolkit fordata visualization analysis and exchange of data (iv) a toolkit(pathway editor) for custom assembly of functional networksand (v) a set of parsers to upload and manipulate different typesof high-throughput molecular data [54ndash56] Unfortunately nopublic information is available about the details of howMetaCoreTM works thus our review of this tool is partially lim-ited and some details remain unclear
MetaboAnalyst
MetaboAnalyst [25 35] is an integrated freely accessible Web-based platform It was first released in 2009 then upgraded in2012 (MetaboAnalyst 20 [57]) and in 2015 (MetaboAnalyst 30also available for download and local installation) It offers a setof online tools for metabolomic data analysis that combine stat-istical analysis of data with their functional and biological inter-pretation and visualization [25] tutorials and protocolpapers are also available online MetaboAnalyst 30 has been re-implemented to improve performance capability and userinteractivity It offers eight functional modules which can begrouped in three categories (i) exploratory statistical analysis(Statistical Analysis and Time-Series Analysis modules) (ii) func-tional analysis (Enrichment Analysis Pathway Analysis andIntegrated Pathway Analysis modules for both genes and metabol-ites) and (iii) advanced methods for translational studies(Biomarker Analysis Sample Size Estimation and Power Analysismodules) In addition it has also an Other Utilities module con-taining a specialized function for lipidomic data analysis and acompound ID-conversion tool [41]
InCroMAP
InCroMAP is a stand-alone Java software originally developed in2011 for enrichment analysis and pathway-based visualizationsof genomic and proteomic data [27 40] The extended version
InCroMAP 15 released in 2013 also supports annotated metab-olomic data thus making this tool suitable for comprehensivesystem biology studies [58] InCroMAP is freely available at itsWeb site [39] which includes a user guide [59] example datafiles to test the tool and a short video tutorial The software per-forms metabolite set enrichment analysis and generates inter-active global maps of the cellular metabolism These mapsallow an integrated pathway-based visualization of data frommultiple lsquoomicsrsquo platforms and provide a useful overview of themetabolic changes present in the studied experimental condi-tion [26 59]
3Omics
3Omics is a platform-independent Web-based tool developedin 2013 for the analysis integration and visualization of tran-scriptomic proteomic and metabolomic human data It is freelyaccessible at [40] and the Web site also includes a help sectionTo demonstrate the software functionalities applied to realdata two case studies are also reported and explained in [27]3Omics supports correlation analysis co-expression profilingphenotype mapping pathway enrichment analysis and geneontology-based enrichment analysis More precisely dependingon the data provided the software offers four types of lsquoomicsrsquoanalyses (i) TranscriptomicsndashProteomicsndashMetabolomics (TndashPndashM) (ii) TranscriptomicsndashProteomics (TndashP) (iii) ProteomicsndashMetabolomics (PndashM) and (iv) TranscriptomicsndashMetabolomics(TndashM) A single-omics mode is also available to reveal intra-omics relationships 3Omics can also supplement missing tran-script protein and metabolite information related to the inputdata by text-mining the biomedical literature through iHOP (in-formation Hyperlinked Over Protein [60 61]) Users can thusperform multi-omics analyses even when only one or two outof three lsquoomicsrsquo data sets are available [27]
Evaluation and comparison of softwarefunctionalities
In this section strong and weak aspects of each of the four se-lected tools are reviewed and comparatively evaluated follow-ing the data analysis workflow previously described (Figure 1)
Input data
Various proprietary data formats have been developed to han-dle and store MS or NMR raw data this heterogeneity makes itdifficult for researchers to manipulate these data For this rea-son several software for the conversion of raw data file types(eg RAW) into a universal format (eg CSV TXT mzXML orMGF) have been implemented such as ProteoWizard [62]MassMatrix [63] and several others including MATLAB [64] andR [65 66] Once raw data have been converted to a suitable for-mat they can be further analyzed Data types and formatsaccepted as input by the four reviewed tools are reported inTable 2 Input data are usually organized as data matrices withsamples as rows and signal features as columns MetaCoreTMMetaboAnalyst and IncroMAP accept data in different kindsof tabular format (eg textual tab-delimited TXT or comma-separated value CSV files) whereas 3Omics only accepts CSVfiles MetaboAnalyst also accepts zipped files of NMR or MSpeak lists or of MS spectra which must be in mzXML mzDATAor NetCDF format As for MetaCoreTM gene lists can also be im-ported from output files of microarray analysis software such asAffymetrix [67] or Agilent [68] Compound names or IDs
Analysis of metabolomic data | 503
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Tab
le2
Mai
nfe
atu
res
of
the
sele
cted
fou
rso
ftw
are
too
lsfo
rm
etab
olo
mic
dat
aan
alys
is
Too
lM
etaC
oreT
MM
etab
oAn
alys
tIn
Cro
MA
P3O
mic
sY
ear
2004
2009
2011
2013
Inst
itu
tio
nG
eneG
oU
niv
ersi
tyo
fA
lber
taM
cGil
lU
niv
ersi
tyM
on
trea
l(C
A)
Cen
ter
for
Bio
info
rma
tics
of
the
Un
iver
sity
of
Tu
bin
gen
(D)
Mo
lecu
lar
Des
ign
ampM
etab
olo
mic
sLa
bora
tory
U
niv
ersi
tyo
fT
aiw
an(T
W)
Imp
lem
enta
tio
nW
eb-b
asedthorn
stan
d-a
lon
eW
eb-b
ased
Stan
d-a
lon
eW
eb-b
ased
Lice
nse
Co
mm
erci
alG
PL(G
NU
Gen
eral
Publ
icLi
cen
se)
LGPL
(GN
ULe
sser
Gen
eral
Publ
icLi
cen
se)
No
ne
Typ
eo
fkn
ow
led
gePr
op
riet
ary
Publ
icPu
blic
Publ
icIn
pu
td
ata
Gen
ep
rote
ino
rm
etab
oli
teli
sts
imp
ort
edas
tab-
de-
lim
ited
text
(TX
T)
com
ma-
sep
arat
edva
lues
(CSV
)or
Exce
lfile
sge
ne
list
sfr
om
mic
roar
ray
anal
ysis
soft
war
e(A
ffym
etri
xA
gile
nt)
Tab
-del
imit
edte
xt(T
XT
)or
com
ma-
sep
arat
edva
lues
(CSV
)fo
rco
nce
ntr
atio
ns
spec
tral
bin
so
rp
eak
inte
n-
sity
dat
azi
pp
edfi
les
(ZIP
)of
NM
Ro
rM
Sp
eak
list
so
ro
fM
Ssp
ectr
a(i
nN
etC
DF
mzX
ML
or
mzD
AT
Afo
rmat
)
Tab
-del
imit
edte
xt(T
XT
)or
com
ma-
sep
arat
edva
lues
(CSV
)of
het
ero
gen
eou
sty
pes
of
pro
cess
edlsquoo
mic
srsquod
ata
Co
mm
a-se
par
ated
valu
es(C
SV)
of
pro
cess
edtr
ansc
rip
tom
ic
pro
teo
mic
or
met
abo
lom
icd
ata
Dat
ap
rep
arat
ion
Dat
ain
tegr
ity
chec
kin
g
Dat
an
orm
aliz
atio
n
Co
mp
ou
nd
nam
eid
enti
fica
tio
n
Stat
isti
cala
nal
ysis
Un
ivar
iate
anal
ysis
Mu
ltiv
aria
tean
alys
is
Clu
ster
ing
Cla
ssifi
cati
on
Dat
ain
terp
reta
tio
nan
din
tegr
atio
nFu
nct
ion
alin
terp
reta
tio
n
Met
abo
lite
set
enri
chm
ent
anal
ysis
Met
abo
lic
pat
hw
ayan
alys
is
Met
abo
lite
map
pin
g
Hyp
erli
nks
toex
tern
ald
atab
ases
Dat
aIn
tegr
atio
n
Ou
tpu
td
ata
Net
wo
rks
can
beex
po
rted
intw
ofo
rmat
sN
etsh
ot
and
Net
wo
rki
mag
esas
PNG
file
s
rep
ort
sco
nta
inin
gp
lots
gr
aph
san
dta
bles
wit
hal
lth
ere
sult
sIm
ages
are
avai
l-ab
leas
TIF
or
PNG
file
s
Tab
ula
rfo
rmat
(eg
CSV
)fo
ren
rich
men
tan
alys
isre
sult
san
dJP
Gfi
les
for
pat
hw
ay-
base
dvi
sual
izat
ion
PNG
SV
Go
rSI
Ffo
rmat
sfo
rim
ages
Tic
ksin
dic
ate
too
lfu
nct
ion
alit
ies
504 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
together with the values of a numeric attribute (eg fold-changes P-values gene expression changes protein levels me-tabolite concentrations) have to be included and the data typehas to be specified by the user [58 69]
Data preparation
Data preparation includes integrity checking normalizationand compound name identification This set of procedures isfully available only in MetaboAnalyst MetaCoreTM and 3Omiscjust offer a tool for compound name standardization butMetaCoreTM also allows the integration with specific softwarefor data preprocessing InCroMAP and 3Omics only accept dataalready preprocessed and normalized with appropriate software(eg OpenMS MayDay R IBM-SPSS Statistics SAS or JMP) [2758] A brief description of each data preparation step is given inthe following text
Data integrity checkingAmong the four tools reviewed data integrity checking is per-formed only by MetaboAnalyst even if other software providethis functionality (eg XCMS [46] for metabolomic data orProgenesis QI [70] for proteomic data) As for MetaboAnalystthis step includes handling of missing values as well as identifi-cation and removal of outliers [69] Missing values are automat-ically replaced with small ones ie half of the minimumpositive value in the original data but the user can also specifyother missing value estimation methods eg by replacingthem with the meanmedian of the original data or by usingK-nearest neighbors probabilistic principal component ana-lysis Bayesian PCA or singular value decomposition methods
Data normalizationMetaboAnalyst provides different normalization methods (egby sum median or reference sample) that can be selected by theuser data transformation (log or cube root) and scaling (autopareto or range scaling) are also available [69]
Compound name identificationAs there is no universally accepted set of compound labels mol-ecule labels in userrsquos input data have to be converted to identi-fiers in public databases (eg HMDB PubChem CompoundChEBI KEGG or METLIN) MetaboAnalyst and 3Omics provide amodule specifically designed for this purpose [27 41] whereasMetaCoreTM has a built-in synonym dictionary for gene and pro-tein names that enables compound label standardization [54]InCroMAP does not provide an ID-conversion tool as it only ac-cepts data with an appropriate identifier (eg Affymetrix orAgilent for genes KEEG HMDB or PubChem for metabolites)[58] We highlight that the conversion of a compound name intoits correct ID code is not a painless procedure and can be anadditional source of error or ambiguity Furthermore differentresults may be obtained depending on the used platformtoolIn fact the same compound could be associated with differentIDs according to its different molecular structures (eg chirality)We recall an example reported by Cavill et al [22] regarding thecase of lactate and KEGG identifiers there are three KEGG iden-tifiers that relate to lactate according to the chirality of the mol-ecule (ie non-superposabity on its mirror image) and whetherthese chiral metabolites are distinguishable depends on theresolution provided by the analytical platform used Becausethey have the same mass a solution could be using specificallyselected elution columns in MS to ensure they elute at differenttimes and are thus experimentally distinguishable
Statistical analysis
Out of the four tools reviewed MetaboAnalyst is the one thatoffers the most complete set of statistical and machine learningmethods for data analysis 3Omics only performs clustering andco-expression analysis whereas InCroMAP does not provide astatistical module Unfortunately only limited information isavailable about the data analysis performed by MetaCoreTM butit seems incorporated in the pathway analysis module Analysisalgorithms used in MetaboAnalyst and 3Omics have originallybeen implemented in the R open-source project [65 66]
A wide variety of statistical methods and data miningapproaches can be applied on metabolomic data depending onthe kind of experiment performed In the following text we il-lustrate both univariate and multivariate statistics and alsobriefly describe time-course analysis for completeness
Univariate analysisIt is usually the first analysis performed as it provides a prelim-inary overview about the data features that are potentially sig-nificant in discriminating the conditions under study 3Omicsperforms co-expression analysis by means of an algorithm thatcomputes dissimilarity coefficients using the Euclidean dis-tance and displays results as heat maps
For two-group data MetaboAnalyst provides fold changeanalysis t-test and volcano plots (ie a type of scatter plots toquickly identify changes in large data sets of replicate data hav-ing the fold change on the x-axis and the negative log of theP-value on the y-axis) both for unpaired and paired analysesFor multi-group data it provides one-way analysis of variance(ANOVA) with associated post hoc analyses and correlationanalysis As a large number of metabolites are usually presentfor each patient or biological sample and an individual statis-tical test has to be performed for each metabolite a high num-ber of false-positive results can be obtained owing to multipletesting To limit it a multiple testing correction technique (egBonferroni BonferronindashHolm WestfallndashYoung or BenjaminindashHochberg correction) must be used to adjust the obtained sig-nificance P-values to keep the probability of observing at leastone significant result owing to chance below a predeterminedlevel [71] The BenjaminindashHochberg correction [72] also knownas false discovery rate (FDR) is the recommended one as itallows controlling the proportion of false positives among allsignificant results Both FDR- and Bonferroni-corrected P-valuesare provided by MetaboAnalyst for each evaluated metabolite
Multivariate analysisIt involves the simultaneous observation and analysis of morethan two statistical variables It is ideal for the analysis oflsquoomicsrsquo data as they usually consist of several features thatchange as a function of time phenotype or experimental condi-tion Multivariate analysis techniques include multivariateANOVA multivariate regression analysis factor analysis prin-cipal component analysis and partial least square discriminantanalysis all of them are supported by MetaboAnalyst Thesetechniques are useful for exploratory data analysis as theyallow summarizing original variables in fewer variables usingtheir weighted average
ClusteringCluster analysis aims to determine intrinsic groups in a set ofunlabeled data thus it is useful to identify groups of metabol-ites that have similar characteristic or behavior or that belongto the same biological pathway MetaboAnalyst supports two
Analysis of metabolomic data | 505
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering
ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines
Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]
Data interpretation and integration
This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text
Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge
MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]
MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in
Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided
with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact
and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-
way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound
colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas
other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of
this figure is available at BIB online httpbiboxfordjournalsorg
506 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view
InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]
3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]
Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation
MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results
Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)
InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]
3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts
Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-
logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds
with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version
of this figure is available at BIB online httpbiboxfordjournalsorg
Analysis of metabolomic data | 507
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]
Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]
Output data
Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis
MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]
MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results
of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing
Conclusion
The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions
Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford
Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations
Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification
508 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future
Key Points
bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools
bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge
bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge
Funding
This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)
References1 Mehrotra B Mendes P Bioinformatics approaches to inte-
grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15
2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210
3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93
4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95
5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39
6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9
7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)
8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34
9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)
10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)
11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302
12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65
13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1
14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93
15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293
16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108
17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2
18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43
19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9
20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48
21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69
22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090
23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100
24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)
25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60
26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6
27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764
28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)
29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4
30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)
31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7
32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109
33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4
34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382
35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)
Analysis of metabolomic data | 509
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9
37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4
38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18
39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)
403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)
41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7
42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7
43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7
44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244
45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)
46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87
47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411
48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395
49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7
50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51
51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5
52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278
53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44
54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50
55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008
56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014
57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33
58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82
59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012
60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6
61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)
62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)
63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)
64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)
65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010
66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)
67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)
68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)
69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60
70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)
71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7
72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300
73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649
510 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
- bbw031-TF1
- bbw031-TF2
-
their main functionalities are illustrated and compared on thebasis of some selected features such as data preprocessingtechniques used statistical analyses performed and methodsused for functional interpretation and if available for data inte-gration The main features of each tool are summarized inTable 2 It is important to point out that only MetaboAnalystprovides a comprehensive module for data preprocessing andstatistical analysis [25 41] MetaCoreTM and 3Omics only offerlimited support for these functionalities whereas InCroMAPonly accepts already preprocessed data In spite of these limita-tions we decided to include MetaCoreTM 3Omics and InCroMAPin this review because of their ability to perform comprehensiveanalyses of multi-omics data which are currently limited inother high-level analysis tools
MetaCoreTM
MetaCoreTM [24] is a commercial tool available both as a stand-alone and as a Web-based application It is a software suite forfunctional analysis of different kinds of high-throughput mo-lecular data (eg next-generation sequencing siRNAmicroRNA microarray-based gene expression and serial ana-lysis of gene expression data array-comparative genomic-hybridization DNA arrays proteomic data metabolic profilesand screening data) MetaCoreTM is an integrated system whichconsists of (i) a high-quality manually curated database ofmammalian biology including metabolites and other molecularclasses bioactive molecules and their interactions signalingand metabolic pathways (ii) genomic analysis tools to identifypotentially significant variants (iii) a data mining toolkit fordata visualization analysis and exchange of data (iv) a toolkit(pathway editor) for custom assembly of functional networksand (v) a set of parsers to upload and manipulate different typesof high-throughput molecular data [54ndash56] Unfortunately nopublic information is available about the details of howMetaCoreTM works thus our review of this tool is partially lim-ited and some details remain unclear
MetaboAnalyst
MetaboAnalyst [25 35] is an integrated freely accessible Web-based platform It was first released in 2009 then upgraded in2012 (MetaboAnalyst 20 [57]) and in 2015 (MetaboAnalyst 30also available for download and local installation) It offers a setof online tools for metabolomic data analysis that combine stat-istical analysis of data with their functional and biological inter-pretation and visualization [25] tutorials and protocolpapers are also available online MetaboAnalyst 30 has been re-implemented to improve performance capability and userinteractivity It offers eight functional modules which can begrouped in three categories (i) exploratory statistical analysis(Statistical Analysis and Time-Series Analysis modules) (ii) func-tional analysis (Enrichment Analysis Pathway Analysis andIntegrated Pathway Analysis modules for both genes and metabol-ites) and (iii) advanced methods for translational studies(Biomarker Analysis Sample Size Estimation and Power Analysismodules) In addition it has also an Other Utilities module con-taining a specialized function for lipidomic data analysis and acompound ID-conversion tool [41]
InCroMAP
InCroMAP is a stand-alone Java software originally developed in2011 for enrichment analysis and pathway-based visualizationsof genomic and proteomic data [27 40] The extended version
InCroMAP 15 released in 2013 also supports annotated metab-olomic data thus making this tool suitable for comprehensivesystem biology studies [58] InCroMAP is freely available at itsWeb site [39] which includes a user guide [59] example datafiles to test the tool and a short video tutorial The software per-forms metabolite set enrichment analysis and generates inter-active global maps of the cellular metabolism These mapsallow an integrated pathway-based visualization of data frommultiple lsquoomicsrsquo platforms and provide a useful overview of themetabolic changes present in the studied experimental condi-tion [26 59]
3Omics
3Omics is a platform-independent Web-based tool developedin 2013 for the analysis integration and visualization of tran-scriptomic proteomic and metabolomic human data It is freelyaccessible at [40] and the Web site also includes a help sectionTo demonstrate the software functionalities applied to realdata two case studies are also reported and explained in [27]3Omics supports correlation analysis co-expression profilingphenotype mapping pathway enrichment analysis and geneontology-based enrichment analysis More precisely dependingon the data provided the software offers four types of lsquoomicsrsquoanalyses (i) TranscriptomicsndashProteomicsndashMetabolomics (TndashPndashM) (ii) TranscriptomicsndashProteomics (TndashP) (iii) ProteomicsndashMetabolomics (PndashM) and (iv) TranscriptomicsndashMetabolomics(TndashM) A single-omics mode is also available to reveal intra-omics relationships 3Omics can also supplement missing tran-script protein and metabolite information related to the inputdata by text-mining the biomedical literature through iHOP (in-formation Hyperlinked Over Protein [60 61]) Users can thusperform multi-omics analyses even when only one or two outof three lsquoomicsrsquo data sets are available [27]
Evaluation and comparison of softwarefunctionalities
In this section strong and weak aspects of each of the four se-lected tools are reviewed and comparatively evaluated follow-ing the data analysis workflow previously described (Figure 1)
Input data
Various proprietary data formats have been developed to han-dle and store MS or NMR raw data this heterogeneity makes itdifficult for researchers to manipulate these data For this rea-son several software for the conversion of raw data file types(eg RAW) into a universal format (eg CSV TXT mzXML orMGF) have been implemented such as ProteoWizard [62]MassMatrix [63] and several others including MATLAB [64] andR [65 66] Once raw data have been converted to a suitable for-mat they can be further analyzed Data types and formatsaccepted as input by the four reviewed tools are reported inTable 2 Input data are usually organized as data matrices withsamples as rows and signal features as columns MetaCoreTMMetaboAnalyst and IncroMAP accept data in different kindsof tabular format (eg textual tab-delimited TXT or comma-separated value CSV files) whereas 3Omics only accepts CSVfiles MetaboAnalyst also accepts zipped files of NMR or MSpeak lists or of MS spectra which must be in mzXML mzDATAor NetCDF format As for MetaCoreTM gene lists can also be im-ported from output files of microarray analysis software such asAffymetrix [67] or Agilent [68] Compound names or IDs
Analysis of metabolomic data | 503
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Tab
le2
Mai
nfe
atu
res
of
the
sele
cted
fou
rso
ftw
are
too
lsfo
rm
etab
olo
mic
dat
aan
alys
is
Too
lM
etaC
oreT
MM
etab
oAn
alys
tIn
Cro
MA
P3O
mic
sY
ear
2004
2009
2011
2013
Inst
itu
tio
nG
eneG
oU
niv
ersi
tyo
fA
lber
taM
cGil
lU
niv
ersi
tyM
on
trea
l(C
A)
Cen
ter
for
Bio
info
rma
tics
of
the
Un
iver
sity
of
Tu
bin
gen
(D)
Mo
lecu
lar
Des
ign
ampM
etab
olo
mic
sLa
bora
tory
U
niv
ersi
tyo
fT
aiw
an(T
W)
Imp
lem
enta
tio
nW
eb-b
asedthorn
stan
d-a
lon
eW
eb-b
ased
Stan
d-a
lon
eW
eb-b
ased
Lice
nse
Co
mm
erci
alG
PL(G
NU
Gen
eral
Publ
icLi
cen
se)
LGPL
(GN
ULe
sser
Gen
eral
Publ
icLi
cen
se)
No
ne
Typ
eo
fkn
ow
led
gePr
op
riet
ary
Publ
icPu
blic
Publ
icIn
pu
td
ata
Gen
ep
rote
ino
rm
etab
oli
teli
sts
imp
ort
edas
tab-
de-
lim
ited
text
(TX
T)
com
ma-
sep
arat
edva
lues
(CSV
)or
Exce
lfile
sge
ne
list
sfr
om
mic
roar
ray
anal
ysis
soft
war
e(A
ffym
etri
xA
gile
nt)
Tab
-del
imit
edte
xt(T
XT
)or
com
ma-
sep
arat
edva
lues
(CSV
)fo
rco
nce
ntr
atio
ns
spec
tral
bin
so
rp
eak
inte
n-
sity
dat
azi
pp
edfi
les
(ZIP
)of
NM
Ro
rM
Sp
eak
list
so
ro
fM
Ssp
ectr
a(i
nN
etC
DF
mzX
ML
or
mzD
AT
Afo
rmat
)
Tab
-del
imit
edte
xt(T
XT
)or
com
ma-
sep
arat
edva
lues
(CSV
)of
het
ero
gen
eou
sty
pes
of
pro
cess
edlsquoo
mic
srsquod
ata
Co
mm
a-se
par
ated
valu
es(C
SV)
of
pro
cess
edtr
ansc
rip
tom
ic
pro
teo
mic
or
met
abo
lom
icd
ata
Dat
ap
rep
arat
ion
Dat
ain
tegr
ity
chec
kin
g
Dat
an
orm
aliz
atio
n
Co
mp
ou
nd
nam
eid
enti
fica
tio
n
Stat
isti
cala
nal
ysis
Un
ivar
iate
anal
ysis
Mu
ltiv
aria
tean
alys
is
Clu
ster
ing
Cla
ssifi
cati
on
Dat
ain
terp
reta
tio
nan
din
tegr
atio
nFu
nct
ion
alin
terp
reta
tio
n
Met
abo
lite
set
enri
chm
ent
anal
ysis
Met
abo
lic
pat
hw
ayan
alys
is
Met
abo
lite
map
pin
g
Hyp
erli
nks
toex
tern
ald
atab
ases
Dat
aIn
tegr
atio
n
Ou
tpu
td
ata
Net
wo
rks
can
beex
po
rted
intw
ofo
rmat
sN
etsh
ot
and
Net
wo
rki
mag
esas
PNG
file
s
rep
ort
sco
nta
inin
gp
lots
gr
aph
san
dta
bles
wit
hal
lth
ere
sult
sIm
ages
are
avai
l-ab
leas
TIF
or
PNG
file
s
Tab
ula
rfo
rmat
(eg
CSV
)fo
ren
rich
men
tan
alys
isre
sult
san
dJP
Gfi
les
for
pat
hw
ay-
base
dvi
sual
izat
ion
PNG
SV
Go
rSI
Ffo
rmat
sfo
rim
ages
Tic
ksin
dic
ate
too
lfu
nct
ion
alit
ies
504 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
together with the values of a numeric attribute (eg fold-changes P-values gene expression changes protein levels me-tabolite concentrations) have to be included and the data typehas to be specified by the user [58 69]
Data preparation
Data preparation includes integrity checking normalizationand compound name identification This set of procedures isfully available only in MetaboAnalyst MetaCoreTM and 3Omiscjust offer a tool for compound name standardization butMetaCoreTM also allows the integration with specific softwarefor data preprocessing InCroMAP and 3Omics only accept dataalready preprocessed and normalized with appropriate software(eg OpenMS MayDay R IBM-SPSS Statistics SAS or JMP) [2758] A brief description of each data preparation step is given inthe following text
Data integrity checkingAmong the four tools reviewed data integrity checking is per-formed only by MetaboAnalyst even if other software providethis functionality (eg XCMS [46] for metabolomic data orProgenesis QI [70] for proteomic data) As for MetaboAnalystthis step includes handling of missing values as well as identifi-cation and removal of outliers [69] Missing values are automat-ically replaced with small ones ie half of the minimumpositive value in the original data but the user can also specifyother missing value estimation methods eg by replacingthem with the meanmedian of the original data or by usingK-nearest neighbors probabilistic principal component ana-lysis Bayesian PCA or singular value decomposition methods
Data normalizationMetaboAnalyst provides different normalization methods (egby sum median or reference sample) that can be selected by theuser data transformation (log or cube root) and scaling (autopareto or range scaling) are also available [69]
Compound name identificationAs there is no universally accepted set of compound labels mol-ecule labels in userrsquos input data have to be converted to identi-fiers in public databases (eg HMDB PubChem CompoundChEBI KEGG or METLIN) MetaboAnalyst and 3Omics provide amodule specifically designed for this purpose [27 41] whereasMetaCoreTM has a built-in synonym dictionary for gene and pro-tein names that enables compound label standardization [54]InCroMAP does not provide an ID-conversion tool as it only ac-cepts data with an appropriate identifier (eg Affymetrix orAgilent for genes KEEG HMDB or PubChem for metabolites)[58] We highlight that the conversion of a compound name intoits correct ID code is not a painless procedure and can be anadditional source of error or ambiguity Furthermore differentresults may be obtained depending on the used platformtoolIn fact the same compound could be associated with differentIDs according to its different molecular structures (eg chirality)We recall an example reported by Cavill et al [22] regarding thecase of lactate and KEGG identifiers there are three KEGG iden-tifiers that relate to lactate according to the chirality of the mol-ecule (ie non-superposabity on its mirror image) and whetherthese chiral metabolites are distinguishable depends on theresolution provided by the analytical platform used Becausethey have the same mass a solution could be using specificallyselected elution columns in MS to ensure they elute at differenttimes and are thus experimentally distinguishable
Statistical analysis
Out of the four tools reviewed MetaboAnalyst is the one thatoffers the most complete set of statistical and machine learningmethods for data analysis 3Omics only performs clustering andco-expression analysis whereas InCroMAP does not provide astatistical module Unfortunately only limited information isavailable about the data analysis performed by MetaCoreTM butit seems incorporated in the pathway analysis module Analysisalgorithms used in MetaboAnalyst and 3Omics have originallybeen implemented in the R open-source project [65 66]
A wide variety of statistical methods and data miningapproaches can be applied on metabolomic data depending onthe kind of experiment performed In the following text we il-lustrate both univariate and multivariate statistics and alsobriefly describe time-course analysis for completeness
Univariate analysisIt is usually the first analysis performed as it provides a prelim-inary overview about the data features that are potentially sig-nificant in discriminating the conditions under study 3Omicsperforms co-expression analysis by means of an algorithm thatcomputes dissimilarity coefficients using the Euclidean dis-tance and displays results as heat maps
For two-group data MetaboAnalyst provides fold changeanalysis t-test and volcano plots (ie a type of scatter plots toquickly identify changes in large data sets of replicate data hav-ing the fold change on the x-axis and the negative log of theP-value on the y-axis) both for unpaired and paired analysesFor multi-group data it provides one-way analysis of variance(ANOVA) with associated post hoc analyses and correlationanalysis As a large number of metabolites are usually presentfor each patient or biological sample and an individual statis-tical test has to be performed for each metabolite a high num-ber of false-positive results can be obtained owing to multipletesting To limit it a multiple testing correction technique (egBonferroni BonferronindashHolm WestfallndashYoung or BenjaminindashHochberg correction) must be used to adjust the obtained sig-nificance P-values to keep the probability of observing at leastone significant result owing to chance below a predeterminedlevel [71] The BenjaminindashHochberg correction [72] also knownas false discovery rate (FDR) is the recommended one as itallows controlling the proportion of false positives among allsignificant results Both FDR- and Bonferroni-corrected P-valuesare provided by MetaboAnalyst for each evaluated metabolite
Multivariate analysisIt involves the simultaneous observation and analysis of morethan two statistical variables It is ideal for the analysis oflsquoomicsrsquo data as they usually consist of several features thatchange as a function of time phenotype or experimental condi-tion Multivariate analysis techniques include multivariateANOVA multivariate regression analysis factor analysis prin-cipal component analysis and partial least square discriminantanalysis all of them are supported by MetaboAnalyst Thesetechniques are useful for exploratory data analysis as theyallow summarizing original variables in fewer variables usingtheir weighted average
ClusteringCluster analysis aims to determine intrinsic groups in a set ofunlabeled data thus it is useful to identify groups of metabol-ites that have similar characteristic or behavior or that belongto the same biological pathway MetaboAnalyst supports two
Analysis of metabolomic data | 505
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering
ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines
Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]
Data interpretation and integration
This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text
Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge
MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]
MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in
Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided
with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact
and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-
way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound
colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas
other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of
this figure is available at BIB online httpbiboxfordjournalsorg
506 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view
InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]
3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]
Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation
MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results
Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)
InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]
3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts
Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-
logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds
with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version
of this figure is available at BIB online httpbiboxfordjournalsorg
Analysis of metabolomic data | 507
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]
Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]
Output data
Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis
MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]
MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results
of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing
Conclusion
The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions
Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford
Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations
Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification
508 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future
Key Points
bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools
bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge
bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge
Funding
This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)
References1 Mehrotra B Mendes P Bioinformatics approaches to inte-
grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15
2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210
3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93
4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95
5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39
6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9
7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)
8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34
9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)
10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)
11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302
12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65
13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1
14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93
15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293
16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108
17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2
18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43
19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9
20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48
21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69
22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090
23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100
24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)
25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60
26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6
27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764
28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)
29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4
30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)
31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7
32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109
33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4
34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382
35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)
Analysis of metabolomic data | 509
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9
37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4
38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18
39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)
403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)
41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7
42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7
43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7
44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244
45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)
46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87
47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411
48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395
49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7
50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51
51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5
52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278
53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44
54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50
55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008
56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014
57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33
58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82
59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012
60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6
61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)
62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)
63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)
64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)
65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010
66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)
67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)
68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)
69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60
70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)
71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7
72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300
73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649
510 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
- bbw031-TF1
- bbw031-TF2
-
Tab
le2
Mai
nfe
atu
res
of
the
sele
cted
fou
rso
ftw
are
too
lsfo
rm
etab
olo
mic
dat
aan
alys
is
Too
lM
etaC
oreT
MM
etab
oAn
alys
tIn
Cro
MA
P3O
mic
sY
ear
2004
2009
2011
2013
Inst
itu
tio
nG
eneG
oU
niv
ersi
tyo
fA
lber
taM
cGil
lU
niv
ersi
tyM
on
trea
l(C
A)
Cen
ter
for
Bio
info
rma
tics
of
the
Un
iver
sity
of
Tu
bin
gen
(D)
Mo
lecu
lar
Des
ign
ampM
etab
olo
mic
sLa
bora
tory
U
niv
ersi
tyo
fT
aiw
an(T
W)
Imp
lem
enta
tio
nW
eb-b
asedthorn
stan
d-a
lon
eW
eb-b
ased
Stan
d-a
lon
eW
eb-b
ased
Lice
nse
Co
mm
erci
alG
PL(G
NU
Gen
eral
Publ
icLi
cen
se)
LGPL
(GN
ULe
sser
Gen
eral
Publ
icLi
cen
se)
No
ne
Typ
eo
fkn
ow
led
gePr
op
riet
ary
Publ
icPu
blic
Publ
icIn
pu
td
ata
Gen
ep
rote
ino
rm
etab
oli
teli
sts
imp
ort
edas
tab-
de-
lim
ited
text
(TX
T)
com
ma-
sep
arat
edva
lues
(CSV
)or
Exce
lfile
sge
ne
list
sfr
om
mic
roar
ray
anal
ysis
soft
war
e(A
ffym
etri
xA
gile
nt)
Tab
-del
imit
edte
xt(T
XT
)or
com
ma-
sep
arat
edva
lues
(CSV
)fo
rco
nce
ntr
atio
ns
spec
tral
bin
so
rp
eak
inte
n-
sity
dat
azi
pp
edfi
les
(ZIP
)of
NM
Ro
rM
Sp
eak
list
so
ro
fM
Ssp
ectr
a(i
nN
etC
DF
mzX
ML
or
mzD
AT
Afo
rmat
)
Tab
-del
imit
edte
xt(T
XT
)or
com
ma-
sep
arat
edva
lues
(CSV
)of
het
ero
gen
eou
sty
pes
of
pro
cess
edlsquoo
mic
srsquod
ata
Co
mm
a-se
par
ated
valu
es(C
SV)
of
pro
cess
edtr
ansc
rip
tom
ic
pro
teo
mic
or
met
abo
lom
icd
ata
Dat
ap
rep
arat
ion
Dat
ain
tegr
ity
chec
kin
g
Dat
an
orm
aliz
atio
n
Co
mp
ou
nd
nam
eid
enti
fica
tio
n
Stat
isti
cala
nal
ysis
Un
ivar
iate
anal
ysis
Mu
ltiv
aria
tean
alys
is
Clu
ster
ing
Cla
ssifi
cati
on
Dat
ain
terp
reta
tio
nan
din
tegr
atio
nFu
nct
ion
alin
terp
reta
tio
n
Met
abo
lite
set
enri
chm
ent
anal
ysis
Met
abo
lic
pat
hw
ayan
alys
is
Met
abo
lite
map
pin
g
Hyp
erli
nks
toex
tern
ald
atab
ases
Dat
aIn
tegr
atio
n
Ou
tpu
td
ata
Net
wo
rks
can
beex
po
rted
intw
ofo
rmat
sN
etsh
ot
and
Net
wo
rki
mag
esas
PNG
file
s
rep
ort
sco
nta
inin
gp
lots
gr
aph
san
dta
bles
wit
hal
lth
ere
sult
sIm
ages
are
avai
l-ab
leas
TIF
or
PNG
file
s
Tab
ula
rfo
rmat
(eg
CSV
)fo
ren
rich
men
tan
alys
isre
sult
san
dJP
Gfi
les
for
pat
hw
ay-
base
dvi
sual
izat
ion
PNG
SV
Go
rSI
Ffo
rmat
sfo
rim
ages
Tic
ksin
dic
ate
too
lfu
nct
ion
alit
ies
504 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
together with the values of a numeric attribute (eg fold-changes P-values gene expression changes protein levels me-tabolite concentrations) have to be included and the data typehas to be specified by the user [58 69]
Data preparation
Data preparation includes integrity checking normalizationand compound name identification This set of procedures isfully available only in MetaboAnalyst MetaCoreTM and 3Omiscjust offer a tool for compound name standardization butMetaCoreTM also allows the integration with specific softwarefor data preprocessing InCroMAP and 3Omics only accept dataalready preprocessed and normalized with appropriate software(eg OpenMS MayDay R IBM-SPSS Statistics SAS or JMP) [2758] A brief description of each data preparation step is given inthe following text
Data integrity checkingAmong the four tools reviewed data integrity checking is per-formed only by MetaboAnalyst even if other software providethis functionality (eg XCMS [46] for metabolomic data orProgenesis QI [70] for proteomic data) As for MetaboAnalystthis step includes handling of missing values as well as identifi-cation and removal of outliers [69] Missing values are automat-ically replaced with small ones ie half of the minimumpositive value in the original data but the user can also specifyother missing value estimation methods eg by replacingthem with the meanmedian of the original data or by usingK-nearest neighbors probabilistic principal component ana-lysis Bayesian PCA or singular value decomposition methods
Data normalizationMetaboAnalyst provides different normalization methods (egby sum median or reference sample) that can be selected by theuser data transformation (log or cube root) and scaling (autopareto or range scaling) are also available [69]
Compound name identificationAs there is no universally accepted set of compound labels mol-ecule labels in userrsquos input data have to be converted to identi-fiers in public databases (eg HMDB PubChem CompoundChEBI KEGG or METLIN) MetaboAnalyst and 3Omics provide amodule specifically designed for this purpose [27 41] whereasMetaCoreTM has a built-in synonym dictionary for gene and pro-tein names that enables compound label standardization [54]InCroMAP does not provide an ID-conversion tool as it only ac-cepts data with an appropriate identifier (eg Affymetrix orAgilent for genes KEEG HMDB or PubChem for metabolites)[58] We highlight that the conversion of a compound name intoits correct ID code is not a painless procedure and can be anadditional source of error or ambiguity Furthermore differentresults may be obtained depending on the used platformtoolIn fact the same compound could be associated with differentIDs according to its different molecular structures (eg chirality)We recall an example reported by Cavill et al [22] regarding thecase of lactate and KEGG identifiers there are three KEGG iden-tifiers that relate to lactate according to the chirality of the mol-ecule (ie non-superposabity on its mirror image) and whetherthese chiral metabolites are distinguishable depends on theresolution provided by the analytical platform used Becausethey have the same mass a solution could be using specificallyselected elution columns in MS to ensure they elute at differenttimes and are thus experimentally distinguishable
Statistical analysis
Out of the four tools reviewed MetaboAnalyst is the one thatoffers the most complete set of statistical and machine learningmethods for data analysis 3Omics only performs clustering andco-expression analysis whereas InCroMAP does not provide astatistical module Unfortunately only limited information isavailable about the data analysis performed by MetaCoreTM butit seems incorporated in the pathway analysis module Analysisalgorithms used in MetaboAnalyst and 3Omics have originallybeen implemented in the R open-source project [65 66]
A wide variety of statistical methods and data miningapproaches can be applied on metabolomic data depending onthe kind of experiment performed In the following text we il-lustrate both univariate and multivariate statistics and alsobriefly describe time-course analysis for completeness
Univariate analysisIt is usually the first analysis performed as it provides a prelim-inary overview about the data features that are potentially sig-nificant in discriminating the conditions under study 3Omicsperforms co-expression analysis by means of an algorithm thatcomputes dissimilarity coefficients using the Euclidean dis-tance and displays results as heat maps
For two-group data MetaboAnalyst provides fold changeanalysis t-test and volcano plots (ie a type of scatter plots toquickly identify changes in large data sets of replicate data hav-ing the fold change on the x-axis and the negative log of theP-value on the y-axis) both for unpaired and paired analysesFor multi-group data it provides one-way analysis of variance(ANOVA) with associated post hoc analyses and correlationanalysis As a large number of metabolites are usually presentfor each patient or biological sample and an individual statis-tical test has to be performed for each metabolite a high num-ber of false-positive results can be obtained owing to multipletesting To limit it a multiple testing correction technique (egBonferroni BonferronindashHolm WestfallndashYoung or BenjaminindashHochberg correction) must be used to adjust the obtained sig-nificance P-values to keep the probability of observing at leastone significant result owing to chance below a predeterminedlevel [71] The BenjaminindashHochberg correction [72] also knownas false discovery rate (FDR) is the recommended one as itallows controlling the proportion of false positives among allsignificant results Both FDR- and Bonferroni-corrected P-valuesare provided by MetaboAnalyst for each evaluated metabolite
Multivariate analysisIt involves the simultaneous observation and analysis of morethan two statistical variables It is ideal for the analysis oflsquoomicsrsquo data as they usually consist of several features thatchange as a function of time phenotype or experimental condi-tion Multivariate analysis techniques include multivariateANOVA multivariate regression analysis factor analysis prin-cipal component analysis and partial least square discriminantanalysis all of them are supported by MetaboAnalyst Thesetechniques are useful for exploratory data analysis as theyallow summarizing original variables in fewer variables usingtheir weighted average
ClusteringCluster analysis aims to determine intrinsic groups in a set ofunlabeled data thus it is useful to identify groups of metabol-ites that have similar characteristic or behavior or that belongto the same biological pathway MetaboAnalyst supports two
Analysis of metabolomic data | 505
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering
ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines
Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]
Data interpretation and integration
This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text
Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge
MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]
MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in
Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided
with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact
and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-
way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound
colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas
other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of
this figure is available at BIB online httpbiboxfordjournalsorg
506 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view
InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]
3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]
Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation
MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results
Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)
InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]
3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts
Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-
logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds
with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version
of this figure is available at BIB online httpbiboxfordjournalsorg
Analysis of metabolomic data | 507
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]
Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]
Output data
Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis
MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]
MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results
of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing
Conclusion
The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions
Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford
Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations
Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification
508 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future
Key Points
bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools
bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge
bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge
Funding
This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)
References1 Mehrotra B Mendes P Bioinformatics approaches to inte-
grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15
2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210
3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93
4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95
5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39
6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9
7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)
8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34
9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)
10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)
11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302
12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65
13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1
14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93
15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293
16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108
17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2
18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43
19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9
20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48
21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69
22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090
23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100
24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)
25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60
26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6
27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764
28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)
29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4
30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)
31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7
32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109
33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4
34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382
35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)
Analysis of metabolomic data | 509
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9
37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4
38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18
39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)
403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)
41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7
42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7
43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7
44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244
45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)
46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87
47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411
48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395
49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7
50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51
51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5
52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278
53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44
54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50
55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008
56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014
57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33
58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82
59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012
60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6
61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)
62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)
63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)
64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)
65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010
66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)
67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)
68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)
69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60
70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)
71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7
72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300
73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649
510 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
- bbw031-TF1
- bbw031-TF2
-
together with the values of a numeric attribute (eg fold-changes P-values gene expression changes protein levels me-tabolite concentrations) have to be included and the data typehas to be specified by the user [58 69]
Data preparation
Data preparation includes integrity checking normalizationand compound name identification This set of procedures isfully available only in MetaboAnalyst MetaCoreTM and 3Omiscjust offer a tool for compound name standardization butMetaCoreTM also allows the integration with specific softwarefor data preprocessing InCroMAP and 3Omics only accept dataalready preprocessed and normalized with appropriate software(eg OpenMS MayDay R IBM-SPSS Statistics SAS or JMP) [2758] A brief description of each data preparation step is given inthe following text
Data integrity checkingAmong the four tools reviewed data integrity checking is per-formed only by MetaboAnalyst even if other software providethis functionality (eg XCMS [46] for metabolomic data orProgenesis QI [70] for proteomic data) As for MetaboAnalystthis step includes handling of missing values as well as identifi-cation and removal of outliers [69] Missing values are automat-ically replaced with small ones ie half of the minimumpositive value in the original data but the user can also specifyother missing value estimation methods eg by replacingthem with the meanmedian of the original data or by usingK-nearest neighbors probabilistic principal component ana-lysis Bayesian PCA or singular value decomposition methods
Data normalizationMetaboAnalyst provides different normalization methods (egby sum median or reference sample) that can be selected by theuser data transformation (log or cube root) and scaling (autopareto or range scaling) are also available [69]
Compound name identificationAs there is no universally accepted set of compound labels mol-ecule labels in userrsquos input data have to be converted to identi-fiers in public databases (eg HMDB PubChem CompoundChEBI KEGG or METLIN) MetaboAnalyst and 3Omics provide amodule specifically designed for this purpose [27 41] whereasMetaCoreTM has a built-in synonym dictionary for gene and pro-tein names that enables compound label standardization [54]InCroMAP does not provide an ID-conversion tool as it only ac-cepts data with an appropriate identifier (eg Affymetrix orAgilent for genes KEEG HMDB or PubChem for metabolites)[58] We highlight that the conversion of a compound name intoits correct ID code is not a painless procedure and can be anadditional source of error or ambiguity Furthermore differentresults may be obtained depending on the used platformtoolIn fact the same compound could be associated with differentIDs according to its different molecular structures (eg chirality)We recall an example reported by Cavill et al [22] regarding thecase of lactate and KEGG identifiers there are three KEGG iden-tifiers that relate to lactate according to the chirality of the mol-ecule (ie non-superposabity on its mirror image) and whetherthese chiral metabolites are distinguishable depends on theresolution provided by the analytical platform used Becausethey have the same mass a solution could be using specificallyselected elution columns in MS to ensure they elute at differenttimes and are thus experimentally distinguishable
Statistical analysis
Out of the four tools reviewed MetaboAnalyst is the one thatoffers the most complete set of statistical and machine learningmethods for data analysis 3Omics only performs clustering andco-expression analysis whereas InCroMAP does not provide astatistical module Unfortunately only limited information isavailable about the data analysis performed by MetaCoreTM butit seems incorporated in the pathway analysis module Analysisalgorithms used in MetaboAnalyst and 3Omics have originallybeen implemented in the R open-source project [65 66]
A wide variety of statistical methods and data miningapproaches can be applied on metabolomic data depending onthe kind of experiment performed In the following text we il-lustrate both univariate and multivariate statistics and alsobriefly describe time-course analysis for completeness
Univariate analysisIt is usually the first analysis performed as it provides a prelim-inary overview about the data features that are potentially sig-nificant in discriminating the conditions under study 3Omicsperforms co-expression analysis by means of an algorithm thatcomputes dissimilarity coefficients using the Euclidean dis-tance and displays results as heat maps
For two-group data MetaboAnalyst provides fold changeanalysis t-test and volcano plots (ie a type of scatter plots toquickly identify changes in large data sets of replicate data hav-ing the fold change on the x-axis and the negative log of theP-value on the y-axis) both for unpaired and paired analysesFor multi-group data it provides one-way analysis of variance(ANOVA) with associated post hoc analyses and correlationanalysis As a large number of metabolites are usually presentfor each patient or biological sample and an individual statis-tical test has to be performed for each metabolite a high num-ber of false-positive results can be obtained owing to multipletesting To limit it a multiple testing correction technique (egBonferroni BonferronindashHolm WestfallndashYoung or BenjaminindashHochberg correction) must be used to adjust the obtained sig-nificance P-values to keep the probability of observing at leastone significant result owing to chance below a predeterminedlevel [71] The BenjaminindashHochberg correction [72] also knownas false discovery rate (FDR) is the recommended one as itallows controlling the proportion of false positives among allsignificant results Both FDR- and Bonferroni-corrected P-valuesare provided by MetaboAnalyst for each evaluated metabolite
Multivariate analysisIt involves the simultaneous observation and analysis of morethan two statistical variables It is ideal for the analysis oflsquoomicsrsquo data as they usually consist of several features thatchange as a function of time phenotype or experimental condi-tion Multivariate analysis techniques include multivariateANOVA multivariate regression analysis factor analysis prin-cipal component analysis and partial least square discriminantanalysis all of them are supported by MetaboAnalyst Thesetechniques are useful for exploratory data analysis as theyallow summarizing original variables in fewer variables usingtheir weighted average
ClusteringCluster analysis aims to determine intrinsic groups in a set ofunlabeled data thus it is useful to identify groups of metabol-ites that have similar characteristic or behavior or that belongto the same biological pathway MetaboAnalyst supports two
Analysis of metabolomic data | 505
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering
ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines
Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]
Data interpretation and integration
This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text
Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge
MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]
MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in
Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided
with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact
and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-
way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound
colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas
other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of
this figure is available at BIB online httpbiboxfordjournalsorg
506 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view
InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]
3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]
Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation
MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results
Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)
InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]
3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts
Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-
logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds
with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version
of this figure is available at BIB online httpbiboxfordjournalsorg
Analysis of metabolomic data | 507
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]
Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]
Output data
Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis
MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]
MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results
of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing
Conclusion
The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions
Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford
Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations
Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification
508 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future
Key Points
bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools
bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge
bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge
Funding
This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)
References1 Mehrotra B Mendes P Bioinformatics approaches to inte-
grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15
2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210
3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93
4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95
5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39
6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9
7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)
8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34
9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)
10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)
11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302
12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65
13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1
14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93
15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293
16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108
17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2
18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43
19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9
20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48
21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69
22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090
23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100
24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)
25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60
26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6
27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764
28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)
29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4
30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)
31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7
32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109
33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4
34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382
35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)
Analysis of metabolomic data | 509
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9
37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4
38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18
39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)
403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)
41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7
42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7
43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7
44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244
45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)
46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87
47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411
48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395
49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7
50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51
51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5
52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278
53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44
54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50
55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008
56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014
57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33
58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82
59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012
60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6
61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)
62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)
63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)
64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)
65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010
66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)
67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)
68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)
69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60
70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)
71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7
72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300
73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649
510 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
- bbw031-TF1
- bbw031-TF2
-
types of clustering hierarchical clustering (with results visual-ized through heat maps and dendrograms) and partitional clus-tering (using K-means or self-organizing map algorithm)whereas 3Omics only performs hierarchical clustering
ClassificationOn the basis of a training set it allows to identify to which cat-egory a new sample belongs and to assign metabolites to aknown group or pathway in high-dimensional dataClassification methods supported by MetaboAnalyst are ran-dom forest and support vector machines
Time-course analysisProvided only by MetaboAnalyst it allows detecting trends inmetabolite concentrations or metabolite distribution patternsover time eg to study treatment effects during multiple timepoints The MetaboAnalyst module currently supports bothmultivariate empirical Bayes time-series analysis for detectingdistinctive temporal profiles across different experimental con-ditions and ANOVA-simultaneous component analysis (ASCA)for the identification of major patterns associated with each ex-perimental factor and their interactions [41 57]
Data interpretation and integration
This final step of the data analysis workflow allows linking sig-nificantly expressed metabolites to the biological context underinvestigation As shown in Figure 1 this step consists of threeparts (ie functional interpretation metabolite mapping and data in-tegration) described in the following text
Functional interpretationIt includes enrichment analysis and pathway analysis Thesetwo approaches often overlap (pathwayndashenrichment analysis)as both work by comparing significant metabolites identifiedby the statistical analyses to predefined functional groupsderived from previous knowledge
MetaCoreTM contains many options for enrichment andpathway analyses some of which are specifically designed fordrug discovery and disease investigation By default calcula-tions are set to compare the userrsquos data set to the entire soft-ware database in the case of a limited data set a specificrestricted list of compounds can be selected from the entireMetaCoreTM database and used for comparison (data filtration)The statistics performed is based on the hypergeometric meanand returns a P-value that ranks the intersection between theuploaded data and the content prebuilt in MataCoreTM Resultsare displayed as histograms with pathways ordered from themost to the least significant one [55 56]
MetaboAnalyst incorporates the Metabolite Set EnrichmentAnalysis (MSEA) and Metabolomics Pathway Analysis (MetPA)software for metabolite set enrichment analysis and pathwayanalysis respectively [17 42] MSEA offers three different kindsof enrichment analyses ie over-representation analysis (ORA)single sample profiling (SSP) and quantitative enrichment ana-lysis (QEA) a description of these methods can be found in [4269] Briefly ORA evaluates if a particular set of metabolites isrepresented more than expected in a given compound list ex-tracted from a single biological sample SSP allows to investi-gate if certain metabolite concentrations in a given sample arehigher or lower than their normal range QEA is similar to ORAbut used with multiple samples (eg collected at different timepoints or belonging to different patients) [42] All these threemethods generate graphs or tables with embedded hyperlinksto relevant pathway images and disease descriptions MetPAprovides several different algorithms for pathway analysisincluding Fisherrsquos exact test hypergeometric text global testand GlobalAncova [37] Compound importance in the givenmetabolic pathway is estimated through its betweenness cen-trality and out-degree centrality (refer to [37 69] for further de-tails) Results are presented in two parts a table with allanalysis results and the graphical output which containsthree view levels metabolome view pathway view and compoundview The metabolome view and a pathway view are shown in
Figure 2 Some graphical visualization features of MetaboAnalyst metabolome view (A) and pathway view (B) Images were obtained using the example data provided
with the MetaboAnalyst software In the metabolome view each circle represents a different pathway circle size and color shade are based on the pathway impact
and p-value (red being the most significant) respectively By clicking on a circle the corresponding pathway view is generated showing all genes involved in that path-
way and their interactions The codes represent compound IDs as reported in KEGG In (B) the taurine and hypotaurine metabolism pathway is shown As for compound
colors within the pathway view light blue means the metabolite is not in the uploaded data but it has been used as background for enrichment analysis whereas
other colors (varying from yellow to red) mean the metabolite is in the data with a different level of significance (red being the most significant) A colour version of
this figure is available at BIB online httpbiboxfordjournalsorg
506 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view
InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]
3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]
Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation
MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results
Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)
InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]
3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts
Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-
logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds
with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version
of this figure is available at BIB online httpbiboxfordjournalsorg
Analysis of metabolomic data | 507
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]
Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]
Output data
Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis
MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]
MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results
of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing
Conclusion
The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions
Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford
Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations
Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification
508 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future
Key Points
bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools
bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge
bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge
Funding
This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)
References1 Mehrotra B Mendes P Bioinformatics approaches to inte-
grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15
2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210
3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93
4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95
5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39
6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9
7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)
8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34
9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)
10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)
11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302
12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65
13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1
14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93
15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293
16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108
17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2
18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43
19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9
20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48
21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69
22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090
23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100
24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)
25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60
26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6
27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764
28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)
29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4
30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)
31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7
32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109
33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4
34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382
35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)
Analysis of metabolomic data | 509
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9
37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4
38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18
39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)
403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)
41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7
42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7
43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7
44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244
45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)
46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87
47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411
48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395
49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7
50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51
51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5
52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278
53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44
54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50
55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008
56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014
57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33
58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82
59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012
60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6
61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)
62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)
63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)
64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)
65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010
66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)
67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)
68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)
69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60
70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)
71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7
72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300
73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649
510 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
- bbw031-TF1
- bbw031-TF2
-
Figure 2 Pathway and compound views are generated dynamic-ally based on the userrsquos interaction with the visualization sys-tem usually multiple pathway and compound views can beobtained from a metabolome view
InCroMAp performs both single data set and integratedcross-platform enrichment and pathway analyses using datafrom individual or heterogeneous platforms In the former sin-gle-platform case a hypergeometric test is used to detect rele-vant pathways whereas a straightforward extension of thesingle-platform method is used in the latter case Unidentifiedmetabolomic features ie which cannot be mapped to a knownset of genes proteins or metabolites are automatically dis-carded Results are presented in a table or a barplot sorted ac-cording to the associated P-values [58]
3Omics provides enrichment and pathway analyses for TndashPndashM PndashM TndashM and single metabolomic data To perform the ana-lysis the user has to select a known metabolite set and oneobtained from experimental data Significantly enriched path-ways are identified with a hypergeometric test Results are re-ported in tables ranked according to their probabilityMetabolites in the pathways are displayed alongside hyperlinksto the HMDB or HumanCyc database [27]
Metabolite mapping and data visualizationMetabolites are mapped to their biological pathway as networksof interconnected nodes thus transforming the original un-structured data into logically structured and visually under-standable representations [73] In this way users can moreeasily identify relationships and hidden patterns among thedata which facilitates hypothesis generation and resultinterpretation
MetaboAnalyst does not require this step as it already pro-vides interactive Google-Maps-style graphs of pathways andmetabolites for the visualization of enrichment and pathwayanalysis results
Within MetaCoreTM biological networks are built on inputmetabolite lists after metabolite preselection through enrich-ment and pathway analyses According to the aim of the studyand the data set size users can choose among several network-building algorithms which are described in details in [54 55]Networks are generated as a combination of single-step inter-actions (directed edges) connecting metabolites proteins orgenes (nodes) Each node is associated with a specific com-pound through the MetaCoreTM database architecture andusers can retrieve the information about the node-associatedcompound simply by clicking on the node An info page openscontaining many links with information about the node-associated object (eg gene protein metabolite ligand tran-scription factor or enzyme) maps where the object appearsrelated gene ontology processes or diseases all known drugs forthe object and a list of the objectrsquos network interactionsSignificant expression or abundance changes of specific com-pounds are also visually shown as a red or blue solid circleabove the node indicating increased or decreased expressionabundance respectively (Figure 3)
InCroMAP provides metabolite mapping support throughtwo different kinds of data visualizations the integrated visual-ization and the global metabolic overview The integrated visual-ization overlays a selected pathway of interest with thecorresponding lsquoomicsrsquo experimental data the global metabolicview integrates multi-omics data and displays them in a singlegraph where each subordinate metabolic pathway is colored ac-cording to the significance of its enrichment [58]
3Omics generates compound networks through theCorrelation Network module which incorporates the lsquocorrsquo func-tion from R to compute the Pearsonrsquos correlation coefficient(PCC) The PCCs between each pair of interconnected elementsin the network are calculated from two sets of values whichcan be of the same kind (intra-lsquoomicsrsquo analysis) or of differentkinds (inter-lsquoomicsrsquo analysis) In the latter case transcripts
Figure 3 Example of a biological network representation in MetaCoreTM According to their shape and color nodes represent genes proteins metabolites or other bio-
logical elements while directed edges represent the reactions intercurring between them green for activation red for inhibition and gray if unspecified Compounds
with abundance change are identified by a red (abundance increasing ie ZFX) or blue (abundance decreasing ie TAP1 RFX5) solid circle above them A colour version
of this figure is available at BIB online httpbiboxfordjournalsorg
Analysis of metabolomic data | 507
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]
Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]
Output data
Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis
MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]
MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results
of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing
Conclusion
The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions
Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford
Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations
Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification
508 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future
Key Points
bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools
bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge
bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge
Funding
This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)
References1 Mehrotra B Mendes P Bioinformatics approaches to inte-
grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15
2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210
3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93
4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95
5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39
6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9
7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)
8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34
9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)
10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)
11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302
12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65
13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1
14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93
15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293
16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108
17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2
18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43
19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9
20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48
21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69
22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090
23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100
24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)
25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60
26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6
27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764
28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)
29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4
30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)
31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7
32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109
33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4
34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382
35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)
Analysis of metabolomic data | 509
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9
37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4
38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18
39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)
403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)
41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7
42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7
43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7
44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244
45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)
46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87
47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411
48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395
49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7
50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51
51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5
52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278
53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44
54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50
55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008
56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014
57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33
58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82
59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012
60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6
61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)
62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)
63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)
64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)
65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010
66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)
67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)
68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)
69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60
70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)
71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7
72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300
73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649
510 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
- bbw031-TF1
- bbw031-TF2
-
proteins and metabolites are all represented in the same graphand identified with different symbols (squares triangles andcircles respectively) Uploaded data are automatically mappedinto networks using a force-directed layout algorithmLiterature-derived relationships are presented as dotted lineswhereas solid lines indicate strong correlations (PCCgt 09) iden-tified between the uploaded data [27]
Data integrationIt is crucial to obtain meaningful biological insights but it is stilldifficult to achieve Currently high-throughput analysis toolsmainly perform data integration through network buildingthus they provide a lsquovisual integrationrsquo of the different kinds oflsquoomicsrsquo data whereas the interpretation is left to the users Inlight of these considerations for MetaCoreTM InCroMAP and3Omics we can consider data integration as part of metabolitemapping as they map all lsquoomicsrsquo data levels on the same net-work MetaboAnalyst instead provides a specific module forintegrated pathway analysis which enables users to integratedata from gene expression and metabolomic experimentsUsers have to upload a list of genes and metabolites of interestidentified from the same samples or obtained under similar ex-perimental conditions genes and metabolites are then mappedto KEGG pathways for overrepresentation analysis In spite ofits great usefulness this MetaboAnalyst module still has somelimitations On the positive side by combining the evidencebased on changes in both gene expressions and metabolite con-centrations it is more likely to identify which pathways areinvolved or up-down-regulated in a biological process On thenegative side users have to keep in mind that these graphicalrepresentations are prone to errors In fact although the entiretranscriptome is routinely mapped current metabolomic tech-nologies capture only a small portion of the metabolome thisdifference can thus lead to potentially biased results Moreoverthis module does not support proteomic data thus a compre-hensive multilevel data integration cannot be achieved yet [41]
Output data
Once the data analysis integration and interpretation havebeen performed users can export the results in different for-mats to use them for further analysis
MetaCoreTM allows saving networks in two formats Networkand Netshot Netshot saves the network exactly as it is whereasNetwork does not retain the expression data and will reflect anysubsequent changes to the objects or interactions in the net-work as the MetaCoreTM database is updated Thereby saving anetwork in both formats enables users to see exactly how itlooked when it was created (Netshot) and how it may havechanged after subsequent MetaCoreTM updates (Network) High-quality images of networks and maps can be saved in PNG for-mat MetaCoreTM also provides a Data Sharing module to shareexperiments gene lists and saved networks with other users orgroups with different permission levels [55]
MetaboAnalyst generates a report in a PDF format contain-ing plots graphs and tables with all the results of the analysesperformed and a brief explanation plots graphs and tables canalso be downloaded as TIF files In addition the processed nu-meric data high-resolution images (PNG format) R scripts andR command history are also available for downloading (beforebeing deleted raw data files are stored on the server for 72 h)Users can easily rerun and in some cases modify resulting Rscripts on their local machine after installation of the R softwareand the required packages [25 66] As for InCroMAP the results
of the enrichment analysis can be exported in a tabular format(eg CSV) and the pathway-based visualization can be stored asa JPG file In 3Omics network images can be exported in PNGSVG or SIF formats and all processed data can be downloadedfor further analyses To safeguard data confidentiality up-loaded data files are only temporarily stored during their evalu-ation section and then deleted after processing
Conclusion
The fast-growing metabolomics domain generates high quanti-ties of valuable data that require integration and comprehen-sive analysis with other lsquoomicsrsquo data to be fully interpreted Themost common approach consists in simultaneously monitoringthe levels of transcripts proteins and metabolites and in com-bining the obtained data to infer the structure and dynamics ofthe underlining biological networks in data sets of interestSeveral statistical and computational methods are required toanalyze and integrate this diverse and large amount of datacomputational tools supporting also data visualization and me-tabolite mapping greatly help to quickly identify the relevantmetabolites and the involved biological processes in the studiedconditions
Of the many tools available for processing and analyzingmetabolomic data we reviewed and compared four of the mostrelevant for number of users or provided features and pointedout their advantages and drawbacks MetaCoreTM is the one withthe wider integrated database of molecular information contain-ing more than 1500 signaling and metabolic pathway maps andover 13 million molecular interactions which is continuallyupdated to guarantee reliability and comprehensiveness [45]This makes MetaCoreTM a powerful tool for researchers in manydifferent fields from drug discovery to biomarker identificationand clinical applications It also provides several biological net-work-building algorithms to map high-throughput experimentaldata into interactive and information-rich networks The greatlimitation in the use of this software is that it requires purchas-ing a license whose cost not all research groups can afford
Among the freely available tools MetaboAnalyst is the mostcomplete one as it offers comprehensive data processing op-tions a wide array of univariate and multivariate statisticalmethods and extensive data visualization and functional ana-lysis modules Its main limitation is the lack of support forproteomic data integration Given its high ease of use InCroMAPaddresses to investigators of any kind of discipline and it is suit-able for the evaluation of system biology studies However itslack of a data preprocessing module requires the use of othersoftware to prepare the input data 3Omics is useful for re-searchers interested in integrated visualization and one-clickcomparative analysis of multiple lsquoomicsrsquo data in a simple andrapid way Yet as InCroMAP it does not support data prepro-cessing and it provides only a few statistical methods for dataanalysis furthermore it only supports human data evaluations
Overall in spite of the undeniable validity of the tools re-viewed there are still several challenges that need to be ad-dressed mainly in the field of data integration to support athorough comprehensive evaluation of the experimental dataand a deeper understanding of the biological processesAlthough multiple lsquoomicsrsquo data are increasingly available anopen issue is still their effective use to understand the biologicalmechanisms responsible for the variance in the observedmetabolomic profiles Toward this goal further developmentand improvement of computational techniques for efficientstorage integration and use of prior knowledge identification
508 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future
Key Points
bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools
bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge
bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge
Funding
This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)
References1 Mehrotra B Mendes P Bioinformatics approaches to inte-
grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15
2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210
3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93
4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95
5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39
6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9
7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)
8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34
9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)
10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)
11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302
12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65
13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1
14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93
15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293
16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108
17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2
18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43
19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9
20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48
21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69
22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090
23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100
24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)
25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60
26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6
27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764
28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)
29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4
30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)
31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7
32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109
33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4
34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382
35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)
Analysis of metabolomic data | 509
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9
37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4
38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18
39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)
403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)
41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7
42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7
43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7
44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244
45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)
46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87
47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411
48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395
49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7
50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51
51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5
52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278
53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44
54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50
55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008
56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014
57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33
58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82
59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012
60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6
61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)
62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)
63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)
64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)
65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010
66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)
67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)
68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)
69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60
70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)
71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7
72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300
73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649
510 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
- bbw031-TF1
- bbw031-TF2
-
and accurate quantification of metabolites heterogeneouslsquoomicsrsquo data integration and pathway visualization are essentialand shall continue to be the focus of the bioinformatics commu-nity in the next future
Key Points
bull Metabolomics is a growing field of biology that gener-ates large amounts of data handling processing andanalysis of these data are still challenging and requirespecialized mathematical statistical and bioinfor-matics tools
bull Metabolomic data alone are not enough to gain thor-ough understanding of a biological system and its be-havior under pathological conditions integration withother lsquoomicsrsquo data (mainly transcriptomics and prote-omics) and with previous knowledge is needed to gaindeeper knowledge
bull Several tools are available for lsquoomicsrsquo data analysisand integration which constitute an invaluable helpfor researchers in many different fields in spite ofthis there are still open challenges mainly regardingheterogeneous data integration and their comprehen-sive analysis that need to be faced to take effectiveadvantage of new experimental data and availableknowledge
Funding
This research was supported by the ldquoShockOmicsrdquo grant (FP7 EUProject No 602706)
References1 Mehrotra B Mendes P Bioinformatics approaches to inte-
grate metabolomics and other system biology data InBiotechnology in Agriculture and Forestry Plant metabolomics200657105ndash15
2 Joyce AR Palsson BOslash The model organism as a system inte-grating lsquoomicsrsquo data sets Nat Rev Mol Cell Biol 20067198ndash210
3 Wishart DS Current progress in computational metabolo-mics Brief Bioinform 20078279ndash93
4 Vinaixa M Samino S Saez I et al A guideline to univariatestatistical analysis for LCMS-based untargeted metabolo-mics-derived data Metabolites 20122775ndash95
5 Shulaev V Metabolomics technology and bioinformaticsBrief Bioinform 20067128ndash39
6 Patti GJ Yanes O Siuzdak G Metabolomics the apogee of theomics biology Nat Rev Mol Cell Biol 201213263ndash9
7 SMPDB httpsmpdbca (25 February 2016 date lastaccessed)
8 Ogata H Goto S Sato K et al KEGG Kyoto Encyclopedia ofGenes and Genomes Nucleic Acids Res 19992729ndash34
9 MetaCyc httpmetacycorg (25 February 2016 date lastaccessed)
10HumanCyc httphumancycorg (25 February 2016 date lastaccessed)
11Roberts LD Souza LA Gerszten RE et al Targeted metabolo-mics Curr Protoc Mol Biol 2012302
12Castle AL Fiehn O Kaddurah-Daouk R et al MetabolomicsStandards Workshop and the development of internationalstandards for reporting metabolomics experimental resultsBrief Bioinform 20067159ndash65
13Gomez-Cabrero D Abugessaisa I Maier D et al Data integra-tion in the era of omics current and future challenges BMCSyst Biol 20148(Suppl 2)I1
14Goble C Stevens R State of the nation in data integration forbioinformatics J Biomed Inform 200841687ndash93
15Zhang A Sun H Wang P et al Modern analytical techniquesin metabolomics analysis Analyst 2012137293
16Sugimoto M Kawakami M Robert M et al Bioinformaticstools for mass spectroscopy-based metabolomic data pro-cessing and analysis Curr Bioinform 2012796ndash108
17Reshetova P Smilde AK Van Kampen AHC et al Use of priorknowledge for the analysis of high-throughput transcriptom-ics and metabolomics data BMC Syst Biol 20148(Suppl 2)S2
18Krastanov A Metabolomics - The state of art BiotechnolBiotechnol Equip 2010241537ndash43
19Patti GJ Separation strategies for untargeted metabolomics JSep Sci 2011343460ndash9
20Vuckovic D Current trends and challenges in sample prepar-ation for global metabolomics using liquid chromatography-mass spectrometry Anal Bioanal Chem 201261523ndash48
21Villas-Boas SG Hojer-Pedersen J Akesson M et al Global me-tabolite analysis of yeast evaluation of sample preparationmethods Yeast 2005221155ndash69
22Cavill R Jennen D Kleinjans J Briede JJ Transcriptomic andmetabolomic data integration Brief Bioinform 2015 doi101093bibbbv090
23Pfau T Pacheco MP Sauter T Towards improved genome-scale metabolic network reconstructions unification tran-script specificity and beyond Brief Bioinform 2015 doi101093bibbbv100
24Thomson Reuters MetaCoreTM 2004 httplsresearchthomsonreuterscom (25 February 2016 date last accessed)
25Xia J Psychogios N Young N et al MetaboAnalyst A web ser-ver for metabolomic data analysis and interpretation NucleicAcids Res 200937652ndash60
26Wrzodek C Eichner J Zell A Pathway-based visualization ofcross-platform microarray datasets Bioinformatics2012283021ndash6
27Kuo T-C Tian T-F Tseng YJ 3Omics a web-based systemsbiology tool for analysis integration and visualization ofhuman transcriptomic proteomic and metabolomic dataBMC Syst Biol 2013764
28 Ingenuity IPA Ingenuity Pathway Analysis httpwwwingenuitycomproductsipa (25 February 2016 date lastaccessed)
29Ludemann A Weicht D Selbig J et al PaVESy Pathway visu-alization and editing system Bioinformatics 2004202841ndash4
30Proteome Software 2005 httpwwwproteomesoftwarecom (25 February 2016 date last accessed)
31Hu Z Mellor J Wu J et al VisANT Data-integrating visualframework for biological networks and modules Nucleic AcidsRes 200533352ndash7
32 Junker BH Klukas C Schreiber F VANTED a system foradvanced data analysis and visualization in the context ofbiological networks BMC Bioinformatics 20067109
33Suhre K Schmitt-Kopplin P MassTRIX mass translator intopathways Nucleic Acids Res 200836481ndash4
34Neuweger H Persicke M Albaum SP et al Visualizing postgenomics data-sets on customized pathway maps byProMeTra-aeration-dependent gene expression and metabol-ism of Corynebacterium glutamicum as an example BMC SystBiol 2009382
35MetaboAnalyst 30 2009 httpwwwmetaboanalystcaMetaboAnalyst (25 February 2016 date last accessed)
Analysis of metabolomic data | 509
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9
37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4
38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18
39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)
403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)
41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7
42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7
43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7
44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244
45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)
46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87
47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411
48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395
49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7
50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51
51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5
52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278
53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44
54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50
55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008
56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014
57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33
58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82
59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012
60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6
61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)
62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)
63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)
64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)
65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010
66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)
67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)
68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)
69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60
70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)
71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7
72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300
73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649
510 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
- bbw031-TF1
- bbw031-TF2
-
36Garcıa-Alcalde F Garcıa-Lopez F Dopazo J et al Paintomics aweb based tool for the joint visualization of transcriptomicsand metabolomics data Bioinformatics 201127137ndash9
37Xia J Wishart DS MetPA A web-based metabolomics tool forpathway analysis and visualization Bioinformatics 2010262342ndash4
38Kamburov A Cavill R Ebbels TMD et al Integrated pathway-level analysis of transcriptomics and metabolomics datawith IMPaLA Bioinformatics 2011272917ndash18
39 InCroMAP 2011 httpwwwracsuni-tuebingendesoftwareInCroMAP (25 February 2016 date last accessed)
403Omics 2013 http3omicscmdmtw (25 February 2016date last accessed)
41Xia J Sinelnikov IV Han B et al MetaboAnalyst 30 - makingmetabolomics more meaningful Nucleic Acids Res 201543W251ndash7
42Xia J Wishart DS MSEA a web-based tool to identify biologic-ally meaningful patterns in quantitative metabolomic dataNucleic Acids Res 201038W71ndash7
43Zhengzheng P Comparing and combining NMR spectroscopyand mass spectrometry in metabolomics Anal Bioanal Chem2007387525ndash7
44Dunn WD Ellis DI Metabolomics current analytical plat-forms and methodologies Trends Anal Chem 2005244
45Thermo Scientific SIEVETM Software for Differential Analysishttpwwwthermoscientificcomenproductsieve-software-differential-analysishtml (25 February 2016 date lastaccessed)
46Smith CA Want EJ OrsquoMaille G et al XCMS processing massspectrometry data for metabolite profiling using nonlinearpeak alignment matching and identification Anal Chem200678779ndash87
47Clasquin MF Melamud E Rabinowitz JD LC-MS data process-ing with MAVEN a metabolomic analysis and visualizationengine Curr Protoc Bioinformatics 20121411
48Pluskal T Castillo S Villar-Briones A et al MZmine 2 modu-lar framework for processing visualizing and analyzingmass spectrometry-based molecular profile data BMCBioinformatics 201011395
49Wishart DS Jewison T Guo AC et al HMDB 30 - The HumanMetabolome Database in 2013 Nucleic Acids Res 201341801ndash7
50Smith CA OrsquoMaille G Want EJ et al METLIN a metabolitemass spectral database Ther Drug Monit 200527747ndash51
51Marsquoayan A Introduction to network analysis in systems biol-ogy Sci Signal 20114tr5
52Mitrea C Taghavi Z Bokanizad B et al Methods andapproaches in the topology-based analysis of biological path-ways Front Physiol 20134278
53Krzywinski M Birol I Jones SJM et al Hive plots-rational ap-proach to visualizing networks Brief Bioinform 201113627ndash44
54Ekins S Nikolsky Y Bugrim A et al Pathway mapping toolsfor analysis of high content data Methods Mol Biol2007356319ndash50
55GeneGo MetaCore training manual - Version 50 St JosephMI (USA) 2008
56GeneGo MetaCore Integrated pathway analysis for all omicsdata 2014
57Xia J Mandal R Sinelnikov I et al MetaboAnalyst 20 - a com-prehensive server for metabolomic data analysis Nucl AcidsRes 201240127ndash33
58Eichner J Rosenbaum L Wrzodek C et al Integrated enrich-ment analysis and pathway-centered visualization of metab-olomics proteomics transcriptomics and genomics data byusing the InCroMAP software J Chromatogr B Analyt TechnolBiomed Life Sci 201496677ndash82
59Wrzodek C Userrsquos Guide for InCroMAP Integrated Analysis ofMicroarray Data from Different Platforms Tubingen GermanyCenter for Bioinformatics Tuebingen (ZBIT) 2012
60Fernandez JM Hoffmann R Valencia A iHOP web servicesNucleic Acids Res 200735W21ndash6
61 iHOP 2007 httpwwwihop-netorgUniPubiHOP (25February 2016 date last accessed)
62ProteWizard httpproteowizardsourceforgenet (25February 2016 date last accessed)
63MassMatrix httpwwwmassmatrixnet (25 February 2016date last accessed)
64MATLAB httpitmathworkscom (25 February 2016 datelast accessed)
65R Development Core Team R A Language and Environment forStatistical Computing Vienna Australia R Development CoreTeam 2010
66R Development Core Team 2010 httpswwwr-projectorg(25 February 2016 date last accessed)
67Affymetrix httpwwwaffymetrixcom (25 February 2016date last accessed)
68Agilent httpwwwagilentcom (25 February 2016 date lastaccessed)
69Xia J Wishart DS Web-based inference of biological patternsfunctions and pathways from metabolomic data usingMetaboAnalyst Nat Protoc 20116743ndash60
70Progenesis QI httpwwwnonlinearcomprogenesisqi (25February 2016 date last accessed)
71Noble WS How does multiple testing correction work NatBiotechnol 2009271135ndash7
72Benjamini Y Hochberg Y Controlling the false discovery ratea practical and powerful approach to multiple testing J R StatSoc 199557289ndash300
73Topfer N Kleessen S Nikoloski Z Integration of metabolo-mics data into metabolic networks Front Plant Sci 2015649
510 | Cambiaghi et al
Dow
nloaded from httpsacadem
icoupcombibarticle-abstract1834982453286 by U
niversidad Nacional Autonom
a de Mexico user on 12 M
ay 2019
- bbw031-TF1
- bbw031-TF2
-