BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for...
Transcript of BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for...
![Page 1: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/1.jpg)
B I O I NFOR MAT I C S A P P ROACH ES F O R M E TAG ENOM I C S D ATA A N A LYS I S
A D I D O R O N - FA I G E N B O I M
P L A N T S C I E N C E S , V E G E TA B L E A N D F I E L D C R O P S A R O , T H E V O L C A N I C E N T E R , I S R A E L R I S H O N L E Z I O N 7 5 2 8 8 0 9
![Page 2: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/2.jpg)
Metagenomics
o“Metagenomics is the study of the collective genomes of all microorganisms from an environmental sample”o Community
o Environmental
o Ecological
![Page 3: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/3.jpg)
DNA sequencing & microbial profilingTraditional microbiology relies on isolation and culture of bacteria
o Cumbersome and labour intensive process
o Fails to account for the diversity of microbial life
o Great plate-count anomaly
Staley, J. T., and A. Konopka. 1985. Measurements of in situ activities of nonphotosynthetic microorganisms in aquatic and terrestrial habitats. Annu. Rev. Microbiol. 39:321-346
![Page 4: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/4.jpg)
Why environmental sequencing?Estimated 1000 trillion tons of bacterial/archeal life on Earth
o Only a small proportion of organisms have been grown in culture
o Species do not live in isolation
o Clonal cultures fail to represent the natural environment of a given organism
o Many proteins and protein functions remain undiscovered
![Page 5: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/5.jpg)
Why environmental sequencing?
Human microbiomeRhizobiome Pollutant
sitesNon-human microbiomes
![Page 6: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/6.jpg)
The revolution in sequencing technologiesHigh throughput technologies promote the accumulation of enormous volumes of genomic and metagenomics data.
Next-Generation Sequencing: A Review of Technologies and Tools for Wound Microbiome Research Brendan P. Hodkinson and Elizabeth A. Grice*. Adv Wound Care (New Rochelle). 2015
HiSeqMiSeq
![Page 7: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/7.jpg)
Experimental ApproachesCommunity composition
◦ Microbiome (16S rRNA gene, 18S, ITS, etc.)
Community composition and functional potential◦ Metagenomics
Functional genetic response◦ Metatranscriptomics
![Page 8: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/8.jpg)
16s Vs. Shotgun Metagenomico16s – targeted sequencing of a single gene
◦ Marker for identification
◦ Well established
◦ Cheap
◦ Amplified what you want
oShotgun sequencing – sequence all the DNA◦ No primer bias
◦ Can identify all microbes
◦ Function information
![Page 9: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/9.jpg)
16S rRNA sequencing
Erlandsen S L et al. J Histochem Cytochem2005;53:917-927
• 16S rRNA forms part of bacterial ribosomes.
• Contains regions of highly conserved and highly variable sequence.
• Variable sequence can be thought of as a molecular “fingerprint” can be used to identify bacterial genera and species.
• Large public databases available for comparison.–Ribosomal Database Project (RDP) currently contains >1.5 million rRNA sequences.
• Conserved regions can be targeted to amplify broad range of bacteria from environmental samples.
• Not quantitative due to copy number variation
![Page 10: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/10.jpg)
16S rRNA gene sequencingo Pros
◦ Well established
◦ Sequencing costs are relatively cheap (~50,000 reads/sample)
◦ Only amplifies what you want (no host contamination)
oCons◦ Primer choice can bias results towards certain organisms
◦ Usually not enough resolution to identify to the strain level
◦ Need different primers usually for archaea & eukaryotes (18S)
◦ Cannot identify viruses
◦ No direct functional profiling
![Page 11: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/11.jpg)
Binning sequences to UTSoOperational Taxonomic Unit (OTU) An arbitrary definition of a taxonomic unit based on
sequence divergence
oComposition-based binning− GC content
− Di/Tri/Tetra/... nucleotide composition (kmer-based frequency comparison)
− Codon usage statistics
oSimilarity-based binning− Direct comparison of OTU sequence to a reference database
− Identity cut-off varies depending on resolution required Genus - 90% , Family - 80% , Species - 97%
![Page 12: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/12.jpg)
MEGAN Blast against NCBI database
Clustering of OTUs based on sequence similarity
Sample 2 Sample 1
OTU present 50:50 in both samples
![Page 13: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/13.jpg)
Software for binningo Composition-based binning
o TETRA - Maximal-Order Markov Modelo PhyloPythia – Support Vectoro Seeded Growing Self-Organising Maps (S-GSOM)o TETRA + Codon based usage
o Similarity-based binningoRequires that most sequences in a sample are present in a primary or secondary reference
databaseoQIIME oMEGAN (comparison against Blast NCBI NR)oMothur (RDP)oCARMA (comparison against PFAM)oARB (linked with Silva database)
![Page 14: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/14.jpg)
Sequences Databases
![Page 15: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/15.jpg)
Measuring diversity of OTUsTwo primary measures for sequence based studies:
• Alpha diversity
−What is there? How much is there?
−Diversity within a sample
• Beta diversity
−How similar are two samples?
−Diversity between samples
![Page 16: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/16.jpg)
Alpha diversity – human microbiome
C Huttenhower et al. Nature 486, 207-214 (2012) doi:10.1038/nature11234
![Page 17: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/17.jpg)
Alpha diversityoSpecies count in the sampleowhat is a species ?
o OUTs
omissing level of evolutionary diversity
oPhylogenetic diversity (PD)o sum of the branch length covered by a sample
omissing the distribution of the species
![Page 18: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/18.jpg)
Alpha diversityoSimpson’s diversity index (also Shannon, Chao indexes)o gives less weight to rarest species
S is the number of speciesN is the total number of organismsni is the number of organisms of species i
Whittaker, R.H. (1972). "Evolution and measurement of species diversity". Taxon(International Association for Plant Taxonomy (IAPT)) 21 (2/3): 213–251
![Page 19: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/19.jpg)
Beta diversity – human microbiome
C Huttenhower et al. Nature 486, 207-214 (2012) doi:10.1038/nature11234
![Page 20: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/20.jpg)
Beta diversityoDiversity between samples
oUnifrac distance
oPhytogenic-based beta diversity
oPercentage observed branch length unique to either sample
Lozupone and Knight, 2005. Unifrac: A new phylogenetic method for comparing microbial communitieis. Appl Environ Microbiol 71:8228
![Page 21: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/21.jpg)
Other useful data representationsSimple bar charts - what species are present?
![Page 22: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/22.jpg)
Other useful data representationsRarefaction curves - How much of a community have we sampled?
Nu
mb
er
of
OT
Us
Number of sequences
Adapted from Wooley et al. A Primer on Metagenomics, PLoS Computational Biology, Feb 2010, Vol 6(2)
![Page 23: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/23.jpg)
Shotgun whole metagenomeoUnlike 16S, metagenomic sequencing is no targeted to
a specific gene, but does an unbiased sample of the entire genomic DNA.
oTypically shorter sequence reads are usedto obtain >5Gb of data per sample.
oHiSeq or NextSeq platform are typically more costeffective for metagenomic sequencing
![Page 24: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/24.jpg)
Shotgun metagenomicsPros
◦ No primer bias
◦ Can identify all microbes (e.g. eukaryotes, viruses)
◦ Direct functional profiling
• Cons◦ More expensive (millions of sequences needed)
◦ Host/site contamination can be significant
◦ May not be able to sequence “rare” microbes
◦ Required computational resources can be restrictive
◦ More complex bioinformatic analyses required◦ Chimera, unknown function
![Page 25: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/25.jpg)
Sequence coverageComplexity
Diversity & Coverage
Estimating coverage in metagenomic data sets and why it matters. ISME J. 2014Luis M Rodriguez-R and Konstantinos T Konstantinidis
![Page 26: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/26.jpg)
Metagenomics' assembly
![Page 27: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/27.jpg)
Metagenomics' assembly
Metagenomic Assembly: Overview, Challenges and Applications. Yale J Biol Med. 2016 Sep; 89(3): 353–362
![Page 28: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/28.jpg)
Metagenomics' assembly
o Greedy assembler:o reads with maximum overlaps are iteratively merged into contigs
o Overlap-Layout-Consensus : o graph is constructed by finding overlaps between all pairs of reads
o Bruijn graph: o reads are chopped into short overlapping segments (k-mers) o K-mers are organized in a de Bruijn graph based on their co-occurrence across reads. o The graph is simplified to remove artifacts due to sequencing errors, o branch-less paths are reported as contigs.
![Page 29: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/29.jpg)
de Bruijn graph approacho Low abundance genomes may end up fragmented if overall sequencing depth is insufficient to form connections in the grapho Using a short k-mer size
oThe assembler must strike a balance between recovering low abundance genomes and obtaining long, accurate contigs for high abundance genomes
oComputational time and memory may be insufficient to complete such assemblies.
oMultiple k-mer approach
oSpread memory load over cluster of computer
![Page 30: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/30.jpg)
Metagenome assembly tools
Comparing and Evaluating Metagenome Assembly Tools from a Microbiologist’s Perspective - Not Only Size Matters!John Vollmers, Sandra Wiegand, Anne-Kristin Kaster
![Page 31: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/31.jpg)
What we do with the assemblyoCharacterizing the contigs/scaffolds oMapping statistics
o Compositions (%GC, codon usage)
o Annotation - taxonomy & function assignments
oBinning
oComparative genomics
oMetabolic pathways
![Page 32: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/32.jpg)
Binning over read mappingoPartition the metagenome to specieso Read coverage (multiple samples)
o compositions
Metagenomic Assembly: Overview, Challenges and Applications. Yale J Biol Med. 2016 Sep; 89(3): 353–362
GC%sample3
sample2
sample1
3460727scaffold1
3361629scaffold2
5120215scaffold3
5022207scaffold4
![Page 33: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/33.jpg)
Binning over read mappingGC%sample
3sample
2sample
1
3460727scaffold1
3361629scaffold2
5120215scaffold3
5022207scaffold4
0
10
20
30
40
50
60
70
GCsample3sample2sample1
scaffold1
scaffold2
scaffold3
scaffold4
![Page 34: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/34.jpg)
Binning contigsoCompletely automated approacho CONCOCT
o GroopM
oMetaBAT
oCompleteness of metagenome assembled genomes (MAGs)o single-copy core genes (tRNA synthetases , ribosomal proteins)
![Page 35: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/35.jpg)
Genes annotationsoFinds bacterial genes in the contigs/scaffolds
◦ Prodigal◦ Prokka
oAnnotation of the genes◦ By homology searches (DIAMOND)◦ Domains finding
o Comparisons◦ Gene family◦ Distribution among the samples (CD-HIT)
Functional potential - The annotations suggest the functional potential of the community
No sure about the biology activity (may not be transcribed an translates)
![Page 36: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/36.jpg)
Common functional databasesoNCBI
oCOGo Well known but original classification (not updated since 2003)
o PFAMo Focused more on protein domains based on hidden Markov models
oKEGGo Very popular, each entry is well annotated, and often linked into “Modules” or “Pathways”o Full access now requires a license fee
o MetaCyco Similar to KEGG, but more microbe focused
o UniRefo Has clustering at different levels (e.g. UniRef100, UniRef90, UniRef50)o Most comprehensive and is constantly updatedo These gene families are typically less functionally informative
![Page 37: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/37.jpg)
Metagenomic annotation systemWeb-based
◦ EBI
◦ MG-RAST
GUI-based◦ MEGAN
Local-based◦ Kraken
◦ MetAMOS
![Page 38: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/38.jpg)
Post-processing analysisoData matrices of samples versus microbial featureso species
o genes
o Pathways
oUnsupervised methodso Clustering and correlations
o PCA
oStatistically different between sample typeso taxa or functional genes
![Page 39: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/39.jpg)
A Review of Bioinformatics Tools for Bio-Prospecting from Metagenomic Sequence DataFront. Genet., 06 March 2017
![Page 40: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/40.jpg)
Case study: the microbiome of fruit peel
Maria Vetcos Edoardo Piombo Shlomit Medina
Shiri Freilich
Samir Droby Michael Wisniewski
![Page 41: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/41.jpg)
Case study: the microbiome of fruit peel
![Page 42: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/42.jpg)
Read length: 150Total of 472 million quality reads
Sequencing output: files in FASTQ format
![Page 43: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/43.jpg)
Assembly: MEGAHIT Format: FASTQTotal of 472 million quality reads Total of 71 Gbp
Format: FASTATotal number of contigs/contigs > 2k: 4,000,000/200,000Average contig length: 820/4,600 bpN50: 980/5000 bpTotal #bp: 3Gbp/1Gbp
![Page 44: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/44.jpg)
Sample #raw reads #clean reads %clean reads #PE%mapping vs.
Filtered set
A1 26,692,151 22,638,404 84.81296243 45,276,808 75.59
A2 32,550,741 27,819,952 85.46641688 55,639,904 69.84
A3 24,083,541 20,677,583 85.85773579 41,355,166 82.77
C1W 29,722,008 25,416,861 85.51528887 50,833,722 78.32
C2W 24,125,961 20,451,024 84.76770728 40,902,048 76.01
C3W 24,956,733 21,353,952 85.56389172 42,707,904 87.48
M1 26,211,005 21,974,866 83.83831906 43,949,732 66.52
M2 5,640,819 4,765,939 84.49019548 9,531,878 62.97
M3 6,113,051 5,137,683 84.04449758 10,275,366 57.24
O1S 23,760,866 19,848,045 83.53249835 39,696,090 57.85
O2S 28,317,777 23,141,736 81.72158429 46,283,472 57.22
O3S 28,604,975 22,679,029 79.28351275 45,358,058 64.43
Total 280,779,628 235,905,074 84.02 471,810,148
Full contig set Contig > 2KTotal number of
sequences3,762,133 206,575
Total number of bps
3,085,995,440 945,480,334
Average sequence length
820.27 4,576.93
N50 979 4,926
![Page 45: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/45.jpg)
Format: FASTATotal number of contigs > 2k pb: 200,000
Gene calling: Prodigal
Format: FASTATotal number of genes: 1,000,000
![Page 46: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/46.jpg)
Genome/geneassembly
(pooled data)
Raw Genomic
Data
4 treatments X 3 repeats = 12 libraries
~45 million reads per libraryTotal of ~472 million quality
reads
~200,000 contigswith N50 of ~5000 bp
With 60% of reads mapped
Functional and taxonomic
annotations
AnnotationsGene calling
~1,000,000 genes
From sequence to gene: summary
![Page 47: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/47.jpg)
JGI annotation platform
![Page 48: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/48.jpg)
Annotation in MEGAN based DIAMOND similarity search
1,000,000
genes
Ncbi NR
DIAMOND
Similarity search
Detection of homologs
for 75 % of genesCondensation into
DAA binary format
![Page 49: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/49.jpg)
Input daa file
SEED
KEGG
Taxonomy
Output filesTaxonPathTaxon IDetc
Output files
Output files
KEGGPathKEGGNameetc
SEEDPathSEEDNameetc
MEGAN annotation platform
![Page 50: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/50.jpg)
Taxonomic annotations
![Page 51: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/51.jpg)
Krona chart: dynamic representationMegan file- Taxonomy ID
assigned_Krona_All.html
![Page 52: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/52.jpg)
Annotations of most genes on the same contigare consistent
![Page 53: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/53.jpg)
![Page 54: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/54.jpg)
SEED
KEGG
Functional annotations
![Page 55: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/55.jpg)
Annotations statistic
%
genes Assigned assigned genes assigned genes
Taxa 759,353 570,702 0.75 75
Interpro2go 759,353 367,789 0.48 48
Eggnog 759,353 255,892 0.34 34
KEGG* 759,353 187,842 0.25 25
* from seed 2015 mapping file
![Page 56: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/56.jpg)
Count data
The count data are presented as a table which reports, for each sample, the number of sequence fragments that have been assigned to each genes.
![Page 57: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/57.jpg)
PCA & correlationsIsrael organic
Israel conventional
US conventional
![Page 58: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/58.jpg)
compounds_contig_conventionalcompunds_contig_organic compunds_gene_conventional compunds_gene_organic
Cutin, suberine and wax biosynthesis 0 5 0 6
Biosynthesis of alkaloids derived from shikimate pathway 0 5 0 4
Drug metabolism - cytochrome P450 0 10 0 9
Glycerophospholipid metabolism 5 0 5 0
Tyrosine metabolism 2 6 2 6
Bisphenol degradation 0 4 0 4
Penicillin and cephalosporin biosynthesis 2 4 2 4
Chlorocyclohexane and chlorobenzene degradation 0 6 0 5
Steroid hormone biosynthesis 10 1 10 1
Inflammatory mediator regulation of TRP channels 3 1 3 0
Isoquinoline alkaloid biosynthesis 0 6 0 6
Arachidonic acid metabolism 17 0 17 0
Aminobenzoate degradation 0 7 0 7
Retinol metabolism 0 6 0 6
Flavonoid biosynthesis 8 0 8 0
Flavone and flavonol biosynthesis 7 1 6 1
Fluorobenzoate degradation 11 0 11 0
Anthocyanin biosynthesis 12 0 12 0
Betalain biosynthesis 8 0 8 0
Steroid biosynthesis 12 0 12 0
Polycyclic aromatic hydrocarbon degradation 0 21 0 21
Porphyrin and chlorophyll metabolism 14 0 14 0
Amino sugar and nucleotide sugar metabolism 0 9 0 9
Biosynthesis of plant secondary metabolites 4 2 4 1
Biosynthesis of type II polyketide products 5 0 5 0
Ubiquinone and other terpenoid-quinone biosynthesis 1 10 1 10
Linoleic acid metabolism 5 0 5 0
Biosynthesis of 12-, 14- and 16-membered macrolides 21 4 21 4
Glycine, serine and threonine metabolism 4 1 4 1
OrganicConventionalName
![Page 59: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/59.jpg)
![Page 60: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/60.jpg)
![Page 61: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/61.jpg)
Differential abundance of enzymes in the KEGG metabolic pathway
![Page 62: BIOINFORMATICS APPROACHES FOR METAGENOMICS DATA … Work… · bioinformatics approaches for metagenomics data analysis ad i d oron - faige nboim p lant s ciences , veg etab le and](https://reader030.fdocuments.us/reader030/viewer/2022013009/5ecfbdcf951509080e10ee69/html5/thumbnails/62.jpg)
Thank you