Community Profiling via
QIIME Dorota Porazinska and Zech Xu
University of Colorado Boulder, CO
File Download
• View slides at: – hAp://goo.gl/4duXII
• Raw files: – hAps://app.box.com/s/kwzjd1go2g8cmic59xcd – Extract it: !tar zxf crawford_mice.tar.gz!
• View IPython Notebook – hAp://nbviewer.ipython.org/gist/RNAer/d8e7cbd7b68a273d2269 – Also inside the downloaded files (require ipython to open it)
• Processed file: – hAps://app.box.com/s/3a6gvuyn8crjamx7uqte – Run: !mv output.tar.gz crawford_mice!!tar zxf output.tar.gz!
Sequencing cost ge]ng cheaper
hAp://goo.gl/rWW1Ay
Tsunami of sequence data
???
1st vs. NGS technologies
hAp://www.patrickwardphd.com/wp-‐content/uploads/2012/05/sprinkler-‐kids-‐l.jpg hAp://1000awesomethings.com/2011/06/21/218-‐drinking-‐from-‐the-‐hose/
A classic microbial ecology study
A classic microbial ecology study
A classic microbial ecology study
A classic microbial ecology study
A classic microbial ecology study
A classic microbial ecology study
Bacterial Community Variacon in Human Body Habitats Across Space and Time, Costello et al., Science 2009
Modified from Hamady et al. Genome Research. 2009
Datasets with billions of sequences:
• Human Microbiome Project: Largest characterizacon of the microbiome of healthy individuals – NIH sponsored, $185 million project – Samples from 300 adults and 18 body sites – Raw data: ~232 GB
Earth Microbiome Project
Coursera Course
hAps://www.coursera.org/course/microbiome
… accumulacng data Healthy individual traveling from the US to Bangladesh
Healthy individual traveling from the US to Bangladesh
Relacves of Crohn's disease pacents
Healthy individual traveling from the US to Bangladesh
Relacves of Crohn's disease pacents
Pacents
… so what can we tell from all this work?
Healthy individual traveling from the US to Bangladesh
Relacves of Crohn's disease pacents
Pacents
USA Global gut
Healthy individual traveling from the US to Bangladesh
Relacves of Crohn's disease pacents
Pacents
USA Venezuela Malawi
Global gut
Healthy individual traveling from the US to Bangladesh
Relacves of Crohn's disease pacents
Pacents
USA Venezuela Malawi
Global gut
HMP
… so what can we tell from all this work?
Healthy individual traveling from the US to Bangladesh
Relacves of Crohn's disease pacents
Pacents
USA Venezuela Malawi
Global gut
HMP
… so what can we tell from all this work?
Healthy individual traveling from the US to Bangladesh
Relacves of Crohn's disease pacents
Pacents
USA Venezuela Malawi
Global gut
HMP
hAp://qiime.org hAp://forum.qiime.org hAp://blog.qiime.org
Graphical User Interface
Command line
Perform idenccal operacons
Paths (absolute) /Users/yoshiki/evident-data/hmp-v13_arare/alpha_div $HOME/evident-data/hmp-v13_arare/alpha_div ~/evident-data/hmp-v13/alpha_div
A slash at the beginning of a path denotes it as an absolute path, i. e. from the base of your hard drive.
Paths (relacve) evident-data/hmp-v13_arare/alpha_div
On the other side relacve paths are not preceeded by a slash
QIIME
QIIME Structure
● Integrates other somware ● Set of scripts to perform certain funccons ● Allows an easy workflow ● Keys, wallet, phone: print_qiime_config.py
QIIME somware dependencies [data-‐lanemask] [data-‐core] [python] [setuptools] [MySQL-‐python] [SQLAlchemy] [pycogent] [pynast] [numpy] [matplotlib] [mpi4py] [lxml] [sphinx] [raxml] [fasFree]
[cdbtools] [chimeraslayer] [cdhit] [rdpclassifier] [blast] [muscle] [infernal] [cytoscape] [clearcut] [mothur] [uclust] [r] [ampliconnoise] [vienna] [pprospector]
Script types
Single Task One step Most of them
Workflows MulGple scripts in one Uses a log file Indicated in the script descripcon
QIIME commands
Get help with index site hAp://qiime.org/genindex.html Get help with the -‐h opcon pick_otus.py -h
Command names are self-‐explanatory Filtering filter_fasta.py filter_otus_by_sample.py filter_distance_matrix.py Sorcng sort_otu_table.py
Ge]ng help
hAp://qiime.org/genindex.html
These opGons are required, else the script will not funcGon correctly
These arguments are opGonal, you can either use them or not, some default values are explained here.
QIIME
• The code is tested (properly) • The documentacon is updated constantly based on users suggescons
• The help in the QIIME-‐forum has a collaboracve spirit (developers & users sharing their research experiences)
print_qiime_config.py
QIIME
Sequencing output (454, Illumina, Sanger)
fastq, fasta, qual, or sff/trace files
Metadata
mapping file
Pre-processinge.g., remove primer(s), demultiplex,
quality filter
Denoise 454 Data
PyroNoise, Denoiser
Reference basedBLAST, UCLUST,
USEARCH
Pick OTUs and representative sequences
De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH
Assign taxonomy
BLAST, RDP Classifier
Align sequences
e.g., PyNAST, INFERNAL, MUSCLE,
MAFFT
Build 'OTU table'i.e., sample by observation
matrix
Build phylogenetic treee.g., FastTree, RAxML,
ClearCut
Database Submission
(In development)
OTU (or other sample by observation) table
Phylogenetic Tree
Evolutionary relationship between OTUs
α-diversity and rarefaction
e.g., Phylogenetic Diversity, Chao1,
Observed Species
β-diversity and rarefaction
e.g., Weighted and unweighted UniFrac, Bray-
Curtis, Jaccard
Interactive visualizations
e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.
Legend
Required step or input Optional step or input
Currently supported for marker-gene data only
(i.e., 'upstream' step)
Currently supported for general sample by observation data
(i.e., 'downstream' step)
www.QIIME.org
Upstream analyses Downstream analyses
Sequencing output (454, Illumina, Sanger)
fastq, fasta, qual, or sff/trace files
Metadata
mapping file
Pre-processinge.g., remove primer(s), demultiplex,
quality filter
Denoise 454 Data
PyroNoise, Denoiser
Reference basedBLAST, UCLUST,
USEARCH
Pick OTUs and representative sequences
De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH
Assign taxonomy
BLAST, RDP Classifier
Align sequences
e.g., PyNAST, INFERNAL, MUSCLE,
MAFFT
Build 'OTU table'i.e., sample by observation
matrix
Build phylogenetic treee.g., FastTree, RAxML,
ClearCut
Database Submission
(In development)
OTU (or other sample by observation) table
Phylogenetic Tree
Evolutionary relationship between OTUs
α-diversity and rarefaction
e.g., Phylogenetic Diversity, Chao1,
Observed Species
β-diversity and rarefaction
e.g., Weighted and unweighted UniFrac, Bray-
Curtis, Jaccard
Interactive visualizations
e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.
Legend
Required step or input Optional step or input
Currently supported for marker-gene data only
(i.e., 'upstream' step)
Currently supported for general sample by observation data
(i.e., 'downstream' step)
www.QIIME.org
QC and split libraries
Sequencing output (454, Illumina, Sanger)
fastq, fasta, qual, or sff/trace files
Metadata
mapping file
Pre-processinge.g., remove primer(s), demultiplex,
quality filter
Denoise 454 Data
PyroNoise, Denoiser
Reference basedBLAST, UCLUST,
USEARCH
Pick OTUs and representative sequences
De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH
Assign taxonomy
BLAST, RDP Classifier
Align sequences
e.g., PyNAST, INFERNAL, MUSCLE,
MAFFT
Build 'OTU table'i.e., sample by observation
matrix
Build phylogenetic treee.g., FastTree, RAxML,
ClearCut
Database Submission
(In development)
OTU (or other sample by observation) table
Phylogenetic Tree
Evolutionary relationship between OTUs
α-diversity and rarefaction
e.g., Phylogenetic Diversity, Chao1,
Observed Species
β-diversity and rarefaction
e.g., Weighted and unweighted UniFrac, Bray-
Curtis, Jaccard
Interactive visualizations
e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.
Legend
Required step or input Optional step or input
Currently supported for marker-gene data only
(i.e., 'upstream' step)
Currently supported for general sample by observation data
(i.e., 'downstream' step)
www.QIIME.org
Building an OTU table
Alpha and Beta diversity
Sequencing output (454, Illumina, Sanger)
fastq, fasta, qual, or sff/trace files
Metadata
mapping file
Pre-processinge.g., remove primer(s), demultiplex,
quality filter
Denoise 454 Data
PyroNoise, Denoiser
Reference basedBLAST, UCLUST,
USEARCH
Pick OTUs and representative sequences
De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH
Assign taxonomy
BLAST, RDP Classifier
Align sequences
e.g., PyNAST, INFERNAL, MUSCLE,
MAFFT
Build 'OTU table'i.e., sample by observation
matrix
Build phylogenetic treee.g., FastTree, RAxML,
ClearCut
Database Submission
(In development)
OTU (or other sample by observation) table
Phylogenetic Tree
Evolutionary relationship between OTUs
α-diversity and rarefaction
e.g., Phylogenetic Diversity, Chao1,
Observed Species
β-diversity and rarefaction
e.g., Weighted and unweighted UniFrac, Bray-
Curtis, Jaccard
Interactive visualizations
e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.
Legend
Required step or input Optional step or input
Currently supported for marker-gene data only
(i.e., 'upstream' step)
Currently supported for general sample by observation data
(i.e., 'downstream' step)
www.QIIME.org
Sequencing output (454, Illumina, Sanger)
fastq, fasta, qual, or sff/trace files
Metadata
mapping file
Pre-processinge.g., remove primer(s), demultiplex,
quality filter
Denoise 454 Data
PyroNoise, Denoiser
Reference basedBLAST, UCLUST,
USEARCH
Pick OTUs and representative sequences
De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH
Assign taxonomy
BLAST, RDP Classifier
Align sequences
e.g., PyNAST, INFERNAL, MUSCLE,
MAFFT
Build 'OTU table'i.e., sample by observation
matrix
Build phylogenetic treee.g., FastTree, RAxML,
ClearCut
Database Submission
(In development)
OTU (or other sample by observation) table
Phylogenetic Tree
Evolutionary relationship between OTUs
α-diversity and rarefaction
e.g., Phylogenetic Diversity, Chao1,
Observed Species
β-diversity and rarefaction
e.g., Weighted and unweighted UniFrac, Bray-
Curtis, Jaccard
Interactive visualizations
e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.
Legend
Required step or input Optional step or input
Currently supported for marker-gene data only
(i.e., 'upstream' step)
Currently supported for general sample by observation data
(i.e., 'downstream' step)
www.QIIME.org
Visualizacons
Sequencing output (454, Illumina, Sanger)
fastq, fasta, qual, or sff/trace files
Metadata
mapping file
Pre-processinge.g., remove primer(s), demultiplex,
quality filter
Denoise 454 Data
PyroNoise, Denoiser
Reference basedBLAST, UCLUST,
USEARCH
Pick OTUs and representative sequences
De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH
Assign taxonomy
BLAST, RDP Classifier
Align sequences
e.g., PyNAST, INFERNAL, MUSCLE,
MAFFT
Build 'OTU table'i.e., sample by observation
matrix
Build phylogenetic treee.g., FastTree, RAxML,
ClearCut
Database Submission
(In development)
OTU (or other sample by observation) table
Phylogenetic Tree
Evolutionary relationship between OTUs
α-diversity and rarefaction
e.g., Phylogenetic Diversity, Chao1,
Observed Species
β-diversity and rarefaction
e.g., Weighted and unweighted UniFrac, Bray-
Curtis, Jaccard
Interactive visualizations
e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.
Legend
Required step or input Optional step or input
Currently supported for marker-gene data only
(i.e., 'upstream' step)
Currently supported for general sample by observation data
(i.e., 'downstream' step)
www.QIIME.org
QC and split libraries
Data
Sequences are in FASTA format
Data
• Quality scores are in the .qual file, similar to FASTA
Metadata (mapping file)
validate_mapping_file.py
Split libraries
• Demulcplex • Quality trim • Quality filter
split_libraries.py hAp://qiime.org/scripts/split_libraries.html
Output files: seqs.fna – demulcplexed sequences histograms.txt – histogram of read lengths split_library_log.txt – detailed informacon about the demulcplexing and quality of reads
Error-‐correccng codes allow mulcplex sequencing
Micah Hamady, et al., Nature Methods, 2008. Error-‐correccng barcodes for pyrosequencing hundreds of samples in mulcplex.
>GCACCTGAGGACAGGCATGAGGAA… >GCACCTGAGGACAGGGGAGGAGGA… >TCACATGAACCTAGGCAGGACGAA… >CTACCGGAGGACAGGCATGAGGAT… >TCACATGAACCTAGGCAGGAGGAA… >GCACCTGAGGACACGCAGGACGAC… >CTACCGGAGGACAGGCAGGAGGAA… >CTACCGGAGGACACACAGGAGGAA… >GAACCTTCACATAGGCAGGAGGAT… >TCACATGAACCTAGGGGCAAGGAA… >GCACCTGAGGACAGGCAGGAGGAA…
>PC.634_1 FLP3FBN01ELBSX CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTTACCCTCTCAGGCCGGCTACGCATCATCGCCTTGGTGGGCCGTTACCTCACCAACTAGCTAATGCGCCGCAGGTCCATCCATGTTCACGCCTTGATGGGCGCTTTAATATACTGAGCATGCGCTCTGTATACCTATCCGGTTTTAGCTACCGTTTCCAGCAGTTATCCCGGACACATGGGCTAGG!>PC.354_3 FLP3FBN01EEWKD !TTGGACCGTGTCTCAGTTCCAATGTGGGGGCCTTCCTCTCAGAACCCCTATCCATCGAAGGCTTGGTGGGCCGTTACCCCGCCAACAACCTAATGGAACGCATCCCCATCGATGACCGAAGTTCTTTAATAGTTCTACCATGCGGAAGAACTATGCCATCGGGTATTAATCTTTCTTTCGAAAGGCTATCCCCGAGTCATCGGCAGGTTGGATACGTGTTACTCACCCGTGCGCCGGT!
split_libraries.py
• seqs.fna – demulcplexed sequences
Sequencing output (454, Illumina, Sanger)
fastq, fasta, qual, or sff/trace files
Metadata
mapping file
Pre-processinge.g., remove primer(s), demultiplex,
quality filter
Denoise 454 Data
PyroNoise, Denoiser
Reference basedBLAST, UCLUST,
USEARCH
Pick OTUs and representative sequences
De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH
Assign taxonomy
BLAST, RDP Classifier
Align sequences
e.g., PyNAST, INFERNAL, MUSCLE,
MAFFT
Build 'OTU table'i.e., sample by observation
matrix
Build phylogenetic treee.g., FastTree, RAxML,
ClearCut
Database Submission
(In development)
OTU (or other sample by observation) table
Phylogenetic Tree
Evolutionary relationship between OTUs
α-diversity and rarefaction
e.g., Phylogenetic Diversity, Chao1,
Observed Species
β-diversity and rarefaction
e.g., Weighted and unweighted UniFrac, Bray-
Curtis, Jaccard
Interactive visualizations
e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.
Legend
Required step or input Optional step or input
Currently supported for marker-gene data only
(i.e., 'upstream' step)
Currently supported for general sample by observation data
(i.e., 'downstream' step)
www.QIIME.org
Building an OTU table
OTU Picking -‐ “de-‐novo”
• Pros – Vast majority of reads are clustered – No reference database bias
• Cons – Speed; not easily parallelizable – Erroneous reads get clustered
CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA
Clustered Sequences
OTUS OTU1 OTU2 OTU3
Clustering Algorithm CTGGGCCGTGTCTCAGTCCCAAACA TTGGAAGATGTCTCAGTTCCAGACA
CTGGGCCGTGTCTCAGTCCCAAACA TTGGAAGATGTCTCAGTTCCAGACA
CTGGGCCGTGTCTCAGTCCCAAACA TTGGAAGATGTCTCAGTTCCAGACA
Experimental Sequences
OTU Picking -‐ “closed-‐reference”
• Pros – Reference database is a quality filter – Speed; easily parallelizable
• Cons – No new OTUs can be observed – Reference database bias
CTGGGCCGTGTCTCAGTCCCAA
CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA
Experimental Sequences
Reference Sequences
CTGGGCCGTGTCTCAGTCCCAA
CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG
CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG
CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG
Sequences that hit a reference
CTGGGCCGTGTCTCAGTCCCAA
Sequences that failed to hit
CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA
OTUS OTU1 OTU1 OTU1
Reference database
Percentage of reads that do not hit the reference colleccon, by environment type.
Other databases
• hAp://www.arb-‐silva.de hAp://qiime.org/home_stacc/dataFiles.html
• hAp://ssu-‐rrna.org
OTU Picking -‐ “open-‐reference”
• Pros – Best of both worlds
• Cons – Downsides of de-‐novo
CTGGGCCGTGTCTCAGTCCCAA
CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA
Experimental Sequences
Reference Sequences
CTGGGCCGTGTCTCAGTCCCAA
CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG
CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG
CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG
Sequences that hit a reference
CTGGGCCGTGTCTCAGTCCCAA
Sequences that failed to hit
CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA
OTUS
OTU1 OTU2 OTU3
OTU4 OTU5 OTU6
Clustering Algorithm
pick_open_reference_otus.py
• hAp://qiime.org/scripts/pick_open _reference_otus.html • Workflow script, performs all steps through building an OTU
table (see the log file) – pick_otus.py: determine the OTU clusters – pick_rep_set.py: pick the representacve sequence for each OTU cluster – align_seqs.py: align the sequences to a template or other reference alignment – assign_taxonomy.py: allot a taxonomy to the representacve sequences – filter_alignment.py: remove non-‐phylogeneccally informacve posicons – make_phylogeny.py: construct a phylogeny from an alignment – make_otu_table.py: constructs the actual OTU table object
QIIME parameters
• hAp://qiime.org/documentacon/qiime_parameters_files.html
• Modify the default behavior of a workflow script. • Blank lines and those starcng with ‘#’ are ignored • Format
– script:parameter value
OTU Table in BIOM format
• Opcmized and efficient data abstraccon • Can be used with many types of data, but to make it Excel 'readable’ use: biom convert
biom convert
• hAp://biom-‐format.org • Converts the BIOM format OTU table to an Excel readable format
• biom convert –i otu_table_mc2_w_tax.biom –o otu_table.txt -‐b
OTU table sample idencfiers
Taxonomic Assignment
• Kingdom • Phylum
• Class • Order • Family • Genus • Species
Sequence 16S gene and compare to 16S database with taxonomic assignments
Taxonomic Assignment using e.g. Uclust
CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA
Experimental Sequences
Reference Sequences CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA
Biom summary
• Basic stacsccs on the OTU table – Num samples, OTUs, sequences in OTUs – Sequences per sample – Useful to determine values to use in downstream analyses
Alpha and Beta diversity
Sequencing output (454, Illumina, Sanger)
fastq, fasta, qual, or sff/trace files
Metadata
mapping file
Pre-processinge.g., remove primer(s), demultiplex,
quality filter
Denoise 454 Data
PyroNoise, Denoiser
Reference basedBLAST, UCLUST,
USEARCH
Pick OTUs and representative sequences
De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH
Assign taxonomy
BLAST, RDP Classifier
Align sequences
e.g., PyNAST, INFERNAL, MUSCLE,
MAFFT
Build 'OTU table'i.e., sample by observation
matrix
Build phylogenetic treee.g., FastTree, RAxML,
ClearCut
Database Submission
(In development)
OTU (or other sample by observation) table
Phylogenetic Tree
Evolutionary relationship between OTUs
α-diversity and rarefaction
e.g., Phylogenetic Diversity, Chao1,
Observed Species
β-diversity and rarefaction
e.g., Weighted and unweighted UniFrac, Bray-
Curtis, Jaccard
Interactive visualizations
e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.
Legend
Required step or input Optional step or input
Currently supported for marker-gene data only
(i.e., 'upstream' step)
Currently supported for general sample by observation data
(i.e., 'downstream' step)
www.QIIME.org
How do we describe and compare diversity?
• α Diversity: – “How many species (taxa) are in a sample?”
• e.g. 6 colors in A and 6 in B • Are polluted environments less diverse than priscne?
• β Diversity: – “How many species are shared between samples?”
• e.g. 2 shared colors between A and B • Do the microbiota differ among different disease states?
A
B
Qualitacve vs. Quanctacve measures
• Qualitacve: Considers presence/absence – α: How many species are in a sample?
• e.g.: 6 species (colors) in both A and B. – β: How many species are shared between samples?
• e.g.: A and B are idenccal because the same colors are present in both.
• Quanctacve: Considers abundance – α: Accounts for distribucon:
• e.g. in B, 6 species are evenly distributed and thus the co community is more diverse than in A where 1 species dominates over other 5.
– β: Samples will be considered more similar if the same distribucon of species is similar. • e.g. B and A no longer look idenccal because of differences in abundance.
A
B
What is a phylogenecc diversity measure?
• α Diversity: – Taxon: “How many species are in a sample?” – Phylogenecc: “How much phylogenecc divergence is in a
sample?” • e.g. B more diverse than A -‐ more divergent colors
• β Diversity: – Taxon: “How many species are shared between samples?” – Phylogenecc: “How much phylogenecc distance is shared
between samples?” • only related colors from B are in A
A
B
UniFrac distance matrix
core_diversity_analyses.py • Workflow script
– filter_samples_from_otu_table.py: Filter samples with low sequence count from table
– single_rarefaccon.py: sample the table at specified sequencing depth – beta_diversity.py: use the sampled table for beta diversity calculacon – principal_coordinates.py: perform PCoA analysis – make_emperor.py: make plots for principal coordinates – mulcple_rarefaccons.py: make mulcple subsamplings/rarefaccons on an otu
table at various sequencing depths – alpha_diversity.py and collate_alpha.py: calculate alpha diversices at those
depths and collate them – make_rarefaccon_plots.py: plot the rarefaccon curves – summarize_taxa.py and plot_taxa_summary.py: summarize taxa and plot
them
Alpha diversity
Basic alpha diversity measure: count number of OTUs. other measures can be: • phylogenecc (PD) • escmators (chao1) • other stacsccs (evenness) • …
Beta diversity
orange1 orange2 blue1 OTU1 4 4 0 OTU2 4 4 0 OTU3 0 1 7 OTU4 0 0 7
Summarize Taxa
• Calculates proporcon of taxa per sample, at different taxonomic levels
• summarize_taxa_through_plots.py
Taxa Summarized by Category
Procrustes Analysis
hAp://qiime.org/tutorials/procrustes_analysis.html transform_coordinate_matrices.py compare_3d_plots.py
Muegge, B. D. et al. Science 332, 970–974 (2011).
Stacsccally Different?
• group_significance.py • Parametric
– G-‐test – ANOVA – T-‐test
• Non parametric – Kruskal-‐Wallis – Mann-‐Whitney-‐U – Bootstrap Mann-‐Whitney-‐U – Bootstrap T-‐test
• compare_categories.py • make_distance_boxplots.py • …
Acknowledgements
Rob Knight Antonio Gonzalez Meg Pirrung Adam Robbins-‐Pianka Luke Ursell Tony Walters Doug Wendel Daniel McDonald Yoshiki Vázquez Baeza Will Van Treuren Laura Wegener Parfery Kris Mayer
Merete Eggesbo Jessica Metcalf Ulla Westermann Zhenjiang Zech Xu Jose Navas Chris Lauber MaA Gebert Greg C Humphrey Hongwei Zhou
Rick Stevens (Argonne), Jack Gilbert (Argonne), Folker Meyer (Argonne), Janet Jansson (LBNL), Jed Fuhrman (USC), Jonathan Eisen (UC Davis), many, many sample donors.
Other collaborators: Noah Fierer (CU, EEB), Jeff Gordon (Wash U), Ruth Ley (Cornell), Peter Turnbaugh(Harvard), Maria Gloria Dominguez (UPR), Catherine Lozupone (CU) ...
Top Related