Esa 2014 qiime

Post on 20-Jun-2015

452 views 1 download

Tags:

Transcript of Esa 2014 qiime

Community  Profiling  via  

QIIME  Dorota  Porazinska  and  Zech  Xu  

University  of  Colorado  Boulder,  CO  

 

File  Download  

•  View  slides  at:  –  hAp://goo.gl/4duXII  

•  Raw  files:    –  hAps://app.box.com/s/kwzjd1go2g8cmic59xcd  –  Extract  it:  !tar zxf crawford_mice.tar.gz!

•  View  IPython  Notebook  –  hAp://nbviewer.ipython.org/gist/RNAer/d8e7cbd7b68a273d2269  –  Also  inside  the  downloaded  files  (require  ipython  to  open  it)  

•  Processed  file:  –  hAps://app.box.com/s/3a6gvuyn8crjamx7uqte  –  Run:  !mv output.tar.gz crawford_mice!!tar zxf output.tar.gz!

Sequencing  cost  ge]ng  cheaper  

hAp://goo.gl/rWW1Ay  

Tsunami  of  sequence  data  

???

1st  vs.  NGS  technologies    

hAp://www.patrickwardphd.com/wp-­‐content/uploads/2012/05/sprinkler-­‐kids-­‐l.jpg   hAp://1000awesomethings.com/2011/06/21/218-­‐drinking-­‐from-­‐the-­‐hose/  

A  classic  microbial  ecology  study  

A  classic  microbial  ecology  study  

A  classic  microbial  ecology  study  

A  classic  microbial  ecology  study  

A  classic  microbial  ecology  study  

A  classic  microbial  ecology  study  

Bacterial  Community  Variacon  in  Human  Body  Habitats  Across  Space  and  Time,  Costello  et  al.,  Science  2009  

Modified  from  Hamady  et  al.  Genome  Research.  2009  

Datasets  with  billions  of  sequences:    

•  Human  Microbiome  Project:  Largest  characterizacon  of  the  microbiome  of  healthy  individuals  – NIH  sponsored,  $185  million  project  – Samples  from  300  adults  and  18  body  sites  – Raw  data:  ~232  GB  

 Earth  Microbiome  Project  

Coursera  Course  

hAps://www.coursera.org/course/microbiome  

…  accumulacng  data  Healthy  individual  traveling    from  the  US  to  Bangladesh    

Healthy  individual  traveling    from  the  US  to  Bangladesh    

Relacves  of  Crohn's  disease  pacents  

Healthy  individual  traveling    from  the  US  to  Bangladesh    

Relacves  of  Crohn's  disease  pacents  

Pacents  

…  so  what  can  we  tell  from  all  this  work?  

Healthy  individual  traveling    from  the  US  to  Bangladesh    

Relacves  of  Crohn's  disease  pacents  

Pacents  

USA  Global  gut  

Healthy  individual  traveling    from  the  US  to  Bangladesh    

Relacves  of  Crohn's  disease  pacents  

Pacents  

USA  Venezuela  Malawi  

Global  gut  

Healthy  individual  traveling    from  the  US  to  Bangladesh    

Relacves  of  Crohn's  disease  pacents  

Pacents  

USA  Venezuela  Malawi  

Global  gut  

HMP  

…  so  what  can  we  tell  from  all  this  work?  

Healthy  individual  traveling    from  the  US  to  Bangladesh    

Relacves  of  Crohn's  disease  pacents  

Pacents  

USA  Venezuela  Malawi  

Global  gut  

HMP  

…  so  what  can  we  tell  from  all  this  work?  

Healthy  individual  traveling    from  the  US  to  Bangladesh    

Relacves  of  Crohn's  disease  pacents  

Pacents  

USA  Venezuela  Malawi  

Global  gut  

HMP  

hAp://qiime.org  hAp://forum.qiime.org  hAp://blog.qiime.org  

Graphical  User  Interface  

Command  line  

Perform  idenccal  operacons  

Paths  (absolute)  /Users/yoshiki/evident-data/hmp-v13_arare/alpha_div $HOME/evident-data/hmp-v13_arare/alpha_div ~/evident-data/hmp-v13/alpha_div

A  slash  at  the  beginning  of  a  path  denotes  it  as  an  absolute  path,  i.  e.  from  the  base  of  your  hard  drive.  

Paths  (relacve)  evident-data/hmp-v13_arare/alpha_div

On  the  other  side  relacve  paths  are  not  preceeded  by  a  slash  

QIIME  

QIIME  Structure  

●  Integrates  other  somware  ●  Set  of  scripts  to  perform  certain  funccons  ●  Allows  an  easy  workflow  ●  Keys,  wallet,  phone:  print_qiime_config.py

 

QIIME  somware  dependencies  [data-­‐lanemask]  [data-­‐core]  [python]  [setuptools]  [MySQL-­‐python]  [SQLAlchemy]  [pycogent]  [pynast]  [numpy]  [matplotlib]  [mpi4py]  [lxml]  [sphinx]  [raxml]  [fasFree]  

[cdbtools]  [chimeraslayer]  [cdhit]  [rdpclassifier]  [blast]  [muscle]  [infernal]  [cytoscape]  [clearcut]  [mothur]  [uclust]  [r]  [ampliconnoise]  [vienna]  [pprospector]  

Script  types  

Single  Task    One  step    Most  of  them  

 Workflows    MulGple  scripts  in  one    Uses  a  log  file    Indicated  in  the  script  descripcon  

QIIME  commands  

Get  help  with  index  site  hAp://qiime.org/genindex.html  Get  help  with  the  -­‐h  opcon    pick_otus.py -h

Command  names  are  self-­‐explanatory  Filtering  filter_fasta.py filter_otus_by_sample.py filter_distance_matrix.py Sorcng  sort_otu_table.py

Ge]ng  help  

hAp://qiime.org/genindex.html    

These  opGons  are  required,  else  the  script  will  not  funcGon  correctly  

These  arguments  are  opGonal,  you  can  either  use  them  or  not,  some  default  values  are  explained  here.  

QIIME  

•  The  code  is  tested  (properly)  •  The  documentacon  is  updated  constantly  based  on  users  suggescons  

•  The  help  in  the  QIIME-­‐forum  has  a  collaboracve  spirit  (developers  &  users  sharing  their  research  experiences)  

print_qiime_config.py  

QIIME  

Sequencing output (454, Illumina, Sanger)

fastq, fasta, qual, or sff/trace files

Metadata

mapping file

Pre-processinge.g., remove primer(s), demultiplex,

quality filter

Denoise 454 Data

PyroNoise, Denoiser

Reference basedBLAST, UCLUST,

USEARCH

Pick OTUs and representative sequences

De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH

Assign taxonomy

BLAST, RDP Classifier

Align sequences

e.g., PyNAST, INFERNAL, MUSCLE,

MAFFT

Build 'OTU table'i.e., sample by observation

matrix

Build phylogenetic treee.g., FastTree, RAxML,

ClearCut

Database Submission

(In development)

OTU (or other sample by observation) table

Phylogenetic Tree

Evolutionary relationship between OTUs

α-diversity and rarefaction

e.g., Phylogenetic Diversity, Chao1,

Observed Species

β-diversity and rarefaction

e.g., Weighted and unweighted UniFrac, Bray-

Curtis, Jaccard

Interactive visualizations

e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.

Legend

Required step or input Optional step or input

Currently supported for marker-gene data only

(i.e., 'upstream' step)

Currently supported for general sample by observation data

(i.e., 'downstream' step)

www.QIIME.org

Upstream  analyses     Downstream  analyses    

Sequencing output (454, Illumina, Sanger)

fastq, fasta, qual, or sff/trace files

Metadata

mapping file

Pre-processinge.g., remove primer(s), demultiplex,

quality filter

Denoise 454 Data

PyroNoise, Denoiser

Reference basedBLAST, UCLUST,

USEARCH

Pick OTUs and representative sequences

De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH

Assign taxonomy

BLAST, RDP Classifier

Align sequences

e.g., PyNAST, INFERNAL, MUSCLE,

MAFFT

Build 'OTU table'i.e., sample by observation

matrix

Build phylogenetic treee.g., FastTree, RAxML,

ClearCut

Database Submission

(In development)

OTU (or other sample by observation) table

Phylogenetic Tree

Evolutionary relationship between OTUs

α-diversity and rarefaction

e.g., Phylogenetic Diversity, Chao1,

Observed Species

β-diversity and rarefaction

e.g., Weighted and unweighted UniFrac, Bray-

Curtis, Jaccard

Interactive visualizations

e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.

Legend

Required step or input Optional step or input

Currently supported for marker-gene data only

(i.e., 'upstream' step)

Currently supported for general sample by observation data

(i.e., 'downstream' step)

www.QIIME.org

QC  and  split  libraries  

Sequencing output (454, Illumina, Sanger)

fastq, fasta, qual, or sff/trace files

Metadata

mapping file

Pre-processinge.g., remove primer(s), demultiplex,

quality filter

Denoise 454 Data

PyroNoise, Denoiser

Reference basedBLAST, UCLUST,

USEARCH

Pick OTUs and representative sequences

De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH

Assign taxonomy

BLAST, RDP Classifier

Align sequences

e.g., PyNAST, INFERNAL, MUSCLE,

MAFFT

Build 'OTU table'i.e., sample by observation

matrix

Build phylogenetic treee.g., FastTree, RAxML,

ClearCut

Database Submission

(In development)

OTU (or other sample by observation) table

Phylogenetic Tree

Evolutionary relationship between OTUs

α-diversity and rarefaction

e.g., Phylogenetic Diversity, Chao1,

Observed Species

β-diversity and rarefaction

e.g., Weighted and unweighted UniFrac, Bray-

Curtis, Jaccard

Interactive visualizations

e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.

Legend

Required step or input Optional step or input

Currently supported for marker-gene data only

(i.e., 'upstream' step)

Currently supported for general sample by observation data

(i.e., 'downstream' step)

www.QIIME.org

Building  an  OTU  table  

Alpha  and  Beta  diversity  

Sequencing output (454, Illumina, Sanger)

fastq, fasta, qual, or sff/trace files

Metadata

mapping file

Pre-processinge.g., remove primer(s), demultiplex,

quality filter

Denoise 454 Data

PyroNoise, Denoiser

Reference basedBLAST, UCLUST,

USEARCH

Pick OTUs and representative sequences

De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH

Assign taxonomy

BLAST, RDP Classifier

Align sequences

e.g., PyNAST, INFERNAL, MUSCLE,

MAFFT

Build 'OTU table'i.e., sample by observation

matrix

Build phylogenetic treee.g., FastTree, RAxML,

ClearCut

Database Submission

(In development)

OTU (or other sample by observation) table

Phylogenetic Tree

Evolutionary relationship between OTUs

α-diversity and rarefaction

e.g., Phylogenetic Diversity, Chao1,

Observed Species

β-diversity and rarefaction

e.g., Weighted and unweighted UniFrac, Bray-

Curtis, Jaccard

Interactive visualizations

e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.

Legend

Required step or input Optional step or input

Currently supported for marker-gene data only

(i.e., 'upstream' step)

Currently supported for general sample by observation data

(i.e., 'downstream' step)

www.QIIME.org

Sequencing output (454, Illumina, Sanger)

fastq, fasta, qual, or sff/trace files

Metadata

mapping file

Pre-processinge.g., remove primer(s), demultiplex,

quality filter

Denoise 454 Data

PyroNoise, Denoiser

Reference basedBLAST, UCLUST,

USEARCH

Pick OTUs and representative sequences

De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH

Assign taxonomy

BLAST, RDP Classifier

Align sequences

e.g., PyNAST, INFERNAL, MUSCLE,

MAFFT

Build 'OTU table'i.e., sample by observation

matrix

Build phylogenetic treee.g., FastTree, RAxML,

ClearCut

Database Submission

(In development)

OTU (or other sample by observation) table

Phylogenetic Tree

Evolutionary relationship between OTUs

α-diversity and rarefaction

e.g., Phylogenetic Diversity, Chao1,

Observed Species

β-diversity and rarefaction

e.g., Weighted and unweighted UniFrac, Bray-

Curtis, Jaccard

Interactive visualizations

e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.

Legend

Required step or input Optional step or input

Currently supported for marker-gene data only

(i.e., 'upstream' step)

Currently supported for general sample by observation data

(i.e., 'downstream' step)

www.QIIME.org

Visualizacons  

Sequencing output (454, Illumina, Sanger)

fastq, fasta, qual, or sff/trace files

Metadata

mapping file

Pre-processinge.g., remove primer(s), demultiplex,

quality filter

Denoise 454 Data

PyroNoise, Denoiser

Reference basedBLAST, UCLUST,

USEARCH

Pick OTUs and representative sequences

De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH

Assign taxonomy

BLAST, RDP Classifier

Align sequences

e.g., PyNAST, INFERNAL, MUSCLE,

MAFFT

Build 'OTU table'i.e., sample by observation

matrix

Build phylogenetic treee.g., FastTree, RAxML,

ClearCut

Database Submission

(In development)

OTU (or other sample by observation) table

Phylogenetic Tree

Evolutionary relationship between OTUs

α-diversity and rarefaction

e.g., Phylogenetic Diversity, Chao1,

Observed Species

β-diversity and rarefaction

e.g., Weighted and unweighted UniFrac, Bray-

Curtis, Jaccard

Interactive visualizations

e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.

Legend

Required step or input Optional step or input

Currently supported for marker-gene data only

(i.e., 'upstream' step)

Currently supported for general sample by observation data

(i.e., 'downstream' step)

www.QIIME.org

QC  and  split  libraries  

Data  

Sequences  are  in  FASTA  format  

Data  

•  Quality  scores  are  in  the  .qual  file,  similar  to  FASTA  

Metadata  (mapping  file)  

validate_mapping_file.py  

Split  libraries  

• Demulcplex    • Quality  trim  • Quality  filter  

split_libraries.py  hAp://qiime.org/scripts/split_libraries.html  

Output  files:  seqs.fna  –  demulcplexed  sequences  histograms.txt  –  histogram  of  read  lengths  split_library_log.txt  –  detailed  informacon  about  the  demulcplexing  and  quality  of  reads  

Error-­‐correccng  codes  allow  mulcplex  sequencing  

Micah  Hamady,  et  al.,  Nature  Methods,  2008.  Error-­‐correccng  barcodes  for  pyrosequencing  hundreds  of  samples  in  mulcplex.  

>GCACCTGAGGACAGGCATGAGGAA…  >GCACCTGAGGACAGGGGAGGAGGA…  >TCACATGAACCTAGGCAGGACGAA…  >CTACCGGAGGACAGGCATGAGGAT…  >TCACATGAACCTAGGCAGGAGGAA…  >GCACCTGAGGACACGCAGGACGAC…  >CTACCGGAGGACAGGCAGGAGGAA…  >CTACCGGAGGACACACAGGAGGAA…  >GAACCTTCACATAGGCAGGAGGAT…  >TCACATGAACCTAGGGGCAAGGAA…  >GCACCTGAGGACAGGCAGGAGGAA…  

>PC.634_1 FLP3FBN01ELBSX CTGGGCCGTGTCTCAGTCCCAATGTGGCCGTTTACCCTCTCAGGCCGGCTACGCATCATCGCCTTGGTGGGCCGTTACCTCACCAACTAGCTAATGCGCCGCAGGTCCATCCATGTTCACGCCTTGATGGGCGCTTTAATATACTGAGCATGCGCTCTGTATACCTATCCGGTTTTAGCTACCGTTTCCAGCAGTTATCCCGGACACATGGGCTAGG!>PC.354_3 FLP3FBN01EEWKD !TTGGACCGTGTCTCAGTTCCAATGTGGGGGCCTTCCTCTCAGAACCCCTATCCATCGAAGGCTTGGTGGGCCGTTACCCCGCCAACAACCTAATGGAACGCATCCCCATCGATGACCGAAGTTCTTTAATAGTTCTACCATGCGGAAGAACTATGCCATCGGGTATTAATCTTTCTTTCGAAAGGCTATCCCCGAGTCATCGGCAGGTTGGATACGTGTTACTCACCCGTGCGCCGGT!

split_libraries.py  

•  seqs.fna  –  demulcplexed  sequences    

Sequencing output (454, Illumina, Sanger)

fastq, fasta, qual, or sff/trace files

Metadata

mapping file

Pre-processinge.g., remove primer(s), demultiplex,

quality filter

Denoise 454 Data

PyroNoise, Denoiser

Reference basedBLAST, UCLUST,

USEARCH

Pick OTUs and representative sequences

De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH

Assign taxonomy

BLAST, RDP Classifier

Align sequences

e.g., PyNAST, INFERNAL, MUSCLE,

MAFFT

Build 'OTU table'i.e., sample by observation

matrix

Build phylogenetic treee.g., FastTree, RAxML,

ClearCut

Database Submission

(In development)

OTU (or other sample by observation) table

Phylogenetic Tree

Evolutionary relationship between OTUs

α-diversity and rarefaction

e.g., Phylogenetic Diversity, Chao1,

Observed Species

β-diversity and rarefaction

e.g., Weighted and unweighted UniFrac, Bray-

Curtis, Jaccard

Interactive visualizations

e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.

Legend

Required step or input Optional step or input

Currently supported for marker-gene data only

(i.e., 'upstream' step)

Currently supported for general sample by observation data

(i.e., 'downstream' step)

www.QIIME.org

Building  an  OTU  table  

OTU  Picking  -­‐  “de-­‐novo”  

•  Pros  –  Vast  majority  of  reads  are  clustered    –  No  reference  database  bias  

•  Cons  –  Speed;  not  easily  parallelizable    –  Erroneous  reads  get  clustered  

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA  

Clustered Sequences

OTUS OTU1 OTU2 OTU3

Clustering Algorithm CTGGGCCGTGTCTCAGTCCCAAACA TTGGAAGATGTCTCAGTTCCAGACA

CTGGGCCGTGTCTCAGTCCCAAACA TTGGAAGATGTCTCAGTTCCAGACA

CTGGGCCGTGTCTCAGTCCCAAACA TTGGAAGATGTCTCAGTTCCAGACA

Experimental Sequences

OTU  Picking  -­‐  “closed-­‐reference”  

•  Pros  –  Reference  database  is  a  quality  filter  –  Speed;  easily  parallelizable  

•  Cons  –  No  new  OTUs  can  be  observed  –  Reference  database  bias  

CTGGGCCGTGTCTCAGTCCCAA

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA  

Experimental Sequences

Reference  Sequences

CTGGGCCGTGTCTCAGTCCCAA

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG

Sequences that hit a reference

CTGGGCCGTGTCTCAGTCCCAA

Sequences that failed to hit

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA  

OTUS OTU1 OTU1 OTU1

Reference  database  

Percentage  of  reads  that  do  not  hit  the  reference  colleccon,  by  environment  type.  

Other  databases  

•  hAp://www.arb-­‐silva.de  hAp://qiime.org/home_stacc/dataFiles.html  

•  hAp://ssu-­‐rrna.org  

OTU  Picking  -­‐  “open-­‐reference”  

•  Pros  –  Best  of  both  worlds  

•  Cons  –  Downsides  of  de-­‐novo  

CTGGGCCGTGTCTCAGTCCCAA

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA  

Experimental Sequences

Reference  Sequences

CTGGGCCGTGTCTCAGTCCCAA

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG

Sequences that hit a reference

CTGGGCCGTGTCTCAGTCCCAA

Sequences that failed to hit

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA  

OTUS

OTU1 OTU2 OTU3

OTU4 OTU5 OTU6

Clustering Algorithm

pick_open_reference_otus.py  

•  hAp://qiime.org/scripts/pick_open  _reference_otus.html  •  Workflow  script,  performs  all  steps  through  building  an  OTU  

table  (see  the  log  file)  –  pick_otus.py:  determine  the  OTU  clusters  –  pick_rep_set.py:  pick  the  representacve  sequence  for  each  OTU  cluster  –  align_seqs.py:  align  the  sequences  to  a  template  or  other  reference  alignment  –  assign_taxonomy.py:  allot  a  taxonomy  to  the  representacve  sequences  –  filter_alignment.py:  remove  non-­‐phylogeneccally  informacve  posicons  –  make_phylogeny.py:  construct  a  phylogeny  from  an  alignment  –  make_otu_table.py:  constructs  the  actual  OTU  table  object  

QIIME parameters

•  hAp://qiime.org/documentacon/qiime_parameters_files.html  

•  Modify  the  default  behavior  of  a  workflow  script.  •   Blank  lines  and  those  starcng  with  ‘#’  are  ignored  •   Format  

–  script:parameter  value  

OTU  Table  in  BIOM  format  

•  Opcmized  and  efficient  data  abstraccon  •  Can  be  used  with  many  types  of  data,  but  to  make  it  Excel  'readable’  use:  biom  convert  

biom  convert    

•  hAp://biom-­‐format.org  •  Converts  the  BIOM  format  OTU  table  to  an  Excel  readable  format  

•  biom  convert  –i  otu_table_mc2_w_tax.biom  –o  otu_table.txt  -­‐b  

OTU  table  sample  idencfiers  

Taxonomic  Assignment  

•  Kingdom  •  Phylum  

•  Class  •  Order  •  Family  •  Genus  •  Species  

Sequence  16S  gene  and  compare  to  16S  database  with  taxonomic  assignments  

Taxonomic  Assignment  using    e.g.  Uclust    

CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA  

Experimental Sequences

Reference  Sequences CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA CTGGGCCGTGTCTCAGTCCCAA TTGGAAGATGTCTCAGTTCCAG TTGGGCCGTATGTCAGTCCCTA  

Biom  summary  

•  Basic  stacsccs  on  the  OTU  table  –  Num  samples,  OTUs,  sequences  in  OTUs  –  Sequences  per  sample  –  Useful  to  determine  values  to  use  in  downstream  analyses  

Alpha  and  Beta  diversity  

Sequencing output (454, Illumina, Sanger)

fastq, fasta, qual, or sff/trace files

Metadata

mapping file

Pre-processinge.g., remove primer(s), demultiplex,

quality filter

Denoise 454 Data

PyroNoise, Denoiser

Reference basedBLAST, UCLUST,

USEARCH

Pick OTUs and representative sequences

De novoe.g., UCLUST, CD-HIT, MOTHUR, USEARCH

Assign taxonomy

BLAST, RDP Classifier

Align sequences

e.g., PyNAST, INFERNAL, MUSCLE,

MAFFT

Build 'OTU table'i.e., sample by observation

matrix

Build phylogenetic treee.g., FastTree, RAxML,

ClearCut

Database Submission

(In development)

OTU (or other sample by observation) table

Phylogenetic Tree

Evolutionary relationship between OTUs

α-diversity and rarefaction

e.g., Phylogenetic Diversity, Chao1,

Observed Species

β-diversity and rarefaction

e.g., Weighted and unweighted UniFrac, Bray-

Curtis, Jaccard

Interactive visualizations

e.g., PCoA plots, distance histograms, taxonomy charts, rarefaction plots, network visualization, jackknifed hierarchical clustering.

Legend

Required step or input Optional step or input

Currently supported for marker-gene data only

(i.e., 'upstream' step)

Currently supported for general sample by observation data

(i.e., 'downstream' step)

www.QIIME.org

 How  do  we  describe  and  compare  diversity?  

•  α  Diversity:  –   “How  many  species  (taxa)  are  in  a  sample?”      

•  e.g.  6  colors  in  A  and  6  in  B  •  Are  polluted  environments  less  diverse  than  priscne?  

•  β  Diversity:  –  “How  many  species  are  shared  between  samples?”  

•  e.g.  2  shared  colors  between  A  and  B  •  Do  the  microbiota  differ  among  different  disease  states?  

 

A  

B  

Qualitacve  vs.  Quanctacve  measures  

•  Qualitacve:  Considers  presence/absence  –  α:  How  many  species  are  in  a  sample?  

•  e.g.:  6  species  (colors)  in  both  A  and  B.  –  β: How  many  species  are  shared  between  samples?  

•  e.g.:  A  and  B  are  idenccal  because  the  same  colors  are  present  in  both.    

•  Quanctacve:  Considers  abundance  –  α:  Accounts  for  distribucon:  

•  e.g.  in  B,  6  species  are  evenly  distributed  and  thus  the  co                                                        community  is  more  diverse  than  in  A  where  1                                                                                  species  dominates  over  other  5.  

–  β: Samples  will  be  considered  more  similar  if  the                    same  distribucon  of  species  is  similar.  •  e.g.  B  and  A  no  longer  look  idenccal  because  of  differences  in  abundance.  

A  

B  

 What  is  a  phylogenecc  diversity  measure?  

•  α  Diversity:  –  Taxon:  “How  many  species  are  in  a  sample?”      –  Phylogenecc:  “How  much  phylogenecc  divergence  is  in  a  

sample?”    •  e.g.  B  more  diverse  than  A  -­‐  more  divergent  colors  

•  β  Diversity:  –  Taxon:  “How  many  species  are  shared  between  samples?”  –  Phylogenecc:  “How  much  phylogenecc  distance  is  shared  

between  samples?”  •  only  related  colors  from  B  are  in  A  

A  

B  

UniFrac  distance  matrix  

core_diversity_analyses.py •  Workflow  script  

–  filter_samples_from_otu_table.py:  Filter  samples  with  low  sequence  count  from  table    

–  single_rarefaccon.py:  sample  the  table  at  specified  sequencing  depth  –  beta_diversity.py:  use  the  sampled  table  for  beta  diversity  calculacon  –  principal_coordinates.py:  perform  PCoA  analysis  –  make_emperor.py:  make  plots  for  principal  coordinates  –  mulcple_rarefaccons.py:  make  mulcple  subsamplings/rarefaccons  on  an  otu  

table  at  various  sequencing  depths  –  alpha_diversity.py  and  collate_alpha.py:  calculate  alpha  diversices  at  those  

depths  and  collate  them  –  make_rarefaccon_plots.py:  plot  the  rarefaccon  curves  –  summarize_taxa.py  and  plot_taxa_summary.py:  summarize  taxa  and  plot  

them  

Alpha  diversity  

Basic  alpha  diversity  measure:  count  number  of  OTUs.    other  measures  can  be:  •  phylogenecc  (PD)  •  escmators  (chao1)  •  other  stacsccs  (evenness)  •  …    

Beta  diversity  

    orange1   orange2   blue1  OTU1   4   4   0  OTU2   4   4   0  OTU3   0   1   7  OTU4   0   0   7  

Summarize  Taxa  

•  Calculates  proporcon  of  taxa  per  sample,  at  different  taxonomic  levels  

•  summarize_taxa_through_plots.py  

Taxa  Summarized  by  Category    

Procrustes  Analysis  

hAp://qiime.org/tutorials/procrustes_analysis.html  transform_coordinate_matrices.py  compare_3d_plots.py  

Muegge,  B.  D.  et  al.  Science  332,  970–974  (2011).  

Stacsccally  Different?  

•  group_significance.py  •  Parametric  

–  G-­‐test  –  ANOVA  –  T-­‐test  

•  Non  parametric  –  Kruskal-­‐Wallis  –  Mann-­‐Whitney-­‐U  –  Bootstrap  Mann-­‐Whitney-­‐U  –  Bootstrap  T-­‐test  

•  compare_categories.py  •  make_distance_boxplots.py  •  …  

 

Acknowledgements  

Rob  Knight  Antonio  Gonzalez  Meg  Pirrung  Adam  Robbins-­‐Pianka  Luke  Ursell  Tony  Walters  Doug  Wendel  Daniel  McDonald  Yoshiki  Vázquez  Baeza  Will  Van  Treuren  Laura  Wegener  Parfery  Kris  Mayer    

Merete  Eggesbo  Jessica  Metcalf  Ulla  Westermann  Zhenjiang  Zech  Xu  Jose  Navas  Chris  Lauber  MaA  Gebert  Greg  C  Humphrey  Hongwei  Zhou  

Rick  Stevens  (Argonne),  Jack  Gilbert  (Argonne),  Folker  Meyer  (Argonne),  Janet  Jansson  (LBNL),  Jed  Fuhrman  (USC),  Jonathan  Eisen  (UC  Davis),  many,  many  sample  donors.  

Other  collaborators:  Noah  Fierer  (CU,  EEB),  Jeff  Gordon  (Wash  U),  Ruth  Ley  (Cornell),  Peter  Turnbaugh(Harvard),  Maria  Gloria  Dominguez  (UPR),  Catherine  Lozupone  (CU)  ...