Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

20
Finding new nirK genes in metagenomic data

Transcript of Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

Page 1: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

Finding new nirK genes in metagenomic data

Page 2: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

What is nirK?-one kind of nitrite reductase

Page 3: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

Nitrogen Cycling

Page 4: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

+5 +3 +2 +1 0

Page 5: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

Metagenomic Datasets

• 2 Samples from Agricultural soil, 2 sequencing runs per sample( by roche 454 pyrosequecing technique)

• 2 Samples from Forest soil, 2 sequencing runs per sample( by roche 454 pyrosequecing technique )

• Data are from Tom Schmidt Lab

Page 6: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

Methods

• Start with sequence similarity search softwares-------HMMER

• HMMER : an implementation of profile hidden Markov models (profile HMMs) for biological sequence analysis

• Profie HMMs are built from multiple sequence alignment made of known members of a given protein family by alignment tool

Page 7: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

Advantage over BLAST

• HMMs have a formal probabilistic basis: use probability theory to guide how all the scoring parameters should be set

• HMMS have consistent theory behind gap and insertion scores

• But much slower than BLAST

Page 8: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

HMMER components

• HMMER has components:• to build profile HMM---hmmbuild• to search a profile against sequence

database---hmmsearch • and to align sequences according to a existing

profile---hmmalign

Page 9: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

6 Good knownnirKs

Mutiple alignment

format

Fungene pipe line

Profile HMM

download clustalw

hmm

build

hmmcalibrate

Potential nirKs

hmmsearch

Against soil data

BlAST nirK

result

blastAgainst soildata

compare

Page 10: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

Blast and Hmmer results• input files: /u/gjr/nirk2/ma1w2_run1_dereplicated_blastp.txt <==========> /u/gjr/nirk2/ma1w2_run1_dereplicated_localhmm.txt• • blastOnly: 23• shared : 6• hmmOnly : 2• • • input files: /u/gjr/nirk2/ma1w2_run2_dereplicated_blastp.txt <==========> /u/gjr/nirk2/ma1w2_run2_dereplicated_localhmm.txt• • blastOnly: 28• shared : 8• hmmOnly : 4• • • input files: /u/gjr/nirk2/ma1w4_run1_dereplicated_blastp.txt <==========> /u/gjr/nirk2/ma1w4_run1_dereplicated_localhmm.txt• • blastOnly: 24• shared : 8• hmmOnly : 5• • • input files: /u/gjr/nirk2/ma1w4_run2_dereplicated_blastp.txt <==========> /u/gjr/nirk2/ma1w4_run2_dereplicated_localhmm.txt• • blastOnly: 34• shared : 16• hmmOnly : 5

Page 11: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

Profile matters!

• Hmmsearch 6 seed profile hmm against all 3055 fungene nirKs (some may not real nirKs…)

• See the E-value distribution

Page 12: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

6Seed profile e-value distribution

make the seqs(124) on left into a profile

Page 13: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

124Seq e-value distribution

Page 14: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

Cumulative curve

Page 15: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

124Seq profile HMMER and BLAST Result

• input files: /u/gjr/nirk3/ma1w2_run1_dereplicated.blastp.txt <==========> /u/gjr/nirk3/ma1w2_run1_dereplicated.localhmm.txt • blastOnly: 112• shared : 7• hmmOnly : 0

• input files: /u/gjr/nirk3/ma1w2_run2_dereplicated.blastp.txt <==========> /u/gjr/nirk3/ma1w2_run2_dereplicated.localhmm.txt • blastOnly: 129• shared : 8• hmmOnly : 0

• input files: /u/gjr/nirk3/ma1w4_run1_dereplicated.blastp.txt <==========> /u/gjr/nirk3/ma1w4_run1_dereplicated.localhmm.txt • blastOnly: 109• shared : 10• hmmOnly : 0

• input files: /u/gjr/nirk3/ma1w4_run2_dereplicated.blastp.txt <==========> /u/gjr/nirk3/ma1w4_run2_dereplicated.localhmm.txt • blastOnly: 120• shared : 18• hmmOnly : 0

Page 16: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

Then tree methodnirK1

Seq1(good)

nirK2

nirK1

nirK2

Seq2(bad)

Just to show an idea

Page 17: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

NCBI nirK(cultured)

Soil blast result

Soil Hmmeresult

Hmmalign with 6 seq profile

quicktree

tree

Page 18: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.
Page 19: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

Question to answer

• Best definition of nirK according to the current information

• Criteria of choosing seeds for profile hmm• Blast false positive problem

Page 20: Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

Thanks