Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

Post on 18-Jan-2016

219 views 0 download

Tags:

Transcript of Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.

Finding new nirK genes in metagenomic data

What is nirK?-one kind of nitrite reductase

Nitrogen Cycling

+5 +3 +2 +1 0

Metagenomic Datasets

• 2 Samples from Agricultural soil, 2 sequencing runs per sample( by roche 454 pyrosequecing technique)

• 2 Samples from Forest soil, 2 sequencing runs per sample( by roche 454 pyrosequecing technique )

• Data are from Tom Schmidt Lab

Methods

• Start with sequence similarity search softwares-------HMMER

• HMMER : an implementation of profile hidden Markov models (profile HMMs) for biological sequence analysis

• Profie HMMs are built from multiple sequence alignment made of known members of a given protein family by alignment tool

Advantage over BLAST

• HMMs have a formal probabilistic basis: use probability theory to guide how all the scoring parameters should be set

• HMMS have consistent theory behind gap and insertion scores

• But much slower than BLAST

HMMER components

• HMMER has components:• to build profile HMM---hmmbuild• to search a profile against sequence

database---hmmsearch • and to align sequences according to a existing

profile---hmmalign

6 Good knownnirKs

Mutiple alignment

format

Fungene pipe line

Profile HMM

download clustalw

hmm

build

hmmcalibrate

Potential nirKs

hmmsearch

Against soil data

BlAST nirK

result

blastAgainst soildata

compare

Blast and Hmmer results• input files: /u/gjr/nirk2/ma1w2_run1_dereplicated_blastp.txt <==========> /u/gjr/nirk2/ma1w2_run1_dereplicated_localhmm.txt• • blastOnly: 23• shared : 6• hmmOnly : 2• • • input files: /u/gjr/nirk2/ma1w2_run2_dereplicated_blastp.txt <==========> /u/gjr/nirk2/ma1w2_run2_dereplicated_localhmm.txt• • blastOnly: 28• shared : 8• hmmOnly : 4• • • input files: /u/gjr/nirk2/ma1w4_run1_dereplicated_blastp.txt <==========> /u/gjr/nirk2/ma1w4_run1_dereplicated_localhmm.txt• • blastOnly: 24• shared : 8• hmmOnly : 5• • • input files: /u/gjr/nirk2/ma1w4_run2_dereplicated_blastp.txt <==========> /u/gjr/nirk2/ma1w4_run2_dereplicated_localhmm.txt• • blastOnly: 34• shared : 16• hmmOnly : 5

Profile matters!

• Hmmsearch 6 seed profile hmm against all 3055 fungene nirKs (some may not real nirKs…)

• See the E-value distribution

6Seed profile e-value distribution

make the seqs(124) on left into a profile

124Seq e-value distribution

Cumulative curve

124Seq profile HMMER and BLAST Result

• input files: /u/gjr/nirk3/ma1w2_run1_dereplicated.blastp.txt <==========> /u/gjr/nirk3/ma1w2_run1_dereplicated.localhmm.txt • blastOnly: 112• shared : 7• hmmOnly : 0

• input files: /u/gjr/nirk3/ma1w2_run2_dereplicated.blastp.txt <==========> /u/gjr/nirk3/ma1w2_run2_dereplicated.localhmm.txt • blastOnly: 129• shared : 8• hmmOnly : 0

• input files: /u/gjr/nirk3/ma1w4_run1_dereplicated.blastp.txt <==========> /u/gjr/nirk3/ma1w4_run1_dereplicated.localhmm.txt • blastOnly: 109• shared : 10• hmmOnly : 0

• input files: /u/gjr/nirk3/ma1w4_run2_dereplicated.blastp.txt <==========> /u/gjr/nirk3/ma1w4_run2_dereplicated.localhmm.txt • blastOnly: 120• shared : 18• hmmOnly : 0

Then tree methodnirK1

Seq1(good)

nirK2

nirK1

nirK2

Seq2(bad)

Just to show an idea

NCBI nirK(cultured)

Soil blast result

Soil Hmmeresult

Hmmalign with 6 seq profile

quicktree

tree

Question to answer

• Best definition of nirK according to the current information

• Criteria of choosing seeds for profile hmm• Blast false positive problem

Thanks