Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.
-
Upload
lilian-smith -
Category
Documents
-
view
219 -
download
0
Transcript of Finding new nirK genes in metagenomic data. What is nirK? -one kind of nitrite reductase.
Finding new nirK genes in metagenomic data
What is nirK?-one kind of nitrite reductase
Nitrogen Cycling
+5 +3 +2 +1 0
Metagenomic Datasets
• 2 Samples from Agricultural soil, 2 sequencing runs per sample( by roche 454 pyrosequecing technique)
• 2 Samples from Forest soil, 2 sequencing runs per sample( by roche 454 pyrosequecing technique )
• Data are from Tom Schmidt Lab
Methods
• Start with sequence similarity search softwares-------HMMER
• HMMER : an implementation of profile hidden Markov models (profile HMMs) for biological sequence analysis
• Profie HMMs are built from multiple sequence alignment made of known members of a given protein family by alignment tool
Advantage over BLAST
• HMMs have a formal probabilistic basis: use probability theory to guide how all the scoring parameters should be set
• HMMS have consistent theory behind gap and insertion scores
• But much slower than BLAST
HMMER components
• HMMER has components:• to build profile HMM---hmmbuild• to search a profile against sequence
database---hmmsearch • and to align sequences according to a existing
profile---hmmalign
6 Good knownnirKs
Mutiple alignment
format
Fungene pipe line
Profile HMM
download clustalw
hmm
build
hmmcalibrate
Potential nirKs
hmmsearch
Against soil data
BlAST nirK
result
blastAgainst soildata
compare
Blast and Hmmer results• input files: /u/gjr/nirk2/ma1w2_run1_dereplicated_blastp.txt <==========> /u/gjr/nirk2/ma1w2_run1_dereplicated_localhmm.txt• • blastOnly: 23• shared : 6• hmmOnly : 2• • • input files: /u/gjr/nirk2/ma1w2_run2_dereplicated_blastp.txt <==========> /u/gjr/nirk2/ma1w2_run2_dereplicated_localhmm.txt• • blastOnly: 28• shared : 8• hmmOnly : 4• • • input files: /u/gjr/nirk2/ma1w4_run1_dereplicated_blastp.txt <==========> /u/gjr/nirk2/ma1w4_run1_dereplicated_localhmm.txt• • blastOnly: 24• shared : 8• hmmOnly : 5• • • input files: /u/gjr/nirk2/ma1w4_run2_dereplicated_blastp.txt <==========> /u/gjr/nirk2/ma1w4_run2_dereplicated_localhmm.txt• • blastOnly: 34• shared : 16• hmmOnly : 5
Profile matters!
• Hmmsearch 6 seed profile hmm against all 3055 fungene nirKs (some may not real nirKs…)
• See the E-value distribution
6Seed profile e-value distribution
make the seqs(124) on left into a profile
124Seq e-value distribution
Cumulative curve
124Seq profile HMMER and BLAST Result
• input files: /u/gjr/nirk3/ma1w2_run1_dereplicated.blastp.txt <==========> /u/gjr/nirk3/ma1w2_run1_dereplicated.localhmm.txt • blastOnly: 112• shared : 7• hmmOnly : 0
• input files: /u/gjr/nirk3/ma1w2_run2_dereplicated.blastp.txt <==========> /u/gjr/nirk3/ma1w2_run2_dereplicated.localhmm.txt • blastOnly: 129• shared : 8• hmmOnly : 0
• input files: /u/gjr/nirk3/ma1w4_run1_dereplicated.blastp.txt <==========> /u/gjr/nirk3/ma1w4_run1_dereplicated.localhmm.txt • blastOnly: 109• shared : 10• hmmOnly : 0
• input files: /u/gjr/nirk3/ma1w4_run2_dereplicated.blastp.txt <==========> /u/gjr/nirk3/ma1w4_run2_dereplicated.localhmm.txt • blastOnly: 120• shared : 18• hmmOnly : 0
Then tree methodnirK1
Seq1(good)
nirK2
nirK1
nirK2
Seq2(bad)
Just to show an idea
NCBI nirK(cultured)
Soil blast result
Soil Hmmeresult
Hmmalign with 6 seq profile
quicktree
tree
Question to answer
• Best definition of nirK according to the current information
• Criteria of choosing seeds for profile hmm• Blast false positive problem
Thanks