BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search...

Post on 20-Dec-2015

243 views 0 download

Tags:

Transcript of BLAST Tutorial 3 What is BLAST? Basic Local Alignment Search Tool Is a set of similarity search...

BLAST

Tutorial 3

What is BLAST?• Basic Local Alignment Search Tool• Is a set of similarity search programs designed to explore sequence databases. 

What are similarity searches good for?• One sequence by itself is not informative; it must be analyzed by comparative methods against existing sequence databases to develop hypothesis concerning relatives and function

BLAST program DatabaseQuery

NameQuery typeDatabase

blastnGenomicGenomic

blastpProteinProtein

blastxTranslated genomic

Protein

tblastnProteinTranslated genomic

tblastxTranslated genomic

Translated genomic

BLAST Databases

http://www.ncbi.nlm.nih.gov/BLAST/

Place Query

Choose Database

?

BLASTN Databases

Gene collection

GenBank, EMBL, DDBJ, PDB and NCBI reference sequences (RefSeq)

Genomic + Transcript

Complete human and mouse genome + transcriptome

ESTExpressed sequence tags

mitoMitochondrial sequences

vectorVector subset of GenBank

monthGenBank, EMBL, DDBJ, PDB from 30 days

EnviEnvironmental samples

http://www.ncbi.nlm.nih.gov/BLAST/blastcgihelp.shtml#nucleotide_databases

Place Query

Choose Database

Optimize similarity level of the search

Threshold for results significance

Limit output size

Primary word match (16-64 nt)

Reward and penalty for matching and mismatching bases

Cost to create and extend a gap

Remove low information content

Limit search to specific organism

?

Search for homologous to chick “olfactory receptor 6” gene

Query sequence Matched Areas of database sequences

Global Alignments

Local Alignments

Sequence Identifier

Sequence description

Score(bits)

CoverageIdentity

E value

Score andE value

Identities and gaps

Strand

Multiple hits on a same subject

Design of the BLAST survey

Consider your research question:

•Are you looking for an particular gene in a particular species?: BLAST against the genome of that species.

•Are you looking for additional members of a gene family across all species? : BLAST against the gene collection database.

•Are you looking for exact motif matches? : increase gap penalty or use megablast.

Score and E-value

Score (S): (identities + mismatches)-gaps

Depends on search space

Query length(bp)

Database length(bp)

Depends on scoring system

Score

Bit Score (S’):

Score and E-value

•The score is a measure of the similarity of the query to the sequence shown.

•The E-value is a measure of the reliability of the score.

•The definition of the E-value is: The probability due to chance, that there is another alignment with a similarity greater than the given S score.

Score and E-value

The Size of the E-value

•The typical threshold for a good E-value from a BLAST search is E=10-6≈e-6 or lower.

•The reason for such low values is that an E=0.001 in a million entry database would still leave 1000 entries due to chance. An E=e-6 would only leave one entry due to chance.

Given the following parameters:Query length: 150=1.37 K=0.711Average Sequence length in database: 270Number of sequences in database: 4,554,026

Exercise

Calculate the S, S’ and E for the following BLAST hit:

ACGTCGATCGAGCT||||| ||||||||AGGTCGTC-GAGGT

S = 13-1 = 12S’= (1.37*12 – ln(0.711))/ln(2)S’= 16.44 + 0.341 /0.693S’= 24.2

S: (Id+MM)-GP

Exercise

Calculate the S, S’ and E for the following BLAST hit:

ACGTCGATCGAGCT||||| ||||||||AGGTCGTC-GAGGT

E= 0.711x150x270x4,554,026xe-1.37*12

E= 131135455683x7.24e-8E= 9504.27

Given the following parameters:Query length: 150=1.37 K=0.711Average Sequence length in database: 270Number of sequences in database: 4,554,026

Exercise

What will be the minimal score in order to achieve a significant E value (e-6~10-6)?

131135455683e-1.37S=10-6

ln (131135455683e-1.37S)=ln(10-6)

ln (131135455683)+ln(e-1.37S)=-13.81

25.6-1.37S=-13.81

S= =-13.81-25.6/-1.37

S≈ 28.76

באדםCFTR. חיפוש רצפים הומולוגיים לגן 1

הנמצאים ביצורים אחריםCFTR. חברי משפחה נוספים לגן 2

ABC transporters. חיפוש של גנים נוספים חברי משפחת 3