NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf ·...

45
NCBI resources II: web-based tools and ftp resources Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1

Transcript of NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf ·...

Page 1: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

NCBI resources II: web-based tools and ftp

resourcesYanbinYin

Mostmaterialsaredownloaded fromftp://ftp.ncbi.nih.gov/pub/education/

1

Page 2: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

Outline

• Tools– BLAST– SpecializedBLAST– GEO

• ftpdownload• Handsonexercise

2

Page 3: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

References

3

http://homepages.ulb.ac.be/~dgonze/TEACHING/stat_scores.pdf

http://www.bioinformatics.wsu.edu/bioinfo_course/notes/lecture6.pdf

NCBIdiscoveryworkshopsftp://ftp.ncbi.nih.gov/pub/education/discovery_workshops/NLM/2012/Sept2012/

Page 4: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

Evolutionofpairwisealignmenttools

4

Smith-Watermanalgorithm

FASTA

BLAST

Fasterbut lessaccurate

Needleman-Wunsch algorithm1970

1981

1985

1990

Page 5: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

BasicLocalAlignmentSearchTool

• Widelyusedsimilaritysearchtool• HeuristicapproachbasedonSmithWatermanalgorithm• Findsbestlocalalignments• Providesstatisticalsignificance• Allcombinations(DNA/Protein)queryanddatabase

– DNAvs DNA– DNAtranslationvs Protein– Proteinvs Protein– Proteinvs DNAtranslation– DNAtranslationvs DNAtranslation

• www,standalone,andnetworkclient

5

Page 6: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

6http://www.bioinformatics.wsu.edu/bioinfo_course/notes/lecture6.pdf

Page 7: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

7http://homepages.ulb.ac.be/~dgonze/TEACHING/stat_scores.pdf

Page 8: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

8http://www.bioinformatics.wsu.edu/bioinfo_course/notes/lecture6.pdf

Page 9: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

9http://www.bioinformatics.wsu.edu/bioinfo_course/notes/lecture6.pdf

Page 10: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

LocalAlignmentStatisticsHigh scores of local alignments between two random sequencesfollow the Extreme Value Distribution

Score

Alig

nmen

ts

(applies to ungapped alignments)

E = Kmne-lS or E = mn2-S’

K = scale for search spacel= scale for scoring systemS’ = bitscore = (lS - lnK)/ln2

Expect ValueE = number of database hits you expect to find by chance

sizeofdatabase

yourscore

expectednumberofrandomhits

10http://www.youtube.com/ncbinlm

Page 11: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

LocalAlignmentScoring:Protein

KK +5

KE +1

QF -3

Gap-(11 + 4(1))= -14

Number of Chance Alignments = 4 X 10-50

Scores from BLOSUM62, a position independent matrix

11

Page 12: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

LocalAlignmentScoring:Nucleotide

Gap-(5 + 4(2))= -13

Number of Chance Alignments = 2 X 10-73

Match=+2 Mismatch=-3

12

Page 13: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

BLASTandBLAST-likeprograms

• TraditionalBLAST(formerlyblastall)nucleotide,protein,translations– blastn nucleotidequeryvs.nucleotidedatabase– blastp proteinqueryvs.proteindatabase– blastx nucleotidequeryvs.proteindatabase– tblastn proteinqueryvs.translatednucleotidedatabase– tblastx translatedqueryvs.translateddatabase

• Megablastnucleotideonly– Contiguousmegablast

• Nearlyidenticalsequences

– Discontiguousmegablast• Cross-speciescomparison

13

Page 14: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

Position-specificBLASTPrograms(proteinonly)

• PositionSpecificIterativeBLAST(PSI-BLAST)Automaticallygeneratesapositionspecificscorematrix(PSSM)

• Position-HitInitiatedBLAST(PHI-BLAST)Focusessearcharoundpattern(motif)

• DomainEnhancedLookupTimeAccelerated(DELTA)BLASTUsesdomainPSSM infirstroundofsearch

• ReversePSI-BLAST(RPS-BLAST)Searchesadatabaseof PSI-BLASTPSSMsConservedDomainDatabaseSearch

14

Page 15: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

15

http://www.ch.embnet.org/CourseAthens/slides/intro_hmm_profile.pdf

Page 16: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

Non-redundantprotein

nr (non-redundant proteinsequences)– GenBank CDS

translations– NP_,XP_refseq_protein– OutsideProtein

• PIR,Swiss-Prot,PRF• PDB (sequencesfromstructures)

pat proteinpatentsenv_nrmetagenomes

(environmental samples)

Servicesblastpblastx

16

Page 17: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

NucleotideDatabases:Traditional

Servicesblastntblastntblastx

17

Page 18: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

NucleotideDatabases:Traditional

• nr(nt)– TraditionalGenBank– NM_andXM_RefSeqs

• refseq_rna

• NCBIGenomes– NC_RefSeqs– GenBankChromosomes

• dbest– ESTDivision

• non-human,non-mouseests

• htgs– HTGdivision

• gss– GSSdivision

• wgs– wholegenomeshotgun

contigs

• tsa– transcriptomeshotgun

assembly

• 16Smicrobial– Selected16Ssequences

(targetedloci)

Databases are mostly non-overlapping

18

Page 19: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

SpecializedBLASTPages

19

Page 20: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

Handsonexercise1

blastn andmegablast

20

Page 21: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

21

Page 22: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

22

Searchagainsthumandatabase

Page 23: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

23

Alotofthingsyoumayexplore

Page 24: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

24

Changehereto1000

Uploadatextfilewithhumantp53mRNAfasta sequenceDownload fromcoursewebpage

Question:howmanyESTsmatchtp53genes?

Page 25: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

25

Ittook~1minutetofinish

Alotofthingsyoumayexplore!!!

Page 26: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

26

Searchagainstotherrefseq genomes

Page 27: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

Handsonexercise2

Proteinblast(blastp andtblastn)

27

Page 28: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

28

Ifnotselectorganisms…

Page 29: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

29

Youcanstillspecifyorganisms…

Page 30: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

30

Uploadatextfilewithtwo arabidopsis protein fastasequenceDownload fromcoursewebpage

Typeinpopulus tochoosepopulus trichocarpa

Youmaysubmitmanysequences,butexpectittakestime

Question:whatarethehomologs inpoplartree?

Page 31: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

31

Ittook~1minute(smallerdatabase)

Clickheretochoosetoviewwhichqueryprotein

Page 32: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

32

Howtodeterminewhatisagoode-valuecutofftoselecthomologs?

http://www.youtube.com/watch?v=nO0wJgZRZJs&list=PL8FD4CC12DABD6B39&index=6

Page 33: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

33

Typeincharoph tochoosecharophytes

Question:whataretheESThomologs incharophytic algae?

Page 34: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

34

Page 35: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

35

Page 36: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

Handsonexercise3

PHI-BLASTQueryprotein+shortmotif/pattern

&PSI-BLAST(iteratedBLAST)

Multi-roundBLASTP

36

Page 37: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

37

Example:plantglycosyltransferase family8(GT8)hassignaturemotif

WewanttosearchArabidopsisGAUT1protein(gi #:86611465)andtheHXXGXXKPWmotif

ProSite stylepattern:H-x(2)-G-x(2)-K-P-W

Page 38: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

38

Page 39: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

39

Page 40: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

40

Page 41: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

Handsonexercise4

RPS-BLASTGivenproteinsequences,findconservedfunctionaldomains

41

Page 42: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

42

Page 43: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

43

Page 44: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

44

Page 45: NCBI resources II: web-based tools and ftp resourcesbcb.unl.edu/yyin/teach/PBB/ncbi-blast.pdf · • NCBI Genomes – NC_ RefSeqs – GenBank Chromosomes • dbest – EST Division

Nextclass:NCBIGEOandftpresource(withalittlebitintroto

Linuxskills)andpractice

45