Tools and Datasets

26
Tools and Datasets Exploring the tools of the trade

description

Exploring the tools of the trade. Tools and Datasets. Sequence Databases. Understanding EMBL Entries Understanding SWISS-PROT Entries. Understanding EMBL Entries. Understanding SWISS-PROT Entries. General Concepts and Methods. Predictions and Validation. - PowerPoint PPT Presentation

Transcript of Tools and Datasets

Page 1: Tools and Datasets

Tools and Datasets

Exploring the tools of the trade

Page 2: Tools and Datasets

Sequence Databases

● Understanding EMBL Entries

● Understanding SWISS-PROT Entries

Page 3: Tools and Datasets

Understanding EMBL Entries

Page 4: Tools and Datasets

Understanding SWISS-PROT Entries

Page 5: Tools and Datasets

General Concepts and Methods

● Predictions and Validation

Page 6: Tools and Datasets

Maxim 17.1

Recognise the difference between the validation of a model and the testing of it for

self-consistency

Page 7: Tools and Datasets

True/False/Negative/Positive

Page 8: Tools and Datasets

Maxim 17.2

Generally, False Negative predictions are considered more acceptable than False

Positives

Page 9: Tools and Datasets

Assessment/Validation Procedure and Possible Outcomes

figOUTCOME.eps

Page 10: Tools and Datasets

Balancing the errors

Page 11: Tools and Datasets

Maxim 17.3

With False Negatives we could come back next year and find the ones we missed, and these

are preferred to False Positives, where we can waste time studying them this year, only to find out that the time was wasted. It all depends on

the circumstances

Page 12: Tools and Datasets

Maxim 17.4

Sometimes all those false positives are maybe, just maybe, trying to tell you something. So, if

you aspire to a Nobel prize ...

Page 13: Tools and Datasets

Using multiple algorithms to improve performance

Page 14: Tools and Datasets

Maxim 17.5

Use a fast if inaccurate algorithm to protect your slow, accurate second-stage algorithm

Page 15: Tools and Datasets

An overview of tRNA: 2D, 3D and Gene Structure

figTRNA.eps

Page 16: Tools and Datasets

http://www.ncbi.nlm.nih.gov/Education/

Introducing Bioinformatics Tools

Page 17: Tools and Datasets

http://www-igbmc.u-strasbg.fr/BioInfo/

ftp://ftp.ebi.ac.uk/pub/software

ClustalW

Page 18: Tools and Datasets

ClustalX operating under Windows XP

figCLUSTALX.eps

Page 19: Tools and Datasets

$ gzip -d clustalw1.83.UNIX.tar.gz

$ tar -xvf clustalw1.83.UNIX.tar

$ cd clustalw1.83

$ make

$ ./clustalw

$ ./clustalw -h

$ ./clustalw -INFILE=../MerAHMAs_MerP.swp -OUTFILE=../Mer.aln

Algorithms and Methods

Page 20: Tools and Datasets

Substitution/scoring matrices

Page 21: Tools and Datasets

BLAST

Page 22: Tools and Datasets

Maxim 17.6

Exactly which BLAST is best depends on the circumstances

Page 23: Tools and Datasets

$ cd

$ mkdir blast

$ cp blast-2.2.6-ia32-linux.tar.gz blast

$ cd blast

$ gzip -d blast-2.2.6-ia32-linux.tar.gz

$ tar -xvf blast-2.2.6-ia32-linux.tar

[NCBI]

Data="/home/michael/blast/data"

Installing NCBI-BLAST

Page 24: Tools and Datasets

$ mkdir databases

$ cd databases

$ mv ../All_Mer_Proteins.fsa .

$ ../formatdb -i All_Mer_Proteins.fsa -p T -o T -n Merproteins

$ blastall -p blastp -d databases/Merproteins -i test_seq.fsa

$ sed 's/sw|/sp|/' All_Mer_Proteins.fsa > Mer_db.prot

$ ../formatdb -i Mer_db.prot -p T -o T -n Merproteins

Preparation of database files for faster searching

Page 25: Tools and Datasets

$ fastacmd -d databases/Merproteins -I

$ fastacmd -d databases/Merproteins -s MERA_SHIFL

$ blastclust -d databases/Merproteins | head

The different types of BLAST search

Page 26: Tools and Datasets

Where To From Here