Basic sequence analysis BioEdit Matgat Sequin
Transcript of Basic sequence analysis BioEdit Matgat Sequin
7/13/2018
1
Ekawat Pasomsub
Unit of Virology and Molecular Microbiology,
Department of Pathology, Ramathibodi Hospital ,
E-mail address: [email protected]
7/13/2018
2
Written by a graduate student who knows how frustrating and time consuming
A mouse-driven
Single program that can handle most simple sequence and alignment editing and manipulation functions
Easy-to-use sequence alignment editor
Basic sequences analyses
System Requirements PC-compatible i486+ (Windows 95 and so on)
At least 32 Mb of RAM
About 30 - 40 Mb of free disk space are also required (BioEdit install will take 15 Mbytes)
7/13/2018 3
How to download and install “BioEdit”
Data and sequence management
How to use BioEdit for DNA analysis
How to use accessory application tools in BioEdit
7/13/2018 4
Text-Editors ( Save to TXT file extension) NotePad – on MS Window
WordPad – on MS Window
NotePad++ – Free installation
Fonts Courier / Courier New
Unicode (UTF-8)
NotePad++
THAI User-define Unicode
FireFox IE
7/13/2018 8
Edit Menu
Edit sequences MODE
Edit
Select and slide
Grab and drag
Cut/copy/paste of the sequences
Copy sequence to clipboard
Search sequence
Select to end and Select to Beginning
Sequence-title management
File menu Graphic View
7/13/2018 9
Edit sequences MODE
Edit
Select and slide
Grab and drag
7/13/2018 10
• For sequence(s)
• Cut Ctrl+F7
• Copy Ctrl+F8
• Paste Ctrl+F9
Copy sequence to clipboard
1. Select the sequence
2. Edit >> Copy Sequences to clipboard (Fasta Format)
3. Open Notepad++
4. Then Paste the sequences
7/13/2018 11
• Search sequence
1. Select A sequence
2. Edit >> Search >> (ctrl+F)
3. Paste A sequence
4. Click on “Find Next”
7/13/2018 12
Found
Not-
Found
1. Alignment
2. Select the position
3. Edit >> Select to beginning
4. Delete sequence
7/13/2018 13
• Select to end and Select to Beginning
• Sequence-title management
1. Edit >> Copy sequence titles
2. Open the Notepad++
3. Paste the sequence titles
4. Edit the sequence titles
5. Copy the sequence title from Notepad++
6. Paste the sequence title using Edit >> Paste Over Titles
7/13/2018 14
BIOEDIT FOR DNA ANALYSIS (SEQUENCE MENU)
DNA complement/Reverse complement
DNA Translation
Restriction enzyme analysis
Open reading frame finder
Pairwise alignment / Identity matrix calculation
Consensus sequence
ABI format sequence analysis
CAP (Contig Assembly Program)
7/13/2018 15
1. Sequence
2. Nucleic Acid
3. Complement
7/13/2018 16
DNA complement/Reverse complement
7/13/2018 17
DNA Translation
http://incep.imagine-ex.co/dna-transcription-and-translation/
Translation is the final step on
the way from DNA to protein. It is
the synthesis of proteins directed
by a mRNA template
Translation by A sequence
7/13/2018 18
7/13/2018 19
Translation by set of sequences
1. Select set of Sequences
2. Sequence menu
3. Translate or Reverse-Translate
A restriction enzyme or restrictionendonuclease is an enzyme that cleaves DNA into fragments at or near specific recognition sites within the molecule known as restriction sites
7/13/2018 20
Restriction enzyme analysis
1. Sequence
2. Nucleic Acid
3. Restriction map
7/13/2018 21
an open reading frame (ORF) is the part of a reading framethat has the ability to be translated
7/13/2018 22
Open reading frame finder
7/13/2018 23
1. Use file “HIV_ORF.fasta”
2. Select the sequence
3. Sequence >> Nucleic Acid >> Find next ORF
SEQUENCE ALIGNMENT: OUTCOME?Identity and Similarity simply means that two sequences are identical and similar, by some criterion(physio-chemical properties). It does not refer to any historical process, just to a comparison of the sequences by some method.
Homologymeans that two (or more) sequences have a common ancestor. This is a statement about evolutionary history.
However, in bioinformatics these two terms are often confused and used interchangeably. The reason is probably that significant similarity is such a strong argument for homology
Seq1 1 LAPSTKDFGKISLSREFKVA 20
L+PS K+FG I++ R F +A
Seq2 1 LSPSEKEFGAIAMRRTFIIA 20
Identity = 10/20 = 50%
Similarity = 15/20 = 75%
AA properties
An alignment that assumes that the two proteins are basically similar over the entire length of one another.
The alignment attempts to match them to each other from end to end, even though parts of the alignment are not very convincing.
LGPSTKDFGKISESREFDNF
| |||| |
LNQLERSFGKINMRLEDAFF
An alignment that searches for segments of the two sequences that match well.
There is no attempt to force entire sequences into an alignment, just those parts that appear to have good similarity, according to some criterion. Using the same sequences as above, one could get:
----------FGKI-----------
||||
----------FGKI-----------
Align two sequences (optional Global) = Global alignment
Align two sequence (allow ends to slide) = Local alignment 7/13/2018 27
Pairwise alignment / Identity matrix calculation
01 CGTATGTCATGCATGCACGC
02 CGATTGCCATCAATGCACGC
03 CCAATGTCATGCATGCACCT
04 CGTTTGTCATGAATGAAGGC
05 CGTATGCCGTCCATGAAGTT
06 CGTAAATCATGAATGCAGGC
07 CATAAGTCATCCATGCAGAT
08 CGTAAGTCATGAATGCACCT
09 CGAAAGTCATCCATGAACGC
10 CGTAAGCAATGAATGCACGT
7/13/2018 28
01
02
CGTANGTCATNNATGCANNN
CGTAW TCATSMATGCASNYG
7/13/2018 29
7/13/2018 31
File > Graphic view
Setting parameters as need
File > Export as Rich text format
Open the MS Word
Open the file in Rich text format
HOW TO USE ACCESSORY APPLICATION TOOLS IN BIOEDIT
ClustalW Multiple Sequence Alignment
CAP Contig assembly
BLAST (Local BLAST)
Phylogenetic tree
7/13/2018 32
7/13/2018 33
Local BLAST
Accessory and Application > BLAST menuCreate local nucleotide database
Local BLAST
7/13/2018 34
A simple, easy to use similarity/identity matrix generator.
It calculates the similarity and identity between every pair of DNA or protein sequences in a given data set
James J Campanella et al. MatGAT: An application that generates
similarity/identity matrices using protein or DNA sequences, BMC Bioinformatics
BankIT
7/13/2018
38
GENBANK BankIt
http://www.ncbi.nlm.nih.gov/BankIt/
GENBANK Sequin
http://www.ncbi.nlm.nih.gov/Sequin/
EMBL Webin
http://www.ebi.ac.uk/submission/webin.html
DDBJ Sakura
http://sakura.ddbj.nig.ac.jp/
7/13/2018 40
7/13/2018 41
Log in
GenBank >> Simple DNA or RNA sequences
7/13/2018 42
7/13/2018 43
7/13/2018 44
7/13/2018 45
7/13/2018 46
7/13/2018 47
7/13/2018 48
7/13/2018 49
Sequence(s) and Definition Line(s)
Click “Continue”
7/13/2018 50
7/13/2018 51
7/13/2018 52
7/13/2018 53
Review modifiers
7/13/2018 54
Warning ofStrain/isolate
Add source modifier
7/13/2018 55
Review modifiers
7/13/2018 56
7/13/2018 57
7/13/2018 58
Add gene nameReview features
7/13/2018 59
7/13/2018 60
7/13/2018 61
7/13/2018
62
Ekawat Pasomsub
Virology Unit, Department of Pathology,
Ramathibodi Hospital, Mahidol University