Basic sequence analysis BioEdit Matgat Sequin

Post on 05-Jun-2022

7 views 0 download

Transcript of Basic sequence analysis BioEdit Matgat Sequin

7/13/2018

1

Ekawat Pasomsub

Unit of Virology and Molecular Microbiology,

Department of Pathology, Ramathibodi Hospital ,

E-mail address: ekawat.pas@mahidol.ac.th

7/13/2018

2

Written by a graduate student who knows how frustrating and time consuming

A mouse-driven

Single program that can handle most simple sequence and alignment editing and manipulation functions

Easy-to-use sequence alignment editor

Basic sequences analyses

System Requirements PC-compatible i486+ (Windows 95 and so on)

At least 32 Mb of RAM

About 30 - 40 Mb of free disk space are also required (BioEdit install will take 15 Mbytes)

7/13/2018 3

How to download and install “BioEdit”

Data and sequence management

How to use BioEdit for DNA analysis

How to use accessory application tools in BioEdit

7/13/2018 4

Text-Editors ( Save to TXT file extension) NotePad – on MS Window

WordPad – on MS Window

NotePad++ – Free installation

Fonts Courier / Courier New

Unicode (UTF-8)

NotePad++

THAI User-define Unicode

FireFox IE

7/13/2018 8

Edit Menu

Edit sequences MODE

Edit

Select and slide

Grab and drag

Cut/copy/paste of the sequences

Copy sequence to clipboard

Search sequence

Select to end and Select to Beginning

Sequence-title management

File menu Graphic View

7/13/2018 9

Edit sequences MODE

Edit

Select and slide

Grab and drag

7/13/2018 10

• For sequence(s)

• Cut Ctrl+F7

• Copy Ctrl+F8

• Paste Ctrl+F9

Copy sequence to clipboard

1. Select the sequence

2. Edit >> Copy Sequences to clipboard (Fasta Format)

3. Open Notepad++

4. Then Paste the sequences

7/13/2018 11

• Search sequence

1. Select A sequence

2. Edit >> Search >> (ctrl+F)

3. Paste A sequence

4. Click on “Find Next”

7/13/2018 12

Found

Not-

Found

1. Alignment

2. Select the position

3. Edit >> Select to beginning

4. Delete sequence

7/13/2018 13

• Select to end and Select to Beginning

• Sequence-title management

1. Edit >> Copy sequence titles

2. Open the Notepad++

3. Paste the sequence titles

4. Edit the sequence titles

5. Copy the sequence title from Notepad++

6. Paste the sequence title using Edit >> Paste Over Titles

7/13/2018 14

BIOEDIT FOR DNA ANALYSIS (SEQUENCE MENU)

DNA complement/Reverse complement

DNA Translation

Restriction enzyme analysis

Open reading frame finder

Pairwise alignment / Identity matrix calculation

Consensus sequence

ABI format sequence analysis

CAP (Contig Assembly Program)

7/13/2018 15

1. Sequence

2. Nucleic Acid

3. Complement

7/13/2018 16

DNA complement/Reverse complement

7/13/2018 17

DNA Translation

http://incep.imagine-ex.co/dna-transcription-and-translation/

Translation is the final step on

the way from DNA to protein. It is

the synthesis of proteins directed

by a mRNA template

Translation by A sequence

7/13/2018 18

7/13/2018 19

Translation by set of sequences

1. Select set of Sequences

2. Sequence menu

3. Translate or Reverse-Translate

A restriction enzyme or restrictionendonuclease is an enzyme that cleaves DNA into fragments at or near specific recognition sites within the molecule known as restriction sites

7/13/2018 20

Restriction enzyme analysis

1. Sequence

2. Nucleic Acid

3. Restriction map

7/13/2018 21

an open reading frame (ORF) is the part of a reading framethat has the ability to be translated

7/13/2018 22

Open reading frame finder

7/13/2018 23

1. Use file “HIV_ORF.fasta”

2. Select the sequence

3. Sequence >> Nucleic Acid >> Find next ORF

SEQUENCE ALIGNMENT: OUTCOME?Identity and Similarity simply means that two sequences are identical and similar, by some criterion(physio-chemical properties). It does not refer to any historical process, just to a comparison of the sequences by some method.

Homologymeans that two (or more) sequences have a common ancestor. This is a statement about evolutionary history.

However, in bioinformatics these two terms are often confused and used interchangeably. The reason is probably that significant similarity is such a strong argument for homology

Seq1 1 LAPSTKDFGKISLSREFKVA 20

L+PS K+FG I++ R F +A

Seq2 1 LSPSEKEFGAIAMRRTFIIA 20

Identity = 10/20 = 50%

Similarity = 15/20 = 75%

AA properties

An alignment that assumes that the two proteins are basically similar over the entire length of one another.

The alignment attempts to match them to each other from end to end, even though parts of the alignment are not very convincing.

LGPSTKDFGKISESREFDNF

| |||| |

LNQLERSFGKINMRLEDAFF

An alignment that searches for segments of the two sequences that match well.

There is no attempt to force entire sequences into an alignment, just those parts that appear to have good similarity, according to some criterion. Using the same sequences as above, one could get:

----------FGKI-----------

||||

----------FGKI-----------

Align two sequences (optional Global) = Global alignment

Align two sequence (allow ends to slide) = Local alignment 7/13/2018 27

Pairwise alignment / Identity matrix calculation

01 CGTATGTCATGCATGCACGC

02 CGATTGCCATCAATGCACGC

03 CCAATGTCATGCATGCACCT

04 CGTTTGTCATGAATGAAGGC

05 CGTATGCCGTCCATGAAGTT

06 CGTAAATCATGAATGCAGGC

07 CATAAGTCATCCATGCAGAT

08 CGTAAGTCATGAATGCACCT

09 CGAAAGTCATCCATGAACGC

10 CGTAAGCAATGAATGCACGT

7/13/2018 28

01

02

CGTANGTCATNNATGCANNN

CGTAW TCATSMATGCASNYG

7/13/2018 29

7/13/2018 31

File > Graphic view

Setting parameters as need

File > Export as Rich text format

Open the MS Word

Open the file in Rich text format

HOW TO USE ACCESSORY APPLICATION TOOLS IN BIOEDIT

ClustalW Multiple Sequence Alignment

CAP Contig assembly

BLAST (Local BLAST)

Phylogenetic tree

7/13/2018 32

7/13/2018 33

Local BLAST

Accessory and Application > BLAST menuCreate local nucleotide database

Local BLAST

7/13/2018 34

A simple, easy to use similarity/identity matrix generator.

It calculates the similarity and identity between every pair of DNA or protein sequences in a given data set

James J Campanella et al. MatGAT: An application that generates

similarity/identity matrices using protein or DNA sequences, BMC Bioinformatics

BankIT

7/13/2018

38

GENBANK BankIt

http://www.ncbi.nlm.nih.gov/BankIt/

GENBANK Sequin

http://www.ncbi.nlm.nih.gov/Sequin/

EMBL Webin

http://www.ebi.ac.uk/submission/webin.html

DDBJ Sakura

http://sakura.ddbj.nig.ac.jp/

7/13/2018 40

7/13/2018 41

Log in

GenBank >> Simple DNA or RNA sequences

7/13/2018 42

7/13/2018 43

7/13/2018 44

7/13/2018 45

7/13/2018 46

7/13/2018 47

7/13/2018 48

7/13/2018 49

Sequence(s) and Definition Line(s)

Click “Continue”

7/13/2018 50

7/13/2018 51

7/13/2018 52

7/13/2018 53

Review modifiers

7/13/2018 54

Warning ofStrain/isolate

Add source modifier

7/13/2018 55

Review modifiers

7/13/2018 56

7/13/2018 57

7/13/2018 58

Add gene nameReview features

7/13/2018 59

7/13/2018 60

7/13/2018 61

7/13/2018

62

Ekawat Pasomsub

Virology Unit, Department of Pathology,

Ramathibodi Hospital, Mahidol University

ekawat.pas@mahidol.ac.th