Download - Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

Transcript
Page 1: Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

1

Part I – Sequence analysis (DNA) : Bioinformatics Software

Chen XinNational University of Singapore

Bioinformatics software

Its role in research:

n Hypothesis-driven research cycle in biology (From Kitano H. Systems biology: a brief overview. Science 2002, 295:1662-4)

Page 2: Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

2

Bioinformatics software

n Cyclical refinement of predictive computer models used to define further biological experiments, including the optimization step.

n From Brusic et al. 2001, Efficient discovery of immune response targets by cyclical refinement of QSAR models of peptide binding. J. Mol. Graph. Model. 19:405-11, 467

Bioinformatics software

n By combining computational methods with experimental biology, major discoveries can be made faster and more efficiently.

n Today, every large molecular or systems biology project has a bioinformatics component.

n Use of biological software allows biologists to extend their set of skills for more efficient and more effective analysis of their data, and for planning of experiments.

Page 3: Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

3

Genetic information

n Genetic information carriern DNA or RNA

n Genetic information carriedn Sequence

n Hence:

Life = Life = f f (Sequence)(Sequence)

New drug discovery

A drug = Target identification -> Lead discovery ->Lead optimization -> animal trial -> clinical trail

Target:Key to disease developmentSpecific to disease developmentSequence, Sample protein, 3D structure …

Page 4: Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

4

DNA sequence analysisTypes of analysis:n GC contentn Pattern analysisn Translation (Open Reading Frame detection)n Gene findingn Mutationn Primer designn Restriction map n ……

When you have a sequence

n Is it likely to be a gene?n What is the possible expression level?n What is the possible protein product?n Can we get the protein product?n Can we figure out the key residue in the

protein product?n ……

Page 5: Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

5

GC content

n Stabilityn GC: 3 hydrogen bondsn AT: 2 hydrogen bonds

n Codon preferencen GC rich fragment

à Gene

GC Content

n CpG islandn Resistance to methylationn Associated with genes which are frequently

switched onn Estimate: ½ mammalian gene have CpG

islandn Most mammalian housekeeping genes

have CpG island at 5’ end

Page 6: Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

6

GC content

n GC Content:n Emboss -> CompSeqn Emboss -> GEECEEn Bioedit

n CpG Island:n http://l25.itba.mi.cnr.it/genebin/wwwcpg.pl (Italy)n Emboss -> CpGReport

Pattern analysis

n Patterns in the sequencen Associated with certain biological

functionn Transcription factor bindingn Transcription startingn Transcription endingn Splicingn ……

Page 7: Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

7

Gene finding

n A kind of pattern searchn Gene structure

n Promoter, Exon, Intronn Promoter: TATA box (TATAAT)n Exon: Open Reading Frame (ORF)n Intron: Only eukaryotes, have splicing signaln Other motifs

Gene

Picture from the LSM2104 Practical, V.B. LIT

Page 8: Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

8

Gene finding

n Most of the programs focused on Open reading framen Emboss -> GetORFn Emboss -> ShowORF

n Other important elements:n Matrix binding site: Emboss -> MarScann Promoter region: PromoterInspectorn Splicing sites: GeneSplicer

Gene finding

n Prokaryotesn No intron n Long open reading framen High densityn Easy to detect

n Eukaryotesn Have intron n Combination of short open reading framesn Low densityn Hard to detect

Page 9: Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

9

Problem 1:

n Is it a gene?n Not sure, but have some confidence

n What is the expression level if it is a gene?n Determined by the promoter and other

upper stream elements

Translation

n Six reading framesn Open reading frame (ORF)

n Start codonn Stop codonn Certain length

n Tools: ShowORF

Page 10: Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

10

Conceptual translation

5’ AATGGCAATCCGCGTAGACTAGGCA 3’

3’ TTACCGTTAGGCGCATCTGTATCGT 5’

AATAATGGCGGCAATAATCCGCCGCGTCGTAGAAGACTACTAGGCGGCAAAAATGATGGCAGCAATCATCCGCCGCGTAGTAGACGACTAGTAGGCAGCAAAAATGGTGGCAACAATCCTCCGCGGCGTAGTAGACTACTAGGAGGCACA

TTTACTACCGTCGTTAGTAGGCGGCGCATCATCTGCTGTATTATCGTCGT

TTTTACCACCGTTGTTAGGAGGCGCCGCATCATCTGTTGTATCATCGTGTTTATTACCGCCGTTATTAGGCGGCGCAGCATCTTCTGTAGTATCGTCGTT

+1

-2-3

-1

+2+3

Six reading frames

AATAATGGCGGCAATAATCCGCCGCGTCGTAGAAGACTACTAGGCGGCAAN G N P R R L G N G N P R R L G

AAAATGGTGGCAACAATCCTCCGCGGCGTAGTAGACTACTAGGAGGCACAW Q S A * T RW Q S A * T R

TTTACTACCGTCGTTAGTAGGCGGCGCATCATCTGCTGTATTATCGTCGT

TTTTACCACCGTTGTTAGGAGGCGCCGCATCATCTGTTGTATCATCGTGTTTATTACCGCCGTTATTAGGCGGCGCAGCATCTTCTGTAGTATCGTCGTT

+1

-2-3

-1

+3

AAATGATGGCAGCAATCATCCGCCGCGTAGTAGACGACTAGTAGGCAGCAM A I R V D * AM A I R V D * A

+2

Page 11: Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

11

Problem 2

n What is the possible product of this gene?n It is likely to be ….n This conceptual translation is in open

reading frame ……n Can we get the gene product?

n If expression level high: Directly separaten If expression level low: Clone it

Recom

binant DN

A

Page 12: Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

12

Primer design

n Design primers only from accurate sequence data

n Restrict your search to regions that best reflect your goals

n Locate candidate primersn Verification of your choice

Primer design

n (primer 1) CTAGTACGATn ATGCCGTAGATC……TCCGATCATGCTAn TACGGCATCTAG……AGGCTAGTACGATn ATGCCGTAG (primer 2)

Page 13: Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

13

Primer design

n Mispriming areas n Primer length: 18-30 (Usually)n Annealing Temperature (55 - 75 C)n GC content: 35% - 65% (usually)n Avoid regions of secondary structuren 100% complimentarity is not necessary n Avoid self-complimentarity

Primer Design

Online tools:n http://www.hgmp.mrc.ac.uk/GenomeWeb/nuc-

primer.htmln http://www-genome.wi.mit.edu/cgi-

bin/primer/primer3_www.cgin http://www.cybergene.se/primer.html

Software toolsn Omigan Vecter NTI

Page 14: Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

14

Restriction map

n Restriction enzymen Recognize a patternn Recognition site V.S. Cutting site

n Select restriction enzyme to get a fragment of sequence

n Rebuild the sequence to create or invalidate a restriction site

n Tools: Omiga, remap, bioedit

Page 15: Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

15

Mutation

n Can be generated by PCRn Primers that not perfectly match

n Frame shift mutationn Insertionn Deletion

n Substitutionn Normaln Silent

Page 16: Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

16

Mutation

n Test the importancen Mutate suspected important place

n Create a patternn Often silent mutation

n Invalidate a patternn Often silent mutation

n Keep a reading frame

Problem 3

n Can we get the protein product?n Clone it and use a bacteria to express it

n Can we figure out the key residue in the protein product?n Guess the important residuen Mutate the residue to see whether the

activity loses

Page 17: Part I – Sequence analysis (DNA)bio-mirror.im.ac.cn/mirrors/s-star/downloads/tutorial/t2a.pdf · Part I – Sequence analysis (DNA) : Bioinformatics Software Chen Xin National University

17

Summary

n Life is determined by nucleotide sequencesn Sequence analysis reveals patterns have

biological significancen Sequence analysis helps the design of wet-lab

experiments

n Next part will be on protein sequence analysis