Biomolecular Simulation: A Computational Microscope for Molecular Biology
Computational Problems in Molecular Biology
description
Transcript of Computational Problems in Molecular Biology
Computational Problemsin Molecular Biology
Dong Xu
Computer Science Department109 Engineering Building WestE-mail: [email protected]
573-882-7064http://digbio.missouri.edu
Lecture Outline
From DNA to gene
Protein sequence and structure
Gene expression
Protein interaction and pathway Provide a roadmap for the entire course Biology from system level (computational
perspective)
About Life
Life is wonderful: amazing mechanisms
Life is not perfect: errors and diseases
Life is a result of evolution
Cells
Basic unit of life Prokaryotes/eukaryotes Different types of cell:
Skin, brain, red/white blood Different biological function
Cells produced by cells Cell division (mitosis) 2 daughter cells
DNA
Double Helix (Watson & Crick)
Nitrogenous Base Pairs Adenine Thymine [A,T]Cytosine Guanine [C,G]Weak bonds (can be broken)Form long chains
Genome Each cell contains a full genome (DNA) The size varies:
Small for viruses and prokaryotes (10 kbp-20Mbp)Medium for lower eukaryotes
Yeast, unicellular eukaryote 13 Mbp Worm (Caenorhabditis elegans) 100 Mbp Fly, invertebrate (Drosophila melanogaster) 170 Mbp
Larger for higher eukaryotes Mouse and man 3000 Mbp
Very variable for plants (many are polyploid) Mouse ear cress (Arabidopsis thaliana) 120 Mbp Lilies 60,000 Mbp
Differences in DNA
~2% ~4%
~0.2%
Genes
Chunks of DNA sequence that can translate into functional biomolecules (protein, RNA)
2% human DNA sequence for coding genes
32,000 human genes, 100,000 genes in tulips
Gene Structure General structure of an eukaryotic gene
Unlike eukaryotic genes, a prokaryotic gene typically consists of only one contiguous coding region
Informational Classes in Genomic DNA
Transcribed sequences (exons and introns) Messenger sequences (mRNA, exons only) Coding sequences (CDS, part of the exons only) Heads and tails: untranslated parts (UTR) Regulatory sequences ... and all the rest
Identify them: gene-finding
Genetic CodeA=Ala=Alanine
C=Cys=Cysteine
D=Asp=Aspartic acid
E=Glu=Glutamic acid
F=Phe=Phenylalanine
G=Gly=Glycine
H=His=Histidine
I=Ile=Isoleucine
K=Lys=Lysine
L=Leu=Leucine
M=Met=Methionine
N=Asn=Asparagine
P=Pro=Proline
Q=Gln=Glutamine
R=Arg=Arginine
S=Ser=Serine
T=Thr=Threonine
V=Val=Valine
W=Trp=Tryptophan
Y=Tyr=Tyrosine
Protein Synthesis
AGCCACTTAGACAAACTA (DNA)Transcribed to:
AGCCACUUAGACAAACUA (mRNA)Translated to:
SHLDKL (Protein)
About Protein
10s – 1000s amino acids (average 300)Lysozyme sequence (129 amino acids):
KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS
DGNGMNAWVA WRNRCKGTDV QAWIRGCRL
Protein backbones:Side chain
Evolution of Genes: Mutation
Genes alter (slightly) during reproduction Caused by errors, from radiation, from toxicity
3 possibilities: deletion, insertion, alteration
Deletion: ACGTTGACTC ACGTGACTC
Insertion: ACGTTGACTC AGCGTTGACTC
Substitution: ACGTTGACTC ACGATGACTC
Mutations are mostly deleterious
Ancestor
Gene duplication
X YRecombination
75%X 25%Y
Paralogs(related functions)
Mixed Homology
Orthologs(similar
function)
Evolution and Homology
Twilight zone: undetectable homology (<20% sequence identity)
Sequence Comparison
o Pairwise sequence comparisono multiple alignment
SAANLEYLKNVLLQFIFLKPG--SERERLLPVINTMLQLSPEEKGKLAAV O15045NEKNMEYLKNVFVQFLKPESVP-AERDQLVIVLQRVLHLSPKEVEILKAA P34562KNEKIAYIKNVLLGFLEHKE----QRNQLLPVISMLLQLDSTDEKRLVMS Q06704REINFEYLKHVVLKFMSCRES---EAFHLIKAVSVLLNFSQEEENMLKET Q92805MLIDKEYTRNILFQFLEQRD----RRPEIVNLLSILLDLSEEQKQKLLSV O42657EPTEFEYLRKVMFEYMMGR-----ETKTMAKVITTVLKFPDDQAQKILER O70365DPAEAEYLRNVLYRYMTNRESLGKESVTLARVIGTVARFDESQMKNVISS Q21071STSEIDYLRNIFTQFLHSMGSPNAASKAILKAMGSVLKVPMAEMKIIDKK Q18013
Phylogenetic Trees
Understand evolution
Protein Structure
Lysozyme structure:
ball & stick strand surface
Structure Features of Folded Proteins
Compact Secondary structures:
loop -helix -sheet
Protein cores mostly consist of -helices and -sheets
Protein Structure Comparison
Structure is better conserved than sequence
Structure can adopt a wide range of mutations.
Physical forces favorcertain structures.
Number of fold is limited. Currently ~700 Total: 1,000 ~10,000 TIM barrel
Protein Folding Problem
A protein folds into a unique 3D structure under the physiological condition
Lysozyme sequence: KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS DGNGMNAWVA WRNRCKGTDV
QAWIRGCRL
Structure-Function Relationship
Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.
A predicted structure is a powerful tool for function inference. Trp repressor as a function switch
Structure-Based Drug Design
HIV protease inhibitor
Structure-based rational drug design is still a major method for drug discovery.
Gene Expression
Same DNA in all cells, but only a few percent common genes expressed (house-keeping genes).
A few examples:
(1) Specialized cell: over-represented hemoglobin in blood cells.(2) Different stages of life cycle: hemoglobins before and after birth, caterpillar and butterfly.(3) Different environments: microbial in nutrient poor or rich environment.(4) Special treatment: response to wound.
Eucaryote Gene Expression Control
DNAPrimaryRNA
transcriptmRNA mRNA
nucleus cytosol
RNA transportcontrol
inactivemRNA
mRNA degradation
control
translationcontrol
nucleus membrane
transcriptionalcontrol
protein
inactiveprotein
protein activitycontrol
RNA processing
control
Methods: Mass-spec Microarray
Gene Regulation
DNA sequenceStart of transcription
promoter
operator
Microarray Experiments
Microarray data
Regulation/function/pathway/cellular state/phenotype
Disease: diagnosis/gene identification/sub-typing
Microarray chip
Genetic vs. Physical Interaction
Regulatory network
Genetic interaction
Complex system
Physical interaction
Gene/protein interaction
Expressedgene
Transcriptionfactor
Biological Pathway
Studying Pathways throughSystems Biology Approach
RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPCsequence
structure
function protein interaction
gene regulation
pathway(cross-talk)
Discussion
Possible impacts of biotechnology to our life
Assignments
Required reading:* Chapter 13 in “Pavel Pevzner: Computational Molecular Biology - An Algorithmic Approach. MIT Press, 2000.”* Larry Hunter: molecular biology for computer scientists
Optional reading: http://www.ncbi.nih.gov/About/primer/bioinformatics.htmlhttp://www.bentham.org/cpps1-1/Dong%20Xu/xu_cpps.htm