Roadmap The topics: basic concepts of molecular biology more on Perl overview of the field ...

38
Roadmap Roadmap The topics: The topics: basic concepts of molecular biology basic concepts of molecular biology more on Perl more on Perl overview of the field overview of the field biological databases and database biological databases and database searching searching sequence alignments sequence alignments phylogenetics phylogenetics structure prediction structure prediction microarray data analysis microarray data analysis
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    0

Transcript of Roadmap The topics: basic concepts of molecular biology more on Perl overview of the field ...

Page 1: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

RoadmapRoadmap

The topics:The topics: basic concepts of molecular biologybasic concepts of molecular biology more on Perlmore on Perl overview of the fieldoverview of the field biological databases and database biological databases and database

searchingsearching sequence alignmentssequence alignments phylogeneticsphylogenetics structure predictionstructure prediction microarray data analysismicroarray data analysis

Page 2: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Protein Protein SynthesiSynthesi

ss

the national health museum

Page 3: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

ProteinsProteins

Page 4: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

ProteinsProteinsProteins perform a vast array of biological

functions including:

Transport: hemoglobin (delivers O2 to lungs) Mechanical support: collagen Storage: ferritin (stores iron) Regulation: repressor proteins (gene expression) Antibodies: immunoglobulin Catalysis: SOD (superoxide dismutase) …

Misfold:Misfold:mad cow disease, Alzheimer's disease, … mad cow disease, Alzheimer's disease, …

Page 5: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Amino acid compositionAmino acid composition

Basic Amino AcidBasic Amino AcidStructure:Structure: The side chain, R,The side chain, R,

varies for each ofvaries for each ofthe 20 amino acidsthe 20 amino acids

C

RR

C

H

NO

OHH

H

Aminogroup

Carboxylgroup

Side chain

Page 6: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

The Peptide BondThe Peptide Bond

Dehydration synthesisDehydration synthesis Polypeptide with repeating backbone: NPolypeptide with repeating backbone: N–C–C –C ––C –NN–C–C –C–C

Page 7: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Side chain propertiesSide chain properties

What make amino acids having different properties ?

CarbonCarbon does not make hydrogen bonds with does not make hydrogen bonds with water easily – water easily – hydrophobichydrophobic

O and NO and N are generally more likely than C to are generally more likely than C to h-bond to water – h-bond to water – hydrophilichydrophilic

The amino acids forms three general groups:The amino acids forms three general groups: HydrophobicHydrophobic PolarPolar Charged (positive/basic & negative/acidic)Charged (positive/basic & negative/acidic)

Page 8: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

The Hydrophobic Amino The Hydrophobic Amino AcidsAcids

Proline severelyProline severelylimits allowablelimits allowableconformations!conformations!

Page 9: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

The Charged Amino The Charged Amino AcidsAcids

Krane & Raymer

Page 10: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

The Polar Amino AcidsThe Polar Amino Acids

Krane & Raymer

Page 11: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

More Polar Amino AcidsMore Polar Amino Acids

and

Page 12: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Peptidyl polymersPeptidyl polymers A few amino acids in a chain are called a A few amino acids in a chain are called a

polypeptidepolypeptide. A . A proteinprotein is usually is usually composed of 50 to 400+ amino acids.composed of 50 to 400+ amino acids.

Page 13: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Primary & Secondary Primary & Secondary StructureStructure

Primary structurePrimary structure = the linear = the linear sequencesequence of amino acids comprising a protein:of amino acids comprising a protein:

AGVGTVPMTAYGNDIQYYGQVT…AGVGTVPMTAYGNDIQYYGQVT…

Secondary structureSecondary structure Regular patterns of hydrogen bonding in Regular patterns of hydrogen bonding in

proteins result in two patterns that emerge in proteins result in two patterns that emerge in nearly every protein structure known: the nearly every protein structure known: the --helixhelix and the and the --sheetsheet

The location of direction of these periodic, The location of direction of these periodic, repeating structures is known as the repeating structures is known as the secondary structuresecondary structure of the protein of the protein

Page 14: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Levels of Levels of Protein Protein

StructureStructure

Secondary structure Secondary structure elements combine to elements combine to form form tertiary tertiary structurestructure

Quaternary structureQuaternary structure occurs in multi-enzyme occurs in multi-enzyme complexescomplexes Many proteins are active Many proteins are active

only as homodimers, only as homodimers, homotetramers, etc.homotetramers, etc.

Page 15: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Dihedral anglesDihedral angles

Page 16: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

HelixHelix Most abundant secondary structureMost abundant secondary structure 3.6 amino acids per turn 3.6 amino acids per turn Hydrogen bond formed between every fourth Hydrogen bond formed between every fourth

residereside Avg length: 10 amino acids, or 3 turnsAvg length: 10 amino acids, or 3 turns Varies from 5 to 40 amino acidsVaries from 5 to 40 amino acids

Page 17: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

HelixHelix Normally found on the surface of protein coresNormally found on the surface of protein cores

Interact with aqueous environmentInteract with aqueous environment

Inner facing side has hydrophobic amino acidsInner facing side has hydrophobic amino acids

Outer-facing side has hydrophilic amino acidsOuter-facing side has hydrophilic amino acids

Every third amino acid tends to be hydrophobicEvery third amino acid tends to be hydrophobic

Pattern can be detected computationallyPattern can be detected computationally

Rich in alanine (A), gutamic acid (E), leucine (L), Rich in alanine (A), gutamic acid (E), leucine (L), and methionine (M)and methionine (M)

Poor in proline (P), glycine (G), tyrosine (Y), and Poor in proline (P), glycine (G), tyrosine (Y), and serine (S)serine (S)

Page 18: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

SheetSheet

Page 19: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

SheetSheet Hydrogen bonds between 5-10 consecutive amino Hydrogen bonds between 5-10 consecutive amino

acids in one portion of the chain with another 5-10 acids in one portion of the chain with another 5-10 farther down the chainfarther down the chain

Interacting regions may be adjacent with a short Interacting regions may be adjacent with a short loop, or far apart with other structures in betweenloop, or far apart with other structures in between

Directions:Directions: Same: Parallel SheetSame: Parallel Sheet Opposite: Anti-parallel SheetOpposite: Anti-parallel Sheet Mixed: Mixed SheetMixed: Mixed Sheet

Alpha carbons (and R side groups) alternate above Alpha carbons (and R side groups) alternate above & below the sheet& below the sheet

Prediction difficult, due to wide range of Prediction difficult, due to wide range of and and anglesangles

Page 20: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Ramachandran Plot Ramachandran Plot (alpha)(alpha)

Page 21: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Ramachandran Plot Ramachandran Plot (beta)(beta)

Page 22: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Ramachandran PlotRamachandran Plot

Page 23: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Helices and SheetsHelices and Sheets

Page 24: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

LoopLoop

Regions between Regions between helices and helices and sheets sheets

Various lengths and three-dimensional Various lengths and three-dimensional configurationsconfigurations

Located on surface of the structureLocated on surface of the structure

Hairpin loops: complete turn in the polypeptide Hairpin loops: complete turn in the polypeptide chain, (anti-parallel chain, (anti-parallel sheets) sheets)

More variable sequence structureMore variable sequence structure

Tend to have charged and polar amino acidsTend to have charged and polar amino acids

Page 25: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

CoilCoil

Region of secondary structure that is not Region of secondary structure that is not a helix, sheet, or loopa helix, sheet, or loop

Page 26: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Determining Protein Determining Protein StructureStructure

There are O(100,000) distinct proteins There are O(100,000) distinct proteins in human proteome.in human proteome.

Two methods for revealing positions of Two methods for revealing positions of atoms in 3-D:atoms in 3-D: X-Ray CrystallographyX-Ray Crystallography

X-ray diffraction pattern + mathematical X-ray diffraction pattern + mathematical constructionconstruction

Good protein crystal needed, good resolution of Good protein crystal needed, good resolution of diffraction neededdiffraction needed

Nuclear Magnetic ResonanceNuclear Magnetic Resonance Small proteins only (< 250 residues)Small proteins only (< 250 residues) Inter-proton distances + geometric constraintsInter-proton distances + geometric constraints

Page 27: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Bovine RibonucleaseBovine Ribonuclease

Christian Anfinsen, 1957.Christian Anfinsen, 1957.

Page 28: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Disulfide BondsDisulfide Bonds

Two cysteines in Two cysteines in close proximity close proximity will form a will form a covalentcovalent bond bond

Disulfide bond, Disulfide bond, disulfide bridge, disulfide bridge, or dicysteine or dicysteine bond.bond.

Significantly Significantly stabilizes stabilizes tertiary tertiary structure.structure.

Page 29: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.
Page 30: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Principles that govern the folding Principles that govern the folding of protein chains - of protein chains - Christian Anfinsen, Christian Anfinsen,

Science 1973Science 1973

Page 31: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

RibonucleaseRibonuclease

Page 32: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Disulfide BondsDisulfide Bonds

661212

551010

4488

3366

2244

# of combinations# of combinations# of S-S bonds# of S-S bonds# of cysteines# of cysteines

1039510395

945945

105105

1515

33

Page 33: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Levinthal’s Levinthal’s paradoxparadox

How do proteins find the right conformation out of the simply endless number of potential three-dimensional forms that it could randomly fold into?

Consider a 100 residue protein. If each residue can take only 3 positions, there are ?possible conformations. If it takes 10-13s to convert from 1 structure to

another, exhaustive search would take ? years!

3100 = 5 1047

1.6 1027

Page 34: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Current Opinion in Structural Biology, 2004, 14, 70-75

Page 35: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

What determines fold?What determines fold?

Anfinsen’s experiments in 1957 demonstrated Anfinsen’s experiments in 1957 demonstrated that proteins can fold spontaneously into their that proteins can fold spontaneously into their native conformations under physiological native conformations under physiological conditions. This implies that primary structure conditions. This implies that primary structure does indeed determine folding or 3-D does indeed determine folding or 3-D structure.structure.

Exceptions existExceptions exist Chaperone Chaperone proteins assist foldingproteins assist folding Abnormally folded Abnormally folded Prion Prion proteins can catalyze proteins can catalyze

misfolding of normal misfolding of normal prionprion proteins that then proteins that then aggregateaggregate

Page 36: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Other factorsOther factors

Physical properties of protein that Physical properties of protein that influence stability & therefore, determine influence stability & therefore, determine its fold:its fold: Rigidity of backboneRigidity of backbone

Amino acid interaction with waterAmino acid interaction with water Hydropathy index for side chainsHydropathy index for side chains

Interactions among amino acidsInteractions among amino acids Electrostatic interactionsElectrostatic interactions

Hydrogen, disulphide bondsHydrogen, disulphide bonds

Volume constraintsVolume constraints

Page 37: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

Understand protein folding

Structure: Given a sequence, what tertiary structure does it adopt? Global optimization, Monte Carlo, Molecular dynamics,

Coarse-grained dynamics, etc.

Thermodynamics: under mutation does the free energy of the native state change relative to native sequence? MC, MD, Free energy methods, etc.

Kinetics: how fast does the protein fold? Does a different sequence fold faster and why? Lattice Monte Carlo, Molecular dynamics, Coarse-

grained dynamics

Page 38: Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.

CASP changed the CASP changed the landscapelandscape

Critical Assessment of Structure Prediction Critical Assessment of Structure Prediction competition. Even numbered years since 1994competition. Even numbered years since 1994 Solved, but unpublished structures are posted in May, Solved, but unpublished structures are posted in May,

predictions due in Septemberpredictions due in September Various categoriesVarious categories

Relation to existing structures, Relation to existing structures, ab initioab initio, homology, fold, , homology, fold, etc.etc.

Partial vs. Fully automated approachesPartial vs. Fully automated approaches Produces lots of information about what aspects of the Produces lots of information about what aspects of the

problems are hard, and ends arguments about test sets.problems are hard, and ends arguments about test sets. Results showing steady improvement, and the Results showing steady improvement, and the

value of integrative approaches.value of integrative approaches.