Structural Bioinformatics Seminar Dina Schneidman Email:...
-
date post
21-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Structural Bioinformatics Seminar Dina Schneidman Email:...
Structural Bioinformatics Structural Bioinformatics SeminarSeminar
•Dina Schneidman
•Email: [email protected]
Outline
Seminar requirementsSeminar requirements Biological IntroductionBiological Introduction How to prepare seminar lecture?How to prepare seminar lecture?
No prior knowledge in Biology is assumed No prior knowledge in Biology is assumed or required! or required!
Attend ALL lecturesAttend ALL lectures Prepare one of the lecturesPrepare one of the lectures
Seminar Requirements
Learn how to study new subject from Learn how to study new subject from articlesarticles
Learn how to present work in Computer Learn how to present work in Computer Science Science
Seminar Goals
Schedule
Introduction to molecular Introduction to molecular structure.structure.
Introduction to pattern matching.Introduction to pattern matching. Introduction to protein structure Introduction to protein structure
alignment (comparison).alignment (comparison). Protein docking.Protein docking.
Small Ligands
Small organic molecules, Small organic molecules, composed of tens of atoms.composed of tens of atoms.
Highly flexible: can have Highly flexible: can have many torsional degrees of many torsional degrees of freedom.freedom.
DNA – The code of life
DNA is a polymer. DNA is a polymer. The monomer units of The monomer units of
DNA are nucleotides: A, DNA are nucleotides: A, T, C, G. T, C, G.
DNA is a normally double DNA is a normally double stranded macromolecule. stranded macromolecule.
RNA RNA is a polymer too. RNA is a polymer too. The monomer units of RNA are The monomer units of RNA are
nucleotides: A, U (instead of T), C, G. nucleotides: A, U (instead of T), C, G. DNA serves as the template for the DNA serves as the template for the
synthesis of RNA.synthesis of RNA.
Protein Protein is a polymer too. Protein is a polymer too. The monomer units of Protein The monomer units of Protein
are 20 amino acids. are 20 amino acids. Each amino acid is encoded Each amino acid is encoded
by 3 RNA nucleotides.by 3 RNA nucleotides.
Hemoglobin sequence:VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGA FSDGLAHLDNLKGTFATLSELHXDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANA LAHKYH
Transcription
mRNA
Cells express different subset of the genes in different tissues and under different conditions.
Gene (DNA)
Translation
Protein
DNA RNA Protein
Symptomes (Phenotype)
The Central Dogma
The central dogmaThe central dogma
DNADNA ---> ---> mRNAmRNA ---> ---> ProteinProtein
{A,C,G,T} {A,C,G,U} {A,D,..Y} Guanine-Cytosine T->U
Thymine-Adenine
4 letter alphabets 20 letter alphabet
Sequence of nucleic acids Sequence of amino acidsSequence of amino acids
Bioinformatics - Computational Genomics
DNA mapping.DNA mapping. Protein or DNA sequence comparisons.Protein or DNA sequence comparisons. Exploration of huge textual databases.Exploration of huge textual databases.
In essence one- dimensional methods In essence one- dimensional methods and intuition.and intuition.
Structural Bioinformatics - Structural Genomics
Elucidation of the 3D structures of Elucidation of the 3D structures of biomolecules.biomolecules.
Analysis and comparison of biomolecular Analysis and comparison of biomolecular structures.structures.
Prediction of biomolecular recognition.Prediction of biomolecular recognition. Handles three-dimensional (3-D) structures.Handles three-dimensional (3-D) structures. Geometric Computing. (a methodology shared Geometric Computing. (a methodology shared
by Computational Geometry, Computer by Computational Geometry, Computer Vision, Computer Graphics, Pattern Vision, Computer Graphics, Pattern Recognition etc.)Recognition etc.)
The Holy Grail - Protein Folding
From Sequence to Structure.From Sequence to Structure. Relatively primitive computational Relatively primitive computational
folding models have proved to be NP folding models have proved to be NP hard even in the 2-D case.hard even in the 2-D case.
Determination of protein structures
X-ray CrystallographyX-ray Crystallography NMR (Nuclear Magnetic Resonance)NMR (Nuclear Magnetic Resonance) EM (Electron microscopy)EM (Electron microscopy)
The Protein Data Bank (PDB)
International repository of 3D molecular International repository of 3D molecular data.data.
Contains x-y-z coordinates of all atoms of Contains x-y-z coordinates of all atoms of the molecule and additional data.the molecule and additional data.
http://pdb.tau.ac.ilhttp://pdb.tau.ac.il http://www.rcsb.org/pdb/http://www.rcsb.org/pdb/
Why bother with structureswhen we have sequences ?
In evolutionary related proteins In evolutionary related proteins structure is much better preserved structure is much better preserved than sequence.than sequence.
Structural motifs may predict similar Structural motifs may predict similar biological function biological function
Getting insight into protein folding. Getting insight into protein folding. Recovering the limited (?) number of Recovering the limited (?) number of protein folds.protein folds.
Applications
Classification of protein databases by Classification of protein databases by structure.structure.
Search of Search of partialpartial and and disconnecteddisconnected structural patterns in large databases. structural patterns in large databases.
Extracting Structure information is Extracting Structure information is difficult, we want to extract “new” folds.difficult, we want to extract “new” folds.
Applications (continued) Speed up of drug discovery.Speed up of drug discovery.
Detection of structural pharmacophores Detection of structural pharmacophores in an ensemble of drugs (similar in an ensemble of drugs (similar substructures in drugs acting on a substructures in drugs acting on a given receptor – pharmacophore).given receptor – pharmacophore).
Comparison and detection of drug Comparison and detection of drug receptor active sites (structurally similar receptor active sites (structurally similar receptor cavities could bind similar receptor cavities could bind similar drugs).drugs).
Protein Alignment
• The superimposition pattern is not known a-priori – pattern discovery .
• The matching recovered can be inexact.
• We are looking not necessarily for thelargest superimposition, since other matchings may have biological meaning.
Geometric Task :
find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding 3-D points.
Given two configurations of points in the three dimensional space,
T
Geometric Task (continued)
Aspects:
•Object representation (points, vectors, segments)•Object resemblance (distance function)•Transformation (translations, rotations, scaling)
-> Optimization technique
Transformations
Translation
Translation and Rotation Rigid Motion (Euclidian Trans.)
Translation, Rotation + Scaling
txx
txUxRx
)( txUsxTx
Inexact Alignment.
Simple case – two closely related proteins with the same number of amino acids.
T
Question: how to measure alignment error?
Superposition - best least squares(RMSD – Root Mean Square Deviation)
Given two sets of 3-D points :P={pi}, Q={qi} , i=1,…,n;
rmsd(P,Q) = √ i|pi - qi |2 /n
Find a 3-D rigid transformation T* such that:
rmsd( T*(P), Q ) = minT √ i|T*pi - qi |2 /n
A closed form solution exists for this task.It can be computed in O(n) time.
Problem statement with RMSD metric.
find the largest alignment, a set of matched elements and transformation, with RMSD less than ε.
(belong to NP,)
Given two configurations of points in the three dimensional space, and ε threshold
T
Docking Problem:
• Given two molecules find their correct association:
+
=
Recep
tor Ligand
T
Complex
The lecture should cover a given slot of time (~90 minutes).
Use PowerPoint slides for presentation. Each slide usually spans 1-2 minutes. The slides should not be overloaded. Use mouse or pointer. Use colors, pictures, tables and animation,
but don’t exaggerate.
Lecture Preparation
Communicate the key ideas during your lecture.
Don’t get lost in technical details. Structure your talk. Use a top-down approach.
What to say and how
Introduction – general description of the Introduction – general description of the paper.paper.
Body - abstract of the current method.Body - abstract of the current method. Technical details.Technical details. Conclusions and discussion.Conclusions and discussion.
Lecture Structure
Most important part of your talk! Most important part of your talk! Title + short explanation about the Title + short explanation about the
presented topic.presented topic. Lecture outline.Lecture outline. Problem definition, input and output. Don’t Problem definition, input and output. Don’t
forget to define the problem!forget to define the problem! Problem motivation. Problem motivation. Introduce terminology of the field.Introduce terminology of the field. Short review of existing approaches (don’t Short review of existing approaches (don’t
forget to add references!).forget to add references!).
Introduction
Abstract of the major results presented in Abstract of the major results presented in the paper.the paper.
Significance of the results.Significance of the results. Sketch of the method.Sketch of the method.
Body
Extended presentation of the method.Extended presentation of the method. Present key algorithmic ideas clearly and Present key algorithmic ideas clearly and
carefully.carefully. Complexity of the method.Complexity of the method. Experimental results.Experimental results.
Technicalities
Summarize major contributions of the work.Summarize major contributions of the work. You can highlight points based on technical You can highlight points based on technical
details you couldn’t discuss in introduction.details you couldn’t discuss in introduction. Present related open problems.Present related open problems. Don’t forget to thank the audience !!!Don’t forget to thank the audience !!! Questions.Questions.
Conclusions and Discussion
Use repetitions: Use repetitions: “ “Tell them what you're going to tell them. Tell them. Then tell them what you told them".
Remind, don’t assumeRemind, don’t assume Maintain eye contactMaintain eye contact Control your voice and motionControl your voice and motion
Getting to the Audience