Structural Bioinformatics Seminar Dina Schneidman Email:...

61
Structural Structural Bioinformatics Bioinformatics Seminar Seminar Dina Schneidman Email: [email protected]
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Structural Bioinformatics Seminar Dina Schneidman Email:...

Structural Bioinformatics Structural Bioinformatics SeminarSeminar

•Dina Schneidman

•Email: [email protected]

Outline

Seminar requirementsSeminar requirements Biological IntroductionBiological Introduction How to prepare seminar lecture?How to prepare seminar lecture?

No prior knowledge in Biology is assumed No prior knowledge in Biology is assumed or required! or required!

Attend ALL lecturesAttend ALL lectures Prepare one of the lecturesPrepare one of the lectures

Seminar Requirements

Learn how to study new subject from Learn how to study new subject from articlesarticles

Learn how to present work in Computer Learn how to present work in Computer Science Science

Seminar Goals

Biological IntroductionBiological Introduction

Schedule

Introduction to molecular Introduction to molecular structure.structure.

Introduction to pattern matching.Introduction to pattern matching. Introduction to protein structure Introduction to protein structure

alignment (comparison).alignment (comparison). Protein docking.Protein docking.

Small Ligands

Small organic molecules, Small organic molecules, composed of tens of atoms.composed of tens of atoms.

Highly flexible: can have Highly flexible: can have many torsional degrees of many torsional degrees of freedom.freedom.

DNA – The code of life

DNA is a polymer. DNA is a polymer. The monomer units of The monomer units of

DNA are nucleotides: A, DNA are nucleotides: A, T, C, G. T, C, G.

DNA is a normally double DNA is a normally double stranded macromolecule. stranded macromolecule.

RNA RNA is a polymer too. RNA is a polymer too. The monomer units of RNA are The monomer units of RNA are

nucleotides: A, U (instead of T), C, G. nucleotides: A, U (instead of T), C, G. DNA serves as the template for the DNA serves as the template for the

synthesis of RNA.synthesis of RNA.

Protein Protein is a polymer too. Protein is a polymer too. The monomer units of Protein The monomer units of Protein

are 20 amino acids. are 20 amino acids. Each amino acid is encoded Each amino acid is encoded

by 3 RNA nucleotides.by 3 RNA nucleotides.

Hemoglobin sequence:VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLGA FSDGLAHLDNLKGTFATLSELHXDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANA LAHKYH

Transcription

mRNA

Cells express different subset of the genes in different tissues and under different conditions.

Gene (DNA)

Translation

Protein

DNA RNA Protein

Symptomes (Phenotype)

The Central Dogma

The central dogmaThe central dogma

DNADNA ---> ---> mRNAmRNA ---> ---> ProteinProtein

{A,C,G,T} {A,C,G,U} {A,D,..Y} Guanine-Cytosine T->U

Thymine-Adenine

4 letter alphabets 20 letter alphabet

Sequence of nucleic acids Sequence of amino acidsSequence of amino acids

Bioinformatics - Computational Genomics

DNA mapping.DNA mapping. Protein or DNA sequence comparisons.Protein or DNA sequence comparisons. Exploration of huge textual databases.Exploration of huge textual databases.

In essence one- dimensional methods In essence one- dimensional methods and intuition.and intuition.

Structural Bioinformatics - Structural Genomics

Elucidation of the 3D structures of Elucidation of the 3D structures of biomolecules.biomolecules.

Analysis and comparison of biomolecular Analysis and comparison of biomolecular structures.structures.

Prediction of biomolecular recognition.Prediction of biomolecular recognition. Handles three-dimensional (3-D) structures.Handles three-dimensional (3-D) structures. Geometric Computing. (a methodology shared Geometric Computing. (a methodology shared

by Computational Geometry, Computer by Computational Geometry, Computer Vision, Computer Graphics, Pattern Vision, Computer Graphics, Pattern Recognition etc.)Recognition etc.)

Protein Structural Comparison

ApoAmicyanin - 1aaj Pseudoazurin - 1pmy

Algorithmic Solution

About 1 sec. Fischer, Nussinov, Wolfson ~ 1990.

Introduction to Protein Structure

Amino acids and the peptide bond

C – first side chain carbon (except for glycine).

Cα atoms

Backbone or Secondary structure display

Wire-frame or ribbons display

Spacefill model

Geometric Representation

3-D Curve{vi}, i=1…n

Secondary structure

Hydrogen bonds.

strands and sheets

The Holy Grail - Protein Folding

From Sequence to Structure.From Sequence to Structure. Relatively primitive computational Relatively primitive computational

folding models have proved to be NP folding models have proved to be NP hard even in the 2-D case.hard even in the 2-D case.

Determination of protein structures

X-ray CrystallographyX-ray Crystallography NMR (Nuclear Magnetic Resonance)NMR (Nuclear Magnetic Resonance) EM (Electron microscopy)EM (Electron microscopy)

An NMR result is an ensemble of models

Cystatin (1a67)

The Protein Data Bank (PDB)

International repository of 3D molecular International repository of 3D molecular data.data.

Contains x-y-z coordinates of all atoms of Contains x-y-z coordinates of all atoms of the molecule and additional data.the molecule and additional data.

http://pdb.tau.ac.ilhttp://pdb.tau.ac.il http://www.rcsb.org/pdb/http://www.rcsb.org/pdb/

Why bother with structureswhen we have sequences ?

In evolutionary related proteins In evolutionary related proteins structure is much better preserved structure is much better preserved than sequence.than sequence.

Structural motifs may predict similar Structural motifs may predict similar biological function biological function

Getting insight into protein folding. Getting insight into protein folding. Recovering the limited (?) number of Recovering the limited (?) number of protein folds.protein folds.

Applications

Classification of protein databases by Classification of protein databases by structure.structure.

Search of Search of partialpartial and and disconnecteddisconnected structural patterns in large databases. structural patterns in large databases.

Extracting Structure information is Extracting Structure information is difficult, we want to extract “new” folds.difficult, we want to extract “new” folds.

Applications (continued) Speed up of drug discovery.Speed up of drug discovery.

Detection of structural pharmacophores Detection of structural pharmacophores in an ensemble of drugs (similar in an ensemble of drugs (similar substructures in drugs acting on a substructures in drugs acting on a given receptor – pharmacophore).given receptor – pharmacophore).

Comparison and detection of drug Comparison and detection of drug receptor active sites (structurally similar receptor active sites (structurally similar receptor cavities could bind similar receptor cavities could bind similar drugs).drugs).

Object Recognition

Model Database

Scene

Recognition

Lamdan, Schwartz, Wolfson, “Geometric Hashing”,1988.

Protein Alignment = Geometric Pattern Discovery

Protein Alignment

• The superimposition pattern is not known a-priori – pattern discovery .

• The matching recovered can be inexact.

• We are looking not necessarily for thelargest superimposition, since other matchings may have biological meaning.

Geometric Task :

find those rotations and translations of one of the point sets which produce “large” superimpositions of corresponding 3-D points.

Given two configurations of points in the three dimensional space,

T

Geometric Task (continued)

Aspects:

•Object representation (points, vectors, segments)•Object resemblance (distance function)•Transformation (translations, rotations, scaling)

-> Optimization technique

Transformations

Translation

Translation and Rotation Rigid Motion (Euclidian Trans.)

Translation, Rotation + Scaling

txx

txUxRx

)( txUsxTx

Inexact Alignment.

Simple case – two closely related proteins with the same number of amino acids.

T

Question: how to measure alignment error?

Superposition - best least squares(RMSD – Root Mean Square Deviation)

Given two sets of 3-D points :P={pi}, Q={qi} , i=1,…,n;

rmsd(P,Q) = √ i|pi - qi |2 /n

Find a 3-D rigid transformation T* such that:

rmsd( T*(P), Q ) = minT √ i|T*pi - qi |2 /n

A closed form solution exists for this task.It can be computed in O(n) time.

Problem statement with RMSD metric.

find the largest alignment, a set of matched elements and transformation, with RMSD less than ε.

(belong to NP,)

Given two configurations of points in the three dimensional space, and ε threshold

T

Docking Problem:

• Given two molecules find their correct association:

+

=

Recep

tor Ligand

T

Complex

Docking Problem:

++ = ?= ?

Docking Problem:

++ = ?= ?

How to present a paper in How to present a paper in Computer ScienceComputer Science

The lecture should cover a given slot of time (~90 minutes).

Use PowerPoint slides for presentation. Each slide usually spans 1-2 minutes. The slides should not be overloaded. Use mouse or pointer. Use colors, pictures, tables and animation,

but don’t exaggerate.

Lecture Preparation

Communicate the key ideas during your lecture.

Don’t get lost in technical details. Structure your talk. Use a top-down approach.

What to say and how

Introduction – general description of the Introduction – general description of the paper.paper.

Body - abstract of the current method.Body - abstract of the current method. Technical details.Technical details. Conclusions and discussion.Conclusions and discussion.

Lecture Structure

Most important part of your talk! Most important part of your talk! Title + short explanation about the Title + short explanation about the

presented topic.presented topic. Lecture outline.Lecture outline. Problem definition, input and output. Don’t Problem definition, input and output. Don’t

forget to define the problem!forget to define the problem! Problem motivation. Problem motivation. Introduce terminology of the field.Introduce terminology of the field. Short review of existing approaches (don’t Short review of existing approaches (don’t

forget to add references!).forget to add references!).

Introduction

Abstract of the major results presented in Abstract of the major results presented in the paper.the paper.

Significance of the results.Significance of the results. Sketch of the method.Sketch of the method.

Body

Extended presentation of the method.Extended presentation of the method. Present key algorithmic ideas clearly and Present key algorithmic ideas clearly and

carefully.carefully. Complexity of the method.Complexity of the method. Experimental results.Experimental results.

Technicalities

Summarize major contributions of the work.Summarize major contributions of the work. You can highlight points based on technical You can highlight points based on technical

details you couldn’t discuss in introduction.details you couldn’t discuss in introduction. Present related open problems.Present related open problems. Don’t forget to thank the audience !!!Don’t forget to thank the audience !!! Questions.Questions.

Conclusions and Discussion

Use repetitions: Use repetitions: “ “Tell them what you're going to tell them. Tell them. Then tell them what you told them".

Remind, don’t assumeRemind, don’t assume Maintain eye contactMaintain eye contact Control your voice and motionControl your voice and motion

Getting to the Audience

Thanks!!!and Good Luck in your

lectures!