EECS 800 Research Seminar Mining Biological Data

17
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006

description

EECS 800 Research Seminar Mining Biological Data. Instructor: Luke Huan Fall, 2006. Lys. Lys. Gly. Gly. Leu. Val. Ala. His. Cartoon. Space filling. Oxygen Nitrogen Carbon Sulfur. Ribbon. Surface. Introduction. Protein A sequence from 20 amino acids - PowerPoint PPT Presentation

Transcript of EECS 800 Research Seminar Mining Biological Data

The UNIVERSITY of Kansas

EECS 800 Research SeminarMining Biological Data

Instructor: Luke Huan

Fall, 2006

Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide2

9/25/2006Protein Structures

IntroductionIntroduction

ProteinA sequence from 20 amino acids

Adopts a stable 3D structure that can be measured experimentallyLys Lys Gly Gly Leu Val Ala His

RibbonSpace fillingCartoon Surface

Oxygen

Nitrogen

Carbon

Sulfur

Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide3

9/25/2006Protein Structures

Exponential Growth of Protein Structures

Exponential Growth of Protein Structures

Year

# o

f st

ruct

ures

35,000

2005

Growth of Known Structures in Protein Data Bank

1988

The total number of known protein structures

Newly characterized proteins in that year

Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide4

9/25/2006Protein Structures

Protein Structure SpaceProtein Structure Space

http://www.nigms.nih.gov/psi/

Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide5

9/25/2006Protein Structures

Structure Space is Described Hierarchically

Structure Space is Described Hierarchically

From SCOP: Structure classification of proteins (http://scop.berkeley.edu/)

ClassFold

Superfamily– Family

» Protein domains

Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide6

9/25/2006Protein Structures

SCOP StatisticsSCOP Statistics

Class Number of folds Number of superfamilies

Number of families

All alpha proteins 218 376 608

All beta proteins 144 290 560

Alpha and beta proteins (a/b)

136 222 629

Alpha and beta proteins (a+b)

279 409 717

Multi-domain proteins

46 46 61

Membrane and cell surface proteins

47 88 99

Small proteins 75 108 171

Total 945 1539 2845

25973 PDB Entries (July 2005). 70859 Domains.

Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide7

9/25/2006Protein Structures

Amino Acids: Building Blocks of Proteins

Amino Acids: Building Blocks of Proteins

Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide8

9/25/2006Protein Structures

20 Naturally-occurring Amino Acids20 Naturally-occurring Amino Acids

Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide9

9/25/2006Protein Structures

Protein Secondary StructureProtein Secondary Structure

α Helix

Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide10

9/25/2006Protein Structures

Protein Secondary StructureProtein Secondary Structure

β strands

Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide11

9/25/2006Protein Structures

Top Level of Structure Space: Structure Classes

Top Level of Structure Space: Structure Classes

There are four major classes:α proteins

β proteins

α + β (anti-parallel β strands)

α / β (parallel β strands).

Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide12

9/25/2006Protein Structures

Protein FoldsProtein Folds

Protein fold is the way how secondary structures are organized in a 3D structure.

Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide13

9/25/2006Protein Structures

Popular FoldsPopular Folds

The eight most frequent SCOP folds

Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide14

9/25/2006Protein Structures

Superfamily and FamilySuperfamily and Family

Proteins within the same superfamily and family will tend to have similar sequence and similar function

Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide15

9/25/2006Protein Structures

The Nature of Protein Structure Data

The Nature of Protein Structure Data

The ball-stick model is an element-based structure representation A structure is decomposed into a set of amino acids

Protein geometry, topology, and attributes are defined with respect to the amino acid set

Geometry is the coordinates of amino acids

Topology is the phyisco-chemical interactions of the residues

Attributes are the physico-chemical properties of the residues

….

Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide16

9/25/2006Protein Structures

Grant Challenges: Proteomics

Grant Challenges: Proteomics

IL -3

IL -3R

IG F1

IG F1R

IR S 1

R A S

P I 3-K

A K T /P K B

B A D

B cl-XL

FA S -L

FA S

FA DD/MO R T

FL IC E

IC E

C P P 32

apoptos is

m itogen

C yc lin D1

pR b

E 2F

C yc lin E

P 53

P 21

P 16

P 27

C dk4

P 107

C -Myc

C -Myc

?

B in-1

Max

Max

C dc 25A

Max

Mad

Mad

C dk2p

P 27 C yc lin E

C dk2p

C yc lin E

C dk2 p

C yc lin E

C dk2

c ell pro liferation

Part of the biological system in a cell at the molecular level

Source: http://www.ircs.upenn.edu/modeling2001/,

Mining Biological DataKU EECS 800, Luke Huan, Fall’06 slide17

9/25/2006Protein Structures

ReferencesReferences

Bioinformatics: Genes, Proteins, and Computers, Christine Orengo, David Jones, Janet Thornton edit, Bios Scientific Publishers, 2003. (ISBN: 1-85996-0545)