Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

30
Fa 05 CSE182 CSE182-L6 Protein structure basics Protein sequencing
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Page 1: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

CSE182-L6

Protein structure basicsProtein sequencing

Page 2: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Announcements

• Midterm 1: Nov 1, in class.• Assignment 2: Online, due October 20.

Page 3: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Distinguishing between families

Page 4: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Distinguishing between families

Assignment 2

Page 5: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Profiles

• Start with an alignment of strings of length m, over an alphabet A,

• Build an |A| X m matrix F=(fki)

• Each entry fki represents the frequency of symbol k in position i

0.71

0.14

0.14

0.28

Page 6: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Scoring Profiles

S(i, j) = fkik

∑ M rk,s j[ ]

k

i

s

fki

Scoring Matrix

Page 7: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Psi-BLAST idea

• Multiple alignments are important for capturing remote homology.

• Profile based scores are a natural way to handle this.

• Q: What if the query is a single sequence.• A: Iterate:

– Find homologs using Blast on query– Discard very similar homologs– Align, make a profile, search with profile.

Page 8: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Psi-BLAST speed

• Two time consuming steps.1. Multiple alignment of homologs2. Searching with Profiles.

1. Does the keyword search idea work?

• Multiple alignment:– Use ungapped multiple

alignments only

• Pigeonhole principle again: – If profile of length m must score >= T– Then, a sub-profile of length l must

score >= lT|/m– Generate all l-mers that score at least

lT|/M– Search using an automaton

Page 9: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Protein Domains• An important realization (in the last decade) is that proteins have a

modular architecture of domains/folds.• Example: The zinc finger domain is a DNA-binding domain.• What is a domain?

– Part of a sequence that can fold independently, and is present in other sequences as well

Page 10: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Domain review

• What is a domain?• How are domains expressed

– Motifs (Regular expression & others)– Multiple alignments– Profiles– Profile HMMs

Page 11: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Domain databases

Can you speed up HMM search?

Page 12: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

A structural view of proteins

Page 13: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

CS view of a protein

• >sp|P00974|BPT1_BOVIN Pancreatic trypsin inhibitor precursor (Basic protease inhibitor) (BPI) (BPTI) (Aprotinin) - Bos taurus (Bovine).

• MKMSRLCLSVALLVLLGTLAASTPGCDTSNQAKAQRPDFCLEPPYTGPCKARIIRYFYNAKAGLCQTFVYGGCRAKRNNFKSAEDCMRTCGGAIGPWENL

Page 14: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Protein structure basics

Page 15: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Side chains determine amino-acid type

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

• The residues may have different properties.• Aspartic acid (D), and Glutamic Acid (E) are acidic

residues

Page 16: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Bond angles form structural constraints

Page 17: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Various constraints determine 3d structure

• Constraints– Structural constraints due to physiochemical

properties– Constraints due to bond angles– H-bond formation

• Surprisingly, a few conformations are seen over and over again.

Page 18: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Alpha-helix

• 3.6 residues per turn• H-bonds between 1st

and 4th residue stabilize the structure.

• First discovered by Linus Pauling

Page 19: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Beta-sheet

• Each strand by itself has 2 residues per turn, and is not stable.• Adjacent strands hydrogen-bond to form stable beta-sheets, parallel or anti-parallel.• Beta sheets have long range interactions that stabilize the structure, while alpha-helices have local

interactions.

Page 20: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Domains

• The basic structures (helix, strand, loop) combine to form complex 3D structures.

• Certain combinations are popular. Many sequences, but only a few folds

Page 21: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

3D structure

• Predicting tertiary structure is an important problem in Bioinformatics.

• Premise: Clues to structure can be found in the sequence.• While de novo tertiary structure prediction is hard, there are

many intermediate, and tractable goals.• The PDB database is a compendium of structures

PDB

Page 22: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Searching structure databases

• Threading, and other 3d Alignments can be used to align structures.

• Database filtering is possible through geometric hashing.

Page 23: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Trivia Quiz

• What research won the Nobel prize in Chemistry in 2004?

• In 2002?

Page 24: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

How are Proteins Sequenced? Mass Spec 101:

Page 25: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Nobel Citation 2002

Page 26: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Nobel Citation, 2002

Page 27: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Mass Spectrometry

Page 28: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Sample Preparation

Enzymatic Digestion (Trypsin)

+Fractionation

Page 29: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Single Stage MS

MassSpectrometry

LC-MS: 1 MS spectrum / second

Page 30: Fa 05CSE182 CSE182-L6 Protein structure basics Protein sequencing.

Fa 05 CSE182

Tandem MS

Secondary Fragmentation

Ionized parent peptide