© Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

23
© Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases

Transcript of © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

Page 1: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

© Wiley Publishing. 2007. All Rights Reserved.

Protein and Specialized Sequence Databases

Page 2: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

Learning Objectives

Finding out the basics of protein maturationDeciphering a Swiss-Prot entryGetting to know specialized protein

databases such as KEGG (the metabolic-

pathways database) or PDB (the structure

database)

Page 3: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

Outline

1. Getting from a gene to a mature protein2. Reading a UniProt/Swiss-Prot entry3. Exploring metabolic databases such as

KEGG4. Finding out how about post translational

modifications

Page 4: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

From a Gene to a Functional Protein

DNA genes get transcribed into mRNAs mRNAs are translated into proteins Proteins often need to be matured before becoming active Matured proteins must be transported to their destination

• Cell nucleus• Mitochondria or other organelle• Periplasma (bacteria)• Secreted outside the cell

The protein is functional when it reaches the place where it has to work (just like you and me)!

Page 5: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

Protein Maturation

Maturation can involve• Removal of some fragments• Specific protein cleavage • Chemical modifications• Phosphorylation• Addition of lipids or sugars

(glycosylation)

Page 6: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

Knowing Your Protein

To understand how your protein works, you need to

know about• Its maturation• Its transportation• Its mechanism of functioning

All this information must be determined

experimentally If it has been done, it’s in Swiss-Prot

Page 7: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

The Swiss-Prot Database

Entries describe all proteins that have known functions Small, non-redundant database: 100,000 entries

• trEMBL contains 4 the 4 million putative proteins found in GenBank • Swiss-Prot contains the subset of trEMBL with a known function

All entries annotated manually Most accurate database for protein function Access Swiss-Prot at www.expasy.ch

Page 8: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

Browsing a Swiss-Prot Entry

Find this entry at www.expasy.org/uniprot/P00533

Page 9: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

The Main Sections of a Swiss-Prot Entry

General information• Accession number

References• Bibliography

Comment section• Functional information

Cross-references• Links to entries in other databases

Feature table• Mapping of every known function

Sequence

Page 10: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

The General Information in a Swiss-Prot Entry

The Entry Name• Identifies the entry• Can change if the entry gets merged

The Primary Accession Number• Has the form PXXXX• Is permanent and never changes

Last Modified lets you know when the entry was last modified The Protein Name and Synonyms provide some common names of your protein The From and Taxonomy fields indicate where the protein comes from The References section lists all the references used to compile this entry

Page 11: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

The Comments Section

The Comments section lists all

the known functions of the

protein. This section is a valuable

document compiled manually by

specialists Comments deal with the most

standard topics (see table)

Page 12: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

Comment Section of the Entry P00533

Page 13: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

The Cross-reference Section

Contains hyperlinks to

other entries in other

databases

Automatically updated

Page 14: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

Some Important Cross-References

EMBL: GenBank original DNA sequencePDB: Experimental structure of your proteinDIP: Proteins interacting with your proteinGlycoSuiteDB: GlycolsylationsMIM: List of genetic diseases involving your proteinOntologies: Function of your proteinProfiles: Known protein domains in your proteinENSEMBL: Genomic location of your protein

Page 15: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

The Features Section

Localizes precisely every known

function of your protein, each on

its sequence TRANSMEM: Transmembrane

domain ACT_SITE: Active sites BINDING: Binding sites DISULPHID: Bridge of cysteines

Page 16: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

Finding Out More About Your Protein’s Maturation

Proteins are often modified

to make them active

Modification can imply

attaching a lipid or a sugar

Use these resources to

determine the details of the

modification

www.ebi.ac.uk/RESID• This site details every

known post-translational

modification

www.glycosuite.com• A complete database of all

known sugars found in

proteins

www.lipidbank.jp• A database of lipids

Page 17: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

The Function of Your Protein

The Features and the Comments sections give you valuable functional information

To find out about the function of your protein, you will need to determine• Where your protein works• Metabolic pathway in which the protein is involved • The protein’s 3D structure• Which protein family it belongs to

You may find this data by following links in the cross-links section

Page 18: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

Where Does Your Protein Work?

Proteins are usually part of a metabolic pathway

A metabolic pathway is like a chain of production linking several different proteins

Metabolic pathways modify metabolites by passing them from one enzyme to the next

On the KEGG pathway, each enzyme appears with its EC number

Page 19: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

Some Important Resources forMetabolic Pathways

www.genome.ad.jp/kegg• KEGG is the most extensive database of metabolic

pathways • You can use it to compare species

www.chem.qmul.ac.uk/iubmb• The IUBMD assigns the EC numbers used to describe an

enzyme activitywww.ecocy.org

• An exhaustive list of all known metabolic pathways in E. coli and other bacteria

Page 20: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

What Is the Structure of Your Protein ?

A protein must have the right structure to perform its function The structure of a protein is the key to understanding it Predicting protein structures is very difficult Precise prediction requires experiments

• X-ray crystallography• Nuclear magnetic resonance

Prediction from sequence alone is possible but unreliable

Page 21: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

Some Databases of Protein Structures

www.rcsb.org/pdb• The database of protein structures• The protein’s “PDB” is often a synonymous with

its structure www.ncbi.nlm.nih.gov/Structure

• The other home of protein structuresswissmodel.expasy.org

• Prediction of structures from sequences

Page 22: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

Some Important Protein Families

Proteins can be classified into

families

This classification is based on

both function and sequence

Very specialized databases are

available for the most important

families

www.kinasenet.org• Kinases control everything in us;

their deregulation is the cause of

many cancers

imgt.cines.fr• Immunoglobulins are key elements

of our natural defenses

rebase.neb.com• This site is a key resource on

restriction enzymes

Page 23: © Wiley Publishing. 2007. All Rights Reserved. Protein and Specialized Sequence Databases.

Wrapping It Up

Predicting protein function is a central goal in biology

Protein databases help organize knowledge

They provide the material for• Developing new biological experiments• Developing new prediction algorithms• Extrapolating experimental data to unknown sequences