Telling self from non-self: Learning the language of the Immune System Rose Hoberman and Roni...

21
Telling self from non- self: Learning the language of the Immune System Rose Hoberman and Roni Rosenfeld BioLM Workshop May 2003

Transcript of Telling self from non-self: Learning the language of the Immune System Rose Hoberman and Roni...

Telling self from non-self: Learning the language of the Immune System

Rose Hoberman and Roni Rosenfeld

BioLM Workshop May 2003

Understanding the Immune System

The Goals: characterize the differences between the languages of self vs. non-selfexplain (and predict) which self proteins (or regions of proteins) are auto-reactive, which proteins are highly allergenic, ...create better predictors of immunogenicity

Possible applications:vaccine developmenttreating auto-immune diseasesco-opt the immune system for cancer therapy

Focus on T cells

Essential component of the adaptive immune system

kill virus-infected cellsstimulate B cells to produce antibodiescoordinate entire adaptive response

Amenable to sequence-based analysisT cell’s recognize short amino acid chains

Specificity of T cells

Through a process of DNA splicing each T cell

has a unique surface molecule called a T cell receptor (TCR)recognizes a unique pattern

A T cell epitope the region of an antigen capable of eliciting a T cell responseshort peptide (amino acid chain) derived from a protein antigen.

Predicting Epitopes

Even an immunogenic protein might have only one or a few T cell epitopesWe have millions of T cells, each of which recognizes only a few patternsHow can we predict epitopes?

Many proteins are not immunogens

Two Possible Constraints

Machinery for generating and displaying peptides

Many peptides will never even be presented to T cells

Process of maintaining self-toleranceT cells should not attack cells displaying only peptides derived from self proteins

TCR-MHC-Peptide Binding

Modelling the Peptide Pipeline

Binding and cleavage databases over 10,000 synthetic and pathogen-derived peptides~400 MHC I and II alleles

Prediction methodsposition specific probability matricesneural networkspeptide threading

Large amount of data and body of research

Two Possible Constraints

Machinery for turning proteins into peptides

Many peptides will never even be presented to T cells

Self-toleranceT cells should not attack cells displaying only self proteins

Self Tolerance

T cells originate in the bone marrow then migrate to the Thymus where they matureSelection of T cells through binding to self MHC-peptides in thymus

Strong binders are killed (clonal deletion)Remaining T cells are (usually) no longer self-reactive

Finding Immunogenic Regions of Proteins

Method 1:learn to predict which peptides will be generated, transported, and bound with MHC molecules

Method 2:learn to discriminate self from non-self and use these models to classify each possible peptide

Related Work

Compositional bias and mimicry toward the nonself proteome in immunodominant T cell epitopes of self and nonself antigens

Ristori G, Salvetti M, Pesole G, Attimonelli M, Buttinelli C, Martin R, Riccio P. FASEB J. 14, 431--438 (2000).

Self-Reactive Protein

Multiple Sclerosis (MS) is caused by the destruction of the Myelin sheets which surround nerve cellsT cells erroneously attack the Myelin Basic Protein (MBP) on the surface of the Myelin cellsWell-studied protein; known which regions are immunogenic

Unigram Models

Ristori et al created two sets (self/non-self)...1. Human genomes2. Microbial genomes (Bacteria/Viruses)

We created three sets...1. Human2. Pathogenic bacteria3. Non-pathogenic bacteria

A Simple Self/Non-Self Predictor

For each window of size ~7-15Calculate the probability that the subsequence was generated by each unigram distribution (running average)The ratio of the two probabilities gives a prediction of the degree of expected immune responseSimilar to Betty’s segmentation by ratio of short-range/long-range models

Prediction of IP Values (Ristori et. al.)

Pathogenic vs. Non-Pathogenic

A Simple Extension

Do amino acid physical and chemical properties have any predictive power?

bulkiness and hydrophobicity measures result in better predictions on MBP than self/non-selfbut Ristori et al claim that their predictions are better than any previous work

Question:which existing prediction model works best?

Where to Go From Here?

Understand relative performance and strengths/weaknesses

self/non-self modellingmore traditional epitope prediction methodshow to combine these methodswhat is the right evaluation function?

Future Work

featureshigher level n-gramsexpression level of genesexploit the differences between pathogen/non-pathogen as well as self/non-self

dataauto-immune proteinsepitopes of known pathogens, ...

modellingmore powerful than simple ratio of probabilities