Yana Safonova - Главнаяbioinformaticsinstitute.ru/sites/default/files/bi... · Safonova and...

Post on 24-Aug-2020

0 views 0 download

Transcript of Yana Safonova - Главнаяbioinformaticsinstitute.ru/sites/default/files/bi... · Safonova and...

Bioinformatics in immunology

Yana SafonovaPostdoctoral researcher, UCSD

● Immunology 101

● Antibodies and drug design

● Bioinformatics and data science behind the drug

design process

● Therapeutics opportunities of vertebrate species

● Human antibodies: did we reveal all their secrets?

● Future development of immunoinformatics

● Immunology 101

● Antibodies and drug design

● Bioinformatics and data science behind the drug design

process

● Therapeutics opportunities of vertebrate species

● Human antibodies: did we reveal all their secrets?

● Future development of immunoinformatics

Adaptive immune system

● Variety of threats to human body is huge and unpredictable

● Genome is too small to encode defences against all these threats

● Immune system has an ability to adapt to various threats using agents (e.g., antibodies) that are not encoded in the genome.

● Antibodies are proteins that bind to a specific treat (called antigen) and cause its neutralization

● Immune system generates millions of different antibodies (repertoire) to neutralize various antigens

Specificity rule: one antibody – one antigen

(not necessarily true)Antibody

Antigen

Antibodies

Antibody-antigen binding

Fundamentals of Immunology, 1943

Before recombination, the genome of an antibody-producing cell (B cell) looks exactly like genomes of all other cells:

Immunoglobulin locus (Chr 14), length ~1.25 Mb

V165-305 ntavg. 291 nt

D11-37 nt

avg. 24 nt

J48-63 nt

avg. 54 nt

Generation of antibodies

Selection of J segment...

Left cleavage of J segment...

Selection of D segment...

Right cleavage of D segment...

Newly created unique genomic region

Concatenation of D and J segments...

Left cleavage of DJ fragment...

Selection of V segment...

Right cleavage of V segment...

360 nt of VDJ + 1000 nt of constant region instead of original 1.25 Mb

VDJ concatenation (variable region of antibody)

Constant region

Variable region of antibodies contains antigen binding sites

Constant region

360 nt of VDJ + 1000 nt of constant region instead of original 1.25 Mb

Antibodies are subjects of fast evolution

Immune system mutates and amplifies a binding antibody

Mutation rate in antibody genes is 3-4 order of magnitude higher than in other genome

One antibody = one antigen

Antibody repertoire is a set of clonal lineages

Antibody repertoire is a set of unknown clonal lineages

● Immunology 101

● Antibodies and drug design

● Bioinformatics and data science behind the drug design

process

● Therapeutics opportunities of vertebrate species

● Human antibodies: did we reveal all their secrets?

● Future development of immunoinformatics

Antibody are versatile drugs

Sela-Culang et al., Frontiers Immunol, 2013

Antibody based drugs

Santos et al., Braz J Pharm Sci, 2018

Market of monoclonal antibodies

https://www.bptc.com/past-present-future-antibody-products-market-place/

● Immunology 101

● Antibodies and drug design

● Bioinformatics and data science behind the drug

design process

● Therapeutics opportunities of vertebrate species

● Human antibodies: did we reveal all their secrets?

● Future development of immunoinformatics

Repertoire sequencing (Rep-Seq) data

JDV V J

Illumina or PacBio read

antibody gene (~ 400 nt)

Problem 1: constructing antibody repertoire

JDV V J

antibody gene (~ 400 nt)

Illumina or PacBio read

X 2X 3X 1X 1

Antibody repertoire Rep-Seq library Antibody repertoire

A big clustering problem...

X 2X 3X 1X 1

Antibody repertoire Rep-Seq library Antibody repertoire

Antibody repertoire construction problem: identify reads corresponding to identical antibodies

JDV V J

antibody gene (~ 400 nt)

Illumina or PacBio read

Antibody repertoire construction problem: identify reads corresponding to identical antibodies

JDV V J

antibody gene (~ 400 nt)

Illumina or PacBio read

...or a big error correction problem?

ACGTGATCGAG

ACGTGCTCGAG

? Sequencing error or natural variation?

Problem 2: reconstructing evolutionary development of antibody repertoire

Problem 2.1: clonal lineage decomposition

1. Decomposing repertoire into clonal lineages

Problem 2.2: evolutionary tree construction

1. Decomposing repertoire into clonal lineages

2. Constructing evolutionary tree for each lineage

Standard phylogenetic algorithms are not applicable to antibody repertoires

Reverse of Homoplasy Homoplasy

Capturing and sequencing antibody repertoire

Finding specific antibodiesFinding antibody drug

Vaccinating a model organism

Rep-Seq based approach to drug development

V D J

CDR1 CDR2 CDR3

CDRs represent antigen-binding sites

CDRs likely represent antigen-binding sites

Sela-Culang et al., Frontiers Immunol, 2013

CDR grafting

Santos et al., Braz J Pharm Sci, 2018

Sela-Culang et al., Frontiers Immunol, 2013

Long CDR3s in response to HIV and malaria

Pieper et al., Nature, 2017

Additional domain is found in public malaria-specific antibodies

Broadly neutralizing antibodies against HIV are characterized by extremely long CDR3s

Sok et al., Nature, 2017

● Immunology 101

● Antibodies and drug design

● Bioinformatics and data science behind the drug design

process

● Therapeutics opportunities of vertebrate species

● Human antibodies: did we reveal all their secrets?

● Future development of immunoinformatics

Cattle antibodies with ultra-long CDR3s

Wang et al., Cell, 2013

Cattle IGH locus

Stanfield et al., Adv Immunol, 2018

IGHD8-2

ORF = 3GTAGTTGTCCTGATGGTTATAGTTATGGTTATGGTTGTGGTTATGGTTATGGTTGTAGTGGTTATGATTGTTATGGTTATGGTGGTTATGGTGGTTATGGTGGTTATGGTTATAGTAGTTATAGTTATAGTTATACTTACGAATATAC

ORF = 2GTAGTTGTCCTGATGGTTATAGTTATGGTTATGGTTGTGGTTATGGTTATGGTTGTAGTGGTTATGATTGTTATGGTTATGGTGGTTATGGTGGTTATGGTGGTTATGGTTATAGTAGTTATAGTTATAGTTATACTTACGAATATAC

ORF = 1GTAGTTGTCCTGATGGTTATAGTTATGGTTATGGTTGTGGTTATGGTTATGGTTGTAGTGGTTATGATTGTTATGGTTATGGTGGTTATGGTGGTTATGGTGGTTATGGTTATAGTAGTTATAGTTATAGTTATACTTACGAATATAC

CysTGTTGCAGC

CGCGGC

TACTCCTTC

TGATGG

AGTCGTGGT

TATTCTTTT

Cys-induced diversity of cattle antibodies

Wang et al., Cell, 2013

Cattle antibodies with known 3D structure

Stanfield et al., Sci Immunol, 2016

Recognition of HIV

Verkoczy, Adv Immunol, 2017

Alter & Ackerman, Cell, 2014

Conventional antibodies consist of two chains

https://www.10xgenomics.com/solutions/vdj/ https://www.the-scientist.com/modus-operandi/gene-expression-in-a-drop-35068

Camelid antibodies

https://www.abcore.com/blog/llama-antibodies-small-powerful

VH locus VHH locus JD

Antibody repertoires from diverse species

Rios et al., Curr Opin Struct Biol, 2015

What species would be next?

Amemiya et al., Nature, 2013

● Immunology 101

● Antibodies and drug design

● Bioinformatics and data science behind the drug design

process

● Therapeutics opportunities of vertebrate species

● Human antibodies: did we reveal all their secrets?

● Future development of immunoinformatics

Tandem CDR3s

Tandem CDR3 represents a result of somatic VDDJ recombination

Some tandem CDR3s can be explained by mismatches and additions in a single D gene

Safonova and Pevzner, under review, 2018

Tandem CDR3s

Tandem CDR3 represents a result of somatic VDDJ recombination

Some tandem CDR3s can be explained by mismatches and additions in a single D gene

Safonova and Pevzner, under review, 2018

Tandem CDR3s reveal duplications

14 datasets

● Known duplication of D5

Safonova and Pevzner, under review, 2018

Tandem CDR3s reveal duplications

14 datasets

● Known duplication of D5● Potential novel duplication of D10

and D20

Safonova and Pevzner, under review, 2018

Tandem CDR3s reveal duplications

14 datasets

● Known duplication of D5● Potential novel duplication of D10

and D20○ Might be an indirect evidence

of high usage of D10 and D20

10 – 17%6 – 18%

Mechanisms of tandem CDR3 formation are poorly understood

12-23 rule:V D J

7 + 23 + 9 9 + 12 + 7 7 + 12 + 9 9 + 23 + 7

possible possibleforbidden

Tandem CDR3s are results of violations of the 12 / 23 rule

Another mechanism of VDJ recombination

D9D10Insertion 1 Insertion 2

Insertion 2 corresponds to sequence between D9 and D10 in the reference IGH locus and thus contains the right RSS of D9 and the left RSS of D10

Safonova and Pevzner, under review, 2018

Immunoglobulin loci have complicated repetitive structure

Alu MIRLINE1LINE2Retrotransposonsretroviral and other LTRs DNA transposonsmedium frequency repetitive sequencessimple repeats

Matsuda et al., J Exp Med, 1998

Chromosome recombination is a source of genetic diversity

V1 V2 V3 V4

V1 V2 V3 V4

Chromosome recombination is a source of genetic diversity

V1 V2 V3 V4

V1 V2 V3 V4

V1 V2 V3 V4

V1 V2 V3 V4

Chromosome recombination is responsible for changing CNVs of existing genes...

V1 V2 V3 V4

V1 V2 V3 V4

V1 V3 V4

V1 V2 V2 V3

Alu

Alu

Alu

Alu V4

Alu

Alu

Alu Alu

… and creating novel genes

V1 V2 V3 V4

V1 V2 V3 V4

V1 V4

V1 V2 V3

V3V2

V2V3 V4

Chromosome recombination may result in changing IGH structure

V1 V2 D1

V1 V2

V1

Alu

Alu

Alu

D2

Alu D1 D2

Alu

D2

V1 V2 AluAlu D1 V2 D1 D2Alu

… and possible VDJ recombinations

V1 V2 AluAlu D1 V2 D1 D2Alu

… and possible VDJ recombinations

V1 V2 AluAlu D1 V3 D2 D3Alu

… and possible VDJ recombinations

V1 V2 AluAlu D1 V3 D2 D3Alu

VDJ recombination is no longer able to recombine V3 and D1

Why is it important: an flu example● IGHV1-69 is responsible for formation of bnAbs to the influenza

A hemagglutinin● 14 alleles of IGHV1-69 can be differentiated by the presence of

either a phenylalanine (F) or leucine (L) at amino acid position 54

● Replacement of Phe54 by Leu54 has been shown to dramatically reduce binding affinities

Avnir et al, Scientific Reports, 2016

Kidd, et al., 2012 reported 18 unique IGHV haplotypes in 9 individuals (unknown ethnicities, SF Bay Area)

Population-based paradigm of vaccination

Watson et al, Trends in Immunology, 2017

● Immunology 101

● Antibodies and drug design

● Bioinformatics and data science behind the drug design

process

● Therapeutics opportunities of vertebrate species

● Human antibodies: did we reveal all their secrets?

● Future development of immunoinformatics

Future directions

● Accumulation of Rep-seq data

● Personalized computation of individual immunoglobulin

genes and prediction of disease associations

● Creation of databases of antibodies with known specificity

● Development of computational approaches for antibody

folding and binding

● Development of immunoproteogenomics technologies

Acknowledges

Data Science postdoctoral fellowships, Center for Information Theory and Applications,

University of California San Diego

UCSDVinnu BhardwajAndrey Bzikadze

Nuno BandeiraMassimo FranceschettiSiavash MirarabRamesh Rao

SPSUCenter for Algorithmic Biotechnology

U of LouisvilleCorey Watson

USDASung Bong ShinTim Smith

Iowa State UJames Reecy

Genentech Jennie LillWendy Sandoval

Digital ProteomicsStefano BonissoneNatalie Castellana

Pavel Pevzner, UCSD

Data science postdoctoral fellowships at UCSDhttp://qi.ucsd.edu/dsfellows/