Yana Safonova - Главнаяbioinformaticsinstitute.ru/sites/default/files/bi... · Safonova and...
Transcript of Yana Safonova - Главнаяbioinformaticsinstitute.ru/sites/default/files/bi... · Safonova and...
Bioinformatics in immunology
Yana SafonovaPostdoctoral researcher, UCSD
● Immunology 101
● Antibodies and drug design
● Bioinformatics and data science behind the drug
design process
● Therapeutics opportunities of vertebrate species
● Human antibodies: did we reveal all their secrets?
● Future development of immunoinformatics
● Immunology 101
● Antibodies and drug design
● Bioinformatics and data science behind the drug design
process
● Therapeutics opportunities of vertebrate species
● Human antibodies: did we reveal all their secrets?
● Future development of immunoinformatics
Adaptive immune system
● Variety of threats to human body is huge and unpredictable
● Genome is too small to encode defences against all these threats
● Immune system has an ability to adapt to various threats using agents (e.g., antibodies) that are not encoded in the genome.
● Antibodies are proteins that bind to a specific treat (called antigen) and cause its neutralization
● Immune system generates millions of different antibodies (repertoire) to neutralize various antigens
Specificity rule: one antibody – one antigen
(not necessarily true)Antibody
Antigen
Antibodies
Antibody-antigen binding
Fundamentals of Immunology, 1943
Before recombination, the genome of an antibody-producing cell (B cell) looks exactly like genomes of all other cells:
Immunoglobulin locus (Chr 14), length ~1.25 Mb
V165-305 ntavg. 291 nt
D11-37 nt
avg. 24 nt
J48-63 nt
avg. 54 nt
Generation of antibodies
Selection of J segment...
Left cleavage of J segment...
Selection of D segment...
Right cleavage of D segment...
Newly created unique genomic region
Concatenation of D and J segments...
Left cleavage of DJ fragment...
Selection of V segment...
Right cleavage of V segment...
360 nt of VDJ + 1000 nt of constant region instead of original 1.25 Mb
VDJ concatenation (variable region of antibody)
Constant region
Variable region of antibodies contains antigen binding sites
Constant region
360 nt of VDJ + 1000 nt of constant region instead of original 1.25 Mb
Antibodies are subjects of fast evolution
Immune system mutates and amplifies a binding antibody
Mutation rate in antibody genes is 3-4 order of magnitude higher than in other genome
One antibody = one antigen
Antibody repertoire is a set of clonal lineages
Antibody repertoire is a set of unknown clonal lineages
● Immunology 101
● Antibodies and drug design
● Bioinformatics and data science behind the drug design
process
● Therapeutics opportunities of vertebrate species
● Human antibodies: did we reveal all their secrets?
● Future development of immunoinformatics
Antibody are versatile drugs
Sela-Culang et al., Frontiers Immunol, 2013
Antibody based drugs
Santos et al., Braz J Pharm Sci, 2018
Market of monoclonal antibodies
https://www.bptc.com/past-present-future-antibody-products-market-place/
● Immunology 101
● Antibodies and drug design
● Bioinformatics and data science behind the drug
design process
● Therapeutics opportunities of vertebrate species
● Human antibodies: did we reveal all their secrets?
● Future development of immunoinformatics
Repertoire sequencing (Rep-Seq) data
JDV V J
Illumina or PacBio read
antibody gene (~ 400 nt)
Problem 1: constructing antibody repertoire
JDV V J
antibody gene (~ 400 nt)
Illumina or PacBio read
X 2X 3X 1X 1
Antibody repertoire Rep-Seq library Antibody repertoire
A big clustering problem...
X 2X 3X 1X 1
Antibody repertoire Rep-Seq library Antibody repertoire
Antibody repertoire construction problem: identify reads corresponding to identical antibodies
JDV V J
antibody gene (~ 400 nt)
Illumina or PacBio read
Antibody repertoire construction problem: identify reads corresponding to identical antibodies
JDV V J
antibody gene (~ 400 nt)
Illumina or PacBio read
...or a big error correction problem?
ACGTGATCGAG
ACGTGCTCGAG
? Sequencing error or natural variation?
Problem 2: reconstructing evolutionary development of antibody repertoire
Problem 2.1: clonal lineage decomposition
1. Decomposing repertoire into clonal lineages
Problem 2.2: evolutionary tree construction
1. Decomposing repertoire into clonal lineages
2. Constructing evolutionary tree for each lineage
Standard phylogenetic algorithms are not applicable to antibody repertoires
Reverse of Homoplasy Homoplasy
Capturing and sequencing antibody repertoire
Finding specific antibodiesFinding antibody drug
Vaccinating a model organism
Rep-Seq based approach to drug development
V D J
CDR1 CDR2 CDR3
CDRs represent antigen-binding sites
CDRs likely represent antigen-binding sites
Sela-Culang et al., Frontiers Immunol, 2013
CDR grafting
Santos et al., Braz J Pharm Sci, 2018
Sela-Culang et al., Frontiers Immunol, 2013
Long CDR3s in response to HIV and malaria
Pieper et al., Nature, 2017
Additional domain is found in public malaria-specific antibodies
Broadly neutralizing antibodies against HIV are characterized by extremely long CDR3s
Sok et al., Nature, 2017
● Immunology 101
● Antibodies and drug design
● Bioinformatics and data science behind the drug design
process
● Therapeutics opportunities of vertebrate species
● Human antibodies: did we reveal all their secrets?
● Future development of immunoinformatics
Cattle antibodies with ultra-long CDR3s
Wang et al., Cell, 2013
Cattle IGH locus
Stanfield et al., Adv Immunol, 2018
IGHD8-2
ORF = 3GTAGTTGTCCTGATGGTTATAGTTATGGTTATGGTTGTGGTTATGGTTATGGTTGTAGTGGTTATGATTGTTATGGTTATGGTGGTTATGGTGGTTATGGTGGTTATGGTTATAGTAGTTATAGTTATAGTTATACTTACGAATATAC
ORF = 2GTAGTTGTCCTGATGGTTATAGTTATGGTTATGGTTGTGGTTATGGTTATGGTTGTAGTGGTTATGATTGTTATGGTTATGGTGGTTATGGTGGTTATGGTGGTTATGGTTATAGTAGTTATAGTTATAGTTATACTTACGAATATAC
ORF = 1GTAGTTGTCCTGATGGTTATAGTTATGGTTATGGTTGTGGTTATGGTTATGGTTGTAGTGGTTATGATTGTTATGGTTATGGTGGTTATGGTGGTTATGGTGGTTATGGTTATAGTAGTTATAGTTATAGTTATACTTACGAATATAC
CysTGTTGCAGC
CGCGGC
TACTCCTTC
TGATGG
AGTCGTGGT
TATTCTTTT
Cys-induced diversity of cattle antibodies
Wang et al., Cell, 2013
Cattle antibodies with known 3D structure
Stanfield et al., Sci Immunol, 2016
Recognition of HIV
Verkoczy, Adv Immunol, 2017
Alter & Ackerman, Cell, 2014
Conventional antibodies consist of two chains
https://www.10xgenomics.com/solutions/vdj/ https://www.the-scientist.com/modus-operandi/gene-expression-in-a-drop-35068
Camelid antibodies
https://www.abcore.com/blog/llama-antibodies-small-powerful
VH locus VHH locus JD
Antibody repertoires from diverse species
Rios et al., Curr Opin Struct Biol, 2015
What species would be next?
Amemiya et al., Nature, 2013
● Immunology 101
● Antibodies and drug design
● Bioinformatics and data science behind the drug design
process
● Therapeutics opportunities of vertebrate species
● Human antibodies: did we reveal all their secrets?
● Future development of immunoinformatics
Tandem CDR3s
Tandem CDR3 represents a result of somatic VDDJ recombination
Some tandem CDR3s can be explained by mismatches and additions in a single D gene
Safonova and Pevzner, under review, 2018
Tandem CDR3s
Tandem CDR3 represents a result of somatic VDDJ recombination
Some tandem CDR3s can be explained by mismatches and additions in a single D gene
Safonova and Pevzner, under review, 2018
Tandem CDR3s reveal duplications
14 datasets
● Known duplication of D5
Safonova and Pevzner, under review, 2018
Tandem CDR3s reveal duplications
14 datasets
● Known duplication of D5● Potential novel duplication of D10
and D20
Safonova and Pevzner, under review, 2018
Tandem CDR3s reveal duplications
14 datasets
● Known duplication of D5● Potential novel duplication of D10
and D20○ Might be an indirect evidence
of high usage of D10 and D20
10 – 17%6 – 18%
Mechanisms of tandem CDR3 formation are poorly understood
12-23 rule:V D J
7 + 23 + 9 9 + 12 + 7 7 + 12 + 9 9 + 23 + 7
possible possibleforbidden
Tandem CDR3s are results of violations of the 12 / 23 rule
Another mechanism of VDJ recombination
D9D10Insertion 1 Insertion 2
Insertion 2 corresponds to sequence between D9 and D10 in the reference IGH locus and thus contains the right RSS of D9 and the left RSS of D10
Safonova and Pevzner, under review, 2018
Immunoglobulin loci have complicated repetitive structure
Alu MIRLINE1LINE2Retrotransposonsretroviral and other LTRs DNA transposonsmedium frequency repetitive sequencessimple repeats
Matsuda et al., J Exp Med, 1998
Chromosome recombination is a source of genetic diversity
V1 V2 V3 V4
V1 V2 V3 V4
Chromosome recombination is a source of genetic diversity
V1 V2 V3 V4
V1 V2 V3 V4
V1 V2 V3 V4
V1 V2 V3 V4
Chromosome recombination is responsible for changing CNVs of existing genes...
V1 V2 V3 V4
V1 V2 V3 V4
V1 V3 V4
V1 V2 V2 V3
Alu
Alu
Alu
Alu V4
Alu
Alu
Alu Alu
… and creating novel genes
V1 V2 V3 V4
V1 V2 V3 V4
V1 V4
V1 V2 V3
V3V2
V2V3 V4
Chromosome recombination may result in changing IGH structure
V1 V2 D1
V1 V2
V1
Alu
Alu
Alu
D2
Alu D1 D2
Alu
D2
V1 V2 AluAlu D1 V2 D1 D2Alu
… and possible VDJ recombinations
V1 V2 AluAlu D1 V2 D1 D2Alu
… and possible VDJ recombinations
V1 V2 AluAlu D1 V3 D2 D3Alu
… and possible VDJ recombinations
V1 V2 AluAlu D1 V3 D2 D3Alu
VDJ recombination is no longer able to recombine V3 and D1
Why is it important: an flu example● IGHV1-69 is responsible for formation of bnAbs to the influenza
A hemagglutinin● 14 alleles of IGHV1-69 can be differentiated by the presence of
either a phenylalanine (F) or leucine (L) at amino acid position 54
● Replacement of Phe54 by Leu54 has been shown to dramatically reduce binding affinities
Avnir et al, Scientific Reports, 2016
Kidd, et al., 2012 reported 18 unique IGHV haplotypes in 9 individuals (unknown ethnicities, SF Bay Area)
Population-based paradigm of vaccination
Watson et al, Trends in Immunology, 2017
● Immunology 101
● Antibodies and drug design
● Bioinformatics and data science behind the drug design
process
● Therapeutics opportunities of vertebrate species
● Human antibodies: did we reveal all their secrets?
● Future development of immunoinformatics
Future directions
● Accumulation of Rep-seq data
● Personalized computation of individual immunoglobulin
genes and prediction of disease associations
● Creation of databases of antibodies with known specificity
● Development of computational approaches for antibody
folding and binding
● Development of immunoproteogenomics technologies
Acknowledges
Data Science postdoctoral fellowships, Center for Information Theory and Applications,
University of California San Diego
UCSDVinnu BhardwajAndrey Bzikadze
Nuno BandeiraMassimo FranceschettiSiavash MirarabRamesh Rao
SPSUCenter for Algorithmic Biotechnology
U of LouisvilleCorey Watson
USDASung Bong ShinTim Smith
Iowa State UJames Reecy
Genentech Jennie LillWendy Sandoval
Digital ProteomicsStefano BonissoneNatalie Castellana
Pavel Pevzner, UCSD
Data science postdoctoral fellowships at UCSDhttp://qi.ucsd.edu/dsfellows/