Tools for Interpretation Planning II: Finding & Learning From Your Stakeholders & Program Partners.
IMGT tools for Interpretation of IG and TR sequences and NGS … › wp-content › uploads › 2017...
Transcript of IMGT tools for Interpretation of IG and TR sequences and NGS … › wp-content › uploads › 2017...
IMGT tools for Interpretation of IG and TR sequences and NGS repertoires:IMGT/HighV-QUEST and IMGT/StatClonotype
Véronique Giudicelli
IMGT®, the international ImMunoGeneTics information system®Université de Montpellier, CNRS IGH, Montpellier, France
6th IgCLL Educational Workshop Uppsala, Sweden 22-23 September 2016
http://www.imgt.org
http://www.imgt.org
• 7 databases• 17 online tools• 15,000 web pages
• Sequences• Genes• Structures
• Immunoglobulins (IG)(or antibodies)
• T cell receptors (TR)• MH• IgSF and MhSF
IMGT®: the international ImMunoGeneTicsinformation system® (http://www.imgt.org)
Created in 1989IMGT Directors: Pr. Marie-Paule Lefranc (founder)Pr. Sofia Kossida
IMGT-ONTOLOGY seven axioms:
To share, reuse and represent knowledgein Immunogenetics and Life Sciences
CLASSIFICATION
NUMEROTATION
DESCRIPTION
ORIENTATION
LOCALIZATION
Giudicelli and Lefranc, Bioinformatics (1999), Giudicelli and Lefranc, Front. Genet. (2012)
IDENTIFICATION OBTENTION
http://www.imgt.org
Concepts of IDENTIFICATION : IMGT standardized keywords
http://www.imgt.org
IG-Heavy IG-Light-Lambda/IG-Light-Kappa
Concepts of CLASSIFICATION
1. Immunoglobulin (IG) and T cell receptor (TR) genes
2. Nomenclature for the V, D, J and C genes
3. Group, subgroup, gene, allele
4. IMGT gene names approved by HUGO Gene Nomenclature Committee (HGNC) in 1999, byNCBI in 2000
5. IMGT alleles validated by IUIS/IMGT‐NC
IMGT/GENE‐DB: international reference database for IG and TR genes (direct links from NCBI Gene) and alleles.
http://www.imgt.org
Lefranc et al. Dev. Comp. Immunol. (2003)
CDR-IMGT lengths[8.10.12]
- conserved AA (and codons) always at the same positions:
23 1st-CYS41 CONSERVED-TRP89 hydrophobic
104 2nd-CYS 118 J-PHE, J-TRP
- six anchors: delimitation of theFR-IMGT and CDR-IMGT
CDR-IMGT lengths are crucialinformation
http://www.imgt.org
Concepts of NUMEROTATION1. IMGT unique numbering
2. IMGT Collier de Perles(first one in 1997)
Concepts of DESCRIPTION: IMGT labelshttp://www.imgt.org
Example: Prototype and IMGT Labels for a rearranged V‐D‐J‐GENE in gDNA
IMGT labels are in capital letters
Two IMGT tools for interpretation of IG and TR sequences and NGS repertoires:
• IMGT/HighV-QUEST
• IMGT/StatClonotype
http://www.imgt.org
IMGT/HighV-QUEST• High throughput version of IMGT/V‐QUEST• Analyses the IG and TR rearranged sequences from NGS• On the Web since 2010• Freely available for academics (user registration required)
=> 2 modules
IMGT/HighV-QUEST
1. Sequence analysis
• Analyses up to 500,000 sequences per batch • Deals with sequences from Roche 454, Illumina, Ion Torrent,
PacBio
For each sequence of the batch:
• Identifies the germline V, D and J genes and alleles,
• Characterizes the nucleotide mutations and amino acid changes
• Analyses the V‐(D)‐J junction (IMGT/JunctionAnalysis)
• Full annotation of the V‐DOMAIN
Alamyar et al. Immunome Res. (2012), Giudicelli et al. Autoimmun Infec Dis. (2015)
1. IMGT/HighV-QUEST sequence analysisIMGT/V‐QUEST results in 11 CSV files
(1 line per analysed sequence, ~500 columns per line)
1_Summary CSV file equivalent to Results summary of online IMGT/V-QUEST
1. IMGT/HighV-QUEST sequence analysis
CSV files 2 to 5 for the description of the V-DOMAIN withIMGT labels
1. IMGT/HighV-QUEST sequence analysis
Sequence ID FR1-IMGT CDR1-IMGT FR2-IMGTseq1 cagatccagctggtgcagtctggggga...ggcctggtcaagccgggggggtccctgagactctcctgtgcagcctct ggattcagtttc............agcagctatacc gtgaactgggtccgccaggctseq2 caagtgcagctgttggagtctggggga...ggtgtggtacggcctggggggtccctgagactctcctgtgcagcctct ggattcatcttt............gatgattttggc atgacgtgggtccgccaagttseq3 caggtgcagctacaggagtggggccca...ggtctggcgaggccttcggggaccctgtctctcacctgcagtgtctct ggtggctccattagt......ggaaccagtcactac tggggctggatccgccagcccseq4 caggtgcagctggtgcagtctggggga...ggcgtggtccagcctgggaggtccctgagactctcctgtgaagtctct ggaatcaccttc............aagggctatcct atgcactgggtccgccaggctseq5 gaggtgaagctgatggaatctggggga...ggcgtggtccagcctgggaggtccctgaggctctcctgtgcagcctct ggattcagattc............agcacttatgct atccactgggtccgccaggctseq6 caggtgcagctggtggagtctggggga...ggcgtggtccagcctgggaggtccctgagactctcctgtatagcctct ggattcaccttc............agtagctatcct atgacctgggtccgccaggctseq7 caggtgcagctggtgcagtctggggca...gaggtgaaaaagcccggggagtctctgaaaatctcctgtaagggttct gggtacagcttt............gccaaccactgg atcgcctgggtgcgccagatgseq8 .......................ccca...gggctggtgaggccttcacagaccctgtccctcacatgcactgtctct ggtggcaccatcaaa......agtggtggttattgc tggacctggatccgccagctcseq9 .......................cccg...gggctggtgaggccttcacagaccctgtccctcacatgcactgtctct ggtggcaccatcaaa......agtggtggttattgc tggacctggatccgccagcacseq10 .......................ccca...gggctggtgaggccttcacagaccctgtcgctcacatgcactgtctct ggtggcaccatcaaa......agtggtggttattgc tggacctggatccgccagctcseq11 .......................ggga...ggcttggtacagccgggggggtccctgagactctcgtgtacaggctct ggattcattttt............agcagctttgcc atgagttgggtccgccaggctseq12 .......................agca...gaggtgaaaaagcccggggagtctctgcagatctcctgcaagacttct ggatacattttt............caaagttattgg atcacctgggtgcgccagacgseq13 .......................ggga...ggcttggtacagcctggggggtccctgagactctcctgtgcagcctct ggattcaccttt............agttactatggc atgacctgggtccgccaggctseq14 gaggtggaattggtggagtctggggga...ggcttggcacagccgggggggtccctgagactctcctgtgaagcctct ggattcagattg............atcaactatgcc gttaactgggtccgccaggctseq15 gaggtgcgcttggaggagtctggggga...gacttcgtacagcctggagggtccctgcgactctcctgtgcagtctct gggttcgccttc............agtcgatatgaa ataagttgggtccgccaggccseq16 gtcgtggagattgtggagtctggggga...ggcgtggtgcaacctgggacgtccctgagactctcctgttcagcgtcg ggattcaccttc............agaaattctgcc atgtattgggtccgccaggctseq17 caggggcagctggtgcagtcgggggga...ggcgtggtccagcctgggaggtccctgagactctcctgtgaagcgtct gggttttccttc............aagttctttaac atgcactgggtccgccaggctseq18 cgacagcagttggtggagtctggggga...aatgtggtccagcccgggacgtccctgagactctcctgcgtggcctca ggtctcgacttc............agaaaatatggc ttgcattggctccgccagactseq19 caggtccgattacaggagtcgggccca...gggctcgtgaagccctcacaaaccctgtccctcacctgcagtgtctcc ggtgaccccctctat......gatagtcatcactac tgggcctggatccgccagcag
CSV file 2 IMGT-gapped-nt-sequences
1. IMGT/HighV-QUEST sequence analysis
FR1-IMGT CDR1-IMGT FR2-IMGT
CSV file 4_IMGT-gapped-AA-sequences
1. IMGT/HighV-QUEST sequence analysis
Sequence ID FR1-IMGT CDR1-IMGT FR2-IMGT CDR2-IMGT FR3-IMGTseq2-1 QIQLVQSGG.GLVKPGGSLRLSCAAS GFSF....SSYT VNWVRQAPGKGLEWVSS ISSR..STSI HYADSVK.GRFTISRDNAKNSLYLQMNSLRAEDTAVYFCseq2-2 QVQLLESGG.GVVRPGGSLRLSCAAS GFIF....DDFG MTWVRQVPGKGLEWVSG INWN..GGRT GYADSVK.GRFTISRDDAKNSLYLQMNSLRAEDTALYYCseq2-3 QVQLQEWGP.GLARPSGTLSLTCSVS GGSIS..GTSHY WGWIRQPPGKGLEWIGS IYFS...GAT HYNPSLK.SRVTINVDTSNNQFSLNLRSMTAADTAVYYCseq2-4 QVQLVQSGG.GVVQPGRSLRLSCEVS GITF....KGYP MHWVRQAPGKGLEWVAV ISND..GRNE DYADSVK.GRFTISRDNSNNTLYLQMNSLTAEDTALYFCseq2-5 EVKLMESGG.GVVQPGRSLRLSCAAS GFRF....STYA IHWVRQAPGKGLEWVAR ISHD..GSQT HYADSVQ.GRFGVSRDNSNYTAYVQLNSLRPDDTAVYFCseq2-6 QVQLVESGG.GVVQPGRSLRLSCIAS GFTF....SSYP MTWVRQAPGKGLEWVAS ISYD..GSYK YKVDSMK.GRLTISRDNSKNTLYLEMNSLTAEDTAVYYCseq2-7 QVQLVQSGA.EVKKPGESLKISCKGS GYSF....ANHW IAWVRQMPGKGLEWMGI FNPD..NSDT TYSPSFQ.GQVTFSADKSISIAYLHWSSLKASDTAIYYCseq2-8 ........G.GLVQPGGSLRLSCTGS GFIF....SSFA MSWVRQAPGKGLEWVSG ISAS..GGST LSADLMK.GRFTISRDNSKNTVYLQMDSLRAEDTAVYYCseq2-9 ........A.EVKKPGESLQISCKTS GYIF....QSYW ITWVRQTPGKGLEWMGI IFPG..DSET RYSPSFE.GQVSISVDESIDTAYLQWRSLEASDTAIYFCseq2-10 ........G.GLVQPGGSLRLSCAAS GFTF....SYYG MTWVRQAPGKGLEWVSH IGAS..GVTT YNADSVK.GRFTISRENSKNTLYLEMNSLRVEDTAIYYCseq2-11 EVELVESGG.GLAQPGGSLRLSCEAS GFRL....INYA VNWVRQAPGKGLEWISA ISGS..GGNT HYADSVR.GRFTISRDLSKNMVFVQMGSLRAEDTAVYFCseq2-12 EVRLEESGG.DFVQPGGSLRLSCAVS GFAF....SRYE ISWVRQAPGKGPEWISY MTSD..DYTI YYADSVK.GRFSMSRDAATRSVFLQMDSLRVDDTAVYYCseq2-13 VVEIVESGG.GVVQPGTSLRLSCSAS GFTF....RNSA MYWVRQAPGKGLEWVGL IWND..GSHK YYGDSVR.GRFTISRDNSRNMFYLQMNSLKVEDTATYYCseq2-14 QGQLVQSGG.GVVQPGRSLRLSCEAS GFSF....KFFN MHWVRQAPGKGLEWVAV ISFD..GTKK YYADSVK.GRFTVSRDNSRNTLDLLMDGLRPEDTAVYSCseq2-15 RQQLVESGG.NVVQPGTSLRLSCVAS GLDF....RKYG LHWLRQTPGRGLEWVAV IWHD..GSNS FYADSVR.GRFNISRDNSKNALFLTMNNLQAEDTAIYYCseq2-16 QVRLQESGP.GLVKPSQTLSLTCSVS GDPLY..DSHHY WAWIRQQPGKGLEWIGH INSY...AYK FYNQSLE.SRLSMSMDTSRNQFSLKMTSVTDVDTAVYFC
1. IMGT/HighV-QUEST sequence analysis
Sequence ID FR1-IMGT CDR1-IMGT FR2-IMGT CDR2-IMGT FR3-IMGTseq2-1 QIQLVQSGG.GLVKPGGSLRLSCAAS GFSF....SSYT VNWVRQAPGKGLEWVSS ISSR..STSI HYADSVK.GRFTISRDNAKNSLYLQMNSLRAEDTAVYFCseq2-2 QVQLLESGG.GVVRPGGSLRLSCAAS GFIF....DDFG MTWVRQVPGKGLEWVSG INWN..GGRT GYADSVK.GRFTISRDDAKNSLYLQMNSLRAEDTALYYCseq2-3 QVQLQEWGP.GLARPSGTLSLTCSVS GGSIS..GTSHY WGWIRQPPGKGLEWIGS IYFS...GAT HYNPSLK.SRVTINVDTSNNQFSLNLRSMTAADTAVYYCseq2-4 QVQLVQSGG.GVVQPGRSLRLSCEVS GITF....KGYP MHWVRQAPGKGLEWVAV ISND..GRNE DYADSVK.GRFTISRDNSNNTLYLQMNSLTAEDTALYFCseq2-5 EVKLMESGG.GVVQPGRSLRLSCAAS GFRF....STYA IHWVRQAPGKGLEWVAR ISHD..GSQT HYADSVQ.GRFGVSRDNSNYTAYVQLNSLRPDDTAVYFCseq2-6 QVQLVESGG.GVVQPGRSLRLSCIAS GFTF....SSYP MTWVRQAPGKGLEWVAS ISYD..GSYK YKVDSMK.GRLTISRDNSKNTLYLEMNSLTAEDTAVYYCseq2-7 QVQLVQSGA.EVKKPGESLKISCKGS GYSF....ANHW IAWVRQMPGKGLEWMGI FNPD..NSDT TYSPSFQ.GQVTFSADKSISIAYLHWSSLKASDTAIYYCseq2-8 ........G.GLVQPGGSLRLSCTGS GFIF....SSFA MSWVRQAPGKGLEWVSG ISAS..GGST LSADLMK.GRFTISRDNSKNTVYLQMDSLRAEDTAVYYCseq2-9 ........A.EVKKPGESLQISCKTS GYIF....QSYW ITWVRQTPGKGLEWMGI IFPG..DSET RYSPSFE.GQVSISVDESIDTAYLQWRSLEASDTAIYFCseq2-10 ........G.GLVQPGGSLRLSCAAS GFTF....SYYG MTWVRQAPGKGLEWVSH IGAS..GVTT YNADSVK.GRFTISRENSKNTLYLEMNSLRVEDTAIYYCseq2-11 EVELVESGG.GLAQPGGSLRLSCEAS GFRL....INYA VNWVRQAPGKGLEWISA ISGS..GGNT HYADSVR.GRFTISRDLSKNMVFVQMGSLRAEDTAVYFCseq2-12 EVRLEESGG.DFVQPGGSLRLSCAVS GFAF....SRYE ISWVRQAPGKGPEWISY MTSD..DYTI YYADSVK.GRFSMSRDAATRSVFLQMDSLRVDDTAVYYCseq2-13 VVEIVESGG.GVVQPGTSLRLSCSAS GFTF....RNSA MYWVRQAPGKGLEWVGL IWND..GSHK YYGDSVR.GRFTISRDNSRNMFYLQMNSLKVEDTATYYCseq2-14 QGQLVQSGG.GVVQPGRSLRLSCEAS GFSF....KFFN MHWVRQAPGKGLEWVAV ISFD..GTKK YYADSVK.GRFTVSRDNSRNTLDLLMDGLRPEDTAVYSCseq2-15 RQQLVESGG.NVVQPGTSLRLSCVAS GLDF....RKYG LHWLRQTPGRGLEWVAV IWHD..GSNS FYADSVR.GRFNISRDNSKNALFLTMNNLQAEDTAIYYCseq2-16 QVRLQESGP.GLVKPSQTLSLTCSVS GDPLY..DSHHY WAWIRQQPGKGLEWIGH INSY...AYK FYNQSLE.SRLSMSMDTSRNQFSLKMTSVTDVDTAVYFC
C23 W41 C104
CSV file 4_IMGT-gapped-AA-sequences
6_Junction: CSV file for the detailed analysis of the junctionprovided by IMGT/JunctionAnalysis
1. IMGT/HighV-QUEST sequence analysis
CSV files 7 to 10 for the description of the nt mutation, amino acid changes and localisation of the hotspots
1. IMGT/HighV-QUEST sequence analysis
Example: CSV File 7: V-REGION-mutation-and-AA-change-table
1. IMGT/HighV-QUEST sequence analysis
Example: CSV File 8: V-REGION-nt-mutation-statistics
Example: CSV File 10: V-REGION-mutation-hotspots
1. IMGT/HighV-QUEST sequence analysis
• Up to 1,000,000 sequence results
• Performed on filtered-in sequences (reliable set)
2. IMGT/HighV-QUEST Statistics for sequenceanalysis interpretation
http://www.imgt.org
Li S, et al. Nat. Commun. (2013)
Characterization of IMGT clonotypes for the evaluation of clonal diversity and clonal expression
An IMGT clonotype (AA) is defined by:• a unique V-(D)-J-rearrangement (V and J genes and alleles ) (nt)• a unique CDR3 (AA) • Conserved anchors C104 W/F118
An example of table for TRB IMGT clonotypes (AA)
http://www.imgt.org
2. IMGT/HighV-QUEST Statistics for sequenceanalysis interpretation
IMGT clonotype (AA) diversity IMGT clonotype (AA) expression
Evaluation of the clonotype diversity and expression per gene
Diversity: nb ofIMGT clonotypes per gene
Expression: nb ofsequences per gene
Identification of IMGT clonotypes (AA) with same CDR3-IMGT (AA)In different sets
Normalized bar graph for IMGT clonotype (AA) diversityfor 2 sets to be compared
http://www.imgt.org
Set 1 (IgD+ memory cells)Set 2 (IgD- memory cells)
Are the differences of IMGT clonotype (AA) diversity between set 1 and 2 significant?
IMGT/StatClonotype
Pairwise evaluation and visualization of NGS IG IMGT clonotype (AA) diversity or expression from IMGT/HighV-QUEST
http://www.imgt.org
Aouinti et al. PLoS One. (2015 ), Aouinti et al. Frontiers in immunology 2016
Downloadable from the tool section of the IMGT Home page
IMGT/StatClonotype: significance of differencesin proportions
http://www.imgt.org
• Statistic test : z-score (exact Fisher’s test for low or null occurrences)• Adjustment of the p-values with 7 multiple testing procedures
IMGT/StatClonotype: Multiple testing procedures plots for genes
http://www.imgt.org
Scatter plot
IMGT/StatClonotype: Synthesis Graph
http://www.imgt.org
Normalized bar graph of gene proportions and the differences in proportionswith significance and confidence intervals (CI) for set 1 and set2
IMGT/StatClonotype: CDR-IMGT length
http://www.imgt.org
IMGT/StatClonotype: CDR-IMGT AA Properties
http://www.imgt.org
Displays:. 20 amino acids. Physicochemical. Hydropathy. Volume. Chemical. Charge. Hydrogen donor or acceptor atoms. Polarity”
Bar graph for CDR3 positions (CDR3 length=13) in set 1
Variability plot
Variablity indexes :. Shannon entropy. Wu-Kabat variability. Simpson index
IMGT/StatClonotype: V-D-J gene association Interactive Heatmap with or without clustering
IGHV genes
IGHJ genes
http://www.imgt.org
IMGT/StatClonotype: V-D-J gene association Interactive Heatmap with or without clustering
IGHV genes
IGHJ genes
http://www.imgt.org
Row IGHV4-34Column IGHJ4Value 1413
IMGT/HighV-QUEST and IMGT/StatClonotype
• For extensive studies of the IG and TR repertoires:‐ V, D, J gene and allele usage‐ CDR3 analysis (length distribution, amino acid composition)‐ Clonal diversity and clonal expression
IMGT clonotype diversity and repertoire immunoprofiling allowcomparison of B or T cell populations and immune situations
• to follow the evolution of the repertoires aftervaccination/immunization
• to compare repertoires in normal and pathological situations • to study combinatorial libraries (antibody discovery)
IMGT/HighV-QUEST was granted access to the HPC resources of
Centre Informatique National de l’Enseignement Supérieur (CINES)
2016-036029 GENCI (Grand Equipement National de Calcul
Intensif).
http://www.imgt.org
and the companies that support the IMGT efforts of standardization.
Acknowledgements
JABADO-MICHALOUDJoumana
DUROUX Patrice
FOLCH Géraldine
LAVOIE Arthur
LEFRANC Gérard
http://www.imgt.org
Thanks to the IMGT® team
AOUINTI Safa
CAMBON Melissa
PAYSAN-LAFOSSETyphaine
LEFRANCMarie-Paule
BENTO Pascal
PERALTA Marine
KOSSIDA Sofia
CHENTLI Iméne
KALYANKarthik