MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

30
Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment Presenter: Syed Ahmad Chan Bukhari, PhD Department of Pathology, Yale School of Medicine

Transcript of MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

Page 1: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

Presenter: Syed Ahmad Chan Bukhari, PhD

Department of Pathology, Yale School of Medicine

Page 2: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

Inability to reproduce scientific experiments is a big challenge.

Lithgow, G. J., Driscoll, M., & Phillips, P. (2017). A long journey to reproducible results. Nature News, 548(7668), 387.

● A drug-like molecule could extend an roundworm lifespan by as much as 67%.

● Other labs failed to replicate the studies.

● Two cancer labs spent more than a year trying to understand inconsistencies with same tumour biopsy.

● Because of lack of standards, both labs were using different cell isolation protocols.

Page 3: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

Inability to reproduce scientific experiments is a big challenge.

Begley, C. G., & Ellis, L. M. (2012). Drug development: Raise standards for preclinical cancer research. Nature, 483(7391), 531-533.

● Amgen could reproduce the findings in only 6 of 53 “landmark” papers in cancer biology

● Bayer could validate only 25% of 67 preclinical studies

Page 4: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

Inability to reproduce scientific experiments can have multiple reasons behind.● Undocumented scientific procedures

● Datasets size and variability

● Problem with statistical techniques

● A documented but a difficult procedure to follow

Page 5: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

Standardization is a proven way to make sense to scientific procedures and outcomes.

How and what array platform used?

Page 6: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

Experiments in Immunology facing the similar reproducibility challenges ● High-throughput sequencing (HTS) of B-cell (antibody, immunoglobulin) and T-cell

receptor repertoires has increased dramatically since the technique was introduced in 2009.

○ Previously relied on low-resolution approaches, such as flow cytometry, spectratyping and

Sanger sequencing

● B cell receptors (BCRs) and T cell receptors (TCRs) serve as the primary means for specific detection of foreign antigens.

Page 7: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

Adaptive Immune Receptor Repertoire (AIRR) Sequencing

● Collection of BCRs or TCRs in an individual, tissue, cell subset or during an immune response is referred to as the repertoire.

● AIRR-seq studies are associated with complex metadata, such as donor phenotypes, cell types and nucleic acid material used.○ Crucial for ensuring reproducibility and facilitating secondary and meta analyses

● AIRR sequencing has enormous promise for understanding the dynamics of the immune repertoire in vaccinology, infectious disease, autoimmunity, and cancer biology.

Page 8: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

Adaptive Immune Receptor Repertoire (AIRR) Sequencing (Popularity)

Page 9: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

Adaptive Immune-Receptor Repertoire (AIRR) CommunityNext-generation sequencing of B & T cell receptor repertoires (AIRR-seq)

Developing standard protocols for reporting and sharing AIRR-seq data to optimize their use in biomedical research and patient care

AIRR Community Formed

Page 10: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

AIRR Community Data ElementsEach of the 6 high-level principles has been expanded into a set of data elements

Page 11: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

“Accurate specification of the pathophysiological

condition is important for cross-comparison of

multiple studies”

● This set describes the experimental study design including the title of the

study, laboratory contact information etc

● For individual subjects, the species, sex, age, and ancestry are included

along with information about disease state(s) etc

● This set describes the metadata about the diagnosis process

Page 12: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

“Information about the origin and expected

composition of the biological sample(s) is central

for the interpretation of downstream sequencing

results.”

Page 13: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

“Proper interpretation of experimental results for

future comparative analysis require information”

● How cells are prepared for processing?

● how the sequencing is performed?

● Quality of the data produced are all critically important too.

“MiAIRR focuses on what information need to be

shared rather suggesting the analysis techniques

and tools”

Page 14: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

Providing raw data enables the most

up-to-date data processing to be performed,

as the analysis tools for AIRR-seq data are

undergoing rapid evolution

● Providing the raw NGS data for each sequencing run (e.g., FASTQ files) permits the

reanalysis, secondary analysis and combination of multiple data sets from different

studies using meta-analysis techniques.

Page 15: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

● Variety of tools are in use sequencing and processing. MiAIRR does not

provide tool specific details.

● MiAIRR defines broad categories that cover the essential data processing

steps.

● The software tools with version numbers, quality

thresholds, primer match and length cutoffs, etc.

Page 16: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

● This final MiAIRR set will thus comprise the list of processed

sequences, along with sequence-level annotations.

● This should include the V(D)J gene segment and constant region (isotype)

annotation if used in the associated publication, along with the CDR3

sequence.

Page 17: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

MiAIRR Elements Distribution to the NCBI

Page 18: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

How MiAIRR elements look like?

https://github.com/airr-community/airr-standards/blob/master/AIRR_Minimal_Standard_Data_Elements.tsv

Page 19: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

https://github.com/airr-community/airr-standards

BioSample

Sequence Read Archive

Page 20: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

CAIRR: A pipeline to submit AIRR data to the NCBI through the CEDAR-workbench

Page 21: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

NCBI is an important resource to archive biomedical data ● NCBI hosts a collection of biomedical databases:

○ BioProject, BioSample, SRA, GenBank, GEO etc.

● Provide infrastructure to submit experimental data and associated metadata

● Minimal use of standard terminologies to define the necessary metadata○ Ontologies recommended for some data elements (Not implemented)

● NCBI metadata are often described using inconsistent terminologies○ Limit our ability to access, find, interoperate and reuse the data sets

Goal: Leverage CEDAR to improve NCBI metadata submissions

NCBI BioSample guideline suggests to use Disease Ontology terms

Page 22: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

What are the issues with the current NCBI submission process?

● Rapid growth● Lack of metadata standardization● Error prone data entry● Lack of community-specific metadata

(e.g., AIRR)● Laborious metadata entry

NC

BI G

rowth

GenB

ank Grow

th

Metadata Diversity in NCBI repositories

Page 23: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

How are metadata currently submitted to NCBI?

BioProject

BioSample

Sequence Read Archive

Combination of web-based forms and excel templates

● No mechanism to enforce standardized vocabularies or ontology links

Page 24: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

CAIRR Workflow

Page 25: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

CAIRR Templates

Created CEDAR templates to submit metadata to: NCBI BioProject, BioSample and SRA

Page 26: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

AIRR Data Submission

Page 27: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

CAIRR Metadata Generation

Data Submitter

NCBI CAIRR

Controlled Vocabularies

Predictive Entry

Interactive Metadata Entry

Metadata Findability

Metadata Accessibility

Metadata Interoperability

Metadata Reusability

represents limited features availability

Metadata submissions to NCBI BioProject, BioSample and SRA are ontologically controlled and relationally linked, which enables concept-based federated queries across repositories that are silos otherwise.

Why CAIRR?

Page 28: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

Resources● Download AIRR NCBI templates:

https://github.com/airr-community/airr-standards● How to submit AIRR data to NCBI Manual?

https://www.overleaf.com/read/tytddwptgkhb

Page 29: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

Breden et. al. “Reproducibility and Reuse of Adaptive Immune Receptor Repertoire Data” (2017)Rubelt, F., Busse, C., Bukhari, SAC et. al. “Adaptive Immune

Receptor Repertoire (AIRR) Community Recommendations for Sharing Immune Repertoire Sequencing Data” (2017)

Page 30: MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment

Kei-Hoi Cheung, Yale University, Dept. of Medical Informatics● AIRR Community

Kleinstein Lab, Yale University, Dept. of Pathology