MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment
-
Upload
syed-ahmad-chan-bukhari-phd -
Category
Data & Analytics
-
view
102 -
download
0
Transcript of MiAIRR:Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment
Minimum information about an Adaptive Immune Receptor Repertoire Sequencing Experiment
Presenter: Syed Ahmad Chan Bukhari, PhD
Department of Pathology, Yale School of Medicine
Inability to reproduce scientific experiments is a big challenge.
Lithgow, G. J., Driscoll, M., & Phillips, P. (2017). A long journey to reproducible results. Nature News, 548(7668), 387.
● A drug-like molecule could extend an roundworm lifespan by as much as 67%.
● Other labs failed to replicate the studies.
● Two cancer labs spent more than a year trying to understand inconsistencies with same tumour biopsy.
● Because of lack of standards, both labs were using different cell isolation protocols.
Inability to reproduce scientific experiments is a big challenge.
Begley, C. G., & Ellis, L. M. (2012). Drug development: Raise standards for preclinical cancer research. Nature, 483(7391), 531-533.
● Amgen could reproduce the findings in only 6 of 53 “landmark” papers in cancer biology
● Bayer could validate only 25% of 67 preclinical studies
Inability to reproduce scientific experiments can have multiple reasons behind.● Undocumented scientific procedures
● Datasets size and variability
● Problem with statistical techniques
● A documented but a difficult procedure to follow
Standardization is a proven way to make sense to scientific procedures and outcomes.
How and what array platform used?
Experiments in Immunology facing the similar reproducibility challenges ● High-throughput sequencing (HTS) of B-cell (antibody, immunoglobulin) and T-cell
receptor repertoires has increased dramatically since the technique was introduced in 2009.
○ Previously relied on low-resolution approaches, such as flow cytometry, spectratyping and
Sanger sequencing
● B cell receptors (BCRs) and T cell receptors (TCRs) serve as the primary means for specific detection of foreign antigens.
Adaptive Immune Receptor Repertoire (AIRR) Sequencing
● Collection of BCRs or TCRs in an individual, tissue, cell subset or during an immune response is referred to as the repertoire.
● AIRR-seq studies are associated with complex metadata, such as donor phenotypes, cell types and nucleic acid material used.○ Crucial for ensuring reproducibility and facilitating secondary and meta analyses
● AIRR sequencing has enormous promise for understanding the dynamics of the immune repertoire in vaccinology, infectious disease, autoimmunity, and cancer biology.
Adaptive Immune Receptor Repertoire (AIRR) Sequencing (Popularity)
Adaptive Immune-Receptor Repertoire (AIRR) CommunityNext-generation sequencing of B & T cell receptor repertoires (AIRR-seq)
Developing standard protocols for reporting and sharing AIRR-seq data to optimize their use in biomedical research and patient care
AIRR Community Formed
AIRR Community Data ElementsEach of the 6 high-level principles has been expanded into a set of data elements
“Accurate specification of the pathophysiological
condition is important for cross-comparison of
multiple studies”
● This set describes the experimental study design including the title of the
study, laboratory contact information etc
● For individual subjects, the species, sex, age, and ancestry are included
along with information about disease state(s) etc
● This set describes the metadata about the diagnosis process
“Information about the origin and expected
composition of the biological sample(s) is central
for the interpretation of downstream sequencing
results.”
“Proper interpretation of experimental results for
future comparative analysis require information”
● How cells are prepared for processing?
● how the sequencing is performed?
● Quality of the data produced are all critically important too.
“MiAIRR focuses on what information need to be
shared rather suggesting the analysis techniques
and tools”
Providing raw data enables the most
up-to-date data processing to be performed,
as the analysis tools for AIRR-seq data are
undergoing rapid evolution
● Providing the raw NGS data for each sequencing run (e.g., FASTQ files) permits the
reanalysis, secondary analysis and combination of multiple data sets from different
studies using meta-analysis techniques.
● Variety of tools are in use sequencing and processing. MiAIRR does not
provide tool specific details.
● MiAIRR defines broad categories that cover the essential data processing
steps.
● The software tools with version numbers, quality
thresholds, primer match and length cutoffs, etc.
● This final MiAIRR set will thus comprise the list of processed
sequences, along with sequence-level annotations.
● This should include the V(D)J gene segment and constant region (isotype)
annotation if used in the associated publication, along with the CDR3
sequence.
MiAIRR Elements Distribution to the NCBI
How MiAIRR elements look like?
https://github.com/airr-community/airr-standards/blob/master/AIRR_Minimal_Standard_Data_Elements.tsv
https://github.com/airr-community/airr-standards
BioSample
Sequence Read Archive
CAIRR: A pipeline to submit AIRR data to the NCBI through the CEDAR-workbench
NCBI is an important resource to archive biomedical data ● NCBI hosts a collection of biomedical databases:
○ BioProject, BioSample, SRA, GenBank, GEO etc.
● Provide infrastructure to submit experimental data and associated metadata
● Minimal use of standard terminologies to define the necessary metadata○ Ontologies recommended for some data elements (Not implemented)
● NCBI metadata are often described using inconsistent terminologies○ Limit our ability to access, find, interoperate and reuse the data sets
Goal: Leverage CEDAR to improve NCBI metadata submissions
NCBI BioSample guideline suggests to use Disease Ontology terms
What are the issues with the current NCBI submission process?
● Rapid growth● Lack of metadata standardization● Error prone data entry● Lack of community-specific metadata
(e.g., AIRR)● Laborious metadata entry
NC
BI G
rowth
GenB
ank Grow
th
Metadata Diversity in NCBI repositories
How are metadata currently submitted to NCBI?
BioProject
BioSample
Sequence Read Archive
Combination of web-based forms and excel templates
● No mechanism to enforce standardized vocabularies or ontology links
CAIRR Workflow
CAIRR Templates
Created CEDAR templates to submit metadata to: NCBI BioProject, BioSample and SRA
AIRR Data Submission
CAIRR Metadata Generation
Data Submitter
NCBI CAIRR
Controlled Vocabularies
Predictive Entry
Interactive Metadata Entry
Metadata Findability
Metadata Accessibility
Metadata Interoperability
Metadata Reusability
represents limited features availability
Metadata submissions to NCBI BioProject, BioSample and SRA are ontologically controlled and relationally linked, which enables concept-based federated queries across repositories that are silos otherwise.
Why CAIRR?
Resources● Download AIRR NCBI templates:
https://github.com/airr-community/airr-standards● How to submit AIRR data to NCBI Manual?
https://www.overleaf.com/read/tytddwptgkhb
Breden et. al. “Reproducibility and Reuse of Adaptive Immune Receptor Repertoire Data” (2017)Rubelt, F., Busse, C., Bukhari, SAC et. al. “Adaptive Immune
Receptor Repertoire (AIRR) Community Recommendations for Sharing Immune Repertoire Sequencing Data” (2017)
Kei-Hoi Cheung, Yale University, Dept. of Medical Informatics● AIRR Community
Kleinstein Lab, Yale University, Dept. of Pathology