The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England...

20
Confidential: For Review Only The Genomics England 100,000 Genomes Project Journal: BMJ Manuscript ID BMJ.2016.036140 Article Type: Analysis BMJ Journal: BMJ Date Submitted by the Author: 18-Oct-2016 Complete List of Authors: Turnbull, Clare; Queen Mary University of London, William Harvey Research Institute; Institute of Cancer Research, Scott, Richard; Genomics England; Great Ormond Street Hospital NHS Trust Jones, Louise; Genomics England; Barts Cancer Institute, Queen Mary University of London Thomas, Ellen; Genomics England; Guys and St Thomas NHS Foundation Trust Murugaesu, Nirupa; Genomics England; St George's University Hospitals NHS Foundation Trust Lawson, Kay; Genomics England Henderson, Shirley; Genomics England; Oxford Universities NHS FoundationTrust Hamblin, Angela; Genomics England; Oxford Universities NHS FoundationTrust Ryten, Mina; Genomics England; University College London O’Neill, Amanda; Genomics England Baple, Emma; Genomics England; University of Exeter Smith, Katherine; Genomics England Rueda-Martin, Antonio; Genomics England Smedley, Damian; Genomics England; Queen Mary University of London, William Harvey Research Institute Patch, Christine; Genomics England; Guys and St Thomas NHS Foundation Trust Alrifai, Doraid; Genomics England; St George's University Hospitals NHS Foundation Trust Athanasopoulou, Maria; Genomics England Bari, Wasim; Genomics England Boardman-Pretty, Freya; Genomics England Boustred, Chris; Genomics England Campbell, Chris; Genomics England Coll-Moragon, Jacobo; Genomics England Cranage, Alison; Genomics England Dinh, Lisa; Genomics England Foulger, Rebecca; Genomics England Furio-Tari, Pedro; Genomics England Gordon, Duncan; Genomics England https://mc.manuscriptcentral.com/bmj BMJ

Transcript of The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England...

Page 1: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

The Genomics England 100,000 Genomes Project

Journal: BMJ

Manuscript ID BMJ.2016.036140

Article Type: Analysis

BMJ Journal: BMJ

Date Submitted by the Author: 18-Oct-2016

Complete List of Authors: Turnbull, Clare; Queen Mary University of London, William Harvey Research Institute; Institute of Cancer Research, Scott, Richard; Genomics England; Great Ormond Street Hospital NHS Trust Jones, Louise; Genomics England; Barts Cancer Institute, Queen Mary University of London Thomas, Ellen; Genomics England; Guys and St Thomas NHS Foundation Trust Murugaesu, Nirupa; Genomics England; St George's University Hospitals NHS Foundation Trust

Lawson, Kay; Genomics England Henderson, Shirley; Genomics England; Oxford Universities NHS FoundationTrust Hamblin, Angela; Genomics England; Oxford Universities NHS FoundationTrust Ryten, Mina; Genomics England; University College London O’Neill, Amanda; Genomics England Baple, Emma; Genomics England; University of Exeter Smith, Katherine; Genomics England Rueda-Martin, Antonio; Genomics England Smedley, Damian; Genomics England; Queen Mary University of London, William Harvey Research Institute

Patch, Christine; Genomics England; Guys and St Thomas NHS Foundation Trust Alrifai, Doraid; Genomics England; St George's University Hospitals NHS Foundation Trust Athanasopoulou, Maria; Genomics England Bari, Wasim; Genomics England Boardman-Pretty, Freya; Genomics England Boustred, Chris; Genomics England Campbell, Chris; Genomics England Coll-Moragon, Jacobo; Genomics England Cranage, Alison; Genomics England

Dinh, Lisa; Genomics England Foulger, Rebecca; Genomics England Furio-Tari, Pedro; Genomics England Gordon, Duncan; Genomics England

https://mc.manuscriptcentral.com/bmj

BMJ

Page 2: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review OnlyHalai, Dina; Genomics England Haraldsdottir, Eik; Genomics England Jang, Mikyung; Genomics England Leigh, Sarah; Genomics England Logie, Cameron; Genomics England Lopez, Javier; Genomics England McDonagh, Ellen; Genomics England McGrath, Kenan; Genomics England Medina, Ignacio; Genomics England

Mistry, Vanisha; Genomics England Montaner, David; Genomics England Mueller, Michael; Genomics England Nevin-Ridley, Katrina; Genomics England Niblock, Olivia; Genomics England Ocampo, Ernesto; Genomics England Parker, Matthew; Genomics England Prapa, Matina; Genomics England Rendall, Alice; Genomics England; St George's University Hospitals NHS Foundation Trust Riley, Laura; Genomics England Rimmer , Andy; Genomics England

Serra, Enric; Genomics England Shallcross, Laura; Genomics England; University College London, Department of Infection and Population Health Simpson, Pauline; Genomics England Sosinsky, Alona; Genomics England Stals, Karen; Genomics England Sultana, Razvan; Genomics England Thompson, Simon; Genomics England Tregidgo, Carolyn; Genomics England Mahon-Pearson, Jeanna; Genomics England Witkowska, Katarzyna; Genomics England; Queen Mary University of

London, William Harvey Research Institute Bale, Mark; Genomics England Fowler, Tom; Genomics England Hubbard, Tim; Genomics England; Kings College London, Medical and Molecular Genetics Rendon, Augusto; Genomics England; University of Cambridge Caulfield, Mark; Genomics England; Queen Mary University of London, William Harvey Research Institute

Keywords: Whole Genome Sequencing, Next Generation Sequencing, Rare Disease, Cancer Genomics, Secondary Findings

Page 1 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 3: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen Thomas1,2, Nirupa Murugaesu1,7, Mina

Ryten1,2,8, Emma Baple1,9, Amanda O’Neill1,10, Kay Lawson1, Shirley Henderson1,11, Angela Hamblin1,11,

Katherine Smith1, Antonio Rueda Martin

1, Damian Smedley

1,3, Christine Patch

1,2,12, Doraid Alrifai

1,7,

Maria Athanasopoulou1, Wasim Bari1, Freya Boardman Pretty1, Chris Boustred1, Chris Campbell1,

Jacobo Coll Moragon1, Alison Cranage1, Lisa Dinh1, Rebecca Foulger1, Pedro Furio Tari1, Duncan

Gordon1, Dina Halai1, Eik Haraldsdottir1, Mikyung Jang1, Sarah Leigh1, Cameron Logie1, Javier

Lopez1, Jo Mason

1, Ellen M. McDonagh

1, Kenan McGrath

1, Ignacio Medina

1, Adam Milward

1,13,

Vanisha Mistry1, David Montaner1, Michael Mueller1, Katrina Nevin-Ridley1, Olivia Niblock1,

Ernesto Ocampo1, Matthew Parker1, Matina Prapa1, Alice Rendall1,7, Laura Riley1, Andy Rimmer1,

Enric Serra1, Laura Shallcross1, Pauline Simpson1, Alona Sosinsky1, Karen Stals1, Razvan Sultana1,

Simon Thompson1, Carolyn Tregidgo

1, Alice Tuff-Lacey

1, Jeanna Mahon-Pearson

1, Katarzyna

Witkowska1, Mark Bale

1, Jim Davies

1,13, Tom Fowler

1, Tim Hubbard

1,14, Augusto Rendon

1,10, Mark

Caulfield1,3

1 Genomics England, Charterhouse Square, London, EC1M 6BQ

2 Guys and St Thomas NHS Foundation Trust, London, SE1 9RT.

3 William Harvey Research Institute, Queen Mary University of London, EC1M 6BQ

4 Institute of Cancer Research, London, SM2 5NG.

5 Great Ormond Street Hospital NHS Trust, London,WC1N 3JH

6 Barts Cancer Institute, Queen Mary University of London, EC1M 6BQ

7 St George's University Hospitals NHS Foundation Trust, London SW17 0QT.

8 University College London, Gower Street, London, WC1E 6BT

9 University of Exeter, Exeter, EX4 4SB.

10 University of Cambridge, C., CB2 1TN.

11 Oxford Universities NHS FoundationTrust, Oxford, OX3 9DU.

12 Florence Nightingale Faculty of Nursing & Midwifery, King’s College, London SE1 8WA.

13 University of Oxford, Oxford, OX1 2JD.

14 Medical and Molecular Genetics, Kings College, London, WC2R 2LS.

On behalf of the 100,000 Genomes Project

The 100,000 Genomes Project is a government-led initiative to sequence 100,000 whole genomes

from patients recruited from the National Health Service (NHS) in England. The project was

established to develop the infrastructure and expertise necessary to transform delivery of genomic

medicine into the NHS, to improve the lives of patients, to enable high quality research and to boost

the UK genomics industry.

Background

Genomics has advanced through stepwise evolution of technology

The genome of each human comprises approximately 3 billion base pairs, 20,000 protein-coding

genes and 4-5 million points of variation1. There has been stepwise evolution in technology for

accessing and reading the genetic code with several landmark genomic discoveries made in the UK

(Box a)2. These advances have been mirrored by commensurate improvement in the genomic tests

available for use in the clinical care of patients with hereditary disease, with evolution from

biochemical assays for the defective gene product, through family mapping of genetic markers to

sequencing that reveals the specific disease-causing mutations3.

Page 2 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 4: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

The Human Genome Project was established in 1990 in order to map comprehensively the full

sequence of the 3 billion bases comprising the human genome4. Regions of the genome were

divided between 20 research sequencing centres across the United States, the United Kingdom,

Japan, France, Germany, and China with the first complete sequence of a human genome completed

after 13 years at an estimated cost of $3 billion5. Now, rather than sequencing sections of the

genome one-at-a-time, over the last decade the advent of next-generation sequencing (NGS) has

enabled the ‘massively parallel’ sequencing of millions of fragments of the genome simultaneously,

which has enabled the long-heralded $1000 genome to be delivered in less than a day (Fig a)6. This

technological renaissance has transformed opportunities for genomic sequencing in research and in

the clinic.

Box a: Landmarks in UK Genomic Research

1903: pioneering studies of early inborn errors of metabolism by Archibald Garrod

1951: X-ray diffraction studies reveal 3D structure of DNA, Rosalin Franklind, Kings College London

1953: elucidation of the structure of the double helix by James Watson and Francis Crick, Cambridge

1954: description of the structure and synthesis of nucleotides and nucleosides by Alexander Todd,

Cambridge University

1977: description of ‘chain-termination’ sequencing by synthesis by Frederick Sanger and Alan Coulson,

Cambridge University

1990: scientists from Wellcome Trust Sanger Centre lead UK Human Genome Project effort

1995: development of solid phase colony next-generation sequencing technology at Cambridge University

Page 3 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 5: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

Figure a: The decrease in cost of sequencing against time

Establishing the 100,000 Genomes Project

Capitalising upon the UK’s strong record in genomic research and the established network of

regional genetics laboratories and clinical genetics services, it was recognised that our unified single-

payer national health service offered a unique test-bed in which to first introduce whole genome

sequencing across a healthcare system. However, as highlighted by Professor Sir John Bell to The

House of Lords Science and Technology Committee in 20097 and later by Human Genome Strategy

Group8, it was widely recognised that substantial transformation of the NHS and education of its

workforce would be required for successful implementation of these new technologies. In 2012, as

part of the Olympic Legacy, the then Prime Minister, David Cameron announced that funding would

be committed to sequence 100,000 genomes from patients in the English NHS, with key objectives

around patient benefit, research and industry development (Box b). In 2013, Secretary of State for

Health Jeremy Hunt announced that the project would be delivered through establishing Genomics

England, a company owned in its entirety by the Department of Health. Through a steering group

initiated by Chief Medical Officer Professor Dame Sally Davies and chaired by Professor David Lomas,

rare disease, cancer and infection were agreed as the initial priority areas (Box c)9.

Box b: Key objectives of the 100,000 Genomes Project

1. To bring benefit to NHS patients

2. To create an ethical and transparent programme based on consent

3. To enable new scientific discovery and medical insights

4. To kick start the development of a UK genomics industry

Box c: The 100,000 Genomes Project: Advancing genomic healthcare across three clinical

areas

• Sequencing the whole genome enables us to survey all variants across a multitude of

mutational mechanism in order to most reliably find the causative mutation underlying

patients suffering rare Mendelian disorders: early investigation with a whole genome

thus can obviate the protracted and expensive diagnostic odyssey which historically

characterised investigation of these disorders.

• Cancer is a common disease with a grave burden of morbidity and mortality and ever-

growing treatment costs. Whole genome sequencing of the tumour can predict

therapeutic efficacy and prognosis, thus enabling administration of more effective

treatments and avoidance of administration of drugs that may be ineffective, costly and

associated with significant effects.

• Sequencing of pathogen genomes enables more effective disease monitoring, infection

control and management of anti-microbial resistance.

Page 4 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 6: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

Whole genomes as the sequencing platform for research and clinical care

Constrained by sequencing costs and capacity for data processing, early applications of NGS typically

concentrated on sequencing of protein-coding regions (the exome or selected panels of genes).

However, projects such as the Encyclopedia of DNA Elements (ENCODE) have debunked the

formerly-held notion that the other 98% of the genome comprising the non-coding elements is

“junk-DNA”10 11

. We are in our infancy of understanding how non-genic variants influence disease:

to evolve these insights, substantial genome data across diseases is required12 13. Furthermore,

many well-recognised disease-causing mechanisms, such as large copy number changes (deletions

and duplications), balanced structural rearrangements and uniparental disomy, could be missed if

we restrict our analyses to just the coding regions of the genome using conventional sequencing

technologies. With the rapidly falling cost of sequencing (Fig a) alongside commensurate advances

in computational infrastructure, clinical sequencing at scale of the entirety of the genome has

become feasible and accordingly was the chosen strategy for this ambitious programme.

A transformative genomics project embedded in the NHS

Genomic research studies have enabled progressive delineation of the genomic architecture of rare

disease, common complex disease and cancers, with evolving correlation of molecular (genomic)

changes with clinical diagnosis, prognosis and/or response to therapy. Application of these findings

to clinical care is termed ‘precision’ ‘personalised’, or ‘stratified’ medicine14. However, local

implementation of NGS has required substantial technical, computational and bioinformatics

capacity that has not been consistently delivered across NHS diagnostic laboratories. Complicated

by the complexities of the commissioning of laboratory testing, this has resulted in disparate

practice and standards of care around genetics15-17. The 100,000 Genomes Project has been

recognised as a unique opportunity for the UK genomics community and NHS England (NHSE) to

work together in these areas to deliver improvement, modernisation and consistency across clinical

and laboratory services18 19

.

Establishing the 100,000 Genomes Project through partnerships and

infrastructure

Genomics Medicine Centres as hubs of expertise in the NHS

In 2014, NHS trusts were invited to tender to become Genomic Medicine Centres (GMCs): regional

hubs of excellence in genomic medicine through which existing expertise in molecular genetics,

molecular pathology, clinical genetics services and molecular oncology would be grown. Following

two rounds of evaluation, 13 Genomic Medicine Centres have now been established, each

comprising a lead NHS trust and up to 12 local delivery partner hospitals. In total the GMC network

comprises 85 hospital trusts and provides full geographic coverage of England (Fig b). In addition

Northern Ireland and Wales are developing capabilities as Genomic Medicine Centres to join the

programme, and Scotland is developing a parallel sequencing project.

Page 5 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 7: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

Harnessing the expertise of the research community through GeCIP

To leverage for the programme the wealth of expertise within the UK and international clinical

academic and NHS genomics community, the Genomics England Clinical Interpretation Partnership

(GeCIP) was established. This partnership reflects a quid pro quo: GeCIP researchers are providing

expert support in the clinical interpretation of the genomes and gain priority access to the genome

data via research data embassies in return. Following a call for expressions of interest, >1700 senior

academics from the UK from diverse scientific backgrounds representing >300 institutions, >600 NHS

clinicians and >200 international collaborators responded20 21

. This group has self-organised into 41

domains spanning 14 themes in rare disease, 14 specific tumour types and 12 cross-cutting themes

such as ethics, health economics and advanced analytical approaches (Figure c).

Rare Disease Cancer Cross-cutting

Fig b: Division of England into 13 NHS Genomic Medicine Centres, each with a lead organisation

Roles of Genomic Medicine Centres include:

• identification of eligible patients, offering equity of access,

• consenting of patients and collection of clinical data.

• collection of biological samples (blood/tumour tissue/saliva)

• sample processing, DNA extraction and quality control checks

• sample dispatch to the central sample biorepository.

• interpretation and technical validation of returned clinically important variants

• return of the findings to patients and implementation of appropriate clinical actions.

Rare Disease Cancer Cross-cutting

Dermatology Breast Cancer Electronic Records

Endocrine and

Metabolism

Renal and Bladder

Cancers

Ethics and Social Science

Neurological Sarcoma Functional Effects

Hearing and Sight Brain Tumours Health Economics

Immunology Germ Cell Tumours Stratified Medicine and

Therapeutic Innovation Paediatric sepsis Prostate Cancer

Musculoskeletal Colorectal Cancer Population Genomics

Gastroenterology

and Hepatology

Haematological

Malignancy

Machine Learning,

Quantitative Methods

and Functional Genomics Respiratory Childhood Solid

Page 6 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 8: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

Dermatology Breast Cancer Electronic Records

Endocrine and

Metabolism

Renal and Bladder

Cancers

Ethics and Social Science

Neurological Sarcoma Functional Effects

Hearing and Sight Brain Tumours Health Economics

Immunology Germ Cell Tumours Stratified Medicine and

Therapeutic Innovation Paediatric sepsis Prostate Cancer

Musculoskeletal Colorectal Cancer Population Genomics

Gastroenterology

and Hepatology

Haematological

Malignancy

Machine Learning,

Quantitative Methods

and Functional Genomics Respiratory Childhood Solid

Cancers

Paediatrics Lung Cancer Education and Training

Renal Melanoma Electronic Records

Inherited Cancer

Predisposition

Ovarian and

Endometrial Cancers

Enabling Rare Disease

Translational Genomics via

Advanced Analytics and

International

Interoperability

Upper Gastro-

intestinal Cancer

Cardiovascular Cancer of Unknown

Primary

Education and Training

Haematology Pan Cancer Functional Cross-Cutting

Fig c: The Genomics England Clinical Interpretation Partnership: Research domains for the 100,000

Genomes Project

Partnering with and stimulating the Genomics Industry in the UK

Following a 'bake-off’ between multiple sequencing providers launched in 2013, Illumina Inc was

selected to partner with the programme to provide sequencing services, working alongside our

Sequencing Advisory Group. Similarly, in 2014, 29 suppliers of genomic analysis, annotation and

interpretation services were evaluated, a subset of which have become 'clinical interpretation

partners'. These suppliers are assisting Genomics England with interpretation of the genomes within

the programme, under ongoing evaluation. In parallel, working with Innovate UK, Genomic England

awarded £10 million forward investment via the Small Business Research Initiative (SBRI) to

Page 7 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 9: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

companies with the most promising proposals to further develop genome annotation tools and

services20 22. Genomics England has also brought together twelve companies into a pre-competitive

consortium, Genomics Expert Network for Enterprises (GENE), to foster closer working between

industry, academia and the NHS so that insights from the genomic analyses will be translated most

rapidly for patient benefit23

.

Establishing sustainable sequencing infrastructure and data architecture

To achieve efficiency of cost and throughput, delivery at scale of whole genome sequencing was

required. Accordingly, supported by the Wellcome Trust, a national sequencing centre has been

built at Hinxton, Cambridgeshire, with capacity to deliver >1,000 whole genome sequences (WGS)

per week. Sequencing at the centre commenced in March 2016.

The 100,000 Genomes Project embodies many of the challenges inherent to interfacing big data with

clinical care. Firstly, there are significant technical challenges to storing, transferring, compressing,

tracking, analysing and representing the massive volumes of data generated by genome sequencing

(box d). Data relating to the patients and the samples is provided by GMCs: harmonising data-

models to work across multiple EPR and laboratory LIMS systems has been highly challenging.

Robustly-tested, versioned pipelines for data processing and analysis have been established to

ensure that generation of genomic analyses is reliable and reproducible. The 100,000 Genomes

Project data are stored in a highly secure government data centre with rigorous systems applied for

data permissions and access. Linked identifiable clinical and molecular data are made available to

clinicians from NHS GMCs, whilst data embassies containing de-identified instances of the data are

provided for researchers from academia and industry. The research data embassies are subject to

strict airlock mechanisms which restrict the exit of data to registered users, who are permitted

summary-level exports only (Figure d).

Box d: Genomics and Big Data for the 100,000 Genomes Project

3 billion: base pairs in the human genome

4-5 million: variants (points of difference to the reference genome) in a single human genome

300 million: sequencing reads per sample*

40: average reads per base (coverage) for a germline genome (minimum). The tumour genomes

require sequencing at greater depth and are sequenced at 75 fold coverage.

63 GB: size of data generate for each genome sequenced*�

9 PB: predicted size of total data generate from 100,000GP *�¥

500 million computer processing hours: total compute required to process 100,000 genomes for

the project

*germline genome from blood at median coverage ~30x, read length 150bp, paired-end reads

���� as BAM files

¥ whole genome equivalents based on 75% tumour 25% germline. Tumour coverage median ~75x. Compressed

Page 8 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 10: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

Figure d: Data Flow in the 100,000 Genomes Project

Improving opportunities for patients with rare disease

Rare diseases are collectively common

A rare disease is defined as affecting < 1 in 2000; there are 5000-8000 different rare diseases and it

is estimated overall that 1 in 17 individuals are affected by a rare disease, equating to approximately

3 million people in the UK. The majority of rare diseases present in childhood with significant rates of

disability and early mortality. It is estimated that 80% of these rare disorders are monogenic

(meaning that there is a single underlying gene defect). However, for over half of these disorders,

the underlying gene(s) have yet to be identified22-24. Due to their individual low frequency, these

‘orphan diseases’ have been historically underserved with regard to research and development of

therapies, an inequity that international collaborations such as Orphanet and EUCERD (now

European Commission Expert Group on Rare Diseases) seek to redress25 26.

Making a diagnosis in rare disease: a diagnostic odyssey

Establishing a robust genetic diagnosis in cases of rare disease is a critical foundation stone in the

care of that child and their family. A precise molecular genetic diagnosis can enable the clinician to

better estimate prognosis, pre-empt complications and apply the interventions and therapies most

likely to be effective. Furthermore, if achieved in a timely fashion, genetic diagnosis facilitates

reproductive decision-making in subsequent pregnancies, enabling provision of accurate risk of

recurrence and potential options for pre-implantation or pre-natal genetic diagnosis. Historically,

when genetic testing was slow, expensive and low throughput, the ‘diagnostic odyssey’ could span

many years involving investigation of multiple organ systems by different medical specialists and

requiring serial testing of individual genes at different laboratories. Recent research projects such as

Deciphering Developmental Disorders (DDD) have revealed the potential of exome sequencing to

increase diagnoses for patients and now through the 100,000 Genomes Project we have the

opportunity to extend diagnostic yield27-29

.

Recruiting patient groups with unmet diagnostic need to 100,000 Genomes Project

Following nomination by clinicians of patient phenotypes particularly underserved by current clinical

diagnostic testing and/or for which the genetic basis of the disorder is not well explained, there are

Page 9 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 11: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

>190 phenotypic categories to which patients with rare disease can be recruited to the Project.

Eligibility for each category is defined by a set of clinical and/or family characteristics, pre-testing of

well-established genes and an optimal family structure for recruitment (typically (i) trio including

unaffected parents, (ii) multi-generational affected individuals or (iii) isolated proband).

Detailed clinical phenotyping improves diagnosis and research in rare disease

Working with the disease experts, detailed clinical data models have been developed for each of the

phenotypic groups under recruitment, which has enabled systematic collation of patient

characteristics using the Human Phenotype Ontology30 31

. Applying this standardised,

internationally-recognised and comprehensive ontology will enable ready comparison of patients

across phenotypic groups as well as comparison of 100,000 Genomes Project phenotypes to outside

datasets. Systematic and standardised clinical phenotyping, using data models such as these, will

facilitate robust longitudinal study of rare disease cohorts by the newly established Public Health

England National Congenital Anomaly and Rare Disease Registration Service (NCARDRS)32.

Identifying causative variants through data analysis, interpretation and validation.

In rare Mendelian disease, we seek to identify the single (or paired recessive) pathogenic variant(s)

causative of disease in that patient. However, each genome will contain >4 million variants, many of

which will be rare and may be candidates for being the causal variant(s). Therefore, we have crowd-

sourced expert input from the relevant rare disease communities from the UK and beyond via

interactive software to enabling us to create ‘virtual panels’ of genes robustly implicated in each

included phenotype33. Semi-automated bioinformatics pipelines, incorporating variant impact,

inheritance mechanism and clinical data, are then used to prioritise and tier called variants using

these virtual panels. Clinicians and scientists within the NHS GMC Network then utilise interactive

variant interpretation platforms to view, explore, share and collate expert opinion on the genomic

data in order to determine which variants should be confirmed by technical validation in their local

laboratories and reported back to participants.

Enabling research through studying cohorts of patients with rare disease

De-identified data from undiagnosed patients will be analysed by the relevant GeCIP domains in the

research data environment, whereby iterative analysis across the full data set can be performed to

identify previously unrecognised genomic causes that, once confirmed, can be fed back to

participants via the GMC Network. In addition, participants can be recalled up to four times per year,

offering opportunity for additional deeper phenotyping, working with parallel dedicated initiatives

such as the Translational Research Collaborative in Rare Disease34. This presents the opportunity to

identify previously unrecognised features, complications and genotype-phenotype correlations,

opening the way for better, more personalised management. Furthermore, as exemplified by

genetic familiar disorders such as cystic fibrosis, haemoglobinopathies and Down syndrome (Trisomy

21), even between patients who carry the identical genetic abnormality, wide phenotypic variation is

common35 36

. Systematic whole genomic analyses of patient cohorts alongside deep longitudinal

phenotyping offers opportunity for identification of modifier genes influencing phenotypic spectrum

and severity, which are currently poorly understood37

.

Page 10 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 12: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

Studying Cancer via Whole Genome Sequencing

Cancer as a genomic disease

One in two people born in the UK after 1960 will be affected by cancer; >350,000 new diagnoses of

cancer were made in in 2013 and the NHS annual spend on cancer services for 2020 is estimated to

be £13 billion38-40.

Cancer is a disease of disordered genomes, with serial acquisition of somatic genetic mutations

which result in progressive escape from the mechanisms which regulate cellular proliferation41 42

.

Large-scale cancer sequencing research projects such as the International Cancer Genome

Consortium (ICGC) and the Cancer Genome Atlas (TCGA) have enabled detailed cataloguing of

mutated genes, allowing the important ‘driver mutations’ to be distinguished from the noise of

incidental ‘passenger mutations.43-46

Accordingly ‘Molecular Oncology’ has emerged, with clinical

application of these genomic biomarkers used to predict tumour behaviour, prognosis and drug

response, along with increasing administration of bespoke targeted drugs which subvert and/or

switch off the oncogenes activated by particular ‘driver mutations’47 48

. However, current

taxonomies of cancers are still largely defined by the organ of origin and histological description of

the aberrant cells and most patients are treated with empiric regimens of cytotoxic chemotherapy

and irradiation49.

A cancer programme to advance clinical research and benefit patients

In the 100,000 Genomes Project we are undertaking WGS across a range of tumour types, adding

substantially to the volume of whole genome data available for comprehensive molecular

characterisation of these tumour types 50

. Furthermore, through alignment of recruitment to clinical

studies and trials, collection of multiple patient samples in space and time and utilisation of

longitudinal clinical data to capture treatment and response, stratified analysis of genomic drivers to

response, progression and metastasis are possible. Through capturing serial blood samples for the

analysis of circulating cell-free DNA (cfDNA), circulating tumour DNA as a ‘liquid biopsy’ for

monitoring tumour progression can be further evaluated51

. In the analysed genomes returned to the

GMC tumour sequencing boards, we highlight clinically relevant variants which range from small

base substitutions of well characterised ‘actionability’ to novel gene copy number variants or fusion

which may enable access to experimental drugs.

Molecular pathology: new approaches for the genomic era

Through initial pilot studies at 10 UK centres, we evaluated collection feasibility and sequencing

quality for both fresh frozen (FF) and formalin-fixed paraffin-embedded (FFPE) tumour tissue. Whilst

sample processing and DNA extraction protocols for FFPE tissue varied widely between centres, the

sequence quality generated was consistently lower that of fresh frozen across a range of tumours.

Fresh frozen tissue was elected as the preferred sample processing pathway with which to initiate

the 100,000 Genomes Project cancer main programme. In consultation with Royal College of

Pathologists, Cancer Research UK and National Cancer Research Institute Cellular Molecular

Pathology (NCRI CM-Path), working through a molecular pathology network, the NHS GMCs are re-

engineering tissue handling to develop sustainable pathways for processing FF tissue at greater scale

which do not compromise sample diagnosis. New approaches include methods of snap-freezing

small biopsies and pathways through theatres involving vacuum packing and extended refrigeration

of fresh surgical tissue at 4°C. In conjunction with Health Education England (HEE), supported by

digital pathology approaches, online training resources are being developed to formally train

pathologists in tumour cellularity evaluation to facilitate provision of samples sufficiently rich in

Page 11 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 13: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

neoplastic cells52

. Together, these approaches are creating new pathways in sample handling ready

for the genomic era.

Complementing the molecular data with rich life-course clinical data

For each cancer patient, basic registration data, sample-handling data and a rich core clinical dataset

are collected, aligned to the National Cancer Registration and Analysis Service (NCRAS) reporting

dataset. This is complemented by additional tumour-type specific fields defined by NHS clinicians

and researchers. Follow-up data will be derived from nationally collected datasets, including the

Cancer Outcomes and Services Dataset (COSD), the Systemic Anti-Cancer Therapy Dataset (SACT)

and the Radiotherapy Dataset (RTDS)53-55. In partnership with NHS Digital, the longitudinal data will

be further enriched by life-long linkage to electronic health data from primary care, hospital

episodes, pharmacy data and social care records. Working with the Farr Institute, these datasets will

be integrated optimally for interrogation of both cancer outcomes and broader health-related

questions56 57.

Infection Infectious diseases are responsible for seven percent of UK deaths at an annual cost of £30 billion

per annum58. Sequencing of viral and bacterial genomes enables delineation of species taxonomy,

virulence, transmission and anti-microbial resistance, facilitating infection control and improved use

of anti-microbial agents59 60. Partnering with Public Health England, we have initiated a programme

of pathogen sequencing and have already completed WGS of 3000 multidrug resistant tuberculosis

organisms.

Education, Training and Patient and Public Involvement (PPI) It has been widely recognised that successful implementation of genomics within routine healthcare

will require substantial up-skilling of the general clinical workforce, additional development of the

specialist genetics workforce, and education of the patients and the public. Partnering with Health

Education England, Masters courses in Genomic Medicine have been established at ten universities,

with courses available full- or part-time to NHS clinical staff, or as modules for Continuing

Professional Development (CPD) 61. This partnership has also yielded a new funded scientific training

programme to expand genetic counsellor numbers for the NHS workforce. In addition, MOOCs

(Massive Open Online Courses) and other online training resources in areas such as consent,

bioinformatics and molecular pathology have been developed62

.

Ongoing Public and Patient Involvement has been central to development of the programme

alongside close consultation with several patient stakeholder and affiliated groups63-65. Open town-

hall style events were held at the inception of the project to facilitate contribution from stakeholder

groups, patients and members of the public in sculpting the programme. A year-long programme of

activities (the Genomics Conversation) was initiated in 2015 to engage the public and relevant

stakeholders on key topics relating to genomic medicine. Delivered through science and health

networks and charities, the Genomics Conversation involved public debates, roundtables, tailored

briefings and research with the aim to start a dialogue on the benefits as well as the barriers to

embedding genomic medicine into mainstream healthcare today. A national participant panel

convenes regularly and provides representation to the Data Access Committee, Ethics Committee

and GeCIP Board66.

Page 12 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 14: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

Ethics and regulatory aspects If the 100,000 Genomes Project is to truly advance utilisation of genomics in UK healthcare, the

programme must be ethical, transparent and supported by public and patient confidence. Through

our Ethics Advisory Committee, our National Participant Panel and our Research Ethics Committee,

we have addressed in detail issues such as life-long data linkage, the return of secondary (incidental)

findings, sharing of data for research with academia and industry, consenting of children and

adolescents, impact on insurance and diminution in mental capacity subsequent to enrolment50.

Secondary findings

As we have moved from single gene testing to large-scale sequencing, much debate has ensued

around handling of ‘additional’ or ‘incidental’ genomic findings, i.e. genetic variants identified which

are not related to the condition under investigation but which are informative to the risk of

unrelated but serious medical conditions67 68. For the 100,000 Genomes Project, we shall offer

participants the option to receive secondary findings on a short list of relatively well characterised

genes which are robustly linked to disease and established clinical management (Box e): the impact

of returning these results will be studied by GeCIP social sciences researchers. In addition, we also

offer reporting of secondary ‘reproductive findings’; for example, where both parents of a child with

a rare disease are found to each carry a pathogenic variant in CTFR, this would be of potential clinical

utility as they are at risk of having a subsequent child affected by cystic fibrosis.

Data protection and data federation: a dynamic tension

In the 100,000 Genomes Project, we shall ensure strict security for patient-identifiable data but

through continued consultation with stakeholder groups and affiliation to the Global Alliance for

Genomics and Health we shall navigate how federation of de-identified data can be achieved to

benefit individual patients and clinical research69 70

. For a patient with a very rare disease, the

causative variants and genes will only be established by locating the handful of other cases in the

world and comparing de-identified clinical and genomic data71 72. Cancer genomes are complex,

noisy and further complicated by spatial heterogeneity and mutational evolution over time:

largescale combining of these genomic and longitudinal clinical data across projects and across

Box e: Conditions (genes) for which secondary findings are reported

• Hereditary non-polyposis colorectal cancer / Lynch syndrome (MLH1, MSH2,

MSH6)

• Familial adenomatous polyposis (APC)

• MYH-associated polyposis (MutYH)

• Hereditary, breast and ovarian cancer (BRCA1, BRCA2)

• Von Hippel-Lindau syndrome (VHL)

• Multiple endocrine neoplasia type 1 (MEN1)

• Multiple endocrine neoplasia type 2 (RET)

• Familial hypercholesterolaemia: (LDLR, APOB, PCSK9)

Page 13 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 15: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

borders will enable real advances in precision oncology, a priority articulated unequivocally in USA

Vice President Biden’s recent Cancer Moonshot initiative73.

Progress in recruitment, sequencing and returning patient results Following piloting in 2014 of patient recruitment, sample collection, sequencing and data analysis for

the rare disease and cancer programmes, the first patients from the NHS Genomic Medicine Centres

were recruited in February 2015 with return of the first results to patients in Newcastle in March

2015 (Box f). Through progressive scaling up of recruitment and sequencing throughput, results will

be returned to >1000 families in 2016.

Discussion The 100,000 Genomes Project marks a substantial milestone in NHS genomic healthcare, advancing

our management of rare Mendelian disease, cancer and infection (box c). In addition, potential

applications of genomics for public health and prevention (box g) will be explored through the

Project by means of return of secondary and reproductive findings.

To date, NHS clinical diagnostic genetic testing and genetic research studies have operated under

quite distinct structures of governance and funding, often resulting in costly and time-consuming

duplication of patient consent, clinical data and sample collection and genomic data generation.

Only a minority of NHS patients are engaged in genetic research studies, whilst for the remainder

their clinical and molecular data remain siloed and inert within the NHS clinical record system.

Through the project, we aim to advance implementation of genomics in healthcare not only to bring

direct benefit the patients of today but also to enable more efficient, synergistic and sustainable

alignment of patient care with clinical research. This will enable the NHS to become a hub for

genomic research, facilitating clinicians, academics and partners from industry to derive research

Box f: Programme Landmarks for the 100,000 Genomes Project

Jan 2014: first patient recruited to 100,000 GP pilot

Dec 2014: Announcement of 11 Genomic Medicine Centres

Feb 2015: first patient recruited to the 100,000 GP Main Programme

March 2015: first results returned to pilot patients in Newcastle

Dec 2015: Announcement of two new Genomic Medicine Centres – 13 NHS GMCs in total

March 2016: Sequencing commences at Hinxton Sequencing centre

April 2016: sequencing of 10,000 whole genomes completed

Box g: Potential applications of genomics in public health and prevention

• Identifying asymptomatic individuals at increased risk of disease: Genetic risk profiling from

cancer susceptibility genes and common risk variants, augmented by non-genetic and life-style

factors can enable targeting of screening, preventative surgery and chemopropylaxis to high–risk

groups, improving cancer prevention and early detection.

• Reproductive carrier screening for genetic diseases: To date only offered opportunistically to

those from ethnically high-risk populations, this could be offered more widely and systematically.

• Newborn screening for genetic diseases: The newborn screening programme is currently reliant

on assay of relevant metabolites: mutational screening of the genes implicated in these childhood

diseases may enable earlier and more accurate detection of childhood diseases.

• Pharmacogenomic profiling: can enable life-saving avoidance of toxicity from chemotherapeutic

drugs and precision dosing in widely-used drugs such as warfarin.

Page 14 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 16: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

value and clinical insights to also benefit the patients of tomorrow (Box h).

Acknowledgements

We should like to thank all patients and participants involved in the 100,000 Genomes Project.

We should like to acknowledge the work of the >1500 NHS staff across 85 NHS hospital trusts

within 13 Genomic Medicine Centres who are enacting this Project. We should like to

acknowledge Sir John Chisholm, Nick Maltby, Vivienne Parry and the full staff of Genomics

England. We should like to recognise the contribution of Sir John Bell, Dame Sally Davies and the

other members of the Genomics England Board, Science Advisory committee, Ethics Advisory

committee, Access Review Committee and numerous advisory and working groups, who have all

supplied considerable time and expertise to guide the project. We should like to recognise our

partnership with NHS England in delivery of this project and our close working with Professor

Sue Hill, James Fisher, Zandra Deans, Val Davison and their teams. We should like to recognise

our partnership with Health Education England in delivery of training and educational materials.

We should like to acknowledge the staff of the NIHR National Biosample Centre and Illumina Inc

in sample processing and sequencing. We should like to acknowledge the support for the

project provided by the National Institute of Health Research, the Wellcome Trust, the Medical

Research Council and Cancer Research UK. We should like to recognise Queen Mary University

of London for hosting Genomics England at their Charterhouse Square campus. The views

expressed in this publication are those of the author(s) and not necessarily those of NHS England

or the Department of Health.

“I Clare Turnbull, The Corresponding Author of this article contained within the original

manuscript which includes any diagrams & photographs within and any related or stand alone

Box h: Combining clinical diagnostics with research: 100,000GP as an exemplar

• Broad Consent: Approval from Health Research Authority Research Ethics Committee (REC) has been

granted for the consent to cover return of clinical results into the routine NHS clinical setting as well as

lifelong data storage and linkage, making participant’s data available to researchers from academia and

industry and re-contact to gain additional data and biosamples.

• Molecular pathology and tissue handling: Applying evidence-based sample handling protocols and vigorous

quality assurance processes in laboratories, the NHS GMCs have re-engineered and optimised sample

handling and molecular pathology pathways to generate the requisite high quality FF tumour tissue and

DNA suitable for high quality WGS.

• Data centralisation: a single central 100,000GP data repository containing linked clinical and full genome

data can be utilised (i) in the identifiable form by NHS clinicians to view the genomic data for individual

patient management and (ii) in the de-identified form by researchers from academia and industry to analyse

cohorts of patients for research and discovery.

• Biobanking of additional biosamples: harnessing the single overarching consent and phlebotomy

opportunity, samples additional to the DNA including serum, plasma, RNA and cell-free DNA are being

collected and stored, to be made available for researchers to perform functional analyses to validate or

explore genomic findings.

Page 15 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 17: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

film submitted (the Contribution”) has the right to grant on behalf of all authors and does grant

on behalf of all authors, a licence to the BMJ Publishing Group Ltd and its licencees, to permit

this Contribution (if accepted) to be published in the BMJ and any other BMJ Group products

and to exploit all subsidiary rights, as set out in our licence set out at:

http://www.bmj.com/about-bmj/resources-authors/forms-policies-and-checklists/copyright-

open-access-and-permission-reuse.”

Contribution

The manuscript was drafted by CT with support from MC, RHS, ET, LJ, AR, TH, DH, EM and KS. All

authors reviewed the final manuscript.

References

1. Auton A, Brooks LD, Durbin RM, et al. A global reference for human genetic variation. Nature

2015;526(7571):68-74.

2. Heather JM, Chain B. The sequence of sequencers: The history of sequencing DNA. Genomics

2016;107(1):1-8.

3. Hutchison CA, 3rd. DNA sequencing: bench to bedside and beyond. Nucleic acids research

2007;35(18):6227-37.

4. Lander ES, Linton LM, Birren B, et al. Initial sequencing and analysis of the human genome. Nature

2001;409(6822):860-921.

5. Finishing the euchromatic sequence of the human genome. Nature 2004;431(7011):931-45.

6. Hayden EC. Technology: The $1,000 genome. Nature 2014;507(7492):294-5.

7. Genomic Medicine: Science & Technology Committee Report.

https://hansard.parliament.uk/Lords/2010-06-

09/debates/10060963000015/GenomicMedicineSAndTCommitteeReport, 2010.

8. Building on our inheritance: Genomic technology in healthcare Human Genomics Strategy Group,

2012.

9. Lomas DA. Strategic Priorities for 100,000 Genomes Project, 2013.

10. An integrated encyclopedia of DNA elements in the human genome. Nature 2012;489(7414):57-

74.

11. Pennisi E. Genomics. ENCODE project writes eulogy for junk DNA. Science (New York, NY)

2012;337(6099):1159, 61.

12. Spielmann M, Mundlos S. Looking beyond the genes: the role of non-coding variants in human

disease. Human molecular genetics 2016;25(R2):R157-r65.

13. Smedley D, Schubach M, Jacobsen JO, et al. A Whole-Genome Analysis Framework for Effective

Identification of Pathogenic Regulatory Variants in Mendelian Disease. American journal of

human genetics 2016;99(3):595-606.

14. Ashley EA. Towards precision medicine. Nature reviews Genetics 2016;17(9):507-22.

15. Molecular diagnostic provision in England: For targeted cancer medicines (solid tumour) in the

NHS: Cancer Research UK, 2015.

16. Improving Outcomes: A Strategy for Cancer: Department of Health, 2011.

17. Ensuring equitable access to complex molecular diagnostic testing for cancer patients:

Department of Health, 2012.

18. Keogh B. Personalised Medicine Strategy: NHS England Board, 2015.

19. Improving outcomes through personalised medicine: Working at the cutting edge of science to

improve patients’ lives: NHS England, 2016.

Page 16 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 18: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

20. Marx V. The DNA of a nation. Nature 2015;524(7566):503-5.

21. The Genomics England Clinical Interpretation Partnership. Nature, 2014.

22. Rare Disease UK [Available from: https://www.raredisease.org.uk/what-is-a-rare-disease/.

23. International Rare Diseases Research Consortium: Policies and Guidelines, 2013.

24. The UK Strategy for Rare Diseases: Department of Health, 2013.

25. Inserm Ophanet: Portal for rare diseases and orphan drugs [updated 6/10/2016. Available from:

http://www.orpha.net/consor/cgi-bin/index.php.

26. Commission expert group on rare diseases: European Commission, 2016.

27. Swaminathan GJ, Bragin E, Chatzimichali EA, et al. DECIPHER: web-based, community resource

for clinical interpretation of rare variants in developmental disorders. Human molecular

genetics 2012;21(R1):R37-44.

28. Large-scale discovery of novel genetic causes of developmental disorders. Nature

2015;519(7542):223-8.

29. Firth HV, Wright CF. The Deciphering Developmental Disorders (DDD) study. Developmental

medicine and child neurology 2011;53(8):702-3.

30. Human Phenotype Ontology: New HPO Release [updated September 3, 2016. Available from:

http://human-phenotype-ontology.github.io/.

31. Kohler S, Doelken SC, Mungall CJ, et al. The Human Phenotype Ontology project: linking

molecular biology and disease through phenotype data. Nucleic acids research

2014;42(Database issue):D966-74.

32. National Congenital Anomaly and Rare Disease Registration Service (NCARDRS): A service to

support clinicians and patients, service delivery, commissioning and public health.: Public

Health England, 2016.

33. Genomics England: PanelApp 2016 [Available from:

https://panelapp.extge.co.uk/crowdsourcing/PanelApp/.

34. Rare Diseases Translational Research Collaboration (TRC) 2014 [Available from:

http://www.nihr.ac.uk/about/rare-diseases-translational-research-collaboration.htm.

35. Lettre G. The search for genetic modifiers of disease severity in the beta-hemoglobinopathies.

Cold Spring Harbor perspectives in medicine 2012;2(10).

36. Gallati S. Disease-modifying genes and monogenic disorders: experience in cystic fibrosis. The

application of clinical genetics 2014;7:133-46.

37. Hamilton BA, Yu BD. Modifier genes and the plasticity of genetic networks in mice. PLoS genetics

2012;8(4):e1002644.

38. Ahmad AS, Ormiston-Smith N, Sasieni PD. Trends in the lifetime risk of developing cancer in

Great Britain: comparison of risk for those born from 1930 to 1960. British journal of cancer

2015;112(5):943-7.

39. Cancer incidence statistics [cited 2016. Available from:

http://www.cancerresearchuk.org/health-professional/cancer-statistics/incidence.

40. Progress in improving cancer services and outcomes in England: National Audit Office, 2015.

41. Yates LR, Campbell PJ. Evolution of the cancer genome. Nature reviews Genetics

2012;13(11):795-806.

42. Behjati S, Huch M, van Boxtel R, et al. Genome sequencing of normal cells reveals developmental

lineages and mutational processes. Nature 2014;513(7518):422-5.

43. Hudson TJ, Anderson W, Artez A, et al. International network of cancer genome projects. Nature

2010;464(7291):993-8.

44. Forbes SA, Beare D, Gunasekaran P, et al. COSMIC: exploring the world's knowledge of somatic

mutations in human cancer. Nucleic acids research 2015;43(Database issue):D805-11.

45. Vogelstein B, Papadopoulos N, Velculescu VE, et al. Cancer genome landscapes. Science (New

York, NY) 2013;339(6127):1546-58.

46. Weinstein JN, Collisson EA, Mills GB, et al. The Cancer Genome Atlas Pan-Cancer analysis project.

Nature genetics 2013;45(10):1113-20.

Page 17 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 19: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

47. LeBlanc VG, Marra MA. Next-Generation Sequencing Approaches in Cancer: Where Have They

Brought Us and Where Will They Take Us? Cancers 2015;7(3):1925-58.

48. Pon JR, Marra MA. Driver and passenger mutations in cancer. Annual review of pathology

2015;10:25-50.

49. Schmidt KT, Chau CH, Price DK, et al. Precision Oncology Medicine: The Clinical Relevance of

Patient Specific Biomarkers Used to Optimize Cancer Treatment. Journal of clinical

pharmacology 2016.

50. The 100,000 Genomes Project Protocol: Genomics England, 2015.

51. O'Leary B, Turner NC. Science in Focus: Circulating Tumour DNA as a Liquid Biopsy. Clinical

oncology (Royal College of Radiologists (Great Britain)) 2016.

52. Tumour Assessment for Whole Genome Sequencing: Health Education England, 2016.

53. Public Health England: National Radiotherapy Dataset (RTDS) 2016 [Available from:

http://www.ncin.org.uk/collecting_and_using_data/rtds.

54. Public Health England: Systemic Anti-Cancer Therapy Dataset (Chemotherapy) 2016 [Available

from: http://www.ncin.org.uk/collecting_and_using_data/data_collection/chemotherapy.

55. Cancer Outcomes and Services Dataset (COSD) 2016 [Available from:

http://www.ncin.org.uk/collecting_and_using_data/data_collection/cosd.

56. NHS Digital: Hospital Episode Statistics 2016 [Available from: http://content.digital.nhs.uk/hes.

57. The Farr Institute [Available from: http://www.farrinstitute.org/about.

58. Surveillance of Infectious Disease: Parliamentary Office of Science and Technology, 2014.

59. Didelot X, Bowden R, Wilson DJ, et al. Transforming clinical microbiology with bacterial genome

sequencing. Nature reviews Genetics 2012;13(9):601-12.

60. Pankhurst LJ, Del Ojo Elias C, Votintseva AA, et al. Rapid, comprehensive, and affordable

mycobacterial diagnosis with whole-genome sequencing: a prospective study. The Lancet

Respiratory medicine 2016;4(1):49-58.

61. Genomics Education Programme: Health Education England, 2015.

62. The Genomics Era: the Future of Genetics in Medicine: St Georges University of London, 2015.

63. Earning Trust. Public Engagement and Patient Involvement Strategy 2015-17: Genomics England,

2015.

64. What do patients with rare genetic conditions think about whole genome sequencing in the

NHS? Research Findings for the 100,000 Genomes Project: Genetic Alliance UK, 2014.

65. Ethical issues relating to involvement of cancer patients in the 100,000 genomes project:

Genomics England, 2014.

66. Call for members of new Participant Panel [Available from:

https://www.genomicsengland.co.uk/participant-panel/.

67. ACMG policy statement: updated recommendations regarding analysis and reporting of

secondary findings in clinical genome-scale sequencing. Genetics in medicine : official

journal of the American College of Medical Genetics 2015;17(1):68-9.

68. Green RC, Berg JS, Grody WW, et al. ACMG recommendations for reporting of incidental findings

in clinical exome and genome sequencing. Genetics in medicine : official journal of the

American College of Medical Genetics 2013;15(7):565-74.

69. Genomic Data Sharing: Qualitative research report: Genomics England, 2014.

70. GENOMICS. A federated ecosystem for sharing genomic, clinical data. Science (New York, NY)

2016;352(6291):1278-80.

71. Buske OJ, Schiettecatte F, Hutton B, et al. The Matchmaker Exchange API: automating patient

matching through the exchange of structured phenotypic and genotypic profiles. Human

mutation 2015;36(10):922-7.

72. Philippakis AA, Azzariti DR, Beltran S, et al. The Matchmaker Exchange: a platform for rare

disease gene discovery. Human mutation 2015;36(10):915-21.

73. Kaiser J, Couzin-Frankel J. BIOMEDICAL RESEARCH. Biden seeks clear course for his cancer

moonshot. Science (New York, NY) 2016;351(6271):325-6.

Page 18 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960

Page 20: The Genomics England 100,000 Genomes Project · Confidential: For Review Only The Genomics England 100,000 Genomes Project Clare Turnbull1-4, Richard Scott1,5, Louise Jones1,6, Ellen

Confidential: For Review Only

Page 19 of 18

https://mc.manuscriptcentral.com/bmj

BMJ

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960