big and small data - raremark.com

18
Tackling rare disease with big and small data January 2017

Transcript of big and small data - raremark.com

Page 1: big and small data - raremark.com

raremark.com 1

Tackling rare disease with

big and small data

January 2017

Page 2: big and small data - raremark.com

2 raremark.com

Big data promises to create valuable

insights in rare disease. Technologies

such as next-generation sequencing and

natural-language processing, alongside

whole-exome analyses and other novel

scientif ic approaches, are helping

cl inicians treat patients who previously

had no therapeutic options. At the same

time, deep interrogation of smaller

patient samples can provide information

of great benefit to developers of orphan

drugs. Realizing the full potential of big

data wil l require models that can also

integrate intell igence from datasets that

are small, writes Pete Chan

Page 3: big and small data - raremark.com

raremark.com 3

Image by geralt on Pixabay

In 2012, a group of researchers

organized a crowdsourcing competition

to shed new light on amyotrophic lateral

sclerosis (ALS), a rare neurodegenerative

disease. Participants were given three

months of data from ALS patients who

had taken part in cl inical trials and

asked to predict how the disease would

progress in the same individuals over the

fol lowing nine months. More than 1,000

teams from over 60 countries stepped

up to the challenge. Two winning groups

created algorithms that outperformed

the predictions of a panel of leading

ALS cl inicians, and they both picked

up prize money of US$20,000 (Küffner

et al., 2015; Zach et al., 2015). One

algorithm discriminated perfectly

between individuals with slow and fast-

progressing ALS: potential ly useful

insight for the stratif ication of patient

cohorts in cl inical trials. The organizers

of the competition, known as the ALS

Prediction Prize, estimated that by

modeling the progression of disease in

individual ALS patients, the two

Page 4: big and small data - raremark.com

4 raremark.com

Abbreviations: ALS = amyotrophic lateral sclerosis; ALSFRS(R); = revised ALS Functional Rating Scale; PRO-ACT = Pooled Resource Open-Access ALS Clinical Trials database

Source: PRO-ACT, 2011

Table 2: PRO-ACT in numbers

Data category No. subjects No. records No. values

Adverse events 8,628 74,545 748,566

ALSFRS(R) 6,844 60,775 791,473

Concomitant medications

7,656 111,848 376,098

Death report 4,633 4,634 8,033

Demographics 10,723 10,723 39,107

Family history 1,007 1,071 2,452

Forced vital capacity

8,848 48,856 200,200

Laboratory data 8,342 2,445,059 9,659,191

Riluzole use 8,817 8,817 17,633

Slow vital capacity 2,717 9,525 25,532

Subject ALS history 9,394 12,058 35,967

Treatment group 9,640 9,640 16,830

Vital signs 9,973 72,422 717,715

Table 1: Clinical trials in PRO-ACT

Abbreviation: PRO-ACT = Pooled Resource Open-Access ALS Clinical Trials database

Source: Atassi et al., 2014

1 Clinical trial of arimoclomol in ALS 10 Clinical trial of TCH346 in ALS2 Clinical trial of creatine in ALS 11 Clinical trial of talampanel in ALS3 Clinical trial of celecoxib in ALS 12 Clinical trial of topiramate in ALS4 Clinical trial of gabapentin in ALS 13 French prospective observational

study in ALS5 Clinical trial of lithium in

combination with riluzole in ALS14 Clinical trial of vitamin E in ALS

6 Clinical trial of rHBDNF in ALS 15 Clinical trial of xaliproden in ALS: first Phase III trial

7 Clinical trial of rHCNTF in ALS 16 Clinical trial of xaliproden in ALS: second Phase III trial

8 Clinical trial of riluzole in ALS 17 Unpublished clinical trial of xaliproden in advanced ALS

9 Clinical trial of riluzole in the treatment of advanced ALS

Page 5: big and small data - raremark.com

raremark.com 5

algorithms could help reduce the number

of patients required for a hypothetical

cl inical study by up to 20%. (In diseases

with a wide range in natural rates of

progression, cl inical trials need larger

numbers of patients to help discern the

effects of the investigational drug.)

The pioneering application of machine-

learning algorithms to ALS research was

made possible by PRO-ACT, an open-

access repository of longitudinal cl inical

trial data (Atassi et al., 2014; PRO-ACT,

2011). At the time, it held data on more

than 8,600 people who had taken part in

Phase II/III ALS studies between 1990

and 2010. Rival teams were given some

sample data to design their algorithms,

before putting these to work on the ALS

Prediction Prize dataset. PRO-ACT was

official ly launched in December 2012

with eight mil l ion data points, growing

since then to more than 10 mil l ion (see

Tables 1&2). More than 400 researchers,

including representatives of around 40

pharma companies, requested access to

PRO-ACT within two years of its launch

(Zach et al., 2015).

Great expectations from big dataPRO-ACT and the research projects it

has enabled i l lustrate how big data

approaches can be applied to biomedical

research in rare disease. They wil l

inspire those who are convinced of the

role of big data in the orphan drug

sector, not just in cl inical trials but also

R&D more broadly. Their excitement is

understandable. On the one hand, they

are faced with the famil iar challenges of

rare disease research, including: small

patient cohorts; poor understanding of

epidemiology; lack of natural history

studies; and the variable quality of

patient registries. On the other, they

are bombarded with a steady stream

of health-related, big data success

stories, with benefits ranging from

the prediction of patient responses to

drugs and side effects through to better

patient segmentation and the delivery

of personalized medicine. They hope

big data might do for rare disease what

it has delivered in common medical

conditions.

Page 6: big and small data - raremark.com

6 raremark.com

Delve a l itt le deeper, though, and you’re

just as l ikely to find skeptics who argue

that big data and rare disease research

are two different and incompatible

worlds. Why the divergent views?

The main reason is lack of consensus

about how to define big data: the UC

Berkeley School of Information l ists

no fewer than 43 definit ions (Dutcher,

2014). The definit ion that resonates with

most is the principle that big data should

have three Vs: volume, velocity and

variety. Crit ics say rare disease data –

collected from small patient populations,

but diff icult to source and often of

dubious quality – certainly fai ls on the

volume measure, and possibly the other

two as well.

A more helpful perspective comes from

Viktor Mayer-Schönberger and Kenneth

Cukier, whose 2013 book, Big Data:

A Revolution That Wil l Transform How We

Live, Work, and Think, helped stoke big-

data fervor among the masses. “When we

talk about big data, we mean ‘big’ less in

absolute than in relative terms: relative

to the comprehensive set of data,” they

wrote (Mayer-Schönberger and Cukier,

2013). Their argument is that data

practit ioners shouldn’t get hung up on

the number of data points they gather;

instead they should view big data as

using “as much of the entire dataset

as feasible”. By their logic, sequencing

the entire genome of a person with

a rare disease and using the data to

help that individual qualif ies as a big

data approach. PRO-ACT, now bringing

together 25 years’ worth of longitudinal

data, is the single largest effort to

assemble the entire dataset of cl inical

trials in ALS.

Daphna Laifenfeld, Director, Personalized

Medicine and Pharmacogenomics

at Teva Pharmaceutical Industries,

defines big data as the combination of

genetics, omics [a term used to describe

discipl ines of biology such as proteomics

and transcriptomics], patient-reported

and cl inical data (Laifenfeld, 2016).

Her l ist could be expanded further but,

for researchers, it ’s a helpful guide for

dividing into famil iar categories what

people really mean when they talk about

big data in medicine.

Viewed through this lens, it ’s apparent

that innovative big data methodologies

are being implemented in rare disease.

Key applications include: drug discovery;

Big data should have three Vs: volume, velocity and variety.

Page 7: big and small data - raremark.com

raremark.com 7

the discovery of disease-related genes,

genetic mutations and biomarkers;

matchmaking of rare disease cases

to help diagnose patients; and drug

repurposing.

In a stellar example of international

academic collaboration, the Exome

Aggregation Consortium (ExAC) has

aggregated genetic sequencing data

from around 20 separate research

studies, creating an open-access

database of genetic variants in more

than 60,000 people; in other words, the

genetic variation we might expect to find

in a normal population (see

Figure 1). Writing in Nature, the ExAC

team described their undertaking as “the

most comprehensive catalogue (to our

knowledge) of human protein-coding

genetic variation to date” (Lek et al.,

2016). Since the launch of the ExAC

database in 2014, researchers the world

over have interrogated the resource,

including the 10 mil l ion identif ied

variants, principally to better understand

the genetic variations seen in rare

disease patients.

In a Broad Institute public lecture,

Daniel MacArthur, the researcher who led

the ExAC consortium, said: “We’ve now

sequenced in our lab more than 1,000

famil ies affected by a rare disease. For

more than 400 of those famil ies, we’ve

been able to give them back a diagnosis

and, for several dozen of those famil ies,

it ’s been possible to convert what had

previously been an untreatable disease

into a disease where it ’s actually possible

It ’s a diverse l ist. But even a cursory

glance at the l iterature reveals that most

efforts are focussed on genomics. There

are scientif ic and economic drivers at

work here: the advent of next-generation

sequencing has made it feasible to

sequence the entire genomes of humans

at a reasonable cost. The other factor

is specif ic to rare disease: the fact that

80% of the known orphan conditions

result from genetic defects, and that the

majority of these are monogenic.

But, to understand which genetic

variants are implicated in rare diseases,

researchers first need to fi lter out those

variants that occur normally. This is no

trivial task, given the tens of thousands

of genetic variants that occur in a typical

exome (the 1-2% of a genome that codes

for proteins). And it is here that big data

analysis has proven invaluable.

Going deep into the genome

Page 8: big and small data - raremark.com

8 raremark.com

to give a medication to alleviate at least

some of those symptoms. That small

fraction wil l grow as we begin to develop

more and better drugs to treat rare

diseases.” (Broad Institute, 2016)

One of Dr MacArthur’s case studies

involved two sisters with a rare condition

that led to extreme weakness in the

facial muscles. Before the girls’ DNA

was sent to Broad, the family had

been through nine years of muscle

biopsies, pathology tests and other

procedures; none of which identif ied

the cause of their disease. Thanks to

the availabil ity of the ExAC database,

Dr MacArthur managed to trace it back

to two extremely rare mutations in a

gene known as LMOD3. The sisters were

diagnosed with nemaline myopathy. And

within six months, a dozen other famil ies

with the same mutation were identif ied,

creating a small network of people who

had been medically isolated just a year

before.

Reflecting on the l imitations of their

resource, Dr MacArthur and colleagues

explained that most ExAC samples

are not accompanied by detailed

phenotypic data; that is, information

on the symptoms and other observable

properties in an individual (Lek et al.,

2016). This is an important point. The

abil ity to l ink genotypic and phenotypic

data is precisely what’s needed if

the troves of big data generated by

DNA sequencing are to be interpreted

correctly, and translated into patient

benefit in cl inical settings. Genotype-to-

Sources: Broad Institute, 2016; Lek et al., 2016

Figure 1: ExAC in numbers

No. natural genetic variants identified

No. contributing authors in Nature paper

No. international research studies donating data

No. exomes in raw dataset

No. exomes in final dataset after filtering for quality

Page 9: big and small data - raremark.com

raremark.com 9

phenotype connections need to be made

not only in individual cases, but also in

unrelated people if scientists are to be

confident in a condition’s genetic cause.

Spyros Mousses, founder and president

of Systems Imagination, a data analytics

company, says researchers are routinely

“looking at bil l ions of measurements

from an individual’s genome”: activit ies

he calls “deep genotyping” (RARECast,

2016). But in his view, the depth of

analysis being performed in genomics is

absent from phenomes. “We’re measuring

not bil l ions but dozens of traits and

cl inical phenotypes,” said Dr Mousses.

Andrew Morris, a director of the Farr

Institute, a UK-based special ist in health

informatics, wants to see the health-data

debate shift towards “deep phenotyping”

(Morris, 2016).

The ability to link genotypic and phenotypic data is precisely what’s needed if the troves of big data generated by DNA sequencing are to be interpreted correctly, and translated into patient benefit in clinical settings.

Matchmaking for clinicians in rare diseaseGoing some way to bridge this gap,

a Canadian-led team has created

PhenomeCentral, an online matchmaking

service for cl inicians and researchers

working in rare disease, often those

whose patients have yet to receive a

diagnosis. PhenomeCentral aggregates

phenotypic and genotypic data from

FORGE Canada, CARE for RARE, the US

NIH Undiagnosed Diseases Project and

other rare disease-focused consortia

(Buske et al., 2015). PhenomeCentral

users query the database by submitting

a patient record that includes cl inical

symptoms and any available information

on patients’ genetic variants.

PhenomeCentral’s algorithms mine the

phenotypic data held in the repository,

identifying patients most l ikely to have

the same condition, and predicting

which genes or genetic variants might

be responsible. Users are then able to

contact others whose patient cases match

theirs, hopefully leading to a positive

diagnosis. In 2015, PhenomeCentral

Page 10: big and small data - raremark.com

10 raremark.com

held records on more than 1,000 deeply-

phenotyped rare disease patients. Most

had had their exomes sequenced and

remained undiagnosed.

Achieving scale is an acknowledged

challenge in rare disease, but

PhenomeCentral wil l surely be aided in

this respect by its decision to join The

MatchMaker Exchange (MME), a network

of matchmaking services, each with its

own cohort of users (Phil ippakis et al.,

2015). Under this model, researchers

have the option of querying not just

PhenomeCentral but also other members

of the MME network at the same time.

More shots on goal give them a better

chance of f inding a patient match.

Elsewhere, two high-profi le init iatives

promise to integrate many more diverse

sources of data beyond genotypes and

phenotypes, and have received plenty of

attention for their big data ambitions.

Later this year, the UK’s 100,000

Genomes Project is expected to have

sequenced the genomes of 25,000 cancer

patients and around 17,000 people with

rare diseases, as well as their famil ies

(Genomics England, 2015). Alongside

genomic data, the project wil l also

collect cl inical data, pathology and

histopathology results, imaging results,

information on treatments and risk

factors, hospital records, and other

data gathered during the l ife course of

patients (Hil l, 2016). In other words,

deep phenotypic and longitudinal data

on the sort of scale that the country’s

National Health Service (NHS), among

comparable systems globally, is uniquely

placed to provide.

And working at international level,

RD-Connect is an EU FP7-funded project

that aims to break down historical data

si los in rare disease. A key objective is

to make it easier for the rare disease

research community to share data.

To this end, it is creating a platform to

integrate patient registries, biobanks

and databases of genomic, phenotypic,

natural history and cl inical trial data

(McCormack, 2016; Thompson et al.,

2014).

RD-Connect piloted its model by

pull ing in data from two European

research consortia: NeurOmics, with

a focus on rare neurodegenerative

and neuromuscular disorders, and

EURenOmics in the field of rare kidney

disorders; with each contributing around

1,000 sequenced exomes. The Broad

Institute, Newcastle University and other

international partners have come on

board more recently (see Figure 2).

Page 11: big and small data - raremark.com

raremark.com 11

Meanwhile, one of the world’s best-known

artif icial intell igence systems is being

piloted in two rare disease projects, with

the aim of creating what some describe

as a digital doctor ’s assistant.

For the past year, orphan disease

researchers at Boston Children’s Hospital

have been training IBM Watson, the tech

company’s f lagship cognitive computing

platform, to understand steroid-resistant

nephrotic syndrome (SRNS), a rare

kidney disease (IBM, 2015). Watson first

gained notoriety by winning the

gameshow Jeopardy! in 2011. Since then,

it has gone on to capture the imagination

of the data science community with its

abil ity to analyze large quantit ies of

data, to understand complex questions

posed in natural language, and to

propose evidence-based answers.

The Boston team fed medical l iterature

and cl inical data relating to SRNS into

Watson, before adding genomic data

from patients retrospectively. This is

the first t ime Watson has been used

to help doctors diagnose rare disease

and identify treatment options – and

the results wil l be eagerly awaited. If

it proves successful in SRNS, the plan

is to extend the approach to neurologic

disorders and other rare pediatric

diseases studied at Boston Children’s.

And at the end of 2016, researchers in

Germany kicked off their own 12-month

Figure 2: Initiatives contributing exome data to RD-Connect

Source: McCormack, 2016

Cognitive assistant for digital doctors

SeqNMD(US)

1,000 exomes

Key

500 exomes

300 exomes

NCNP Japan

EURenOmics(EU)

CMG Slovenia

CNAG Rare

(Spain)

NeurOmics(EU)

MYO-SEQ(UK)

Page 12: big and small data - raremark.com

12 raremark.com

pilot project with Watson, to evaluate its

potential to diagnose any rare disease

(IBM, 2016; Marks, 2016). The Center

for Undiagnosed and Rare Diseases at

the University Hospital Marburg has

been contacted by more than 6,000

patients since it opened in 2013. Most

patients have brought with them years

of unstructured data from their medical

histories, including: lab test results;

cl inical reports; pathology reports; and

drugs they’ve been prescribed. For the

Marburg researchers to review all this

information and combine it with their

own knowledge and the medical l iterature

to reach a diagnosis typically takes

several days for each patient.

The hope is that Watson wil l be able to

automate and accelerate the process,

quickly presenting physicians with

a l ist of possible hypotheses from

which they can make their own data-

driven diagnoses. In a further test of

Watson’s capabil it ies in natural-language

processing, the Marburg pilot wil l require

patients’ medical histories recorded

in German to be matched up with the

body of rare disease-related l iterature

published in English.

Time to downsizeAll well and good. Yet a l imitation that

is common to virtually al l the init iatives

described above is the absence of the

views of patients. This is an important

missed opportunity, given that rare

disease patients and famil ies are in many

cases experts in their own conditions,

capable of interacting with health

providers on a professional level, and

contributing insights that only they

possess.

Addressing this issue requires acceptance

that while great insights can be gleaned

from huge datasets, equally valuable and

complementary intell igence can be

derived from rigorous interrogation of

datasets that are relatively small. As it

happens, a small data movement has

also emerged in the past few years; its

loudest cheerleader being Martin

A limitation that is common to virtually all current big data-focussed initiatives is the absence of the views of patients.

Page 13: big and small data - raremark.com

raremark.com 13

Lindstrom, the Danish author of Small

Data: The Tiny Clues That Uncover Huge

Trends (Lindstrom, 2016). Mr Lindstrom’s

world is that of marketing and branding,

but it doesn’t take a huge leap to apply

his principles of keen observation of

small samples to people l iving with rare

disease.

And recent work in the field of patient-

reported outcomes (PROs) has provided

evidence that patient-generated medical

data can be of comparable quality to

data gathered from traditional sources.

A group of US researchers conducted a

proof-of-concept study using the chronic

lymphocytic leukemia (CLL) community

of PatientsLikeMe, a patient-powered

research network. There are several PRO

instruments specif ic to CLL, meaning the

supporting l iterature contain data the

researchers could use as comparators.

Using a combination of online surveys

and telephone interviews, they found

good alignment between the symptoms

that members of PatientsLikeMe’s CLL

community said were important to them,

and those identif ied through traditional

interviews and patient focus groups

(McCarrier et al., 2016).

Raremark has also been exploring how

to involve patients in the area of data

sharing and donation, the reasons for

doing so, and the implications for the

patient community. In l ine with the

small data model, we posed a series

of well-defined questions to small

groups of patients, both online and

over the phone. The study sample

comprised Raremark users with

an interest in three rare diseases:

adrenoleukodystrophy, myasthenia

gravis and Sanfi l ippo syndrome. Work

conducted from November 2016 to

January 2017 revealed an understanding

of the importance of data sharing for

the benefit of others, and a wil l ingness

to do so: 94% of participants said

they would feel comfortable sharing

selected health-related information about

themselves with the community and the

pharmaceutical industry.

Raremark’s f indings reflect the results

of a larger RD-Connect study that

included similar themes. As long as

the right governance systems are in

place, RD-Connect discovered, the rare

disease patient community generally

has a positive view on the sharing

of data to support medical research.

“All the participants understood the

incentive for [rare disease] in sharing

data and samples; in fact, there were

several pleas for research systems to be

standardised across the EU in order to

make data sharing easier,” the authors

Page 14: big and small data - raremark.com

14 raremark.com

wrote in the European Journal of Human

Genetics (McCormack et al., 2016).

Intelligence: from artificial to humanWatch a presentation by Dr Mousses of

Systems Imagination and you’l l be left

with big data-driven visions of the future.

Machines wil l be able to gather medical

data, create their own models and test

hypotheses in vast numbers without the

help of humans. They wil l also be able

to look at medical images and extract

bil l ions of features for interpretation:

a level of resolution that would simply be

impossible for pathologists. Pointing out

that traditional evidence-based medicine

has fai led in rare disease, he uses the

term “intell igence-based medicine”

to describe the mining of deep data

from rare disease patients – genomic,

phenotypic and biometric – before these

are integrated, using machine learning,

and analyzed for the benefit of those

individuals (Global Genes, 2016).

Learning from the ALS Prediction Prize

case study, in which, remarkably, four-

fifths of competitors had virtually no

previous experience in the condition,

injections of fresh thinking from smart

people from non-health discipl ines may

reveal exciting possibil it ies yet to be

imagined.

Machines wil l not be able to model some

truly human things, such as how to

explain to another human what it ’s l ike

to l ive day to day with a rare medical

condition, or whether a drug’s supposed

benefits deliver outcomes that are

meaningful to them. For these insights,

the only true source wil l be patients.

For big data-derived intell igence to

translate into real benefit for the rare

disease community, we need workable

models for combining very large datasets

with the very small.

Pete Chan is Head of Research & Analysis

at Raremark.

Email: [email protected].

Page 15: big and small data - raremark.com

raremark.com 15

Atassi, N. et al. (2014) ‘The PRO-ACT

database: Design, init ial analyses,

and predictive features’, Neurology,

83(19), pp. 1719–1725. doi: 10.1212/

wnl.0000000000000951.

Broad Institute (2016) Midsummer

nights’ science: Using big data to

understand rare diseases. Available

at: https://www.youtube.com/

watch?v=GFNn7z7OWU8&feature=youtu.

be (Accessed: 7 January 2017).

Buske, O.J. et al. (2015)

‘PhenomeCentral: A portal for phenotypic

and genotypic matchmaking of patients

with rare genetic diseases’, Human

Mutation, 36(10), pp. 931–940. doi:

10.1002/humu.22851.

Dutcher, J. (2014) What is big data?

Available at: https://datascience.

berkeley.edu/what-is-big-data (Accessed:

7 January 2017).

Genomics England (2015) Genomics

England and the 100,000 Genomes

Project. Available at: https://www.

genomicsengland.co.uk/wp-content/

uploads/2015/05/Genomics-Englad-

Narrative-May-20152.pdf (Accessed:

7 January 2017).

Global Genes (2016) Big data and

intell igence-based medicine. Available

at: https://www.youtube.com/

watch?v=cTTCDReujdE&feature=youtu.be

(Accessed: 7 January 2017).

Hil l, S. (2016) Beyond 100,000 Genomes:

Transforming the NHS into a personalised

medicine service. BioData World Congress

2016. Cambridge, UK. 27 October 2016.

IBM (2015) Boston Children’s Hospital

to tap IBM Watson to tackle rare

pediatric diseases. Available at:

http://www-03.ibm.com/press/us/en/

pressrelease/48031.wss (Accessed:

7 January 2017).

IBM (2016) Rhön-Klinikum hospitals

to study how IBM Watson can support

doctors in the diagnosis of rare diseases.

Available at: https://www-03.ibm.com/

press/us/en/pressrelease/50803.wss

(Accessed: 7 January 2017).

Küffner, R. et al. (2015) ‘Crowdsourced

analysis of cl inical trial data to

predict amyotrophic lateral sclerosis

progression’, Nature Biotechnology, 33,

pp. 51–57.

References

Page 16: big and small data - raremark.com

16 raremark.com

Laifenfeld, D. (2016) Preventive and

predictive genetics: towards personalised

medicine. BioData World Congress 2016.

Cambridge, UK. 26 October 2016.

Lek, M. et al. (2016) ‘Analysis of protein-

coding genetic variation in 60,706

humans’, Nature, 536(7616), pp. 285–

291. doi: 10.1038/nature19057.

Lindstrom, M. (2016) Small Data: The

Tiny Clues That Uncover Huge Trends.

London, United Kingdom: John Murray

Learning.

Marks, P. (2016) Dr House goes digital

as IBM’s Watson diagnoses rare diseases.

Available at: https://www.newscientist.

com/article/2109354-dr-house-goes-

digital-as-ibms-watson-diagnoses-rare-

diseases/ (Accessed: 7 January 2017).

Mayer-Schönberger, V. and Cukier, K.

(2013) Big data: A Revolution That Wil l

Transform How We Live, Work, and Think.

London: John Murray Publishers.

McCarrier, K.P. et al. (2016) ‘Concept

el icitation within patient-powered

research networks: A feasibil ity study in

chronic lymphocytic leukemia’, Value in

Health, 19(1), pp. 42–52. doi: 10.1016/j.

jval.2015.10.013.

McCormack, P. (2016) RD-Connect:

Big data for rare disease. Available

at: http://www.geneticdisordersuk.

org/static/media/up/GDLS2016_

PaulineMccormack_RDConnect.pdf

(Accessed: 7 January 2017).

McCormack, P. et al. (2016) ‘“You should

at least ask”. The expectations, hopes

and fears of rare disease patients on

large-scale data and biomaterial sharing

for genomics research’, European Journal

of Human Genetics, 24(10), pp. 1403–

1408. doi: 10.1038/ejhg.2016.30.

Morris, A. (2016) Options and

opportunities for health data science in

the UK. BioData World Congress 2016.

Cambridge, UK. 27 October 2016.

Phil ippakis, A. et al. (2015) ‘The

Matchmaker Exchange: A platform for

rare disease gene discovery’, Human

Mutation, 36(10), pp. 915–921. doi:

10.1002/humu.22858.

PRO-ACT (2011) Available at: https://

nctu.partners.org/ProACT/ (Accessed:

7 January 2017).

Page 17: big and small data - raremark.com

raremark.com 17

RARECast (2016) Harnessing big data

to work for rare disease patients by

RARECast Global Genes. Available

at: https://soundcloud.com/rarecast/

harnessing-big-data-to-work-for-rare-

disease-patients (Accessed: 7 January

2017).

Thompson, R. et al. (2014) ‘RD-Connect:

An integrated platform connecting

databases, registries, biobanks and

cl inical bioinformatics for rare disease

research’, Journal of General Internal

Medicine, 29(S3), pp. 780–787. doi:

10.1007/s11606-014-2908-8.

Zach, N. et al. (2015) ‘Being PRO-

ACTive: What can a cl inical trial database

reveal about ALS?’, Neurotherapeutics,

12(2), pp. 417–423. doi: 10.1007/

s13311-015-0336-z.

Page 18: big and small data - raremark.com

18 raremark.com

raremark.com