ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf ·...

53
ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION STUDY by SARADHA RAJAMANI A Project submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Master of Statistics Department of Mathematics The University of Utah October 2016

Transcript of ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf ·...

Page 1: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

ELIMINATING BIAS IN CANCER RISK ESTIMATES

A SIMULATION STUDY

by

SARADHA RAJAMANI

A Project submitted to the faculty ofThe University of Utah

in partial fulfillment of the requirements for the degree of

Master of Statistics

Department of Mathematics

The University of Utah

October 2016

Page 2: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

Copyright c© SARADHA RAJAMANI 2016

All Rights Reserved

Page 3: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

The University of Utah Graduate School

STATEMENT OF DISSERTATION APPROVAL

THIS PAGE IS A PLACE HOLDER ONLYPlease use the updated form on the Thesis Office website

The dissertation of SARADHA RAJAMANI

has been approved by the following supervisory committee members:

BRAXTON OSTING , Chair(s) 17 Aug 2016Date Approved

ELISHA HUGHES , Member 17 Aug 2016Date Approved

TOM ALBERTS , Member 17 Aug 2016Date Approved

Page 4: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

ABSTRACT

Hereditary breast cancer is aggressive and known to affect women at early age. The

lifetime and age-specific cancer risk should be accurate as the management of disease

depends on these estimates. The recommended action for high risk individual ranges from

frequent screening to prophylactic surgeries, having huge consequences on patient quality

of life.

Case-control studies estimate the relative risk of cancer harboring genetic mutations

that predisposes the patients to disease. The prevalence of the gene mutations that are

highly penetrant is less than 1%. For studying rare mutations, population based random

sampling would yield an underpowered study. Hence, most of the studies are done in

high risk families which create bias in risk estimates.

In this project, we simulated the sampling condition in hereditary genetic testing lab-

oratories like Myriad Genetics Inc., and the factors that contribute to bias in this type of

study. We have shown that the bias is due to ascertainment on family history and personal

cancer condition. We were able to prove that no matter the severity, bias can be eliminated

by properly accounting for it in the logistic regression model that predicts cancer risk based

on mutation status. We also investigated non-hereditary factors, such as environmental

exposure and age confounding the relationship between genetic variant status and the

occurrence of cancer.

Page 5: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

To my beloved parents.

“No amount of experimentation can ever prove me right; a single experi-ment can prove me wrong.”

– Albert Einstein

Page 6: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

CONTENTS

ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

CHAPTERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Breast Cancer Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Genes Associated With Hereditary Breast Cancer . . . . . . . . . . . . . . . . . . . . . . . 21.3 Risks Associated With Hereditary Breast Cancer . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3.1 Relative and Absolute Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.4 Calculating Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4.1 Familial Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4.2 Risk Due To Rare Allele . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4.3 Risk For High-Risk Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.5 Bias in Risk Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.5.1 Berkson Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.5.2 Adjustment For Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.6 Risk Bias Due To Study Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.6.1 Population-Based Case-Control Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.6.2 Family-Based Case-Control Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.6.3 Kin Cohort Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.6.4 Clinic Based Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.6.5 Segregation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.6.6 Prospective Cohort Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.6.7 Volunteer Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.7 Bias in Myriad Genetics data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.7.1 Causal Pathway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.8 Project Assumptions and Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2. BIAS DUE TO FAMILY HISTORY OF CANCER . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1 Family History (FH) Of Breast Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.1 High Risk Familial Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Simulation Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.3 Bias and Increased Cancer Risk in Carriers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Page 7: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

3. BIAS DUE TO ENVIRONMENTAL EXPOSURE AND FAMILY HISTORY OFCANCER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1 Environmental Exposure and Breast Cancer Risk . . . . . . . . . . . . . . . . . . . . . . . 173.2 Environmental Factors and Breast Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4. BIAS DUE TO AGE AND FAMILY HISTORY OF CANCER . . . . . . . . . . . . . . . . 21

4.1 Age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.1.1 Calculating Cumulative Lifetime Risk of breast cancer for carriers and

non-carriers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.1.2 Pedigree Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5. SUMMARY OF FINDINGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

A. R CODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

vi

Page 8: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

LIST OF TABLES

1.1 Observed proportion in population and Myriad data . . . . . . . . . . . . . . . . . . . . . 9

2.1 Breast Cancer (BC) incidence in ascertained data . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Mutation Status and BC in whole data data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1 Cumulative Lifetime Risk Distribution (%) by Age and Mutation Status . . . . . . 25

4.2 Number of cancer incidence in first degree relatives . . . . . . . . . . . . . . . . . . . . . . 25

Page 9: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

LIST OF FIGURES

1.1 Proportion of familial hereditary risk of breast cancer [18]. . . . . . . . . . . . . . . . . . 2

1.2 Causal Diagram showing ascertainment bias. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1 Elimination of bias in ascertained set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Bias and relation to personal mutation risk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1 Bias in cancer risk estimate due to family history and exposure. . . . . . . . . . . . . 20

4.1 Breast cancer incidence and mortality risk by age [22]. . . . . . . . . . . . . . . . . . . . . 22

4.2 BRCA genes cancer risk by age [7] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.3 Cumulative risk estimates of relatives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.4 Cumulative risk estimates of proband. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.5 Bias in ascertained set due to family history and age. . . . . . . . . . . . . . . . . . . . . . 26

Page 10: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

ACKNOWLEDGMENTS

Many thanks are due for making this project possible.

First, I would like to thank my project advisor BRAXTON OSTING Ph.D., of Depar-

ment of Mathematics at University of Utah. He is always eager to help and available

whenever I needed some expertise. He guided me in to right direction, giving a sense of

broader view on the project. He also gave me the opportunity to make this project my

own.

I would also like to express my profound gratitude to ELISHA HUGHES Ph.D., Myriad

Genetics Labs, for her constant support throughout this project. Without her passionate

participation and input, the project could not have been successfully completed. Our

inspiring conversations often led me to think more clearly about this project.

My appreciation extends to TOM ALBERTS Ph.D., of Department of Mathematics at

University of Utah, for being member of my committee and provide valuable comments.

I wish to thank DARL FLAKE Ph.D., Myriad Genetics Labs, for helping me with basic

understanding of simulations and samplings. His thoughts gave this projects a more

structure and cohesiveness.

I would like to acknowledge ALEXANDER GUTIN Ph.D., Myriad Genetics Labs, for

the conception of this project. His valuable comments proved to be corner stone of this

project.

Lastly, I would like to thank my parents for their unwavering support throughout these

years of my work and professional goals.

Page 11: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

CHAPTER 1

INTRODUCTION

Breast cancer remains the most common form of cancer diagnosed worldwide. The

incidence of breast cancer is also reported to be rapidly rising in a number of developing

countries, possibly owing to the congruence of a number of factors, including changes in

lifestyle, behavioral patterns, and improved diagnostics, all results of economic growth.

Despite enormous research in understanding of breast cancer risk, the clinical validity and

utility of such studies are often questioned. Recently, advances have been made to provide

an accurate estimate of cancer risk by eliminating the bias and conducting a well-powered

study [19].

In this chapter, basic concepts like breast cancer types, associated genes, cancer risk due

to gene mutation and how to estimate risk and bias in cancer risk are introduced.

1.1 Breast Cancer TypesThere are three main categories of breast cancer depending on the type of risk factors

[24].

• Hereditary Breast Cancer - Hereditary breast cancer is caused by gene variants that

predisposes the patient to cancer. The onset of cancer is younger for these patients

and first degree relatives are at 50% risk of having the same variant.

• Sporadic Breast Cancer - Nearly 70-80% of the breast cancer are of sporadic type

where the factors that predisposes the patient to breast cancer are unknown.

• Familial Breast Cancer - Familial clustering results from chance clustering of sporadic

cases in families. There is no specific pattern of inheritance. This type of breast cancer

can be caused by unknown genetic, environmental and/or lifestyle factors.

Page 12: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

2

Figure 1.1. Proportion of familial hereditary risk of breast cancer [18].

Sporadic cancer is the most common type of breast cancer whose risk factors are unknown. Familialcancer risk factors are unknown genetic, environmental or lifestyle factors. Hereditary cancer riskis due to genetic variants that predisposes to breast cancer. fig1

1.2 Genes Associated With Hereditary Breast CancerMost inherited cases of breast cancer are associated with abnormal changes or muta-

tions in one of two genes, BRCA1 (BReast CAncer gene 1) and BRCA2 (BReast CAncer gene

2). The class of BRCA proteins, tumor suppressor proteins help repair cell DNA damage.

Certain mutations in BRCA genes either causes the protein to not function properly or the

protein is not produced at all. BRCA deficient cells accumulates DNA damage leading to

cancer. The carriers of these mutations where the BRCA protein is affected has increased

risk of female breast and ovarian cancers. Abnormal BRCA genes may account for up to

10% of all breast cancers [3]. The measure of the effect of a mutation in a cancer gene is

defined as penetrance or absolute risk of cancer in carriers.

1.3 Risks Associated With Hereditary Breast CancerThe risk for breast cancer varies with age, presence of breast cancer associated gene

variants, hormonal status, familial breast cancer, etc. The risk also varies from one study

to another depending on the selection of control and cases for the case-control studies.

A woman living in the US has a 12.3%, or a 1 in 8, lifetime risk of being diagnosed

Page 13: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

3

with breast cancer [3]. The cumulative lifetime risk (CLTR) for developing a disease also

known as penetrance, is frequently reported as the probability of cancer by the age of 70

years. CLTR is used in genetic counseling where options for cancer prevention include

prophylactic removal of both breasts. In most of the studies, penetrance is estimated

from the degree of familial aggregation of cancer. Early penetrance is calculated to be

71-85% [Easton2002, 6]. The validity of these studies have been questioned due to the

ascertainment bias.

For clinic-based population of mutation carriers, female breast cancer risk was 72.8%

(95% CI = 68-78%) by age 70 and ovarian cancer risk is 40.7% (95% CI = 36-46%). There

is an increased risk of colon cancer (two-fold), pancreas (threefold), stomach (three fold)

and fallopian tube (120-fold) in BRCA1 mutation carriers compared to population-based

estimates from Surveillance, Epidemiology, and End Results (SEER) database. Familial risk

estimate is 85% for breast cancer and 60% for ovarian cancer [4]. The population based risk

estimate range from 35-50% for breast cancer and 15% for ovarian cancer [11, 9].

The age adjusted risk of breast cancer in the clinic-based population is 77% and 56%

risk for Ashkenazi Jewish volunteers [11]. The estimate is 85% for breast cancer studies

in high risk family cohort. Risk is lower in clinic-based population since there are fewer

families of increased and early breast cancer cases compared to familial studies.

1.3.1 Relative and Absolute Risk

The relative risk or odds ratio is not an estimate of absolute risk without making adjust-

ments for all known and unknown risk factors. The estimates of epidemiological studies

are presented as average relative risk or odds ratio. For the purpose of counseling, relative

risk has to be converted to absolute risk which gives the risk over lifetime or 10-years

etc. Absolute risk are strongly influenced by risk factors for breast cancer such as familial

history of breast cancer, age at menopause and breast density on mammography. In case

of a rare variant, a relative risk of 2 or 4 translates to 18% and 32% absolute risk in absence

of other risk factors. A relative risk higher than 4 is categorized as high-risk category. If

the patient has a variant that confers a relative risk of 2 to 4, the patient will be in high-risk

only if other factors are present [5].

Page 14: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

4

1.4 Calculating Risk1.4.1 Familial Risk

Assume that a mother is a carrier, denoted by A* on one allele and Aw, the wild type

allele. The father’s genotype is unknown and denoted by A1A2. The child genotype will

be A∗A1, A∗A2, Aw A1 or Aw A2, each with probability of 0.25. For autosomal dominant

inheritance, the probability of being a carrier are equal to 1, 1, p or p respectively where p

is the prevalence of the mutation in population. The probability that the mother is a carrier

conditional on offspring being an mutant is

P(Mother = carrier|o f f spring = mutant) =p2+

12

P(mother = carrier|o f f spring = non−mutant) = p.(1.1)

We assume Mendelian inheritance, that an individual inherits two BRCA1 or BRCA2 al-

leles, one from each parent independently [8]. The mutations are inherited in autoso-

mal dominant mode where mutation in one allele causes loss of protein function even

though the other copy is intact. At each gene locus, an individual has zero, one or two

mutations. Suppose the frequency of mutations for BRCA1 is indicated by p, then the

probabilities that an individual inherits mutation in both alleles is P(BRCA1=2) is p2. For

individual with one or no mutation, P(BRCA1=1) = p(1-p) and P(BRCA1=0) = (1− p)2.

The distribution of BRCA1 genetic status given her family history is P[BRCA1—Fam.Hist]

= P[BRCA1]P[Fam.Hist—BRCA1]/P[Fam.Hist] Hardy-Weinberg equilibrium states that

allele and genotype frequencies in a population will remain constant from generation

to generation in the absence of other evolutionary influences. For two alleles A and a

with frequencies p(A) = p and p(a) = q respectively, the expected genotype frequencies

of homozygotes are p(AA) = p2 and p(aa) = q2. The expected value in heterozygotes

p(Aa) = 2pq. The above expected values are called Hardy-Weinberg proportions. Hence,

p2 + 2pq + q2 = 1.

1.4.2 Risk Due To Rare Allele

If A denotes the event that an individual has at least one copy of a rare allele and D

denotes the individual has disease, then the probability of observing the rare allele can be

expressed using Baye’s law,

Page 15: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

5

P(A|D) = γ∗P(A)when

γ∗ =P(D|A)

P(D).

(1.2)

is a measure of the relative risk. The relative risk can also be expressed as

γ =P(D|A)

P(D|Ac),(1.3)

where Ac is the event that an individual does not have the allele of inquiry. γ∗ and γ

are the same for rare alleles with modest rate difference in risk (attributable risk) between

exposed and unexposed.

1.4.3 Risk For High-Risk Cases

Cases are at higher risk than controls due to non-genetic factors and hence the risk is

overestimated. Such estimate of cancer risk is meaningful only for carrier women who do

not have breast cancer yet. The distribution of risk in the population is different than the

distribution of risk in incident cases [2]. The distribution of risks in carriers identified from

an incident series of breast cancer represent higher risk than the carrier population. Let r

denote the risk of a randomly selected carrier who is at risk in the population. Let p(r) be

the probability density of the risks in carriers with mean risk µ. The distribution of breast

cancer risk for carriers with breast cancer is not p(r) but q(r) where,

q(r) =rp(r)∫rp(r)dx

. (1.4)

If the mean risk of this distribution is µc, then

µc

µ= 1 +

v2

µ2 (1.5)

where v2 is the variance of risk in the population. This shows that the mean risk in carriers

identified through breast cancer is greater than mean risk of the carriers. This size-based

sampling is the basis of cancer research. For a binary exposure with a prevalence, p and

relative risk ψ, the cases and controls should be sampled in proportion to pψ and (1-p).

The prevalence of exposure among cases is

q =pψ

(µ2)(1− p + pψ). (1.6)

The odds ratio is calculated from case-control along with exposure status,

OR =q(1− p)p(1− q)

= ψ. (1.7)

Page 16: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

6

The above paradigm enable us to calculate the relative risk but not absolute risk with

additional information about the cancer incidence data [2].

1.5 Bias in Risk EstimationBias is the lack of internal validity of a study where the association between exposure

and disease is measured incorrectly. A study without internal validity will also lack exter-

nal validity where the study results are applied to everyone in that population. Bias occurs

in a study due to selection bias, information bias or confounding. Ascertainment bias is

a kind of selection or sampling bias resulting in non-random sample. Depending on the

severity of the bias, the odds ratio which estimates the cancer risk due to exposure varies

from study to study. Accurate assessment of cancer risk is important for clinical manage-

ment of the cancer, ranging from more frequent MRI screening to preventive surgeries,

with substantial consequences on the patient’s life. Prophylactic oophorectomy decreases

the BC risk in BRCA1 mutation carriers by 50%. Wide variation in risk estimates makes

counseling in risk evaluation program difficult.

1.5.1 Berkson Bias

When both exposure and outcome both affect the sample selection process, the type of

bias is called Berkson bias. This causes a downward bias of the estimate even when the

dependence of selection on disease and exposure is not perfect or intermediates are present

[12]. Berkson’s bias can arise in prospective or retrospective studies, and in randomized or

observational settings.

1.5.2 Adjustment For Bias

Two most common methods of controlling for confounding are adjusting using mul-

tivariable regression models and stratified analysis after matching for confounder [13].

For example, if gender is a confounder, then the cases and controls are matched based on

gender followed by calculation of disease risk for each strata. These methods work only if

all confounders are accurately measured [12].

Ascertainment-adjusted likelihood approaches calculating retrospective likelihood al-

low unbiased estimation and use all the information available on individual in the family;

members who are not genotypes are useful for risk analysis through their probability of

Page 17: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

7

being a carrier, whatever their phenotype [17].

1.6 Risk Bias Due To Study Design1.6.1 Population-Based Case-Control Study

In population-based studies, cases are randomly selected by their breast cancer diag-

nosis and controls are population-matched. This study design provide direct estimate of

relative risk or odds ratio which is not biased by familial genetic predisposition or exposure

that increase the cancer risk. Most common mutations in BRCA1 and BRCA2 occur at a

frequency of 1% or less in the general population.

To study this mutation and estimate the cancer risk, a large number of samples would

have to be collected otherwise the study would be under-powered. Population based risk

estimates underestimate the risk, especially if the disease is rare and hence the mutation

carriers are represented at low frequency. Population-based studies are not practical espe-

cially for founder mutations, like the ones in Ashkenazi Jewish families [5].

1.6.2 Family-Based Case-Control Study

For family-based case-control studies, the samples are selected based on personal and

familial cancer history. For rare mutations, this study design enriches the cases thereby

improving the power of the study. The patients are restricted to the ones who meet the

eligibility criteria for genetic testing. Also, patients with severe breast cancer family history

are more likely to seek genetic concealing or testing. Early penetrance studies of BRCA1

and BRCA2 mutations used high-risk families with multiple cases of breast cancer [2]. This

leads to ascertainment bias that may contribute to the difficulty in replicating the genetic

study finding. Also, additional assumptions about the modifying effects of other familial

factors has to be made for bias correction [5].

1.6.3 Kin Cohort Study

The kin-cohort study design has widespread acceptance for studying rare mutations

that are autosomal dominant [23]. The risks are estimated using maximum-likelihood

methods [26]. The advantages of this study design is that it requires smaller sample size

compared to cohort or case-control study and enables the investigator to study several

disease outcomes.

Page 18: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

8

The disadvantages being proband’s decision to volunteer in the study and information

recall bias by proband. Patients who volunteer for such studies are inherently different

from people who don’t in the same population. The patients decision could be due to

disease severity, cancer at young age, strong family history or all of the above. Also,

patients with disease are more likely to recall information about family cancer than one

who doesn’t. Risks may also be overestimated if all familial factors are not accounted for.

1.6.4 Clinic Based Study

Patients who are diagnosed with breast cancer at clinics are enrolled in this study. The

information about the relatives are obtained from the participants. Studies comparing

cases ascertained for clinical genetic testing to general population controls upwardly bias

estimates of penetrance [5]

1.6.5 Segregation Study

Segregation studies are conducted on families that have strong familial occurrence of

cancer. For this type of study, controls are not required. This study requires a large number

of individuals from the same family to participate in the study. Segregation studies are

usually under powered.

1.6.6 Prospective Cohort Study

Individuals with and without the exposure of interest, like genetic abnormalities are

observed over a long period of time for the incidence of cancer. This study provides direct

estimate of absolute risk. Key disadvantages are long-term commitment required by both

participants and investigator, expensive and risk estimates are confounded if other familial

factors are not accounted for. Also, failure to report preventive surgeries would lead to

underestimation of risk since preventive surgeries would reduce the incidence of breast

cancer.

1.6.7 Volunteer Study

The penetrance of breast cancer in 5318 volunteer Jewish population is 56% with 120

patient identified to have familial cancer history which is lower than familial breast cancer

studies. In these types of studies, the cases are still different from controls, i.e, they don’t

Page 19: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

9

have the same risk for breast cancer.

1.7 Bias in Myriad Genetics dataMyriad patients are ascertained directly on breast cancer and for breast cancer variants

through family history of cancer. Both personal and familial cancer histories motivate

participants to seek genetic testing and cancer management. As the variables that explain-

ing differential baseline risk were captured for all patients before testing, ascertainment

bias can be controlled using standard methods like multivariate regression models and

matched case-control analysis. The analysis would provide adjusted ORs that estimate the

relative risk conferred by mutations after accounting for confounders like family history

of cancer, age and ancestry.

If all the confounders are measured accurately and adjusted, then the resultant estimate

should be reproducible in population-based studies. The odds ratio for breast cancer risk

Table 1.1. Observed proportion in population and Myriad dataCarrier Breast Cancer General Population P(Myriad testing) Observed in Myriad

Yes Affected A S1 AS1Yes Unaffected B S2 BS2No Affected C S3 CS3No Unaffected D S4 DS4

due to genetic variants is AD/BC in the general population. In Myriad data, the bias isS1S4S2S3

. We expect the bias to arise mainly from S2 > S4 since, among unaffected patients

in the general population, carriers will have family history that motivate myRisk testing

which is Myriad’s hereditary multigene cancer test. We could also have (opposite effect)

S1 > S3 if breast cancer cases with family history are more often referred to myRisk testing

(and have more variants) than breast cancer cases without family history. This seems to

contradict the statement above regarding downward bias. However, S2 > S4 should be

stronger than S1 > S3 since unaffected patients require family history for myRisk testing,

where affected patients do not.

Page 20: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

10

1.7.1 Causal Pathway

Causal diagrams can be used to illustrate and quantify ascertainment bias in Myriad

data. Pathogenic variants are synonymous with mutations. The figure shows our assump-

Figure 1.2. Causal Diagram showing ascertainment bias.fig2

tions regarding causal dependence in Myriad myRisk test population.

• Direct causal associations of mutation on breast cancer and family history.

• Conditional on Myriad ascertainment, PV>FH>mT>BC is a biasing path. This will

bias PV>BC associations.

• Conditioning on FH blocks the biasing path PV>FH>mT>BC, but opens a new

biasing path PV>FH>Other>BC.

• There will be some bias from PV>FH>Other>BC in any study that estimates the

effect of PV on BC after accounting for FH.

1.8 Project Assumptions and DesignIn this project, I am exploring the bias due to family history, family history in relation

with exposure and family history along with age.

For this project, the following assumptions were made.

• Autosomal dominant mode of inheritance - a fact well established in the literature.

Page 21: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

11

• mother is carrier and father is wild-type - this assumption helps simplify the family

tree.

• Hereditary gene mutation are the only cause of familial breast cancer - limiting the

scope of hereditary breast cancer type.

• Risk is same in every generation - also a fact well established in the literature.

Page 22: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

CHAPTER 2

BIAS DUE TO FAMILY HISTORY OF CANCER

2.1 Family History (FH) Of Breast Cancer2.1.1 High Risk Familial Criteria

Having two or more of the following criteria defines the family as high-risk,

• Breast cancer diagnosed at age 50 or younger

• Breast and Ovarian cancer on the same side of the family or in a single individual

• More than one cancer occurrence for the patient or relatives

• Two or more relatives with breast or ovarian cancer

• A known mutation in cancer susceptibility gene within the family

• Ashkenazi Jewish descent with familial breast or ovarian cancer

The bias in cancer risk estimate due to familial cancer occurrence has been well docu-

mented and thoroughly discussed in previous chapter. In this chapter, family history is

simulated to show how Odds Ratio (OR) estimates varies from the true estimate. Also

shown here are relation between bias and personal cancer risk. With increasing risk, there

is more downward shift in OR. A logistic regression model is used to eliminate the bias

when the dataset is ascertained on family history and personal cancer status.

2.2 Simulation ModelThe population frequency of a genetic mutation is set at 1% with a sample size of 50,000.

Page 23: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

13

P(MUT) = 0.01 (2.1)

For the first degree (FD) relative, the probability of being a carrier depends on the proband

mutation status. If the proband has genetic variants that predisposes to breast cancer, then

the probabilty follows mendelian inheritance and the relative has a 50% chance of being

a carrier. For non-carrier proband, the probability of relative is half of the 1% general

population risk since one shared allele between the relative and proband is known to be

wild-type. The probability of cancer risk for first degree relative dependent on carrier is

P(FD.MUT = 1|MUT = 1) = 0.50

P(FD.MUT = 1|MUT = 0) = 0.005(2.2)

The outcome, cancer status for the proband is coded as a function of β0, the population

prevalence of breast cancer (10%), β1, the coefficient for mutation status of the proband

and β2, the coefficient of first degree mutation status. All coefficients are in log scale since

logistic regression model is used. Probability, p for proband and first degree relative is

calculated from the above models. The model can be written as

Cancer = β0 + β1MUT + β2FH (2.3)

A uniform random variable (N=50000), Y is created with values between 0 and 1. If the Y

<= p, then the individual’s cancer status is classified as affected and unaffected otherwise.

The first degree relative being a carrier doubles the risk of patient. Hence β2 is set at

log(2). For patient mutation status β1, the risk is fixed at six-fold difference between carrier

and non-carrier (log(6)).

A biased set ascertained on first degree cancer status and proband cancer status is

created by eliminating 36,749 patients who do not have breast cancer or family history.

The distribution of cancer in relative and proband in the remaining set (N = 13251) is listed

in the table 2.1.

Logistic regression models are used to calculate crude OR in the whole sample set

of 50000 and ascertained set, where cancer status is a function of mutation status of the

proband. The adjusted OR is obtained after adjusting for family history.

Page 24: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

14

Table 2.1. Breast Cancer (BC) incidence in ascertained dataPatient BC Status Kin has BC Kin unaffected

0 0 59991 6124 1128

A 1000 replicate of the above simulation is performed to obtain the distribution of OR.

Figure 3 shows the distribution of OR with and without adjustments for the unascertained

and ascertained set.

510

15

Odd

s R

atio

crude adjusted−FH crude adjusted−FH

UnascertainedAscertained

Figure 2.1. Elimination of bias in ascertained set.fig3 The orange boxplot shows the crude and adjusted OR where family history (FH) did not

cause any bias in unascertained set. The crude OR is downward biased due to family history inascertained set whereas the bias is eliminated after adjusting.

Page 25: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

15

The distribution of OR (orange) with and without adjusting is the same in the unascer-

tained set since family history of cancer is not a confounder. The distribution of OR (blue)

in ascertained data before adjusting is lower than the true estimate. This downward bias

occurs when the independent variable of interest is correlated with a confounder and the

confounder is not accounted for. After adjusting with family history, the bias is eliminated.

2.3 Bias and Increased Cancer Risk in CarriersThe patient mutation status β1 is varied from two to ten-fold (log(2) - log(10)). The

correlation of proband and first degree mutation status are recorded for each value of β1.

510

15

Beta for Patient Mutation Status

Bia

s in

asc

erta

ined

sam

ple

log2 log4 log6 log8 log10

Figure 2.2. Bias and relation to personal mutation risk.

The figure shows that the bias in estimate increases with the increase in cancer risk due to mutation.fig4

Page 26: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

16

The bias increases as the risk increases for carrier. Since the ascertainment is on first

degree cancer status and proband cancer status, which in turn depends on the proband

mutation stutus, the bias increases as the prevalence of cancer increases.

Table 2.2. Mutation Status and BC in whole data dataBeta for MUT Dataset MUT status % of BC

log(2) unascertained 0 10.0log(2) unascertained 1 27.4log(2) ascertained 0 53.4log(2) ascertained 1 69.1log(10) unascertained 0 10.0log(10) unascertained 1 68.1log(10) ascertained 0 52.5log(10) ascertained 1 76.9

In the table 2.2, breast cancer remains constant for non-mutant at 1̃0% and 5̃3% in

unascertained and ascertained simulated data set. But the proportion of breast cancer cases

increases for carriers in both datasets as β1 increases. As β1 increases, the risk increases

by three to seven-fold for mutants compared to non-mutants. On the other hand, as β1

increases, the relative risk (the % of breast cancer in mutant over the % of breast cancer

in non-mutant for ascertained set) increases but only marginally. Since the non-carriers or

controls are becoming enriched for breast cancer, the relative risk decreases which explains

the downward shift of OR.

Page 27: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

CHAPTER 3

BIAS DUE TO ENVIRONMENTAL EXPOSURE

AND FAMILY HISTORY OF CANCER

3.1 Environmental Exposure and Breast Cancer RiskOnly up to 30% of the familial occurrence of breast cancer is explained by known

genetic markers [20, 15]. Rare varients in genes like BRCA1, BRCA2, PALB2, ATM and

CHEK2 genes with moderate to high penetrance along with over 90 common mutations

explain 37% of the excess familial risk [14].

Hereditary, genetic, lifestyle and environmental factors are responsible for breast can-

cer development [21]. Heterogeneity of risks caused by unknown genetic or environmental

factors within families also leads to overestimate of risks.

If additional factors are likely to increase the breast cancer risk in BC families in ad-

dition to genetic predisposition, then carrier women with strong family history of breast

cancer are likely to have much higher risk than a woman with gene mutation alone. Some

of the environment and lifestyle factors that increase cancer risk are

• age at menarche

• Age at first birth and number of live births

• Age at menopause

• using hormone replacement therapy

• drinking alcohol

• smoking cigarettes

• lack of exercise

Page 28: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

18

• BMI

3.2 Environmental Factors and Breast CancerWomen with genetic mutations are at higher risk for breast cancer if certain environ-

mental factors are present. Gene-environment interaction is assumed if the odds ratios of

the genetic and the environmental factors is significantly different from the expected value

obtained by multiplying the relative risks for genetic and environment alone. The above is

true only if all genetic and environmental factors are accounted for. Interaction studies for

rare variants with higher penetrance are often underpowered.

A small meta-analysis study found an 35% (95% CI 0.42-0.99) decreased breast cancer

risk for women with BRCA1 mutation when they had first birth at 30 years of age or older

[25]. This is contradictory to the finding in general population where older age at first birth

is associated with increased breast cancer risk [10]. However, this gene-environmental

interaction could be due to ascertainment bias. This study has a small sample size. In this

project, we assume that there is no gene-environmental interaction since no well-powered

study was able to establish the relatioship scientifically.

3.3 SimulationOur goal of simulating environmental exposure is to find the bias in our estimate when

not all the factors are accounted for.

The population frequency of a genetic mutation is set at 1% with a sample size of 50,000

and the frequency of exposure is 10 as shown in the equation below.%.

P(MUT) = 0.01P(EXP) = 0.10 (3.1)

Among individuals without mutations or exposure, the prevalence of cancer is 10%. The

probability of breast cancer is 1 if the patient is a carrier and also exposed. For patients

with either exposure or hereditary gene variants, the probabilites are 0.20 for individuals

with exposure and without mutation. It increases to 0.60 if the individuals are not exposed

and if they have mutation.

Page 29: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

19

P(BC = 1|MUT = 0, EXP = 0) = 0.10

P(BC = 1|MUT = 0, EXP = 1) = 0.20

P(BC = 1|MUT = 1, EXP = 0) = 0.60

P(BC = 1|Mut = 1, EXP = 1) = 1.00

(3.2)

For the first degree relative, the probability of being a carrier depends on the proband

mutation status. If the proband has genetic variants that predisposes to breast cancer, then

the probabilty follows mendelian inheritance and the relative has 50% chance of being

a carrier. For non-carrier proband, the probability of relative is half of the 1% general

population risk since one shared allele between the relative and proband is known to

be wild-type. The breast cancer status follow the same paradigm as the proband cancer

assignment.

For the first degree relative exposure status also depends on the proband status. If

the proband is not exposed, then by Baye’s theorem, the total possible probability is 0.09

and if the proband is exposed then the FD exposure status is given twice that of proband

exposure cancer risk.P(FD.MUT = 1|MUT = 1) = 0.50

P(FD.MUT = 1|MUT = 0) = 0.005

P(FD.EXP = 1|EXP = 1) = 0.20

P(FD.EXP = 1|EXP = 0) = 0.09

(3.3)

The ascertainment bias set is created with all patients who have family history or personal

breast cancer. Logistic regression is used to estimate the cancer risk with cancer as outcome

and proband mutation status as the predictor. Odds ratio is calculated without adjusting is

the crude OR in ascertained set. The model is then adjusted for first degree relative cancer

status in the unascertained and ascertained data. Finally, in both datasets, OR adjusted for

first degree relative cancer status and exposure status is estimated.

A 100 replicate of the above simulation is performed to get the median OR. The cancer

risk of exposure is varied from 10% to 50% to assess the magnitude of bias. In the unascer-

tained set, the bias between crude and adjusted OR increases with increase in cancer risk

due to exposure. With the prevalence of exposure at 10% and high cancer risk makes the

exposure a confounder in the population. Hence the crude OR does not reflect the true

measure of association between mutation status and cancer risk.

Page 30: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

20

●●●

●●

●● ●

●●●●●●

●●●●●

●●●●●●

●●

●●

0.1 0.2 0.3

0.4 0.5

10

20

10

20

1.ad

j−FH

2.ad

j−FH

2.ad

j−FH,ex

p

1.ad

j−FH,E

xp

1.cr

ude

2.cr

ude

1.ad

j−FH

2.ad

j−FH

2.ad

j−FH,ex

p

1.ad

j−FH,E

xp

1.cr

ude

2.cr

ude

OR

AscertainmentNo

Yes

Figure 3.1. Bias in cancer risk estimate due to family history and exposure.

The bias due to exposure is seen in the crude and family history adjusted OR. The bias increases asthe cancer risk due to exposure increases. The plot facet title indicates the probability of cancer dueto exposure. The bias is eliminated only after adjusting for exposure in addition to family history.fig5

Since exposure is a confounder, adjusting for family history alone is not enough to

correct the bias due to family history and exposure.

The crude OR in ascertained set is same across different exposure conditions since

the mutation prevalence and its cancer risk did not vary. Just like in unascertained set,

adjusting for family history alone is not enough to correct the bias due to family history

and exposure. The bias is corrected after adjusting for family history and exposure. But

the variance increases in the ascertained estimate after adjusting for biases. This might be

due to smaller number of samples in the ascertained set. The variance of the estimate in

the ascertained set increases, probably due to small sample size.

Page 31: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

CHAPTER 4

BIAS DUE TO AGE AND FAMILY HISTORY OF

CANCER

4.1 AgeAge is the strongest risk factor for breast cancer. The older a woman is, the more

likely she is to get breast cancer. The decrease in incidence rates that occurs in women

80 years of age and older may reflect lower rates of screening, the detection of cancers by

mammography before 80 years of age, and/or incomplete detection. During 2008-2012, the

median age at the time of breast cancer diagnosis was 61 [1]. Inherited changes in BRCA

genes and age further increase the risk of breast cancer than either of the component risk.

The average age of female breast cancer diagnosis is 42 years (95% CI= 40-44 years) in

clinical population [16]. The ages are 20 and 10 years younger, compared to population

averages. Also, women who are carrier and has breast cancer at a younger age is more

likely to have aggressive cancer type increasing the mortality risk [1].

4.1.1 Calculating Cumulative Lifetime Risk of breast cancer for carriers andnon-carriers

The relatives of the volunteers who carry pathogenic variants are referred to as carrier

kin and non-carrier kin if they have wild-type allele. The cumulative risk of both carrier

and non-carrier kin are weighted averages of the risks in carriers and non-carriers, con-

ditional on risk factors. The weighted averages are further stratified by age to provide

age-specific penetrance of a mutation. The weights depends on prevalence, p of the muta-

tion and mode of inheritance [23].

According to Mendelian autosomal dominant inheritance, the probability of a kin being

a carrier is 0.5 + p/2. In case of BRCA genes where the prevalence p is 1%, the probability

Page 32: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

22

Figure 4.1. Breast cancer incidence and mortality risk by age [22].

The breast cancer risk increases with age. After 70, the risk decreases due to lower screening orincomplete detection of cancer. fig6

of the risk is 0.5 + (0.01/2) = 0.505. The probability of non-carrier kin is p. Let R+ and R−

are proportion of individuals who develop disease before age t in carrier and non-carrier

kin respectively. R+ and R− are weighted averages of S+ and S−, the cumulative risk of

individuals developing disease before age t [23].

R− = pS+ + (1− p)S−

R+ = (p2+

l2)S+ + (

l2− p

2)S−

(4.1)

Solving the above,

S− =1 + p1− p

R− − 2p

1− pR+

S+ = 2R+ − R−

(4.2)

R+ and R− are obtained using Kaplan-Meier methods. To calculate CLTR for carriers,

prevalence is not required. For low prevalence, the CLTR difference between carriers and

non-carriers can be approximated by

Page 33: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

23

Figure 4.2. BRCA genes cancer risk by age [7]

The risk for breast cancer increases due to BRCA1 or BRCA2 gene mutation and age. fig7

S+ − S− = 2R+ − R−

1− p

≈ 2(R+ − R−)

(4.3)

The figure 4.3 shows the cumulative risk of occurrence of breast or ovarian cancer in first-

degree relatives of carriers of any of the three founder mutations and that of non-carriers

among Ashkenazi Jewish volunteers study [23].

The figure 4.4 shows the cumulative risk of breast or ovarian cancer of the carrier

with any of the three founder mutations and that of non-carriers among Ashkenazi Jewish

volunteers study.

4.1.2 Pedigree Simulation

The pedigree is created with 50,000 proband being a carrier at the frequency of 1%. The

probabilities of mother and offspring being an carrier depends on the proband and follows

the same paradigm as the first degree relative simulation. The prevalence of mutation in

sister depends on the mother. If the mother is mutant, then the probability of her being an

mutant is 50%. If the mother is not a carrier, the probability of cancer is half that of general

population which is 0.5%.

Page 34: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

24

Figure 4.3. Cumulative risk estimates of relatives.

Figure 4.3 shows the age-dependent cumulative breast cancer risk of first degree relatives who arecarriers along with non-carrier relatives who are R+ and R− in equation 4.1 fig8. fig8

Figure 4.4. Cumulative risk estimates of proband.Figure 4.4 shows the age-dependent cumulative breast cancer risk of mutant and non-mutant carriers who are S+ and S− in equation 4.1 fig8 fig9

The age range for mutant in the simulation is 35 to 62 years of age and that of non-

mutant is 40-82 years of age. Mothers age range is 56-102, sister 25-72 and that of daughter

is 20-66. Both mother and proband are assumed to have given birth between 15 to 30 years

of age. The age-specific cumulative risk varies with mutation status.

The risk decreases after 80 years of age in the population. For simulation purpose,

the risk after eighty years of age is set at same level as 70. Based on the cancer risk each

individual in the pedigree is given cancer status. The first degree cancer status is the sum of

Page 35: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

25

Table 4.1. Cumulative Lifetime Risk Distribution (%) by Age and Mutation StatusStatus 30 40 50 60 70

non-carrier 0.5 2.6 4.9 8.4 12carrier 5 10 30 45 60

mother, sister and daughter cancer status, cancer being a binary variable with 1 indicating

cancer and 0 for no disease. The table below shows the first degree cancer status in a

sample of 50000.

Table 4.2. Number of cancer incidence in first degree relatives0 1 2 3

38621 10482 873 24

The simulation is repeated 1000 times to get the median OR. In the fig 4.5, the orange

boxplot are distribution of OR in unascertained group where as the blue group represent

the ascertained set.

In the simulation, age distribution is different for carriers and non-carriers in the whole

population. Hence age is a confounder in the unascertained set, the OR bias between

the crude and adjusted with age and FH. When adjusted for family history alone, the

distribution matches the crude OR of the unascertained set. When the ascertained set is

adjusted for both family history and age, then the distribution matches that of adjusted

OR in the unascertained set which is the true OR.

Page 36: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

26

24

68

Odd

s R

atio

crude adjusted−FH adjusted−FH,Age crude adjusted−FH adjusted−FH,Age

UnascertainedAscertained

Figure 4.5. Bias in ascertained set due to family history and age.fig

Page 37: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

CHAPTER 5

SUMMARY OF FINDINGS

5.1 DiscussionActive management of hereditary breast cancer risk plays a major role in preventing

the incidence of cancer. The family history of breast cancer predisposes the individual to

high risk for the disease. The risk estimates vary with the selection of cases and controls.

Family-based studies provide higher risk estimates due to the enrichment of cases. The

population-based studies, on the other hand, have fewer cases of breast cancer since the

prevalence of these genetic variants are rare in the population. This leads to lower risk

estimates. In hereditary cancer testing labs like Myriad Genetics Inc., the ascertainment is

on personal cancer status and family history of cancer. In this study, I simulated a model

to understand and eliminate ascertainment bias in risk estimates when the study is done

in laboratory setting.

In the first model, I have studied the bias caused due to family history. After controlling

for the family history of cancer, the bias is eliminated and the odds ratio reflect the true

strength between the personal mutation status and cancer status. I have also shown the

bias in relation to increase in relationship between personal mutation status and cancer

status. Strengthening this relationship also leads to increased correlation between the

mutation status and family history of cancer.

In the second model, I have simulated exposure as another variable that alters the

relation between mutation and cancer status. Here exposure acts as a confounder and

it is accounted for in unascertained as well as ascertained set to get the true odds ratio.

The magnitude of confounding increases as the cancer risk due to exposure also increases.

This causes a increased downward bias, similar to what is shown in case of family history.

Page 38: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

28

In the third model, age is simulated to be a confounder. Here the age distribution of

cancer onset is different for mutants and non-mutants. Mutants have early onset of cancer

and also higher risk of cancer. This causes a downward bias of the odds ratio as well.

Again, as in the case of family history and exposure, accounting for age eliminates the

bias.

The strength of this study is that mutation prevalence and cancer risks are based on

literature figures and hence the odds ratio reflect the true study scenario. This study also

shows what unaccounting for other confounders, like exposure and age can do the esti-

mates. The bias relationship to predictor as well as confounders are defined. A common

method of adjusting, logistic regression is used to show that the bias can be eliminated.

The study does not reflect the odds ratio in every study types. The prevalence of alleles

of moderate penetrance are not studied here since their prevalence tend to higher. Another

limitation of the study is that only one predictor is considered and interaction between two

predictor variables are not addressed.

5.2 ConclusionAn underestimate of cancer risk has huge implication in patient clinical management

of the cancer. Eliminating the bias allows the risk estimate to be applied to the general

population. This simulation project studies in detail about confounders and its relation to

predictor and outcome variables. The magnitude of bias is shown for varying condition

and how it can be eliminated. Three major risk factors are considered, like family history

which addresses the hereditary component, environmental exposures and age-specific

risks. This project model can be further extended to multiple predictors and various other

types of cancer.

Page 39: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

APPENDIX A

R CODE

# I n i t i a l i z e d a t a s e tdata <− data . frame ( seq ( 1 , 1 0 0 0 ) )colnames ( data ) <− ”sample”

f o r ( i in data $sample ){# u n a s c e r t a i n e d s e t N = 50000df <− data . frame ( seq ( 1 , 5 0 0 0 0 ) )colnames ( df ) <− ”sample”

# Proband muta t i on s t a t u sdf $ mut . s ta tus <− rbinom ( n=50000 , s i z e =1 , prob=0 . 0 1 )df $ e x p . s t a t u s <− rbinom ( n=50000 , s i z e =1 , prob=0 . 1 0 )

# B r e a s t c a n c e r s t a t u s f o r probandsdf $ c a n . s t a t u s [ df $ mut . s ta tus ==0] <−rbinom ( nrow ( df [ df $ mut . s ta tus = = 0 , ] ) , s i z e =1 , 0 . 1 )

df $ c a n . s t a t u s [ df $ mut . s ta tus ==1] <−rbinom ( nrow ( df [ df $ mut . s ta tus = = 1 , ] ) , s i z e =1 , 0 . 6 )

# a s s i g n f a m i l y h i s t o r y muta t i on s t a t u sdf $ FD.mut .s tatus [ df $ mut . s ta tus ==0] <−rbinom ( nrow ( df [ df $ mut . s ta tus = = 0 , ] ) , s i z e =1 , 0 . 0 0 5 )

df $ FD.mut .s tatus [ df $ mut . s ta tus ==1] <−rbinom ( nrow ( df [ df $ mut . s ta tus = = 1 , ] ) , s i z e =1 , 0 . 5 )

# a s s i g n f a m i l y h i s t o r y c a n c e r s t a t u sdf $ F D . c a n . s t a t u s [ df $ FD.mut .s tatus ==0] <−rbinom ( nrow ( df [ df $ FD.mut .s tatus = = 0 , ] ) , s i z e =1 , 0 . 1 )

df $ F D . c a n . s t a t u s [ df $ FD.mut .s tatus ==1] <−rbinom ( nrow ( df [ df $ FD.mut .s tatus = = 1 , ] ) , s i z e =1 , 0 . 6 )

Page 40: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

30

# a s c e r t a i n e d s e ttmp <− df [ df $ F D . c a n . s t a t u s ==1|df $ c a n . s t a t u s ==1 ,]

# l o g i s t i c r e g r e s s i o n modelmodel<−glm ( c a n . s t a t u s ˜ mut .s tatus , data=tmp , family=” binomial ” )data $ crude.ascOR [ data $sample== i ] <− exp ( model$ c o e f f i c i e n t s [ 2 ] )data $ crude.ascSE [ data $sample== i ] <−summary ( model ) $ c o e f f i c i e n t s [ , 2 ] [ 2 ]data $ crude.ascP [ data $sample== i ] <−summary ( model ) $ c o e f f i c i e n t s [ , 4 ] [ 2 ]

model<−glm ( c a n . s t a t u s ˜ mut .s tatus , data=df , family=” binomial ” )data $crudeOR [ data $sample== i ] <− exp ( model$ c o e f f i c i e n t s [ 2 ] )data $crudeSE [ data $sample== i ] <− summary ( model ) $ c o e f f i c i e n t s [ , 2 ] [ 2 ]data $crudeP [ data $sample== i ] <− summary ( model ) $ c o e f f i c i e n t s [ , 4 ] [ 2 ]

model<−glm ( c a n . s t a t u s ˜ mut . s ta tus+FD.can . s ta tus , data=tmp ,family=” binomial ” )data $ adj .ascOR [ data $sample== i ] <− exp ( model$ c o e f f i c i e n t s [ 2 ] )data $ a d j . a s c S E [ data $sample== i ] <−

summary ( model ) $ c o e f f i c i e n t s [ , 2 ] [ 2 ]data $ a d j . a s c P [ data $sample== i ] <−summary ( model ) $ c o e f f i c i e n t s [ , 4 ] [ 2 ]

model<−glm ( c a n . s t a t u s ˜ mut . s ta tus+FD.can . s ta tus , data=df ,family=” binomial ” )data $adjOR [ data $sample== i ] <− exp ( model$ c o e f f i c i e n t s [ 2 ] )data $ adjSE [ data $sample== i ] <− summary ( model ) $ c o e f f i c i e n t s [ , 2 ] [ 2 ]data $ adjP [ data $sample== i ] <− summary ( model ) $ c o e f f i c i e n t s [ , 4 ] [ 2 ]}

# b o x p l o t f o r b i a s−FDx <− data . frame ( seq ( 1 , 4 0 0 0 ) )colnames ( df ) <− ”sample”x$OR[ 1 : 1 0 0 0 ] <− d a t a l $crudeORx$OR[ 1 0 0 1 : 2 0 0 0 ] <− data $adjORx$OR[ 2 0 0 1 : 3 0 0 0 ] <− data $ crude.ascORx$OR[ 3 0 0 1 : 4 0 0 0 ] <− data $ adj .ascORx$group <− rep ( 1 : 4 , each =1000)

pdf ( ”FH Dist20160813SR.pdf ” )boxplot (OR˜ group , data=x , xaxt=”n” , ylab=”Odds Rat io ” ,

c o l =c ( ” choco la te1 ” , ” choco la te1 ” , ” dodgerblue ” , ” dodgerblue ” ) ,o u t l i n e =F )

a x i s ( 1 , a t =seq ( 1 , 4 ) ,

Page 41: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

31

l a b e l s =c ( ” crude ” , ” adjusted−FH” , ” crude ” , ” adjusted−FH” ) )legend ( ” t o p l e f t ” , legend=c ( ” Unascertained ” , ” Ascertained ” ) , pch =19 ,c o l =c ( ” choco la te1 ” , ” dodgerblue ” ) , bty=”n” )d e v . o f f ( )

# c o r r e l a t i o n be tween FH & proband m u t . s t a t u s#and i t s e f f e c t on b i a s# b e t a 1 v a l u e s <− c ( l o g ( 2 ) , l o g ( 4 ) , l o g ( 6 ) , l o g ( 8 ) and l o g ( 1 0 ) )cor b i a s <− NULL

f o r ( j in beta1va l ) {b i a s <− data . frame ( seq ( 1 , 1 0 0 ) )f o r ( i in 1 : 1 0 0 ) {df <− data . frame ( seq ( 1 , 5 0 0 0 0 ) )colnames ( df ) <− ”sample”

# proband and FD muta t i on s t a t u sdf $ mut . s ta tus <− rbinom ( n=50000 , s i z e =1 , prob=0 . 0 1 )df $ FD.mut .s tatus [ df $ mut . s ta tus ==0] <−rbinom ( nrow ( df [ df $ mut . s ta tus = = 0 , ] ) , s i z e =1 , 0 . 0 0 5 )df $ FD.mut .s tatus [ df $ mut . s ta tus ==1] <−rbinom ( nrow ( df [ df $ mut . s ta tus = = 1 , ] ) , s i z e =1 , 0 . 5 )

# i n i t i a l i z e b e t a ’ sBeta0 <− −2. 2 0Beta1 <− log ( 6 )Beta2 <− log ( 2 )

# c r e a t e d i c h o t o m o u s outY <− r e p ( 0 , t i m e s =50000)df $outcome1 <− r u n i f (50000 , min=0 , max=1)df $outcome2 <− r u n i f (50000 , min=0 , max=1)

# l o g i s t i c r e g r e s s i o nln p1 <− Beta0 + ( Beta1∗ ( df $ mut . s ta tus ) ) +

( Beta2∗ ( df $ FD.mut .s tatus ) )ln p2 <− Beta0 + ( Beta1∗ ( df $ mut . s ta tus ) )

# p−v a l u e sdf $p1 <− exp ( ln p1 ) /(1+ exp ( ln p1 ) )df $p2 <− exp ( ln p2 ) /(1+ exp ( ln p2 ) )

# d e f i n e c a n c e r s t a t u s depend ing on outcome and pf o r ( k in 1 : nrow ( df ) ) {

i f ( df $outcome1 [ k]<=df $p1 [ k ] ) {df $ c a n . s t a t u s [ k ] <− 1}e l s e {df $ c a n . s t a t u s [ k ] <− 0}i f ( df $outcome2 [ k]<=df $p2 [ k ] ) {df $ F D . c a n . s t a t u s [ k ] <− 1}

Page 42: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

32

e l s e {df $ F D . c a n . s t a t u s [ k ] <− 0}}

# s a v e c o r r e l a t i o nb i a s $ cor [ i ] <− cor ( df $ mut .s tatus , df $ F D . c a n . s t a t u s )b i a s $ beta1 <− Beta1

# a s c e r t a i n e d s e ttmp <− df [ df $ F D . c a n . s t a t u s ==1|df $ c a n . s t a t u s ==1 ,]

# l o g i s t i c r e g r e s s i o n model t o g e t ORmodel<−glm ( c a n . s t a t u s ˜ mut .s tatus , data=df , family=” binomial ” )b i a s $crudeOR [ i ]<−exp ( model$ c o e f f i c i e n t s [ 2 ] )model<−glm ( c a n . s t a t u s ˜ mut . s ta tus+FD.can . s ta tus , data=df ,

family=” binomial ” )b i a s $adjOR [ i ]<−exp ( model$ c o e f f i c i e n t s [ 2 ] )

model<−glm ( c a n . s t a t u s ˜ mut .s tatus , data=tmp ,family=” binomial ” )

b i a s $ascOR [ i ]<−exp ( model$ c o e f f i c i e n t s [ 2 ] )model<−glm ( c a n . s t a t u s ˜ mut . s ta tus+FD.can . s ta tus , data=tmp ,

family=” binomial ” )b i a s $ adj .ascOR [ i ]<exp ( model$ c o e f f i c i e n t s [ 2 ] )}

colnames ( b i a s )<−c ( ”Sample” , ” cor ” , ” beta1 ” , ”crudeOR” ,”adjOR” , ”ascOR” , ” adj .ascOR ” )cor b i a s <− rbind ( cor bias , b i a s )

}

#OR f o r e a c h b e t a 1 v a l u ecor r e s u l t <− aggregate ( . ˜ beta1 , data=cor bias , FUN=mean)cor r e s u l t $ b i a s <− cor r e s u l t $crudeOR−cor r e s u l t $ascOR

# p l o t o f b e t a 1 and OR b i a spdf ( ” Beta Bias20160805SR.pdf ” )p l o t ( cor r e s u l t $ beta1 , cor r e s u l t $ bias , type=” l ” , xaxt=”n” , xlab=” Beta f o r P a t i e n t Mutation S t a t u s ” ,ylab=” Bias in a s c e r t a i n e d sample” )points ( cor r e s u l t $ beta1 , cor r e s u l t $ bias , pch =19 , c o l =” black ” )a x i s ( 1 , a t=cor r e s u l t $ beta1 , l a b e l s =c ( ” log2 ” , ” log4 ” , ” log6 ” ,” log8 ” , ” log10 ” ) )d e v . o f f ( )

### p e d i g r e e s i m u l a t i o nr i s k . e s t <− a s . d a t a . f r a m e ( seq ( 1 : 1 0 ) )

Page 43: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

33

r i s k . e s t $ c a r r i e r <− rep ( 0 : 1 , t imes =5)r i s k . e s t $age <− rep ( seq ( 3 0 , 7 0 , by =10) , each =2)r i s k . e s t $ c a n . r i s k<−c (0 .005 , 0 .05 , 0 .026 , 0 .10 , 0 .049 , 0 .30 ,0 .084 , 0 .45 , 0 .12 , 0 . 6 0 )

# 1000 r e p e t i t i o n sf o r ( i in 1 : 1 0 0 0 ){# i n i t i a l i z e d a t a s e t

ped <− a s . d a t a . f r a m e ( seq ( 1 , 5 0 0 0 0 ) )colnames ( ped ) <− ”SampleId”

# proband muta t i on s t a t u sped$proband.mut <− rbinom (50000 , s i z e =1 , prob=0 . 0 1 )

\ t e x t c o l o r {blue }{#FD muta t i on s t a t u s }ped$mother.mut [ ped$proband.mut ==1] <−rbinom ( nrow ( ped [ ped$proband.mut = = 1 , ] ) , s i z e =1 , prob=0 . 5 0 )ped$mother.mut [ ped$proband.mut ==0] <−rbinom ( nrow ( ped [ ped$proband.mut = = 0 , ] ) , s i z e =1 , prob=0 . 0 0 5 )

ped$ s i s t e r . m u t [ ped$mother.mut ==1] <−rbinom ( nrow ( ped [ ped$mother.mut = = 1 , ] ) , s i z e =1 , prob=0 . 5 0 )ped$ s i s t e r . m u t [ ped$mother.mut ==0] <−rbinom ( nrow ( ped [ ped$mother.mut = = 0 , ] ) , s i z e =1 , prob=0 . 0 0 5 )

ped$ daughter.mut [ ped$proband.mut ==1] <−rbinom ( nrow ( ped [ ped$proband.mut = = 1 , ] ) , s i z e =1 , prob=0 . 5 0 )ped$ daughter.mut [ ped$proband.mut ==0] <−rbinom ( nrow ( ped [ ped$proband.mut = = 0 , ] ) , s i z e =1 , prob=0 . 0 0 5 )

# a s s i g n ageped$ proband.age [ ped$proband.mut ==1] <−rnorm ( nrow ( ped [ ped$proband.mut = = 1 , ] ) , 45 , 4 )ped$ proband.age [ ped$proband.mut ==0] <−rnorm ( nrow ( ped [ ped$proband.mut = = 0 , ] ) , 61 , 5 )

ped$ mother.age <− ped$ proband.age + sample ( 1 5 : 3 0 , 1 , r e p l a c e=T )ped$ s i s t e r . a g e <− ped$ proband.age − sample ( 1 : 1 0 , 1 , r e p l a c e=T )ped$ daughter .age <− ped$ proband.age − sample ( 1 5 : 3 0 , 1 , r e p l a c e=T )

# age in c a t e g o r i e sped$ proband.age .ca t<−cut ( ped$ proband.age , c ( 0 , 3 0 ,seq ( 4 0 , 6 0 , 1 0 ) , 7 0 , 1 0 0 ) ,l a b e l s =c ( 3 0 , 4 0 , 5 0 , 6 0 , 7 0 , 7 0 ) )ped$ m o th er . ag e . c a t<−cut ( ped$ mother.age , c ( 0 , 3 0 ,seq ( 4 0 , 6 0 , 1 0 ) , 7 0 , 1 1 2 ) ,

Page 44: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

34

l a b e l s =c ( 3 0 , 4 0 , 5 0 , 6 0 , 7 0 , 7 0 ) )ped$ s i s t e r . a g e . c a t <− cut ( ped$ s i s t e r . a g e , c ( 0 , 3 0 ,seq ( 4 0 , 6 0 , 1 0 ) , 7 0 , 1 0 0 ) ,l a b e l s =c ( 3 0 , 4 0 , 5 0 , 6 0 , 7 0 , 7 0 ) )ped$ d a u g h t e r . a g e . c a t<−cut ( ped$ daughter.age , c ( 0 , 3 0 ,seq ( 4 0 , 6 0 , 1 0 ) , 7 0 , 1 0 0 ) ,

l a b e l s =c ( 3 0 , 4 0 , 5 0 , 6 0 , 7 0 , 7 0 ) )

# merge a l l d a t a i n t o ped d a t a f r a m eped <− merge ( ped , r i s k . e s t [ , c ( ” c a r r i e r ” , ”age” , ” c a n . r i s k ” ) ] , by.x=c ( ”proband.mut” , ” proband.age .ca t ” ) , by.y=c ( ” c a r r i e r ” , ”age” ) ,

a l l . x =T )ped <− rename ( ped , c ( ” c a n . r i s k ”=” Proband.can . r i sk ” ) )

ped <− merge ( ped , r i s k . e s t [ , c ( ” c a r r i e r ” , ”age” , ” c a n . r i s k ” ) ] , by.x=c ( ”mother.mut” , ” mo t he r . a ge . c a t ” ) , by.y=c ( ” c a r r i e r ” , ”age” ) ,

a l l . x =T )ped <− rename ( ped , c ( ” c a n . r i s k ”=” m o t h e r . c a n . r i s k ” ) )

ped <− merge ( ped , r i s k . e s t [ , c ( ” c a r r i e r ” , ”age” , ” c a n . r i s k ” ) ] , by.x=c ( ” s i s t e r . m u t ” , ” s i s t e r . a g e . c a t ” ) , by.y=c ( ” c a r r i e r ” , ”age” ) ,

a l l . x =T )

ped <− rename ( ped , c ( ” c a n . r i s k ”=” s i s t e r . c a n . r i s k ” ) )

ped <− merge ( ped , r i s k . e s t [ , c ( ” c a r r i e r ” , ”age” , ” c a n . r i s k ” ) ] , by.x=c ( ” daughter.mut ” , ” d a u g h t e r . a g e . c a t ” ) , by.y=c ( ” c a r r i e r ” , ”age” ) ,

a l l . x =T )ped <− rename ( ped , c ( ” c a n . r i s k ”=” d a u g h t e r . c a n . r i s k ” ) )

# c a n c e r s t a t u s b a s e d on r i s kf o r ( j in 1 : 5 0 0 0 0 ){ped$ proband.can [ j ] <− rbinom ( 1 , 1 , ped$ Proband.can . r i sk [ j ] )ped$ mother.can [ j ] <− rbinom ( 1 , 1 , ped$ m o t h e r . c a n . r i s k [ j ] )ped$ s i s t e r . c a n [ j ] <− rbinom ( 1 , 1 , ped$ s i s t e r . c a n . r i s k [ j ] )ped$ daughter .can [ j ] <− rbinom ( 1 , 1 , ped$ d a u g h t e r . c a n . r i s k [ j ] )}

#FD c a n c e r s t a t u sped$FD.can <− ped$ mother.can+ped$ s i s t e r . c a n +ped$ daughter .can

# a s c e r t a i n e d s e ttmp <− ped [ ped$ proband.can | ped$FD.can , ]

# l o g i s t i c r e g r e s s i o n modelmodel<−glm ( proband.can ˜ proband.mut , data=ped , family=” binomial ” )age$crudeOR [ i ] <− exp ( model$ c o e f f i c i e n t s [ 2 ] )

Page 45: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

35

model<−glm ( proband.can ˜ proband.mut+proband.age , data=ped ,family=” binomial ” )age$adjOR [ i ] <− exp ( model$ c o e f f i c i e n t s [ 2 ] )

model<−glm ( proband.can ˜ proband.mut , data=tmp ,family=” binomial ” )

age$ascOR [ i ] <− exp ( model$ c o e f f i c i e n t s [ 2 ] )

model<−glm ( proband.can ˜ proband.mut+FD.can , data=tmp ,family=” binomial ” )age$ adj .ascOR [ i ] <− exp ( model$ c o e f f i c i e n t s [ 2 ] )

model<−glm ( proband.can ˜ proband.mut+FD.can+proband.age ,data=tmp ,

family=” binomial ” )age$ adj .ascOR2 [ i ] <− exp ( model$ c o e f f i c i e n t s [ 2 ] )}

# p l o t o f OR d i s t r i b u t i o npdf ( ”age bias20160808SR.pdf ” )par ( cex=1 . 3 )p l o t ( dens i ty ( log ( age$crudeOR ) ) , lwd=2 , c o l =”navy” , xlim=c ( 0 , 3 ) ,ylim=c ( 0 , 4 ) , x lab=” Beta 1 values− C o e f f i c i e n t of P a t i e n t Mutation S t a t u s ” , main=”” )l i n e s ( dens i ty ( log ( age$ascOR ) ) , lwd=2 , c o l =” red ” )l i n e s ( dens i ty ( log ( age$adjOR ) ) , lwd=2 , c o l =” orange ” )l i n e s ( dens i ty ( log ( age$ adj .ascOR ) ) , lwd=2 , c o l =” green ” )l i n e s ( dens i ty ( log ( age$ adj .ascOR2 ) ) , lwd=2 , c o l =” black ” )legend ( ” t o p r i g h t ” , legend=c ( ”crudeOR” , ” a s c e r t a i n e d OR” ,” adjusted OR” ,” asc OR adjusted f o r FH” , ” asc OR adjusted f o r FH & Age” ) ,

l t y =1 ,c o l =c ( ”navy” , ” red ” , ” orange ” , ” green ” , ” black ” ) , lwd=3 , bty=”n” ,pch =19)d e v . o f f ( )

# e x p o s u r e and b i a sexp b i a s <− NULLb i a s <− data . frame ( seq ( 1 , 1 0 0 ) )

# 1000 s i m u l a t i o n sf o r ( i in 1 : 1 0 0 0 ) {# I n i t i a l i z e d a t a s e tdata <− data . frame ( seq ( 1 , 1 0 0 0 ) )colnames ( data ) <− ”sample”

Page 46: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

36

#### s i m u l a t i o nf o r ( i in data $sample ){

# u n a s c e r t a i n e d s e t N = 50000df <− data . frame ( seq ( 1 , 5 0 0 0 0 ) )colnames ( df ) <− ”sample”

# Proband muta t i on s t a t u sdf $ mut . s ta tus <− rbinom ( n=50000 , s i z e =1 , prob=0 . 0 1 )df $ e x p . s t a t u s <− rbinom ( n=50000 , s i z e =1 , prob=0 . 1 0 )

# Proband e x p o s u r e s t a t u sdf $ c a n . s t a t u s [ df $ e x p . s t a t u s ==1&df $ mut . s ta tus ==1] <− 1df $ c a n . s t a t u s [ df $ e x p . s t a t u s ==1&df $ mut . s ta tus ==0] <−rbinom ( nrow ( df [ df $ e x p . s t a t u s ==1&df $ mut . s ta tus = = 0 , ] ) , s i z e =1 ,

prob=0 . 5 0 )

# B r e a s t c a n c e r s t a t u s f o r probandsdf $ c a n . s t a t u s [ df $ e x p . s t a t u s ==0&df $ mut . s ta tus ==0] <−rbinom ( nrow ( df [ df $ e x p . s t a t u s ==0&df $ mut . s ta tus = = 0 , ] ) , s i z e =1 , 0 . 1 )

df $ c a n . s t a t u s [ df $ e x p . s t a t u s ==1&df $ mut . s ta tus ==0] <−rbinom ( nrow ( df [ df $ e x p . s t a t u s ==1&df $ mut . s ta tus = = 0 , ] ) , s i z e =1 ,prob=0 . 2 0 )

df $ c a n . s t a t u s [ df $ e x p . s t a t u s ==0&df $ mut . s ta tus ==0] <−rbinom ( nrow ( df [ df $ e x p . s t a t u s ==0&df $ mut . s ta tus = = 0 , ] ) , s i z e =1 , 0 . 1 )

df $ c a n . s t a t u s [ df $ e x p . s t a t u s ==0&df $ mut . s ta tus ==1] <−rbinom ( nrow ( df [ df $ e x p . s t a t u s ==0&df $ mut . s ta tus = = 1 , ] ) , s i z e =1 , 0 . 6 )

# a s s i g n f a m i l y h i s t o r ydf $ FD.mut .s tatus [ df $ mut . s ta tus ==0] <−rbinom ( nrow ( df [ df $ mut . s ta tus = = 0 , ] ) , s i z e =1 , 0 . 0 0 5 )

df $ FD.mut .s tatus [ df $ mut . s ta tus ==1] <−rbinom ( nrow ( df [ df $ mut . s ta tus = = 1 , ] ) , s i z e =1 , 0 . 5 )

df $ F D . e x p . s t a t u s [ df $ e x p . s t a t u s ==0] <−rbinom ( nrow ( df [ df $ e x p . s t a t u s = = 0 , ] ) , s i z e =1 , 0 . 0 9 )

df $ F D . e x p . s t a t u s [ df $ e x p . s t a t u s ==1] <−rbinom ( nrow ( df [ df $ e x p . s t a t u s = = 1 , ] ) , s i z e =1 , 0 . 2 0 )

df $ F D . c a n . s t a t u s [ df $ F D . e x p . s t a t u s ==1&df $ FD.mut .s tatus ==1] <− 1

df $ F D . c a n . s t a t u s [ df $ F D . e x p . s t a t u s ==1&df $ FD.mut .s tatus ==0] <−

Page 47: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

37

rbinom ( nrow ( df [ df $ F D . e x p . s t a t u s ==1&df $ FD.mut .s tatus = = 0 , ] ) ,s i z e =1 , prob=0 . 5 0 )

df $ F D . c a n . s t a t u s [ df $ F D . e x p . s t a t u s ==0&df $ FD.mut .s tatus ==0] <−rbinom ( nrow ( df [ df $ F D . e x p . s t a t u s ==0&df $ FD.mut .s tatus = = 0 , ] ) ,

s i z e =1 , 0 . 1 )

df $ F D . c a n . s t a t u s [ df $ F D . e x p . s t a t u s ==0&df $ FD.mut .s tatus ==1] <−rbinom ( nrow ( df [ df $ F D . e x p . s t a t u s ==0&df $ FD.mut .s tatus = = 1 , ] ) ,s i z e =1 , 0 . 6 )

# a s c e r t a i n e d d a t atmp <− df [ df $ F D . c a n . s t a t u s ==1|df $ c a n . s t a t u s ==1 ,]

# l o g i s t i c r e g r e s s i o nmodel < −glm ( c a n . s t a t u s ˜ mut .s tatus , data=df , family=” binomial ” )b i a s $crudeOR [ i ]<−exp ( model$ c o e f f i c i e n t s [ 2 ] )

model <− glm ( c a n . s t a t u s ˜ mut . s ta tus+FD.can . s ta tus , data=df ,family=” binomial ” )b i a s $adjOR [ i ] <− exp ( model$ c o e f f i c i e n t s [ 2 ] )

model <− glm ( c a n . s t a t u s ˜ mut . s ta tus+ F D . c a n . s t a t u s + e x p . s t a t u s ,data=df ,family=” binomial ” )b i a s $adjOR2 [ i ] <− exp ( model$ c o e f f i c i e n t s [ 2 ] )

model <− glm ( c a n . s t a t u s ˜ mut .s tatus , data=tmp , family=” binomial ” )b i a s $ascOR [ i ] <−exp ( model$ c o e f f i c i e n t s [ 2 ] )

model <− glm ( c a n . s t a t u s ˜ mut . s ta tus+FD.can . s ta tus , data=tmp ,family=” binomial ” )

b i a s $ adj .ascOR [ i ] <−exp ( model$ c o e f f i c i e n t s [ 2 ] )

model <− glm ( c a n . s t a t u s ˜ mut . s ta tus+ F D . c a n . s t a t u s + e x p . s t a t u s ,data=tmp ,

family=” binomial ” )b i a s $ adj .ascOR2 [ i ] <− exp ( model$ c o e f f i c i e n t s [ 2 ] )}colnames ( b i a s ) <− c ( ”Sample” , ”crudeOR” , ”adjOR” , ”adjOR2” , ”ascOR” ,” adj .ascOR ” , ” adj .ascOR2 ” )

# combining d i f f e r e n t b i a sexp b i a s <− rbind ( exp bias , b i a s )exp b i a s $ e x p . r i s k <− rep ( seq (0 .1 , 0 .5 , by=0 . 1 ) , each =100)

# d a t a a g g r e g a t e by e x p . r i s and c a l c u l a t e b i a s

Page 48: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

38

exp b i a s r e s u l t <− aggregate ( . ˜ e x p . r i s k , data=exp bias ,FUN=median )exp b i a s r e s u l t $ b i a s <−exp b i a s r e s u l t $crudeOR−exp b i a s r e s u l t $adjORexp b i a s r e s u l t $ b ias1 <−exp b i a s r e s u l t $adjOR−exp b i a s r e s u l t $ascORexp b i a s r e s u l t $ b ias2 <−exp b i a s r e s u l t $adjOR−exp b i a s r e s u l t $ adj .ascORexp b i a s r e s u l t $ b ias3 <−exp b i a s r e s u l t $adjOR−exp b i a s r e s u l t $ adj .ascOR2

# p l o t o f e x p o s u r e b i a spdf ( ”Exp Bias20160805SR.pdf ” )p l o t ( exp b i a s r e s u l t $ e x p . r i s k , exp b i a s r e s u l t $crudeOR , type=” l ” ,

xlab=” P r o b a b i l i t y of Cancer Risk f o r Environmental Exposure” ,ylab=” Bias in a s c e r t a i n e d sample” , ylim=c ( 0 , 1 6 ) )

points ( exp b i a s r e s u l t $ e x p . r i s k , exp b i a s r e s u l t $crudeOR ,pch =19 , c o l =” black ” )points ( exp b i a s r e s u l t $ e x p . r i s k , exp b i a s r e s u l t $adjOR ,pch =19 , c o l =” blue ” )l i n e s ( exp b i a s r e s u l t $ e x p . r i s k , exp b i a s r e s u l t $adjOR ,pch =19 , c o l =” blue ” )

points ( exp b i a s r e s u l t $ e x p . r i s k , exp b i a s r e s u l t $adjOR2 ,pch =19 , c o l =” orange ” )l i n e s ( exp b i a s r e s u l t $ e x p . r i s k , exp b i a s r e s u l t $adjOR2 ,pch =19 , c o l =” orange ” )

points ( exp b i a s r e s u l t $ e x p . r i s k , exp b i a s r e s u l t $ascOR ,pch =19 , c o l =” red ” )l i n e s ( exp b i a s r e s u l t $ e x p . r i s k , exp b i a s r e s u l t $ascOR ,pch =19 , c o l =” red ” )points ( exp b i a s r e s u l t $ e x p . r i s k , exp b i a s r e s u l t $ adj.ascOR ,pch =19 , c o l =” green ” )l i n e s ( exp b i a s r e s u l t $ e x p . r i s k , exp b i a s r e s u l t $ adj.ascOR ,pch =19 , c o l =” green ” )

points ( exp b i a s r e s u l t $ e x p . r i s k , exp b i a s r e s u l t $ adj.ascOR2 ,pch =19 , c o l =”cyan” )

l i n e s ( exp b i a s r e s u l t $ e x p . r i s k , exp b i a s r e s u l t $ adj.ascOR2 ,pch =19 , c o l =”cyan” )

legend (0 .15 , 10 , legend=c ( ” crude OR” , ” adjusted OR f o r FH” ,” adjusted OR f o r FH & Exp” , ” a s c e r t a i n e d OR” ,” a s c e r t a i n e d OR adjusted f o r FH” ,” a s c e r t a i n e d OR adjusted f o r FH & Exp” ) , bty=”n” , pch =19 , l t y =1 ,c o l =c ( ” black ” , ” blue ” , ” orange ” , ” red ” , ” green ” , ”cyan” ) , lwd=2)

Page 49: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

39

d e v . o f f ( )

# s a v e f i l e sw r i t e . c s v ( data , f i l e =” data t r i a l 1 0 0 0 2 0 1 6 0 7 2 9 S R . c s v ” )w r i t e . c s v ( data , f i l e =” data2 exp sta tus20160802SR.csv ” )w r i t e . c s v ( data , f i l e =” data3 FDexp0.5 20160802 SR.csv ” )w r i t e . c s v ( data , f i l e =” data4 Mut0.05 20160802 SR.csv ” )w r i t e . c s v ( data , f i l e =” data5 Can0.3 20160802 SR.csv ” )w r i t e . c s v ( data , f i l e =” data6 can0 .3 expCan0.5 20160802 SR.csv ” )w r i t e . c s v ( cor bias , f i l e =” cor b i a s . c s v ” )w r i t e . c s v ( cor r e s u l t , f i l e =” cor r e s u l t . c s v ” )w r i t e . c s v ( exp bias , f i l e =”exp b i a s . c s v ” )w r i t e . c s v ( exp b i a s r e s u l t , f i l e =”exp b i a s r e s u l t . c s v ” )w r i t e . c s v ( age , f i l e =” age .csv ” )

Page 50: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

APPENDIX B

*

Bibliography

[1] American Cancer Society. http://www.cancer.org/acs/groups/content/@research/documents/document/acspc-046381.pdf. Accessed: 2016-08-14.

[2] Colin B. Begg. “On the use of familial aggregation in population-based case probandsfor calculating penetrance”. In: Journal of the National Cancer Institute 94.16 (2002),pp. 1221–1226.

[3] Breast Cancer Organization. http://www.breastcancer.org/risk/factors/genetics.Accessed: 2016-08-14.

[4] Douglas F. Easton Deborah Ford and the Breast Cancer Linkage Consortium. “Ge-netic heterogeneity and penetrance analysis of the BRCA1 and BRCA2 genes inbreast cancer families. The Breast Cancer Linkage Consortium.” In: American Journalof Human Genetics 62 (1998), pp. 676–689.

[5] Antonis C. Antoniou Marc Tischkowitz Sean V. Tavtigian Katherine L. NathansonPeter Devilee Alfons Meindl Fergus J. Couch Melissa Southey David E. GoldgarGareth R. Evans Georgia Chenevix Trench Nazneen Rahman Mark Robson SusanM. Domchek Douglas F. Easton Paul D.P. Pharoah and William D. Foulkes. “Gene-Panel Sequencing and the Prediction of Breast-Cancer Risk.” In: The New EnglandJournal of Medicine 372:23 (2015), pp. 2243–2256.

[6] Timothy Bishop Douglas F. Easton Deborah Ford and the Breast Cancer LinkageConsortium. “Breast and ovarian cancer incidence in BRCA1-mutation carriers. BreastCancer Linkage Consortium.” In: American Journal of Human Genetics 56(1) (1995),pp. 265–271.

[7] Bella Kaufman Eitan Friedman Shlomo Segev Paul Renbaum Rachel Beeri MoranGal Julia Grinshpun Cohen Karen Djemal Jessica Mandell Ming Lee Uziel BellerRaphael Catane Mary Claire King Efrat Gabai Kapara Amnon Lahad and EphratLevy Lahad. “Population-based screening for breast and ovarian cancer risk dueto BRCA1 and BRCA2.” In: Proceedings of National Academy of Science 111 (2014),1420514210.

[8] Neil Risch Elizabeth Claus and Douglas Thompsont. “Genetic analysis of breastcancer in the Cancer and Steroid Hormone Study”. In: American Journal of HumanGenetics 48 (1991), pp. 232–242.

[9] I J Bleiweiss L D McCurdy M M Walsh P I Tartter S T Brower F H Fodor A Westonand C M Eng. “Frequency and carrier risk associated with common BRCA1 and

Page 51: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

BIBLIOGRAPHY 41

BRCA2 mutations in Ashkenazi Jewish breast cancer patients.” In: American Journalof Human Genetics 63(1) (1998), pp. 45–51.

[10] J Green D Bull GK Reeves K Pirie and V Beral for the Million Women Study Col-laborators. “Reproductive factors and specific histological types of breast cancer:prospective study and meta-analysis.” In: British Journal of Cancer 100(3) (2009), 538544.

[11] Sholom Wacholder Sonya M. Baker Martha Berlin Mary McAdams Michelle M. Tim-merman Lawrence C. Brody Jeffery Struewing Patricia Hartge and Margaret A. Tucker.“The Risk of Cancer Associated with Specific Mutations of BRCA1 and BRCA2 amongAshkenazi Jews”. In: The New England Journal of Medicine 336 (1997), pp. 1401–1408.

[12] Sander Greenland Kenneth J. Rothman and Timothy L. Lash. Modern epidemiology.Ed. by Kenneth Rothman. LWW, 1998, pp. 345–380.

[13] Lawrence Kupper Kleinbaum G David and Hal Morgenstern. Epidemiologic research:principles and quantitative methods. Ed. by Christopher Cooke Tim E. Aldrich Jack Grif-fith. Wiley, 1982.

[14] Sara Lindstrom Sander Canisius Joe Dennis Michael J Lush Mel J Maranian ManjeetK Bolla Qin Wang Mitul Shah Barbara J Perkins Kamila Czene Mikael ErikssonHatef Darabi Judith S Brand Stig E Bojesen Brge G Nordestgaard Henrik FlygerSune F Nielsen Nazneen Rahman Clare Turnbull BOCS Olivia Fletcher Julian Peto-Lorna Gibson Isabel dos-Santos-Silva Jenny Chang-Claude Dieter Flesch-Janys AnjaRudolph Ursula Eilber Sabine Behrens-Heli Nevanlinna Taru A Muranen KristiinaAittomki Carl Blomqvist Sofia Khan Kirsimari Aaltonen Habibul Ahsan Muham-mad G Kibriya Alice S Whittemore Esther M John Kathleen E Malone Marilie DGammon Regina M Santella Giske Ursin Enes Makalic Daniel F Schmidt GrahamCasey David J Hunter-Susan M Gapstur Mia M Gaudet W Ryan Diver ChristopherA Haiman Fredrick Schumacher Brian E Henderson Loic Le Marchand ChristineD Berg Stephen J Chanock Jonine Figueroa Robert N Hoover Diether LambrechtsPatrick Neven Hans Wildiers Erik van Limbergen Marjanka K Schmidt AnnegienBroeks Senno Verhoef Sten Cornelissen Fergus J Couch Janet E Olson Emily HallbergCeline Vachon Quinten Waisfisz Hanne Meijers-Heijboer Muriel A Adank Rob Bvan der Luijt Jingmei Li Jianjun Liu Keith Humphreys Daehee Kang Ji-Yeob ChoiSue K Park Keun-Young Yoo Keitaro Matsuo Hidemi Ito Hiroji Iwata Kazuo TajimaPascal Gunel Thrse Truong Claire Mulot Marie Sanchez Barbara Burwinkel Fred-erik Marme Harald Surowy Christof Sohn Anna H Wu Chiu-chen Tseng David VanDen Berg Daniel O Stram Anna Gonzlez-Neira Javier Benitez M Pilar Zamora JoseIgnacio Arias Perez Xiao-Ou Shu Wei Lu Yu-Tang Gao Hui Cai Angela Cox SimonS Cross Malcolm W R Reed Irene L Andrulis Julia A Knight Gord Glendon AnnaMarie Mulligan Elinor J Sawyer Ian Tomlinson Michael J Kerin Nicola Miller kCon-Fab Investigators AOCS Group Annika Lindblom Sara Margolin Soo Hwang TeoCheng Har Yip Nur Aishah Mohd Taib Gie-Hooi Tan Maartje J Hooning AntoinetteHollestelle John W M Martens J Margriet Colle William Blot Lisa B Signorello Qi-uyin Cai John L Hopper Melissa C Southey Helen Tsimiklis Carmel Apicella Chen-Yang Shen Chia-Ni Hsiung Pei-Ei Wu Ming-Feng Hou Vessela N Kristensen SiljeNord Grethe I Grenaker Alnaes NBCS Graham G Giles Roger L Milne CatrionaMcLean Federico Canzian Dimitrios Trichopoulos Petra Peeters Eiliv Lund MalinSund Kay-Tee Khaw Marc J Gunter Domenico Palli Lotte Maxild Mortensen LaureDossus Jose-Maria Huerta Alfons Meindl Rita K Schmutzler Christian Sutter Rongxi

Page 52: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

BIBLIOGRAPHY 42

Yang Kenneth Muir Artitaya Lophatananon Sarah Stewart-Brown Pornthep Siriwa-narangsan Mikael Hartman Hui Miao Kee Seng Chia Ching Wan Chan Peter AFasching Alexander Hein Matthias W Beckmann Lothar Haeberle Hermann Bren-ner Aida Karina Dieffenbach Volker Arndt Christa Stegmaier Alan Ashworth NickOrr Minouk J Schoemaker Anthony J Swerdlow Louise Brinton Montserrat Garcia-Closas Wei Zheng Sandra L Halverson Martha Shrubsole Jirong Long Mark S Gold-berg France Labrche Martine Dumont Robert Winqvist Katri Pylks Arja Jukkola-Vuorinen Mervi Grip Hiltrud Brauch Ute Hamann Thomas Brning GENICA Net-work Paolo Radice Paolo Peterlongo Siranoush Manoukian Loris Bernard Natalia VBogdanova Thilo Drk Arto Mannermaa Vesa Kataja Veli-Matti Kosma Jaana M Har-tikainen Peter Devilee Robert A E M Tollenaar Caroline Seynaeve Christi J Van As-peren Anna Jakubowska Jan Lubinski Katarzyna Jaworska Tomasz Huzarski SuleepornSangrajrang Valerie Gaborieau Paul Brennan James McKay Susan Slager Amanda EToland Christine B Ambrosone Drakoulis Yannoukakos Maria Kabisch Diana Tor-res Susan L Neuhausen Hoda Anton-Culver Craig Luccarini Caroline Baynes Sha-hana Ahmed Catherine S Healey Daniel C Tessier Daniel Vincent Francois BacotGuillermo Pita M Rosario Alonso Nuria lvarez Daniel Herrero Jacques Simard PaulP D P Pharoah Peter Kraft Alison M Dunning Georgia Chenevix-Trench Per Hall &Douglas F Easton Kyriaki Michailidou Jonathan Beesley. “Genome-wide associationanalysis of more than 120,000 individuals identifies 15 new susceptibility loci forbreast cancer”. In: Nature Genetics 47(4) (2015), pp. 373–380.

[15] Jennifer R. Harris Kamila Czene David J. Havelick Thomas Scheike Rebecca E. GraffKlaus Holst Sren Mller Robert H. Unger Christina McIntosh Elizabeth Nuttall In-gunn Brandt Kathryn L. Penney Mikael Hartman Peter Kraft Giovanni ParmigianiKaare Christensen Markku Koskenvuo Niels V. Holm Kauko Heikkil Eero PukkalaAxel Skytthe Hans-Olov Adami Jaakko Kaprio-for the Nordic Twin Study of Cancer(NorTwinCan) Collaboration Lorelei A. Mucci Jacob B. Hjelmborg. “Heritability andfamilial risk of cancer: an update from the Nordic Twin Registry of Cancer (NorTwin-Can)”. In: The American Society of Human Genetics 63rd Annual Meeting, Boston (2013).

[16] Kathleen A. Calzone Jill E. Stopfer Katherine L. Nathanson Marcia S. Brose TimothyR. Rebbeck and Barbara L. Weber. “Cancer Risk Estimates for BRCA1 MutationCarriers Identified in a Risk Evaluation Program”. In: The Journal of National CancerInstitute 94(18) (2002), pp. 1365–1372.

[17] Xihong Lin Michael Epstein and Michael Boehnke. “Ascertainment-Adjusted Pa-rameter Estimates Revisited.” In: American Journal of Human Genetics 70(4) (2002),pp. 886–895.

[18] Myriad Genetics. https://www.myriad.com/patients-families/disease-info/breast-cancer. Accessed: 2016-08-14.

[19] Amit D. Joshi Paul L. Auer Mia M. Gaudet Roger L. Milne Fredrick R. SchumacherWilliam F. Anderson David Check Subham Chattopadhyay Laura Baglietto Chris-tine D. Berg Stephen J. Chanock David G. Cox Jonine D. Figueroa Mitchell H. GailBarry I. Graubard Christopher A. Haiman Susan E. Hankinson Robert N. Hoover-Claudine Isaacs Laurence N. Kolonel Loic Le Marchand I-Min Lee Sara LindstrmKim Overvad Isabelle Romieu Maria-Jose Sanchez Melissa C. Southey Daniel O.Stram Rosario Tumino Tyler J. VanderWeele Walter C. Willett Shumin Zhang JulieE. Buring Federico Canzian Susan M. Gapstur Brian E. Henderson David J. Hunter

Page 53: ELIMINATING BIAS IN CANCER RISK ESTIMATES A SIMULATION …osting/pub/SaradhaRajamaniThesis.pdf · STATEMENT OF DISSERTATION APPROVAL THIS PAGE IS A PLACE HOLDER ONLY Please use the

BIBLIOGRAPHY 43

Graham G Giles Ross L. Prentice Regina G. Ziegler Peter Kraft Montse Garcia-ClosasNilanjan Chatterjee Paige Maas Myrto Barrdahl. “Breast Cancer Risk From Modifi-able and Nonmodifiable Risk Factors Among White Women in the United States”.In: The Journal of the American Medical Association (2016).

[20] Pia K. Verkasalo Anastasia Ilailou Jaakko Kaprio Markku Ksosenvuo Eero PukkalaAxel Shythe Paul Lichtenstein Niels V. Holm and Kari Hemminki. “Environmentaland heritable factors in the causation of canceranalyses of cohorts of twins fromSweden, Denmark, and Finland”. In: New England Journal of Medicine 343(2) (2000),pp. 78–85.

[21] Anja Rudolph, Jenny Chang-claude, and Marjanka Schmidt. “Gene-Environmentinteraction and risk of breast cancer”. In: British Journal of Cancer 114 (2016), pp. 125–133.

[22] Seer Databasel. http://seer.cancer.gov/data/. Accessed: 2016-08-14.

[23] Jeffery P. Struewing David Pee Mary McAdams Lawrence Brody Sholom WacholderPatricia Hartge and Margaret Tucke. “The kin-cohort study for estimating pene-trance.” In: American Journal of Epidemiology 148(7) (1998), pp. 623–630.

[24] Staten Island University Hospital. http://www.siuh.edu/Our-Services/Clinical-Services / Cancer - Services / The - Hereditary - Cancer - Genetics - Program /

Sporadic-vs-Hereditary-Cancer.aspx. Accessed: 2016-08-14.

[25] Susan M. Domchek Tara M. Friebel and Timothy R. Rebbeck. “Modifiers of cancerrisk in BRCA1 and BRCA2 mutation carriers: systematic review and meta-analysis”.In: Journal of National Cancer Institute 106(6) (2014), published online.

[26] Deborah Thompson and Douglas F. Easton. “The genetic epidemiology of breastcancer genes.” In: Journal of Mammary Gland Biology Neoplasia 9(3) (2004), pp. 221–36.