Identification of low frequency variants associated with ... · Identification of low frequency...

19
Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem 1* , Daniel F. Gudbjartsson 1* , G. Bragi Walters 1* , Hafdis T. Helgadottir 1 , Agnar Helgason 1 , Sigurjon A. Gudjonsson 1 , Carlo Zanon 1 , Soren Besenbacher 1 , Gyda Bjornsdottir 1 , Olafur T. Magnusson 1 , Gisli Magnusson 1 , Eirikur Hjartarson 1 , Jona Saemundsdottir 1 , Arnaldur Gylfason 1 , Adalbjorg Jonasdottir 1 , Hilma Holm 1 , Ari Karason 1 , Thorunn Rafnar 1 , Hreinn Stefansson 1 , Ole A. Andreassen 2 , Jesper H. Pedersen 3 , Allan I. Pack 4 , Marieke C.H. de Visser 5 , Lambertus A. Kiemeney 5,6,7 , Arni J Geirsson 8 , Gudmundur I. Eyjolfsson 9 , Isleifur Olafsson 10 , Augustine Kong 1 , Gisli Masson 1 , Helgi Jonsson 8,11 , Unnur Thorsteinsdottir 1,11 , Ingileif Jonsdottir 1,11,12 & Kari Stefansson 1,11 1 deCODE genetics, Sturlugata 8, 101 Reykjavik, Iceland 2 Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway. 3 Department of Cardiothoracic Surgery, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark 4 Center for Sleep and Circardian Neurobiology, Division of Sleep Medicine, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 5 Department of Epidemiology, Biostatistics & HTA, Radboud University Nijmegen Medical Centre, 6500 HB Nijmegen, the Netherlands. 6 Comprehensive Cancer Center IKO, 6501 BG Nijmegen, the Netherlands. 7 Department of Urology, Radboud University Nijmegen Medical Centre, 6500 HB Nijmegen, the Netherlands 8 Landspitali, The National University Hospital of Iceland, Department of Medicine, Reykjavik, Iceland 9 Icelandic Medical Center (Laeknasetrid) Laboratory in Mjodd (RAM), 10 Landspitali, The National University Hospital of Iceland, Department of Clinical Biochemistry, Reykjavik, Iceland, 11 University of Iceland, Faculty of Medicine, Reykjavik, Iceland 12 Landspitali, The National University Hospital of Iceland, Department of Immunology, Reykjavik, Iceland. * Authors with an equal contribution Nature Genetics: doi:10.1038/ng.972

Transcript of Identification of low frequency variants associated with ... · Identification of low frequency...

Page 1: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

Identification of low frequency variants associated with

gout and serum uric acid levels

Patrick Sulem1*, Daniel F. Gudbjartsson1*, G. Bragi Walters1*, Hafdis T. Helgadottir1, Agnar Helgason1, Sigurjon A. Gudjonsson1, Carlo Zanon1, Soren Besenbacher1, Gyda Bjornsdottir1, Olafur T. Magnusson1 , Gisli Magnusson1 , Eirikur Hjartarson1, Jona Saemundsdottir1, Arnaldur Gylfason1, Adalbjorg Jonasdottir1, Hilma Holm1, Ari Karason1, Thorunn Rafnar1, Hreinn Stefansson1, Ole A. Andreassen2, Jesper H. Pedersen3, Allan I. Pack4, Marieke C.H. de Visser5, Lambertus A. Kiemeney5,6,7, Arni J Geirsson8, Gudmundur I. Eyjolfsson9, Isleifur Olafsson10, Augustine Kong1, Gisli Masson1, Helgi Jonsson8,11, Unnur Thorsteinsdottir1,11, Ingileif Jonsdottir1,11,12 & Kari Stefansson1,11 1deCODE genetics, Sturlugata 8, 101 Reykjavik, Iceland 2Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway. 3Department of Cardiothoracic Surgery, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark 4Center for Sleep and Circardian Neurobiology, Division of Sleep Medicine, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 5Department of Epidemiology, Biostatistics & HTA, Radboud University Nijmegen Medical Centre, 6500 HB Nijmegen, the Netherlands. 6Comprehensive Cancer Center IKO, 6501 BG Nijmegen, the Netherlands. 7Department of Urology, Radboud University Nijmegen Medical Centre, 6500 HB Nijmegen, the Netherlands

8Landspitali, The National University Hospital of Iceland, Department of Medicine, Reykjavik, Iceland 9Icelandic Medical Center (Laeknasetrid) Laboratory in Mjodd (RAM), 10Landspitali, The National University Hospital of Iceland, Department of Clinical Biochemistry, Reykjavik, Iceland, 11University of Iceland, Faculty of Medicine, Reykjavik, Iceland 12Landspitali, The National University Hospital of Iceland, Department of Immunology, Reykjavik, Iceland. * Authors with an equal contribution

Nature Genetics: doi:10.1038/ng.972

Page 2: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

 

Supplementary  Information      Contents:  Supplementary  Note:     Supplementary  Note  Supplementary  Table  1:     Association  with  gout  of  genome-­‐wide  significant  markers  at  

the  19q13  locus  Supplementary  Table  2:     Association  with  gout  of  genome-­‐wide  significant  markers  at  

the  ABCG2  locus  Supplementary  Table  3:     Novel  and  previously  reported  sequence  variants  associating  

with  serum  uric  acid  levels  and  gout  Supplementary  Table  4:     Sex-­‐stratified  uric  acid  levels  association  for  novel  and  

previously  reported  sequence  variants  associating  with  gout  and  serum  uric  acid  levels  

Supplementary  Table  5:     Sex-­‐stratified  gout  association  for  novel  and  previously  reported  sequence  variants  associating  with  gout  and  serum  uric  acid  levels  

Supplementary  Table  6:     Association  of  uric  acid  sequence  variants  with  age  at  visit  to  the  clinician  

Supplementary  Figure  1:     The  sequencing  depth  of  the  457  whole-­‐genome  sequenced  individuals  

Supplementary  Figure  2:     Quantile-­‐quantile  plot  of  the  SNPs  in  the  genome-­‐wide  association  scan  for  gout  

Supplementary  Figure  3:   Quantile-­‐quantile  plot  of  the  SNPs  in  the  genome-­‐wide  association  scan  for  uric  acid  levels  

Nature Genetics: doi:10.1038/ng.972

Page 3: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

 Supplementary    Note  

Whole  Genome  Sequencing  

SNPs  were  identified  through  the  Icelandic  whole  genomic  sequencing  project.  A  total  of  457  

Icelanders  were  selected  for  sequencing  based  on  having  various  neoplasic,  cardiovascular  and  

psychiatric  conditions.  All  of  the  individuals  were  sequenced  to  a  depth  of  at  least  10X.  Based  on  this  

data,  15,957,390  SNPs  were  imputed  based  on  this  set  of  individuals.  

Sample  preparation.  Paired-­‐end  libraries  for  sequencing  were  prepared  according  to  the  

manufacturer's  instructions  (Illumina).  In  short,  approximately  5  μg  of  genomic  DNA,  isolated  from  

frozen  blood  samples,  was  fragmented  to  a  mean  target  size  of  300  bp  using  a  Covaris  E210  

instrument.  The  resulting  fragmented  DNA  was  end  repaired  using  T4  and  Klenow  polymerases  and  

T4  polynucleotide  kinase  with  10  mM  dNTP  followed  by  addition  of  an  'A'  base  at  the  ends  using  

Klenow  exo  fragment  (3′  to  5′-­‐exo  minus)  and  dATP  (1  mM).  Sequencing  adaptors  containing  'T'  

overhangs  were  ligated  to  the  DNA  products  followed  by  agarose  (2%)  gel  electrophoresis.  Fragments  

of  about  400  bp  were  isolated  from  the  gels  (QIAGEN  Gel  Extraction  Kit),  and  the  adaptor-­‐modified  

DNA  fragments  were  PCR  enriched  for  ten  cycles  using  Phusion  DNA  polymerase  (Finnzymes  Oy)  and  

PCR  primers  PE  1.0  and  PE  2.0  (Illumina).  Enriched  libraries  were  further  purified  using  agarose  (2%)  

gel  electrophoresis  as  described  above.  The  quality  and  concentration  of  the  libraries  were  assessed  

with  the  Agilent  2100  Bioanalyzer  using  the  DNA  1000  LabChip  (Agilent).  Barcoded  libraries  were  

stored  at  −20  °C.  All  steps  in  the  workflow  were  monitored  using  an  in-­‐house  laboratory  information  

management  system  with  barcode  tracking  of  all  samples  and  reagents.  

DNA  sequencing.  Template  DNA  fragments  were  hybridized  to  the  surface  of  flow  cells  (Illumina  PE  

flowcell,  v4)  and  amplified  to  form  clusters  using  the  Illumina  cBot.  In  brief,  DNA  (8–10  pM)  was  

denatured,  followed  by  hybridization  to  grafted  adaptors  on  the  flowcell.  Isothermal  bridge  

amplification  using  Phusion  polymerase  was  then  followed  by  linearization  of  the  bridged  DNA,  

denaturation,  blocking  of  3  ends  and  hybridization  of  the  sequencing  primer.  Sequencing-­‐by-­‐

synthesis  was  performed  on  Illumina  GAIIx  instruments  equipped  with  paired-­‐end  modules.  Paired-­‐

end  libraries  were  sequenced  using  2  ×  101  cycles  of  incorporation  and  imaging  with  Illumina  

sequencing  kits,  v4.  Each  library  or  sample  was  initially  run  on  a  single  lane  for  validation  followed  by  

further  sequencing  of  ≥4  lanes  with  targeted  cluster  densities  of  250–300  k/mm2.  Imaging  and  

analysis  of  the  data  was  performed  using  the  SCS  2.6  and  RTA  1.6  software  packages  from  Illumina,  

respectively.  Real-­‐time  analysis  involved  conversion  of  image  data  to  base-­‐calling  in  real-­‐time.  

Nature Genetics: doi:10.1038/ng.972

Page 4: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

Alignment.  For  each  lane  in  the  DNA  sequencing  output,  the  resulting  qseq  files  were  converted  into  

fastq  files  using  an  in-­‐house  script.  All  output  from  sequencing  was  converted,  and  the  Illumina  

quality  filtering  flag  was  retained  in  the  output.  The  fastq  files  were  then  aligned  against  Build  36  of  

the  human  reference  sequence  using  bwa  version  0.5.7  (ref.  1).  

BAM  file  generation.  SAM  file  output  from  the  alignment  was  converted  into  BAM  format  using  

samtools  version  0.1.8  (ref.  2),  and  an  in-­‐house  script  was  used  to  carry  the  Illumina  quality  filter  flag  

over  to  the  BAM  file.  The  BAM  files  for  each  sample  were  then  merged  into  a  single  BAM  file  using  

samtools.  Finally,  Picard  version  1.17  (see  http://picard.sourceforge.net/)  was  used  to  mark  

duplicates  in  the  resulting  sample  BAM  files.  

SNP  calling  and  genotyping  in  whole-­‐genome  sequencing  

A  two-­‐step  approach  was  applied.  The  first  step  was  to  detect  SNPs  by  identifying  sequence  positions  

where  at  least  one  individual  could  be  determined  to  be  different  from  the  reference  sequence  with  

confidence  (quality  threshold  of  20)  based  on  the  SNP  calling  feature  of  the  pileup  tool  samtools2.  

SNPs  that  always  differed  heterozygous  or  homozygous  from  the  reference  were  removed.  The  

second  step  was  to  use  the  pileup  tool  to  genotype  the  SNPs  at  the  positions  that  were  flagged  as  

polymorphic.  Because  sequencing  depth  varies  and  hence  the  certainty  of  genotype  calls  also  varies,  

genotype  likelihoods  rather  than  deterministic  calls  were  calculated  (see  below).  Of  the  2.5  million  

SNPs  reported  in  the  HapMap2  CEU  samples,  96.3%  were  observed  in  the  whole-­‐genome  sequencing  

data.  Of  the  6.9  million  SNPs  reported  in  the  1000  Genomes  Project  data,  89.4%  were  observed  in  the  

whole-­‐genome  sequencing  data.  

Long  range  phasing  

Long  range  phasing  of  all  chip-­‐genotyped  individuals  was  performed  with  methods  described  

previously3,4.  In  brief,  phasing  is  achieved  using  an  iterative  algorithm  which  phases  a  single  proband  

at  a  time  given  the  available  phasing  information  about  everyone  else  that  shares  a  long  haplotype  

identically  by  state  with  the  proband.  Given  the  large  fraction  of  the  Icelandic  population  that  has  

been  chip-­‐typed,  accurate  long  range  phasing  is  available  genome-­‐wide  for  all  chip-­‐typed  Icelanders.  

Genotype  imputation  

We  imputed  the  SNPs  identified  and  genotyped  through  sequencing  into  all  Icelanders  who  had  been  

phased  with  long  range  phasing  using  the  same  model  as  used  by  IMPUTE5.  The  genotype  data  from  

sequencing  can  be  ambiguous  due  to  low  sequencing  coverage.  In  order  to  phase  the  sequencing  

Nature Genetics: doi:10.1038/ng.972

Page 5: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

genotypes,  an  iterative  algorithm  was  applied  for  each  SNP  with  alleles  0  and  1.  We  let  H  be  the  long  

range  phased  haplotypes  of  the  sequenced  individuals  and  applied  the  following  algorithm:  

1. For  each  haplotype  h  in  H,  use  the  Hidden  Markov  Model  of  IMPUTE  to  calculate  for  every  

other  k  in  H,  the  likelihood,  denoted  γh,k,  of  h  having  the  same  ancestral  source  as  k  at  the  

SNP.  

2. For  every  h  in  H,  initialize  the  parameter   ,  which  specifies  how  likely  the  one  allele  of  the  

SNP  is  to  occur  on  the  background  of  h  from  the  genotype  likelihoods  obtained  from  

sequencing.  The  genotype  likelihood  Lg  is  the  probability  of  the  observed  sequencing  data  at  

the  SNP  for  a  given  individual  assuming  g  is  the  true  genotype  at  the  SNP.  If  L0,  L1  and  L2  are  

the  likelihoods  of  the  genotypes  0,  1  and  2  in  the  individual  that  carries  h,  then  set  

.  

 

3. For  every  pair  of  haplotypes  h  and  k  in  H  that  are  carried  by  the  same  individual,  use  the  

other  haplotypes  in  H  to  predict  the  genotype  of  the  SNP  on  the  backgrounds  of  h  and  k:  

 and   .  Combining  these  predictions  with  the  genotype  

likelihoods  from  sequencing  gives  un-­‐normalized  updated  phased  genotype  probabilities:  

,   ,    and  

.  Now  use  these  values  to  update  θh  and  θk  to    

and   .  

4. Repeat  step  3  when  the  maximum  difference  between  iterations  is  greater  than  a  

convergence  threshold  ε.  We  used  ε=10−7.  

Given  the  long  range  phased  haplotypes  and    the  allele  of  the  SNP  on  a  new  haplotype  h  not  in  H,  

is  imputed  as   .  

The  above  algorithm  can  easily  be  extended  to  handle  simple  family  structures  such  as  parent-­‐

offspring  pairs  and  triads  by  letting  the  P  distribution  run  over  all  founder  haplotypes  in  the  family  

structure.  The  algorithm  also  extends  trivially  to  the  X-­‐chromosome.  If  source  genotype  data  are  only  

ambiguous  in  phase,  such  as  chip  genotype  data,  then  the  algorithm  is  still  applied,  but  all  but  one  of  

Nature Genetics: doi:10.1038/ng.972

Page 6: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

the  Ls  will  be  0.  In  some  instances,  the  reference  set  was  intentionally  enriched  for  carriers  of  the  

minor  allele  of  a  rare  SNP  in  order  to  improve  imputation  accuracy.  In  this  case,  expected  allele  

counts  will  be  biased  toward  the  minor  allele  of  the  SNP.  Call  the  enrichment  of  the  minor  allele  E  

and  let    be  the  expected  minor  allele  count  calculated  from  the  naïve  imputation  method,  and  let  

 be  the  unbiased  expected  allele  count,  then   and  hence   .  

This  adjustment  was  applied  to  all  imputations  based  on  enriched  imputations  sets.  We  note  that  if  

 is  0  or  1,  then    will  also  be  0  or  1,  respectively.  

Genotype  imputation  information  

The  informativeness  of  genotype  imputation  was  estimated  by  the  ratio  of  the  variance  of  imputed  

expected  allele  counts  and  the  variance  of  the  actual  allele  counts:  

 

where    is  the  allele  count.    was  estimated  by  the  observed  variance  

of  the  imputed  expected  counts  and    was  estimated  by   ,  where    is  the  allele  

frequency.  

In  silico  genotyping  

In  addition  to  imputing  sequence  variants  from  the  whole  genome  sequencing  effort  into  chip  

genotyped  individuals,  we  also  performed  a  second  imputation  step  where  genotypes  were  imputed  

into  relatives  of  chip  genotyped  individuals,  creating  in  silico  genotypes.  The  inputs  into  the  second  

imputation  step  are  the  fully  phased  (in  particular  every  allele  has  been  assigned  a  parent  of  origin)  

imputed  and  chip  type  genotypes  of  the  available  chip  typed  individuals.  The  algorithm  used  to  

perform  the  second  imputation  step  consists  of:  

1. For  each  ungenotyped  individual  (the  proband),  find  all  chip  genotyped  individuals  within  two  

meiosis  of  the  individual.  The  six  possible  types  of  two  meiosis  relatives  of  the  proband  are  

(ignoring  more  complicated  relationships  due  to  pedigree  loops):  Parents,  full  and  half  

siblings,  grandparents,  children  and  grandchildren.  If  all  pedigree  paths  from  the  proband  to  

a  genotyped  relative  go  through  other  genotyped  relatives,  then  that  relative  is  excluded.  

E.g.  if  a  parent  of  the  proband  is  genotyped,  then  the  proband’s  grandparents  through  that  

Nature Genetics: doi:10.1038/ng.972

Page 7: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

parent  are  excluded.  If  the  number  of  meiosis  in  the  pedigree  around  the  proband  exceeds  a  

threshold  (we  used  12),  then  relatives  are  removed  from  the  pedigree  until  the  number  of  

meiosis  falls  below  12,  in  order  to  reduce  computational  complexity.  

2. At  every  point  in  the  genome,  calculate  the  probability  for  each  genotyped  relative  sharing  

with  the  proband  based  on  the  autosomal  SNPs  used  for  phasing.  A  multipoint  algorithm  

based  on  the  hidden  Markov  model  Lander-­‐Green  multipoint  linkage  algorithm  using  fast  

Fourier  transforms  is  used  to  calculate  these  sharing  probabilities6,7.  First  single  point  sharing  

probabilities  are  calculated  by  dividing  the  genome  into  0.5cM  bins  and  using  the  haplotypes  

over  these  bins  as  alleles.  Haplotypes  that  are  the  same,  except  at  most  at  a  single  SNP,  are  

treated  as  identical.  When  the  haplotypes  in  the  pedigree  are  incompatible  over  a  bin,  then  a  

uniform  probability  distribution  was  used  for  that  bin.  The  most  common  causes  for  such  

incompatibilities  are  recombinations  within  the  pedigree,  phasing  errors  and  genotyping  

errors.  Note  that  since  the  input  genotypes  are  fully  phased,  the  single  point  information  is  

substantially  more  informative  than  for  unphased  genotyped,  in  particular  one  haplotype  of  

the  parent  of  a  genotyped  child  is  always  known.  The  single  point  distributions  are  then  

convolved  using  the  multipoint  algorithm  to  obtain  multipoint  sharing  probabilities  at  the  

center  of  each  bin.  Genetic  distances  were  obtained  from  the  most  recent  version  of  the  

deCODE  genetic  map4.  

3. Based  on  the  sharing  probabilities  at  the  center  of  each  bin,  all  the  SNPs  from  the  whole  

genome  sequencing  are  imputed  into  the  proband.  To  impute  the  genotype  of  the  paternal  

allele  of  a  SNP  located  at   ,  flanked  by  bins  with  centers  at    and   .  Starting  with  

the  left  bin,  going  through  all  possible  sharing  patterns   ,  let    be  the  set  of  haplotypes  of  

genotyped  individuals  that  share  identically  by  descent  within  the  pedigree  with  the  

proband’s  paternal  haplotype  given  the  sharing  pattern    and    be  the  probability  of    

at  the  left  bin  –  this  is  the  output  from  step  2  above  –  and  let    be  the  expected  allele  count  

of  the  SNP  for  haplotype   .  Then    is  the  expected  allele  count  of  the  paternal  

haplotype  of  the  proband  given    and  an  overall  estimate  of  the  allele  count  given  the  

sharing  distribution  at  the  left  bin  is  obtained  from   .  If    is  empty  then  no  

relative  shares  with  the  proband’s  paternal  haplotype  given    and  thus  there  is  no  

information  about  the  allele  count.  We  therefore  store  the  probability  that  some  genotyped  

relative  shared  the  proband’s  paternal  haplotype,    and  an  expected  allele  

Nature Genetics: doi:10.1038/ng.972

Page 8: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

count,  conditional  on  the  proband’s  paternal  haplotype  being  shared  by  at  least  one  

genotyped  relative:   .  In  the  same  way  calculate    and   .  

Linear  interpolation  is  then  used  to  get  an  estimates  at  the  SNP  from  the  two  flanking  bins:    

 

If    is  an  estimate  of  the  population  frequency  of  the  SNP  then    is  an  

estimate  of  the  allele  count  for  the  proband’s  paternal  haplotype.  Similarly,  an  expected  

allele  count  can  be  obtained  for  the  proband’s  maternal  haplotype.  

Case  control  association  testing  

Logistic  regression  was  used  to  test  for  association  between  SNPs  and  disease,  treating  disease  status  

as  the  response  and  expected  genotype  counts  from  imputation  or  allele  counts  from  direct  

genotyping  as  covariates.  Testing  was  performed  using  the  likelihood  ratio  statistic.  The  conditional  

analysis  of  the  chromosome  1  centromere  and  19q13  loci  was  performed  by  adding  the  strongest  

SNP  at  each  locus  as  a  covariate  while  testing  every  SNP  in  the  region  for  association  with  gout.  When  

testing  for  association  based  on  the  in  silico  genotypes,  controls  were  matched  to  cases  based  on  the  

informativeness  of  the  imputed  genotypes,  such  that  for  each  case    controls  of  matching  

informativeness  where  chosen.  Failing  to  match  cases  and  controls  will  lead  to  a  highly  inflated  

genomic  control  factor,  and  in  some  cases  may  lead  to  spurious  false  positive  findings.  The  

informativeness  of  each  of  the  imputation  of  each  one  of  an  individual’s  haplotypes  was  estimated  by  

taking  the  average  of  

over  all  SNPs  imputed  for  the  individual,  where    is  the  expected  allele  count  for  the  haplotype  at  

the  SNP  and    is  the  population  frequency  of  the  SNP.  Note  that    and  

.  The  mean  informativeness  values  cluster  into  groups  corresponding  to  the  

most  common  pedigree  configurations  used  in  the  imputation,  such  as  imputing  from  parent  into  

child  or  from  child  into  parent.  Based  on  this  clustering  of  imputation  informativeness  we  divided  the  

haplotypes  of  individuals  into  seven  groups  of  varying  informativeness,  which  created  27  groups  of  

Nature Genetics: doi:10.1038/ng.972

Page 9: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

individuals  of  similar  imputation  informativeness;  7  groups  of  individuals  with  both  haplotypes  having  

similar  informativeness,  21  groups  of  indivdiuals  with  the  two  haplotypes  having  different  

informativeness,  minus  the  one  group  of  individuals  with  neither  haplotype  being  imputed  well.  

Within  each  group  we  calculate  the  ratio  of  the  number  of  controls  and  the  number  of  cases,  and  

choose  the  largest  integer    that  was  less  than  this  ratio  in  all  the  groups.  For  example,  if  in  one  

group  there  are  10.3  times  as  many  controls  as  cases  and  if  in  all  other  groups  this  ratio  was  greater,  

then  we  would  set    and  within  each  group  randomly  select  ten  times  as  many  controls  as  

there  are  cases.  For  gout  we  used   .  

Quantitative  trait  association  testing  

A  generalized  form  of  linear  regression  was  used  to  test  for  association  of  UA  with  SNPs.  Let    be  

the  vector  of  quantitative  measurements,  and  let    be  the  vector  of  expected  allele  counts  for  the  

SNP  being  tested.  We  assume  the  quantitative  measurements  follow  a  normal  distribution  with  a  

mean  that  depends  linearly  on  the  expected  allele  at  the  SNP  and  a  variance  covariance  matrix  

proportional  to  the  kinship  matrix:  

 

where  

 

is  based  on  the  kinship  between  individuals  as  estimated  from  the  Icelandic  genealogical  database  

( )  and  and  estimate  of  the  heritability  of  the  trait  ( ).  It  is  not  computationally  feasible  to  use  this  

full  model  and  we  therefore  split  the  individuals  with  in  silico  genotypes  and  UA  measurements  into  

smaller  clusters.  Here  we  chose  to  restrict  the  cluster  size  to  at  most  300  individuals.    

The  maximum  likelihood  estimates  for  the  parameters   ,   ,  and    involve  inverting  the  kinship  

matrix.  If  there  are    individuals  in  the  cluster,  then  this  inversion  requires    calculations,  but  

since  these  calculations  only  need  to  be  performed  once  the  computational  cost  of  doing  a  GWAS  will  

only  be    calculations;  the  cost  of  calculating  the  maximum  likelihood  estimates  if  the  kinship  

matrix  has  already  been  inverted.  

Effective  sample  size  estimation  

In  order  to  estimate  the  effective  sample  size  of  the  case  control  and  quantitative  trait  association  

analyses,  we  compared  the  variances  of  the  logistic  and  generalized  linear  regression  parameter  

estimates  based  on  the  in  silico  genotypes  to  their  one  step  imputation  counterparts.  For  the  

quantitative  trait  association  analysis,  assume  that  a  single  step  imputation  (SNPs  are  imputed,  but  in  

Nature Genetics: doi:10.1038/ng.972

Page 10: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

silico  genotypes  are  not  used)  association  analysis  with    subjects  leads  on  average  to  an  estimate  

of  the  regression  parameter  with  variance    and  that  the  corresponding  in  silico  genotype  

association  analysis  leads  to  an  estimate  of  the  regression  parameter  with  variance   ,  then  

assuming  that  variance  goes  down  linearly  with  sample  size  we  estimate  the  effective  sample  size  in  

the  in  silico  genotype  association  analysis  as   .  For  the  case  control  association  analysis,  

the  number  of  controls  is  much  greater  than  the  number  cases  and  we  use  the  same  formula  to  

estimate  the  effective  number  of  cases,  with  the   -­‐s  representing  the  number  of  cases  and  the   -­‐s  

representing  the  variances  of  the  logistic  regression  coefficient.  

 

References  

1. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754-1760 (2009).

2. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079 (2009).

3. Kong, A. et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nat. Genet. 40, 1068-1075 (2008).

4. Kong, A. et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099-1103 (2010).

5. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906-913 (2007).

6. Lander, E.S. & Green, P. Construction of multilocus genetic linkage maps in humans. Proc. Natl. Acad. Sci. USA 84, 2363-2367 (1987).

7. Kruglyak, L. & Lander, E.S. Faster multipoint linkage analysis using Fourier transforms. J. Comput. Biol. 5, 1-7 (1998).

 

       

Nature Genetics: doi:10.1038/ng.972

Page 11: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

Supplementary  Table  1  –  Association  with  gout  of  genome-­‐wide  significant  markers  at  the  19q13  locus  (top  marker  is  bolded)      

P-­‐value   Allelic  Odds  Ratio  Allelic  

Frequency  in  Cases  

Allelic  Frequency  in  Controls  

Info   Coding  allele  Other  allele  

Marker  Chr  19-­‐Build  36  position  

Gene  (if  aa  

change)  

Amino  acid  

change  

1.7×10-­‐8   2.815   0.0251   0.0116   0.89   T   G   chr19:54326383   54,326,383    -­‐      -­‐    4.6×10-­‐8   2.509   0.0288   0.0144   0.87   T   C   chr19:54400812   54,400,812    -­‐      -­‐    1.7×10-­‐10   3.101   0.0273   0.0116   0.88   C   A   chr19:54505919   54,505,919    -­‐      -­‐    1.5×10-­‐16   3.122   0.0442   0.0183   0.89   G   C   chr19:54660818   54,660,818   ALDH16A1   P476R  2.7×10-­‐8   1.660   0.0946   0.0658   0.92   G   C   rs1064257   54,685,347    -­‐      -­‐    2.2×10-­‐8   2.113   0.0506   0.0314   0.76   C   G   rs62128084   54,706,613    -­‐      -­‐    3.2×10-­‐11   3.080   0.0293   0.0124   0.87   C   G   chr19:54788061   54,788,061    -­‐      -­‐    1.3×10-­‐11   3.205   0.0288   0.0119   0.87   G   A   chr19:54812310   54,812,310    -­‐      -­‐    4.7×10-­‐11   2.203   0.0566   0.0317   0.90   C   T   rs62128132   54,909,767    -­‐      -­‐    1.8×10-­‐9   2.941   0.0259   0.0114   0.88   G   A   chr19:54991872   54,991,872    -­‐      -­‐    2.0×10-­‐9   2.932   0.0259   0.0113   0.88   A   G   chr19:55018776   55,018,776    -­‐      -­‐    1.8×10-­‐9   2.939   0.0259   0.0113   0.89   T   C   chr19:55068782   55,068,782    -­‐      -­‐    1.9×10-­‐9   2.938   0.0259   0.0113   0.89   G   T   chr19:55071043   55,071,043    -­‐      -­‐    1.9×10-­‐9   2.938   0.0259   0.0113   0.89   C   G   chr19:55071103   55,071,103    -­‐      -­‐    1.1×10-­‐9   3.141   0.0218   0.0078   0.99   C   T   chr19:55268031   55,268,031    -­‐      -­‐    1.1×10-­‐9   3.136   0.0219   0.0079   0.98   C   G   chr19:55270330   55,270,330    -­‐      -­‐    5.2×10-­‐10   3.162   0.0225   0.0081   0.96   C   T   chr19:55483086   55,483,086    -­‐      -­‐    1.8×10-­‐10   3.240   0.0228   0.008   0.98   T   C   chr19:55576372   55,576,372    -­‐      -­‐    1.9×10-­‐8   2.611   0.027   0.0125   0.94   G   A   chr19:55602702   55,602,702    -­‐      -­‐    

The  tests  for  association  are  based  on  an  effective  sample  size  of  968  gout  cases  and  over  40,000  controls.  

Nature Genetics: doi:10.1038/ng.972

Page 12: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

Supplementary  Table  2  –  Association  with  gout  of  genome-­‐wide  significant  markers  at  the  ABCG2  locus  (top  marker  is  bolded)  

P-­‐value  Allelic  Odds  

Ratio  

Allelic  Frequency  in  

Cases  

Allelic  Frequency  in  Controls  

Info  Coding  allele  

Other  allele  

Marker  Chr  4-­‐Build  36  position  

Gene  (if  aa  

change)  

Amino  acid  

change  

1.2×10-­‐9   1.35   0.479   0.419   0.97   T   C   rs2725261   89,255,377    -­‐      -­‐    

5.3×10-­‐12   1.65   0.147   0.102   0.99   G   A   rs1481012   89,258,106    -­‐      -­‐    

2.9×10-­‐12   1.66   0.145   0.100   1.00   C   G   rs45499402   89,262,658    -­‐      -­‐    

3.0×10-­‐12   1.66   0.145   0.100   1.00   C   G   chr4:89263204   89,263,204    -­‐      -­‐    

3.0×10-­‐12   1.66   0.145   0.100   1.00   T   A   chr4:89263336   89,263,336    -­‐      -­‐    

3.0×10-­‐12   1.66   0.145   0.100   1.00   A   G   rs75544042   89,264,355    -­‐      -­‐    

3.0×10-­‐12   1.66   0.145   0.100   1.00   T   C   chr4:89265226   89,265,226    -­‐      -­‐    

2.8×10-­‐12   1.66   0.145   0.100   1.00   A   C   rs74904971   89,269,050    -­‐      -­‐    

2.8×10-­‐12   1.66   0.145   0.100   1.00   T   G   rs2231142   89,271,347   ABCG2   Q141K  

2.8×10-­‐12   1.66   0.145   0.100   1.00   G   A   rs4148155   89,273,691    -­‐      -­‐    

2.3×10-­‐8   1.31   0.518   0.462   1.00   A   C   rs2622620   89,282,875    -­‐      -­‐    

1.9×10-­‐9   1.34   0.562   0.502   0.99   A   C   rs2622627   89,284,377    -­‐      -­‐    

2.0×10-­‐9   1.34   0.561   0.501   0.99   C   A   rs2725249   89,284,892    -­‐      -­‐    

1.6×10-­‐9   1.35   0.565   0.505   0.99   A   C   rs2622626   89,285,739    -­‐      -­‐    

4.6×10-­‐8   1.36   0.752   0.703   0.98   A   C   rs2725248   89,287,031    -­‐      -­‐    

4.5×10-­‐10   1.36   0.513   0.451   0.99   A   G   rs2725247   89,287,281    -­‐      -­‐    

2.8×10-­‐10   1.36   0.511   0.448   0.99   T   G   rs17731799   89,287,479    -­‐      -­‐    

4.7×10-­‐10   1.36   0.513   0.451   0.99   A   G   rs2725246   89,287,522    -­‐      -­‐    

2.7×10-­‐8   1.37   0.751   0.701   0.98   C   T   rs2622625   89,287,761    -­‐      -­‐    

4.7×10-­‐10   1.36   0.513   0.451   0.99   A   G   rs2725245   89,287,762    -­‐      -­‐    

1.8×10-­‐9   1.35   0.564   0.504   0.99   C   T   rs2725244   89,287,785    -­‐      -­‐    

4.7×10-­‐10   1.36   0.512   0.449   0.99   C   T   rs2622624   89,288,430    -­‐      -­‐    

1.9×10-­‐9   1.34   0.564   0.504   0.99   A   T   rs2725242   89,288,551    -­‐      -­‐    

2.6×10-­‐9   1.34   0.565   0.506   0.99   C   T   chr4:89293003   89,293,003    -­‐      -­‐    

4.0×10-­‐9   1.34   0.559   0.500   0.99   T   C   rs13109944   89,293,429    -­‐      -­‐    

7.1×10-­‐10   1.35   0.508   0.447   0.99   G   A   rs28856119   89,293,627    -­‐      -­‐    

4.4×10-­‐10   1.36   0.493   0.432   0.98   A   G   chr4:89293691   89,293,691    -­‐      -­‐    

1.7×10-­‐9   1.35   0.491   0.431   0.97   T   C   chr4:89293698   89,293,698    -­‐      -­‐    

2.1×10-­‐9   1.35   0.490   0.431   0.97   T   C   chr4:89293703   89,293,703    -­‐      -­‐    

2.3×10-­‐9   1.34   0.490   0.431   0.97   G   T   chr4:89293711   89,293,711    -­‐      -­‐    

2.1×10-­‐9   1.35   0.495   0.436   0.98   C   T   chr4:89293717   89,293,717    -­‐      -­‐    

2.1×10-­‐9   1.34   0.498   0.438   0.98   C   G   chr4:89293718   89,293,718    -­‐      -­‐    

7.1×10-­‐10   1.35   0.508   0.447   0.99   G   C   rs34633905   89,293,795    -­‐      -­‐    

1.2×10-­‐9   1.35   0.505   0.444   1.00   A   C   rs2725239   89,294,647    -­‐      -­‐    

1.3×10-­‐9   1.35   0.505   0.444   1.00   G   C   rs2622603   89,296,505    -­‐      -­‐    

1.1×10-­‐9   1.35   0.504   0.443   0.99   T   C   rs2622605   89,298,410    -­‐      -­‐    

1.5×10-­‐9   1.34   0.504   0.444   1.00   C   T   rs2622605   89,298,410    -­‐      -­‐    

1.3×10-­‐9   1.35   0.504   0.444   1.00   C   T   rs3114020   89,302,690    -­‐      -­‐    

7.5×10-­‐10   1.35   0.504   0.442   1.00   A   G   rs2725226   89,304,355    -­‐      -­‐    

7.6×10-­‐10   1.35   0.506   0.444   0.99   T   A   rs2622608   89,305,768    -­‐      -­‐    

6.9×10-­‐10   1.35   0.505   0.443   0.99   C   A   rs2622609   89,307,499    -­‐      -­‐    

The  tests  for  association  are  based  on  an  effective  sample  size  of  968  gout  cases  and  over  40,000  controls.      

Nature Genetics: doi:10.1038/ng.972

Page 13: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

Supplementary  Table  3  –  Novel  and  previously  reported  sequence  variants  associating  with  serum  uric  acid  levels  and  gout  

 

      Allele       Uric  Acid     Gout    SNP   Chr   Pos   Effect/other   Freq   Info   Effect  (95%  CI)a   P   OR  (95%  CI)   P  Novel  SNP  associations          chr1_142697422   1   142,697,422   C/T   0.986   0.55   0.48  (0.36,  0.60)   4.5×10-­‐16   1.92  (1.01,  3.63)   0.046  chr1_144539240   1   144,539,240   A/G   0.987   0.65   0.41  (0.30,  0.52)   2.5×10-­‐13   2.06  (1.11,  3.82)   0.023  c.1580C>G   19   54,660,818   G/C   0.019   0.89   0.36  (0.29,  0.44)   4.5×10-­‐21   3.12  (2.38,  4.17)   1.5×10-­‐16  Replication  of  previously  reported  SNP  associations        rs1967017   1   144,435,002   T/C   0.449   0.93   0.03  (0.01,  0.05)   0.0016   1.09  (0.98,  1.19)   0.12  rs12129861   1   144,437,046   G/A   0.500   0.90   0.04  (0.01,  0.06)   0.0012   1.04  (0.94,  1.15)   0.46  rs780094   2   27,594,741   T/C   0.340   1.00   0.04  (0.02,  0.06)   0.00071   1.19  (1.08,  1.32)   0.00092  rs780093   2   27,596,107   T/C   0.342   1.00   0.04  (0.02,  0.06)   0.00082   1.18  (1.06,  1.30)   0.0012  rs734553   4   9,532,102   T/G   0.790   1.00   0.24  (0.22,  0.27)   1.0×10-­‐80   1.39  (1.23,  1.59)   2.4×10-­‐7  rs13129697   4   9,536,065   T/G   0.767   1.00   0.23  (0.21,  0.26)   1.6×10-­‐79   1.32  (1.18,  1.49)   5.1×10-­‐6  rs2199936   4   89,264,355   A/G   0.101   1.00   0.16  (0.12,  0.19)   1.9×10-­‐20   1.66  (1.44,  1.91)   3.0×10-­‐12  rs2231142   4   89,271,347   T/G   0.101   1.00   0.16  (0.12,  0.19)   2.3×10-­‐20   1.67  (1.43,  1.92)   2.8×10-­‐12  rs675209   6   7,047,083   T/C   0.260   1.00   0.04  (0.01,  0.06)   0.0022   1.03  (0.92,  1.15)   0.62  rs742132   6   25,715,550   A/G   0.716   1.00   0.01  (-­‐0.01,  0.04)   0.25   1.09  (0.98,  1.21)   0.12  rs1165196   6   25,921,129   A/G   0.492   1.00   0.05  (0.03,  0.07)   2.8×10-­‐6   1.10  (1.00,  1.21)   0.059  rs1183201   6   25,931,423   T/A   0.479   1.00   0.05  (0.03,  0.07)   6.4×10-­‐6   1.06  (0.96,  1.18)   0.22  rs12356193   10   61,083,359   A/G   0.849   1.00   0.04  (0.01,  0.07)   0.0062   1.05  (0.92,  1.21)   0.46  rs17300741   11   64,088,038   A/G   0.499   1.00   0.04  (0.02,  0.06)   1.9×10-­‐5   1.06  (0.96,  1.16)   0.27  rs2078267   11   64,090,690   C/T   0.502   1.00   0.05  (0.03,  0.07)   1.1×10-­‐5   1.07  (0.97,  1.17)   0.18  rs505802   11   64,113,648   C/T   0.280   1.00   0.03  (0.01,  0.05)   0.0076   1.03  (0.93,  1.15)   0.57  rs1106766   12   56,095,723   C/T   0.689   0.98   0.04  (0.02,  0.06)   0.00084   1.06  (0.95,  1.17)   0.30  The  tests  for  association  are  based  on  an  effective  sample  size  of  15,506  individuals  with  uric  acid  measurements  and  968  gout  cases  and  over  40,000  controls.  Previously  reported  SNPs  are  from  Yang  et  al.  and  Kolz  et  al.  Publications.  aEffects  on  uric  acid  levels  are  in  standard  deviations.    

Nature Genetics: doi:10.1038/ng.972

Page 14: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

Supplementary  Table  4  –  Sex-­‐stratified  uric  acid  levels  association  for  novel  and  previously  reported  sequence  variants  associating  with  gout  and  serum  uric  acid  levels  

            Allele           Male       Female        

SNP   Chr   Pos   Effect/other   Freq   Info   Effect  (95%  CI)   P   Effect  (95%  CI)   P   Pdiff  

chr1_142697422   1   142697422   C/T   0.986   0.55   0.58  (0.40,  0.76)   2.4×10-­‐10   0.41  (0.26,  0.55)   2.7×10-­‐8   0.14  rs1967017   1   144435002   T/C   0.449   0.93   0.04  (0.01,  0.07)   0.0086   0.03  (0.00,  0.05)   0.043   0.47  rs12129861   1   144437046   G/A   0.5   0.9   0.04  (0.01,  0.08)   0.0064   0.03  (0.00,  0.06)   0.035   0.46  rs780094   2   27594741   T/C   0.34   1   0.03  (0.00,  0.07)   0.041   0.04  (0.01,  0.07)   0.0036   0.78  rs780093   2   27596107   T/C   0.342   1   0.03  (0.00,  0.06)   0.053   0.04  (0.01,  0.07)   0.0033   0.70  rs734553   4   9532102   T/G   0.79   1   0.19  (0.15,  0.23)   5.1×10-­‐23   0.28  (0.25,  0.31)   2.1×10-­‐70   0.00031  

rs13129697   4   9536065   T/G   0.767   1   0.18  (0.15,  0.22)   1.2×10-­‐22   0.27  (0.24,  0.30)   1.3×10-­‐69   0.00026  rs2199936   4   89264355   A/G   0.101   1   0.20  (0.15,  0.25)   7.0×10-­‐15   0.13  (0.09,  0.17)   3.3×10-­‐9   0.036  rs2231142   4   89271347   T/G   0.101   1   0.20  (0.15,  0.25)   6.5×10-­‐15   0.13  (0.08,  0.17)   4.1×10-­‐9   0.034  rs675209   6   7047083   T/C   0.26   1   0.06  (0.02,  0.09)   0.001   0.02  (-­‐0.01,  0.05)   0.25   0.074  rs742132   6   25715550   A/G   0.716   1   0.01  (-­‐0.03,  0.04)   0.66   0.02  (-­‐0.01,  0.05)   0.19   0.62  rs1165196   6   25921129   A/G   0.492   1   0.05  (0.01,  0.08)   0.0038   0.05  (0.02,  0.08)   0.00013   0.82  rs1183201   6   25931423   T/A   0.479   1   0.04  (0.01,  0.07)   0.012   0.05  (0.03,  0.08)   0.0001   0.59  rs12356193   10   61083359   A/G   0.849   1   0.05  (0.01,  0.10)   0.018   0.03  (0.00,  0.07)   0.069   0.50  rs17300741   11   64088038   A/G   0.499   1   0.05  (0.01,  0.08)   0.0037   0.04  (0.02,  0.07)   0.0007   0.95  rs2078267   11   64090690   C/T   0.502   1   0.05  (0.02,  0.08)   0.003   0.05  (0.02,  0.07)   0.00048   0.96  rs505802   11   64113648   C/T   0.28   1   0.03  (0.00,  0.07)   0.062   0.03  (0.00,  0.06)   0.048   0.87  rs1106766   12   56095723   C/T   0.689   0.98   0.04  (0.01,  0.08)   0.0097   0.03  (0.00,  0.06)   0.023   0.59  c.1580C>G   19   54660818   G/C   0.019   0.89   0.35  (0.24,  0.46)   2.9×10-­‐10   0.38  (0.28,  0.48)   4.9×10-­‐14   0.71  

Nature Genetics: doi:10.1038/ng.972

Page 15: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

Supplementary  Table  5  –  Sex-­‐stratified  gout  association  for  novel  and  previously  reported  sequence  variants  associating  with  gout  and  serum  uric  acid  levels  

            Allele           Male       Female        

SNP   Chr   Pos   Effect/other   Freq   Info   OR  (95%  CI)   P   OR  (95%  CI)   P   Pdiff  

chr1_142697422   1   142697422   C/T   0.986   0.55   5.62  (2.01,  15.70)   0.00098   0.88  (0.40,  1.94)   0.75   0.0049  rs1967017   1   144435002   T/C   0.449   0.93   1.09  (0.95,  1.23)   0.21   1.08  (0.92,  1.25)   0.37   0.92  rs12129861   1   144437046   G/A   0.5   0.9   1.06  (0.93,  1.22)   0.32   0.99  (0.85,  1.15)   0.89   0.45  rs780094   2   27594741   T/C   0.34   1   1.19  (1.05,  1.35)   0.006   1.16  (1.00,  1.35)   0.057   0.82  rs780093   2   27596107   T/C   0.342   1   1.19  (1.04,  1.35)   0.0084   1.16  (1.00,  1.35)   0.058   0.82  rs734553   4   9532102   T/G   0.79   1   1.33  (1.14,  1.56)   0.00034   1.47  (1.22,  1.79)   7.6×10-­‐5   0.44  

rs13129697   4   9536065   T/G   0.767   1   1.30  (1.11,  1.52)   0.00075   1.35  (1.14,  1.61)   0.001   0.74  rs2199936   4   89264355   A/G   0.101   1   1.79  (1.49,  2.13)   1.8×10-­‐10   1.44  (1.16,  1.80)   0.0011   0.13  rs2231142   4   89271347   T/G   0.101   1   1.79  (1.49,  2.13)   1.7×10-­‐10   1.45  (1.15,  1.79)   0.0011   0.15  rs675209   6   7047083   T/C   0.26   1   0.95  (0.82,  1.10)   0.47   1.15  (0.98,  1.35)   0.088   0.076  rs742132   6   25715550   A/G   0.716   1   1.07  (0.94,  1.23)   0.3   1.09  (0.93,  1.29)   0.3   0.86  rs1165196   6   25921129   A/G   0.492   1   1.08  (0.96,  1.22)   0.22   1.12  (0.97,  1.30)   0.13   0.71  rs1183201   6   25931423   T/A   0.479   1   1.03  (0.92,  1.18)   0.58   1.10  (0.95,  1.28)   0.19   0.48  rs12356193   10   61083359   A/G   0.849   1   1.13  (0.95,  1.35)   0.17   0.97  (0.79,  1.19)   0.76   0.25  rs17300741   11   64088038   A/G   0.499   1   1.04  (0.92,  1.18)   0.51   1.09  (0.94,  1.26)   0.24   0.62  rs2078267   11   64090690   C/T   0.502   1   1.05  (0.93,  1.19)   0.45   1.11  (0.96,  1.28)   0.16   0.57  rs505802   11   64113648   C/T   0.28   1   1.02  (0.89,  1.17)   0.75   1.04  (0.88,  1.22)   0.64   0.85  rs1106766   12   56095723   C/T   0.689   0.98   1.07  (0.93,  1.22)   0.33   1.04  (0.89,  1.22)   0.61   0.78  c.1580C>G   19   54660818   G/C   0.019   0.89   3.85  (2.86,  5.56)   1.7×10-­‐16   2.04  (1.28,  3.23)   0.0024   0.027  

Nature Genetics: doi:10.1038/ng.972

Page 16: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

Supplementary  Table  6  –  Association  of  uric  acid  sequence  variants  with  age  at  visit  to  the  clinician  

            Allele        

SNP   Chr   Pos   Effect/other   Effect  (95%  CI)   P  

chr1_142697422   1   142697422   C/T   -­‐14.94  (-­‐30.00,  0.11)   0.052  rs1967017   1   144435002   T/C   -­‐0.87  (-­‐2.71,  0.97)       0.35  rs12129861   1   144437046   G/A   -­‐0.80  (-­‐2.66,  1.07)       0.40  rs780094   2   27594741   T/C   -­‐0.23  (-­‐1.98,  1.52)       0.80  rs780093   2   27596107   T/C   -­‐0.23  (-­‐1.98,  1.53)       0.80  rs734553   4   9532102   T/G   0.24  (-­‐2.15,  2.64)        0.84  

rs13129697   4   9536065   T/G   -­‐0.35  (-­‐2.60,  1.90)       0.76  rs2199936   4   89264355   A/G   -­‐3.41  (-­‐5.85,  -­‐0.97)    0.0061  rs2231142   4   89271347   T/G   -­‐3.41  (-­‐5.85,  -­‐0.97)    0.0062  rs675209   6   7047083   T/C   1.10  (-­‐0.95,  3.14)        0.29  rs742132   6   25715550   A/G   0.83  (-­‐1.14,  2.81)        0.41  rs1165196   6   25921129   A/G   -­‐0.77  (-­‐2.51,  0.97)       0.38  rs1183201   6   25931423   T/A   -­‐0.72  (-­‐2.48,  1.04)       0.42  rs12356193   10   61083359   A/G   0.56  (-­‐1.97,  3.09)        0.67  rs17300741   11   64088038   A/G   0.08  (-­‐1.62,  1.78)        0.93  rs2078267   11   64090690   C/T   0.20  (-­‐1.50,  1.90)        0.82  rs505802   11   64113648   C/T   -­‐0.25  (-­‐2.11,  1.60)       0.79  rs1106766   12   56095723   C/T   0.55  (-­‐1.42,  2.51)   0.58 c.1580C>G   19   54660818   G/C   -­‐7.62  (-­‐12.36,  -­‐2.89)   0.0016  

Effect is expressed in years.

Nature Genetics: doi:10.1038/ng.972

Page 17: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

Supplementary Figure 1

The sequencing depth of the 457 whole-genome sequenced individuals.

Nature Genetics: doi:10.1038/ng.972

Page 18: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

Supplementary Figure 2

Quantile-quantile plot of the 15,957,390 SNPs in the genome-wide association scan for gout. The blue ‘x’s represent the P values scaled down by the genomic control inflation factor of 1.10. The diagonal red line represents where the dots are expected to fall under the null hypothesis of no association. The horizontal green line represents P = 5 × 10-8.

Nature Genetics: doi:10.1038/ng.972

Page 19: Identification of low frequency variants associated with ... · Identification of low frequency variants associated with gout and serum uric acid levels Patrick Sulem1*, Daniel F.

Supplementary Figure 3

Quantile-quantile plot of the 15,957,390 SNPs in the genome-wide association scan for uric acid levels. The blue ‘x’s represent the P values scaled down by the genomic control inflation factor of 1.18. The diagonal red line represents where the dots are expected to fall under the null hypothesis of no association. The horizontal green line represents P = 5 × 10-8.

Nature Genetics: doi:10.1038/ng.972