1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures … · 2017-04-13 ·...

7
1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures of natural selection in modern humans Marc Pybus 1 , Giovanni M. Dall’Olio 1 , Pierre Luisi 1 , Manu Uzkudun 1 , Angel Carren ˜ o-Torres 2 , Pavlos Pavlidis 3 , Hafid Laayouni 1 , Jaume Bertranpetit 1, * and Johannes Engelken 1,4, * 1 Program for Population Genetics, Institute of Evolutionary Biology (CSIC—Universitat Pompeu Fabra), 08003 Barcelona, Spain, 2 Population Genomics Node, National Institute for Bioinformatics (INB), Universitat Pompeu Fabra, 08003 Barcelona, Spain, 3 Institute of Molecular Biology and Biotechnology-FORTH, Heraklion, Crete GR 700 13, Greece and 4 Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, 04103 Leipzig, Germany Received July 31, 2013; Revised October 31, 2013; Accepted November 1, 2013 ABSTRACT Searching for Darwinian selection in natural popula- tions has been the focus of a multitude of studies over the last decades. Here we present the 1000 Genomes Selection Browser 1.0 (http://hsb.upf.edu) as a resource for signatures of recent natural selec- tion in modern humans. We have implemented and applied a large number of neutrality tests as well as summary statistics informative for the action of se- lection such as Tajima’s D, CLR, Fay and Wu’s H, Fu and Li’s F* and D*, XPEHH, "iHH, iHS, F ST , "DAF and XPCLR among others to low coverage sequencing data from the 1000 genomes project (Phase 1; release April 2012). We have implemented a publicly available genome-wide browser to communicate the results from three different populations of West African, Northern European and East Asian ancestry (YRI, CEU, CHB). Information is provided in UCSC- style format to facilitate the integration with the rich UCSC browser tracks and an access page is provided with instructions and for convenient visualization. We believe that this expandable resource will facilitate the interpretation of signals of selection on different temporal, geographical and genomic scales. INTRODUCTION Initiatives such as the 1000 Genomes Project (1,2) are generating resequencing data from world-wide human populations on a genome-wide scale. Resequencing data constitutes a major leap for population genomic analysis due to its higher information density and limited SNP as- certainment bias compared to genotyping data. Therefore such data is appropriate to calculate summary statistics that are based on the site frequency spectrum like CLR or Tajima’s D. Using the neutral evolutionary model as a null hypothesis, diverse statistics can be applied to genetic data to identify deviations from neutrality (Table 1). These statistical tests show varying degrees of robustness to demographic events (e.g. population bottlenecks and expansions) and sensitivity to different types of selection (e.g. positive, purifying or balancing). For instance, popu- lation bottlenecks, can lead to footprints that are similar to those caused by positive selection (21). Therefore, outlier approaches, which are commonly used to identify non-neutral loci in the extremes of a genome-wide distri- bution, are likely to contain a number of false positives in their extremes. Likewise, a number of false negatives, hence misidentified truly selected loci, are expected in a grey zone near the (arbitrary) outlier threshold (22). Outlier approaches in genome scans have proven powerful, but certainly they should be interpreted care- fully in order to avoid storytelling (23). Even more, a profound understanding of adaptive evolution requires the integration of biological function (24) and if possible, validation on an experimental basis (25). Molecular network approaches can also give a functional context to the specific genes under adaptive selection (26,27). In all studies, care should be taken in communicating putative loci under selection to the public in order to avoid racist misinterpretation (28). *To whom correspondence should be addressed. Tel: +34 933 160 840; Fax:+34 935 422 802; Email: [email protected] Correspondence may also be addressed to Jaume Bertranpetit. Email: [email protected] The authors wish it to be known that, in their opinion, the first four authors should be regarded as Joint First Authors. Published online 25 November 2013 Nucleic Acids Research, 2014, Vol. 42, Database issue D903–D909 doi:10.1093/nar/gkt1188 ß The Author(s) 2013. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/ by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]

Transcript of 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures … · 2017-04-13 ·...

Page 1: 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures … · 2017-04-13 · 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures of

1000 Genomes Selection Browser 10 a genomebrowser dedicated to signatures of natural selectionin modern humansMarc Pybus1 Giovanni M DallrsquoOlio1 Pierre Luisi1 Manu Uzkudun1

Angel Carreno-Torres2 Pavlos Pavlidis3 Hafid Laayouni1 Jaume Bertranpetit1 and

Johannes Engelken14

1Program for Population Genetics Institute of Evolutionary Biology (CSICmdashUniversitat Pompeu Fabra) 08003Barcelona Spain 2Population Genomics Node National Institute for Bioinformatics (INB) Universitat PompeuFabra 08003 Barcelona Spain 3Institute of Molecular Biology and Biotechnology-FORTH Heraklion Crete GR700 13 Greece and 4Department of Evolutionary Genetics Max Planck Institute for Evolutionary Anthropology04103 Leipzig Germany

Received July 31 2013 Revised October 31 2013 Accepted November 1 2013

ABSTRACT

Searching for Darwinian selection in natural popula-tions has been the focus of a multitude of studiesover the last decades Here we present the 1000Genomes Selection Browser 10 (httphsbupfedu)as a resource for signatures of recent natural selec-tion in modern humans We have implemented andapplied a large number of neutrality tests as well assummary statistics informative for the action of se-lection such as Tajimarsquos D CLR Fay and Wursquos H Fuand Lirsquos F and D XPEHH iHH iHS FST DAF andXPCLR among others to low coverage sequencingdata from the 1000 genomes project (Phase 1release April 2012) We have implemented a publiclyavailable genome-wide browser to communicate theresults from three different populations of WestAfrican Northern European and East Asian ancestry(YRI CEU CHB) Information is provided in UCSC-style format to facilitate the integration with the richUCSC browser tracks and an access page is providedwith instructions and for convenient visualization Webelieve that this expandable resource will facilitatethe interpretation of signals of selection on differenttemporal geographical and genomic scales

INTRODUCTION

Initiatives such as the 1000 Genomes Project (12) aregenerating resequencing data from world-wide human

populations on a genome-wide scale Resequencing dataconstitutes a major leap for population genomic analysisdue to its higher information density and limited SNP as-certainment bias compared to genotyping data Thereforesuch data is appropriate to calculate summary statisticsthat are based on the site frequency spectrum like CLRor Tajimarsquos D Using the neutral evolutionary model as anull hypothesis diverse statistics can be applied to geneticdata to identify deviations from neutrality (Table 1)These statistical tests show varying degrees of robustnessto demographic events (eg population bottlenecks andexpansions) and sensitivity to different types of selection(eg positive purifying or balancing) For instance popu-lation bottlenecks can lead to footprints that are similarto those caused by positive selection (21) Thereforeoutlier approaches which are commonly used to identifynon-neutral loci in the extremes of a genome-wide distri-bution are likely to contain a number of false positives intheir extremes Likewise a number of false negativeshence misidentified truly selected loci are expected in agrey zone near the (arbitrary) outlier threshold (22)Outlier approaches in genome scans have provenpowerful but certainly they should be interpreted care-fully in order to avoid storytelling (23) Even more aprofound understanding of adaptive evolution requiresthe integration of biological function (24) and ifpossible validation on an experimental basis (25)Molecular network approaches can also give a functionalcontext to the specific genes under adaptive selection(2627) In all studies care should be taken incommunicating putative loci under selection to thepublic in order to avoid racist misinterpretation (28)

To whom correspondence should be addressed Tel +34 933 160 840 Fax +34 935 422 802 Email johannesengelkenyahoocomCorrespondence may also be addressed to Jaume Bertranpetit Email JaumeBertranpetitupfedu

The authors wish it to be known that in their opinion the first four authors should be regarded as Joint First Authors

Published online 25 November 2013 Nucleic Acids Research 2014 Vol 42 Database issue D903ndashD909doi101093nargkt1188

The Author(s) 2013 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (httpcreativecommonsorglicensesby-nc30) which permits non-commercial re-use distribution and reproduction in any medium provided the original work is properly cited For commercialre-use please contact journalspermissionsoupcom

Despite of these limitations and the fact that completeselective sweeps may not be extremely widespread inhumans (29) a large number of regions under strongpositive selection can be expected in the genome (30)

DESCRIPTION OF APPLIED STATISTICAL TESTS

Due to linkage neutral alleles in the surrounding regionhitchhike with the selected allele Maynard Smith andHaigh (31) described this process of genetic hitchhikingand the so-called selective sweep More recent studiesshowed that genetic hitchhiking generates distinct poly-morphism signatures on the genome such as (i) reductionof polymorphism level and excess of low- and high-frequency derived variants (32) (ii) spatial patterns oflinkage-disequilibrium (33) and (iii) increased genetic dif-ferentiation among populations (34) Taking advantage ofthese three theoretical expectations several methods todetect positive selection have been developed in the lasttwo decades This makes reference to the fact that nosingle statistic is enough to describe selection undervarious demographic models and modes of selection (22)Here we implemented a large number of statistical tests

(Table 1) in order to allow for a more comprehensiveanalysis of natural selection especially positive selectionIn brief we have assigned the statistical tests to differentmethod families (Table 1) Within the first family which isbased on the allele frequency spectrum Tajimarsquos D (3) is aclassical neutrality test that compares estimates ofthe number of segregating sites and the mean pair-wisedifference between sequences CLR is a multi-locus com-posite likelihood ratio test (435) Fay and Wursquos H (5)uses another facet of the site-frequency spectrum by

comparing the number of derived segregating sites athigh frequencies to the number of variants at intermediatefrequencies Fu and Lirsquos F compares the number ofsingletons to the mean pair-wise difference betweensequences and Fu and Lirsquos D compares it to the totalnumber of nucleotide variants in a genomic region (6)R2 (7) is a statistical test for detecting populationgrowth based on the comparison of the differencebetween the number of singletons per sequence and theaverage number of nucleotide differences

Among the linkage disequilibrium structure methodsXP-EHH (8) is a cross-population test based onextended haplotype homozygosity (EHH) iHH con-siders the difference between the integrated haplotypehomozygosity scores for each allele in a single populationwhile iHS (9) is defined as their log ratio EHH averageand EHH maximum (36) modified from (10) are based onthe extended haplotype homozygosity Wallrsquos B (11)counts the number of pairs of adjacent segregating sitesthat are congruent (if the subset of the data consisting ofthe two sites contains only two different haplotypes) whileWallrsquos Q (12) adds the number of partitions (two disjointsubsets whose union is the set of individuals in the sample)induced by congruent pairs to Wallrsquos B Fursquos F (13) takesinto account the haplotype diversity in the sampleDh (14) is a summary statistic based on the number ofdifferent haplotypes in the sample

The third family of methods is based on population dif-ferentiation FST (37) calculated following the diploidmethod in Weir 1996 (p 178) and DAF (18) are estimatesof population differentiation based on derived allelefrequencies XP-CLR (19) is a multi-locus allele-frequency-differentiation statistic between two populations

Table 1 List of available summary statistics

Method family Method Reference Window size Rank scores tail

Allele frequency spectrum Tajimarsquos D Tajima (3) 30 kb LowerCLR Nielsen et al (4) Variable size UpperFay and Wursquos H Fay and Wu (5) 30 kb LowerFu and Lirsquos F Fu and Li (6) 30 kb LowerFu and Lirsquos D Fu and Li (6) 30 kb LowerR2 Ramos-Onsins and Rozas (7) 30 kb Lower

Linkage disequilibriumstructure

XP-EHH modified from Sabeti et al (8) SNP-specific UpperAiHH modified from Voight et al (9) SNP-specific Upperhis modified from Voight et al (9) SNP-specific UpperEHH_average modified from Sabeti et al (10) 30 kb UpperEHH_max modified from Sabeti et al (10) 30 kb UpperWallrsquos B Wall (11) 30 kb UpperWallrsquos Q Wall (12) 30 kb UpperFursquos F Fu (13) 30 kb LowerDh Nei (14) 30 kb UpperZa Rozas et al (15) 30 kb UpperZnS Kelly (16) 30 kb UpperZZ Rozas et al (15) 30 kb Upper

Population differentiation Fst (global and pairwise) Weir and Cockerham (17) SNP-specific UpperDAF (standard and absolute) Hofer et al (18) SNP-specific UpperXP-CLR Chen et al (19) 01 cM (maximum window) Upper

Descriptive statistics Segregating sites 30 kb NASingletons 30 kb NApi (nucleotide diversity) Nei and Li (20) 30 kb NADAF (derived allele frequency) SNP-specific NAMAF (minor allele frequency) SNP-specific NA

D904 Nucleic Acids Research 2014 Vol 42 Database issue

Additional statistics like segregating sites per 30-kb windowand the nucleotide diversity and others (Table 1) are listedas descriptive statistics A thorough description of the testsis given in the original literature (see Table 1) and in diverseexcellent reviews on the topic (3839)

COMPUTATIONAL FRAMEWORK ANDDESCRIPTION OF 1000 GENOMES SOURCE DATA

A framework to calculate diverse summary statistics(Table 1) from 1000 genomes data was developed(Figure 1) A detailed description of how the statisticswere implemented is given (Supplementary Material)A genome-wide overview of the results stored in thedatabase for selected summary statistics is given(Supplementary Table S1) As described in the 1000genomes Phase 1 paper (1) the quality of the 1000genomes low coverage data has improved considerablyover the pilot phase (2) but a number of limitationsneed to be kept in mind for population genomicanalysis (i) singletons and other rare variants are stillunderrepresented (ii) the accessibility of the genomewith the used short-read-sequencing technologies 94and (iii) the reported phasing switch error every 250 kb(median Supplementary Figure S5 in (1)) likely underesti-mates the length of long-shared haplotypes expected tooccur around recent selective sweeps Despite of thesedrawbacks which are mainly due to the nature of thelow coverage approach the short-read technology and dif-ferences in read depth (40) this dataset has importantadvantages over genotyping data most importantly (i) ahigher SNP density (ii) the overcoming of ascertainmentbias and (iii) a larger number of individuals per popula-tion when compared to previous datasets (HapMap IIand HGDP) We used phased data from the CEU theCHB and the YRI populations from the integratedPhase 1 variant set (April 2012) with 97 85 and 88 indi-viduals respectively From the input vcf (variant callformat) file we extracted exclusively the low-coverageVSQR SNP calls in order to avoid any bias that mightresult from differences between low-coverage calls andhigh-coverage exome SNP calls Indels were not usedAncestral states in this data set were identified using a4-way alignment of humans chimp orangutan andrhesus macaque provided by the 1000 genomes consor-tium (ftpftp1000genomesebiacukvol1ftpphase1analysis_resultssupportingancestral_alignments)

AVAILABILITY OF DATABASE

All data is available via our entry page httphsbupfeduA search mask gives the user easy access to the results fora specific gene or a genomic region of choice The lsquosubmitrsquobutton leads the user to a UCSC-style genome browser(httppgbibeupfedu) which is a custom installationof the UCSC Genome Browser (4142) This UCSCGenome Browser installment allows for a visual inspec-tion of the data and for an integration of our data withmany other available datasets The raw scores of thetracks can be conveniently downloaded using the UCSC

Table function (43) and is integrated with the Galaxyplatform (galaxyprojectorg) Using the lsquoconfigurersquofunction on the browser page the tracks can be furthercustomized and using lsquoright clickrsquo the visualized genomicregions can be downloaded as a picture in png formatFor every statistical test we provide two tracks one forthe raw scores and one for ranked scores The purpose ofthe rank score tracks is to provide a comparison to the restof the genome Conveniently the rank scores are pre-sented in such a way that they present a peak (instead ofa valley) in regions under positive selection They arecalculated using an outlier approach (2244) by sortingall the scores genome-wide and determining the log10of the rank divided by the number of values in the distri-bution taking the upper tail for most of the tests or thelower tail for Tajimarsquos D Fay and Wursquos H Fu and Lirsquos Fand D R2 and Fursquos F (see Table 1 and a more detaileddescription on the entry page) The main purpose of theentry page is to provide a channel of communication withusers following the guidelines in (45) It serves as aplatform for updates questions and feedback (46)Therefore the page also provides documentation on thetracks and on the tests implemented as well as a FAQand a feedback section

Figure 1 Schematic workflow developed in order to calculate diversegenome-wide summary statistics informative for the action of selectionand to build a database in order to share and visualize the results

Nucleic Acids Research 2014 Vol 42 Database issue D905

Figure 2 Examples of genomic regions under selection in the 1000 genomes selection browser Tracks of statistics from different populations arevisualized in colour (CEU in green CHB in red and YRI in blue) Additional examples are given at httphsbupfedu (A) The p- and q-arms ofchromosome 2 (log10 of empirically ranked scores) Recurrent peaks at around 725Mb (left green arrow) and 1095Mb (right green arrow)indicate the loci CYP26B1EXOC6B and EDAR respectively (B) Signature of positive selection around SLC45A2 another established skin colourgene in the CEU population (05-Mb window log10 of empirically ranked scores) (C) Widespread balancing selection in the HLA region indicatedby strongly positive scores for Tajimarsquos D in all three analysed human populations (05-Mb window)

D906 Nucleic Acids Research 2014 Vol 42 Database issue

EXAMPLE APPLICATIONS

First we exemplify the use of the database by extract-ing results for a number of established loci under selectionEDAR (47) LCT (46) SLC45A2 (48) CD36 (49) HERC2(50) SLC24A5 (51) CD5 (52) and APOL1 (53) A loci-specific summary of statistical tests is given(Supplementary Table S2) Interestingly for any givenlocus only a subset of statistical tests shows an extremeoutlier score This is consistent with differences in thearchitecture of selective sweeps iHS scores near tocertain very pronounced selective sweeps (eg LCT andSLC24A5) failed to compute due to inherent propertiesof the statistics because either (i) the selected haplotypewas near fixation or (ii) the EHH did not drop below thedefined threshold in a given window Examples for bothpositive (SLC45A2) and balancing (HLA region) selectionare visualized in Figure 2 As expected Tajimarsquos D scoresaround HLA (54) as well as the ABO locus (55) (data notshown) were pronouncedly elevated in all three analyzedpopulations a pattern which is compatible with the actionof balancing selection

COMPARISON TO OTHER WEB RESOURCES

As for positive selection based on between-species com-parisons the Selectome database (httpbioinfounilchselectome (56)) presents results based on the dNdSmethod using a branch-site specific likelihood test Asfor recent natural selection within modern humans anumber of web resources are available For previousdatasets eg the HapMap 2 and HGDP projectsseveral positive selection statistics are available in formof the haplotter tool (httphaplotteruchicagoedu(24)) and in form of the HGDP selection browser(httphgdpuchicagoedu (57)) For the 1000 genomesproject data the online tool ENGINES (httpspsmartcesgaes (58)) is useful for the analysis of allele frequenciesand a recent study presented a method to calculate cor-rected summary statistics from low coverage sequencingdata (40) dbPSHP (httpjjwanglaborgdbpshp) offersa large number of statistical tests in a SNP-specificmanner for HapMap 3 and 1000 genomes datasetsComplementary to these databases our database gives alarge number of region- and SNP-specific scores (depend-ing on the test statistic) based on resequencing data (1000genomes Phase 1) with a special focus on genome-widesignificance (by the ranked scores) and the visualization ofseveral statistics in parallel (Figure 2)

CONCLUSIONS

By applying a large number of summary statistics to datafrom the 1000 genomes project we have built a timely andexpandable resource for the population genomics researchcommunity An associated user-friendly genome browsergives a visual impression of the genetic variation in agenomic region of interest and offers functionality for anarray of down-stream analyses While this resource willnot replace a thorough case by case analysis of selectionwe expect that it will prove useful for the research

community through the large number of test statisticsand the fine-grained character of resequencing data

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors thankfully acknowledge contributions fromAnna Ramırez-Soriano Arcadi Navarro FrancescCalafell Elena Bosch Chris Tyler-Smith GoncaloAbecasis Roger Bartomeus Penalver and the 1000Genomes Project (1000genomesorg) The authors alsothank Txema Heredia and the National Institute ofBioinformatics (httpwwwinaborg) for computationalsupport

FUNDING

Ministerio de Ciencia y Tecnologıa (Spain) DireccioGeneral de Recerca Generalitat de Catalunya (Grup deRecerca Consolidat 2009 SGR 1101) Subprogram BMC[BFU2010-19443 awarded to JB] Post-doctoral scholar-ship from the Volkswagenstiftung [Az I85 198 to JE]Spanish government [BFU-2008-01046 SAF2011-29239]The Spanish government FPI scholarships [BES-2009-017731 and BES-2011-04502 to GMD and MPrespectively] PhD fellowship from lsquoAccion Estrategica deSalud en el marco del Plan Nacional de InvestigacionCientıfica Desarrollo e Innovacion Tecnologica 2008-2011rsquo from Instituto de Salud Carlos III (to PL)Funding for open access charge Prof Jaume Bertranpetit

Conflict of interest statement None declared

REFERENCES

1 The 1000 Genomes Project Consortium AbecasisGR AutonABrooksLD DePristoMA DurbinRM HandsakerREKangHM MarthGT and McVeanGA (2012) An integratedmap of genetic variation from 1092 human genomes Nature491 56ndash65

2 The 1000 Genomes Project Consortium (2010) A map of humangenome variation from population-scale sequencing Nature 4671061ndash1073

3 TajimaF (1989) Statistical method for testing the neutralmutation hypothesis by DNA polymorphism Genetics 123585ndash595

4 NielsenR WilliamsonS KimY HubiszMJ ClarkAG andBustamanteC (2005) Genomic scans for selective sweeps usingSNP data Genome Res 15 1566ndash1575

5 FayJC and WuCI (2000) Hitchhiking under positiveDarwinian selection Genetics 155 1405ndash1413

6 FuYX and LiWH (1993) Statistical tests of neutrality ofmutations Genetics 133 693ndash709

7 Ramos-OnsinsSE and RozasJ (2002) Statistical properties ofnew neutrality tests against population growth Mol Biol Evol19 2092ndash2100

8 SabetiPC VarillyP FryB LohmuellerJ HostetterECotsapasC XieX ByrneEH McCarrollSA GaudetR et al(2007) Genome-wide detection and characterization of positiveselection in human populations Nature 449 913ndash918

Nucleic Acids Research 2014 Vol 42 Database issue D907

9 VoightBF KudaravalliS WenX and PritchardJK (2006) Amap of recent positive selection in the human genome PLoSBiol 4 e72

10 SabetiPC ReichDE HigginsJM LevineHZP RichterDJSchaffnerSF GabrielSB PlatkoJV PattersonNJMcDonaldJG et al (2002) Detecting recent positive selection inthe human genome from haplotype structure Nature 419832ndash837

11 WallJD (1999) Recombination and the power of statistical testsof neutrality Genet Res 74 65ndash79

12 WallJD (2000) A comparison of estimators of the populationrecombination rate Mol Biol Evol 17 156ndash163

13 FuYX (1997) Statistical tests of neutrality of mutations againstpopulation growth hitchhiking and background selectionGenetics 147 915ndash925

14 NeiM (1987) Molecular Evolutionary Genetics ColumbiaUniversity Press New York NY

15 RozasJ GullaudM BlandinG and AguadeM (2001) DNAvariation at the rp49 gene region of Drosophila simulansevolutionary inferences from an unusual haplotype structureGenetics 158 1147ndash1155

16 KellyJK (1997) A test of neutrality based on interlocusassociations Genetics 146 1197ndash1206

17 WeirBS and CockerhamCC (1984) Estimating F-statistics forthe analysis of population structure Evolution 38 1358ndash1370

18 HoferT RayN WegmannD and ExcoffierL (2009) Largeallele frequency differences between human continental groups aremore likely to have occurred by drift during range expansionsthan by selection Ann Hum Genet 73 95ndash108

19 ChenH PattersonN and ReichD (2010) Populationdifferentiation as a test for selective sweeps Genome Res 20393ndash402

20 NeiM and LiWH (1979) Mathematical model for studyinggenetic variation in terms of restriction endonucleasesProceedings of the National Academy of Sciences 76 5269ndash5273

21 BartonNH (1998) The effect of hitch-hiking on neutralgenealogies Genet Res 72 123ndash133

22 AkeyJM (2009) Constructing genomic maps of positive selectionin humans where do we go from here Genome Res 19711ndash722

23 PavlidisP JensenJD StephanW and StamatakisA (2012) Acritical assessment of storytelling gene ontology categories andthe importance of validating genomic scans Mol Biol Evol 293237ndash3248

24 SabetiPC SchaffnerSF FryB LohmuellerJ VarillyPShamovskyO PalmaA MikkelsenTS AltshulerD andLanderES (2006) Positive natural selection in the humanlineage Science 312 1614ndash1620

25 BarrettRDH and HoekstraHE (2011) Molecular spandrelstests of adaptation at the genetic level Nat Rev Genet 12767ndash780

26 DallrsquoolioGM LaayouniH LuisiP SikoraM MontanucciLand BertranpetitJ (2012) Distribution of events of positiveselection and population differentiation in a metabolicpathway the case of asparagine N-glycosylation BMC EvolBiol 12 98

27 LuisiP Alvarez-PonceD DallrsquoolioGM SikoraMBertranpetitJ and LaayouniH (2012) Network-level andpopulation genetics analysis of the insulinTOR signaltransduction pathway across human populations Mol Biol Evol29 1ndash40

28 VittiJJ ChoMK TishkoffSA and SabetiPC (2012) Humanevolutionary genomics ethical and interpretive issues TrendsGenet 28 137ndash145

29 HernandezRD KelleyJL ElyashivE MeltonSC AutonAMcVeanG SellaG and PrzeworskiM (2011) Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

30 GrossmanSR AndersenKG ShlyakhterI TabriziSWinnickiS YenA ParkDJ GriesemerD KarlssonEKWongSH et al (2013) Identifying recent adaptations in large-scale genomic data Cell 152 703ndash713

31 SmithJ and HaighJ (1974) The hitch-hiking effect of afavourable gene Genet Res 23 23ndash35

32 BravermanJM HudsonRR KaplanNL LangleyCH andStephanW (1995) The hitchhiking effect on the sitefrequency spectrum of DNA polymorphisms Genetics 140783ndash796

33 KimY and NielsenR (2004) Linkage disequilibrium as asignature of selective sweeps Genetics 167 1513ndash1524

34 BarreiroLB LavalG QuachH PatinE andQuintana-MurciL (2008) Natural selection has drivenpopulation differentiation in modern humans Nat Genet 40340ndash345

35 WilliamsonSH HubiszMJ ClarkAG PayseurBABustamanteCD and NielsenR (2007) Localizing recent adaptiveevolution in the human genome PLoS Genet 3 e90

36 Ramırez-SorianoA Ramos-OnsinsSE RozasJ CalafellF andNavarroA (2008) Statistical power analysis of neutrality testsunder demographic expansions contractions and bottlenecks withrecombination Genetics 179 555ndash567

37 WeirBS and HillWG (2002) Estimating F-statistics Annu RevGenet 36 721ndash750

38 NielsenR (2004) Population genetic analysis of ascertained SNPdata Hum Genomics 1 218ndash224

39 BamshadM and WoodingSP (2003) Signatures ofnatural selection in the human genome Nat Rev Genet 499ndash111

40 KorneliussenTS MoltkeI AlbrechtsenA and NielsenR(2013) Calculation of Tajimarsquos D and other neutrality teststatistics from low depth next-generation sequencing data BMCBioinform 14 289

41 KentWJ SugnetCW FureyTS RoskinKM PringleTHZahlerAM and HausslerAD (2002) The Human GenomeBrowser at UCSC Genome Res 12 996ndash1006

42 DreszerTR KarolchikD ZweigAS HinrichsASRaneyBJ KuhnRM MeyerLR WongM SloanCARosenbloomKR et al (2012) The UCSC Genome Browserdatabase extensions and updates 2011 Nucleic Acids Res 40D918ndashD923

43 KarolchikD HinrichsAS FureyTS RoskinKMSugnetCW HausslerD and KentWJ (2004) The UCSCTable Browser data retrieval tool Nucleic Acids Res 32D493ndashD496

44 KelleyJL MadeoyJ CalhounJC SwansonW andAkeyJM (2006) Genomic signatures of positive selection inhumans and the limits of outlier approaches Genome Res 16980ndash989

45 DallrsquoOlioGM MarinoJ SchubertM KeysKL StefanMIGillespieCS PoulainP ShameerK SugarR InvergoBMet al (2011) Ten simple rules for getting help from onlinescientific communities PLoS Comput Biol 7 e1002202

46 EnattahNS SahiT SavilahtiE TerwilligerJD PeltonenLand JarvelaI (2002) Identification of a variant associated withadult-type hypolactasia Nat Genet 30 233ndash237

47 BrykJ HardouinE PugachI HughesD StrotmannRStonekingM and MylesS (2008) Positive selection in EastAsians for an EDAR allele that enhances NF-kappaB activationPLoS One 3 e2209

48 BranickiW BrudnikU Draus-BariniJ KupiecT andWojas-PelcA (2008) Association of the SLC45A2 gene withphysiological human hair colour variation J Hum Genet 53966ndash971

49 FryAE GhansaA SmallKS PalmaA AuburnSDiakiteM GreenA CampinoS TeoYY ClarkTG et al(2009) Positive selection of a CD36 nonsense variant in sub-Saharan Africa but no association with severe malariaphenotypes Hum Mol Genet 18 2683ndash2692

50 DuffyDL MontgomeryGW ChenW ZhaoZZ LeLJamesMR HaywardNK MartinNG and SturmRA (2007)A three-single-nucleotide polymorphism haplotype in intron 1 ofOCA2 explains most human eye-color variation Am J HumGenet 80 241ndash252

51 LamasonRL MohideenM-APK MestJR WongACNortonHL ArosMC JurynecMJ MaoXHumphrevilleVR HumbertJE et al (2005) SLC24A5 aputative cation exchanger affects pigmentation in zebrafish andhumans Science 310 1782ndash1786

D908 Nucleic Acids Research 2014 Vol 42 Database issue

52 Carnero-MontoroE BonetL EngelkenJ BieligT Martınez-FlorensaM LozanoF and BoschE (2012) Evolutionary andfunctional evidence for positive selection at the human CD5immune receptor gene Mol Biol Evol 29 811ndash823

53 GenoveseG FriedmanDJ RossMD LecordierLUzureauP FreedmanBI BowdenDW LangefeldCDOleksykTK Uscinski KnobAL et al (2010) Association oftrypanolytic ApoL1 variants with kidney disease in AfricanAmericans Science 329 841ndash845

54 HedrickPW and ThomsonG (1983) Evidence for balancingselection at HLA Genetics 104 449ndash456

55 CalafellF RoubinetF Ramırez-SorianoA SaitouNBertranpetitJ and BlancherA (2008) Evolutionary dynamics ofthe human ABO gene Hum Genet 124 123ndash135

56 ProuxE StuderRA MorettiS and Robinson-RechaviM(2009) Selectome a database of positive selection Nucleic AcidsRes 37 D404ndashD407

57 PickrellJK CoopG NovembreJ KudaravalliS LiJZAbsherD SrinivasanBS BarshGS MyersRMFeldmanMW et al (2009) Signals of recent positive selection ina worldwide sample of human populations Genome Res 19826ndash837

58 AmigoJ SalasA and PhillipsC (2011) ENGINES exploringsingle nucleotide variation in entire human genomes BMCBioinformatics 12 105

Nucleic Acids Research 2014 Vol 42 Database issue D909

Page 2: 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures … · 2017-04-13 · 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures of

Despite of these limitations and the fact that completeselective sweeps may not be extremely widespread inhumans (29) a large number of regions under strongpositive selection can be expected in the genome (30)

DESCRIPTION OF APPLIED STATISTICAL TESTS

Due to linkage neutral alleles in the surrounding regionhitchhike with the selected allele Maynard Smith andHaigh (31) described this process of genetic hitchhikingand the so-called selective sweep More recent studiesshowed that genetic hitchhiking generates distinct poly-morphism signatures on the genome such as (i) reductionof polymorphism level and excess of low- and high-frequency derived variants (32) (ii) spatial patterns oflinkage-disequilibrium (33) and (iii) increased genetic dif-ferentiation among populations (34) Taking advantage ofthese three theoretical expectations several methods todetect positive selection have been developed in the lasttwo decades This makes reference to the fact that nosingle statistic is enough to describe selection undervarious demographic models and modes of selection (22)Here we implemented a large number of statistical tests

(Table 1) in order to allow for a more comprehensiveanalysis of natural selection especially positive selectionIn brief we have assigned the statistical tests to differentmethod families (Table 1) Within the first family which isbased on the allele frequency spectrum Tajimarsquos D (3) is aclassical neutrality test that compares estimates ofthe number of segregating sites and the mean pair-wisedifference between sequences CLR is a multi-locus com-posite likelihood ratio test (435) Fay and Wursquos H (5)uses another facet of the site-frequency spectrum by

comparing the number of derived segregating sites athigh frequencies to the number of variants at intermediatefrequencies Fu and Lirsquos F compares the number ofsingletons to the mean pair-wise difference betweensequences and Fu and Lirsquos D compares it to the totalnumber of nucleotide variants in a genomic region (6)R2 (7) is a statistical test for detecting populationgrowth based on the comparison of the differencebetween the number of singletons per sequence and theaverage number of nucleotide differences

Among the linkage disequilibrium structure methodsXP-EHH (8) is a cross-population test based onextended haplotype homozygosity (EHH) iHH con-siders the difference between the integrated haplotypehomozygosity scores for each allele in a single populationwhile iHS (9) is defined as their log ratio EHH averageand EHH maximum (36) modified from (10) are based onthe extended haplotype homozygosity Wallrsquos B (11)counts the number of pairs of adjacent segregating sitesthat are congruent (if the subset of the data consisting ofthe two sites contains only two different haplotypes) whileWallrsquos Q (12) adds the number of partitions (two disjointsubsets whose union is the set of individuals in the sample)induced by congruent pairs to Wallrsquos B Fursquos F (13) takesinto account the haplotype diversity in the sampleDh (14) is a summary statistic based on the number ofdifferent haplotypes in the sample

The third family of methods is based on population dif-ferentiation FST (37) calculated following the diploidmethod in Weir 1996 (p 178) and DAF (18) are estimatesof population differentiation based on derived allelefrequencies XP-CLR (19) is a multi-locus allele-frequency-differentiation statistic between two populations

Table 1 List of available summary statistics

Method family Method Reference Window size Rank scores tail

Allele frequency spectrum Tajimarsquos D Tajima (3) 30 kb LowerCLR Nielsen et al (4) Variable size UpperFay and Wursquos H Fay and Wu (5) 30 kb LowerFu and Lirsquos F Fu and Li (6) 30 kb LowerFu and Lirsquos D Fu and Li (6) 30 kb LowerR2 Ramos-Onsins and Rozas (7) 30 kb Lower

Linkage disequilibriumstructure

XP-EHH modified from Sabeti et al (8) SNP-specific UpperAiHH modified from Voight et al (9) SNP-specific Upperhis modified from Voight et al (9) SNP-specific UpperEHH_average modified from Sabeti et al (10) 30 kb UpperEHH_max modified from Sabeti et al (10) 30 kb UpperWallrsquos B Wall (11) 30 kb UpperWallrsquos Q Wall (12) 30 kb UpperFursquos F Fu (13) 30 kb LowerDh Nei (14) 30 kb UpperZa Rozas et al (15) 30 kb UpperZnS Kelly (16) 30 kb UpperZZ Rozas et al (15) 30 kb Upper

Population differentiation Fst (global and pairwise) Weir and Cockerham (17) SNP-specific UpperDAF (standard and absolute) Hofer et al (18) SNP-specific UpperXP-CLR Chen et al (19) 01 cM (maximum window) Upper

Descriptive statistics Segregating sites 30 kb NASingletons 30 kb NApi (nucleotide diversity) Nei and Li (20) 30 kb NADAF (derived allele frequency) SNP-specific NAMAF (minor allele frequency) SNP-specific NA

D904 Nucleic Acids Research 2014 Vol 42 Database issue

Additional statistics like segregating sites per 30-kb windowand the nucleotide diversity and others (Table 1) are listedas descriptive statistics A thorough description of the testsis given in the original literature (see Table 1) and in diverseexcellent reviews on the topic (3839)

COMPUTATIONAL FRAMEWORK ANDDESCRIPTION OF 1000 GENOMES SOURCE DATA

A framework to calculate diverse summary statistics(Table 1) from 1000 genomes data was developed(Figure 1) A detailed description of how the statisticswere implemented is given (Supplementary Material)A genome-wide overview of the results stored in thedatabase for selected summary statistics is given(Supplementary Table S1) As described in the 1000genomes Phase 1 paper (1) the quality of the 1000genomes low coverage data has improved considerablyover the pilot phase (2) but a number of limitationsneed to be kept in mind for population genomicanalysis (i) singletons and other rare variants are stillunderrepresented (ii) the accessibility of the genomewith the used short-read-sequencing technologies 94and (iii) the reported phasing switch error every 250 kb(median Supplementary Figure S5 in (1)) likely underesti-mates the length of long-shared haplotypes expected tooccur around recent selective sweeps Despite of thesedrawbacks which are mainly due to the nature of thelow coverage approach the short-read technology and dif-ferences in read depth (40) this dataset has importantadvantages over genotyping data most importantly (i) ahigher SNP density (ii) the overcoming of ascertainmentbias and (iii) a larger number of individuals per popula-tion when compared to previous datasets (HapMap IIand HGDP) We used phased data from the CEU theCHB and the YRI populations from the integratedPhase 1 variant set (April 2012) with 97 85 and 88 indi-viduals respectively From the input vcf (variant callformat) file we extracted exclusively the low-coverageVSQR SNP calls in order to avoid any bias that mightresult from differences between low-coverage calls andhigh-coverage exome SNP calls Indels were not usedAncestral states in this data set were identified using a4-way alignment of humans chimp orangutan andrhesus macaque provided by the 1000 genomes consor-tium (ftpftp1000genomesebiacukvol1ftpphase1analysis_resultssupportingancestral_alignments)

AVAILABILITY OF DATABASE

All data is available via our entry page httphsbupfeduA search mask gives the user easy access to the results fora specific gene or a genomic region of choice The lsquosubmitrsquobutton leads the user to a UCSC-style genome browser(httppgbibeupfedu) which is a custom installationof the UCSC Genome Browser (4142) This UCSCGenome Browser installment allows for a visual inspec-tion of the data and for an integration of our data withmany other available datasets The raw scores of thetracks can be conveniently downloaded using the UCSC

Table function (43) and is integrated with the Galaxyplatform (galaxyprojectorg) Using the lsquoconfigurersquofunction on the browser page the tracks can be furthercustomized and using lsquoright clickrsquo the visualized genomicregions can be downloaded as a picture in png formatFor every statistical test we provide two tracks one forthe raw scores and one for ranked scores The purpose ofthe rank score tracks is to provide a comparison to the restof the genome Conveniently the rank scores are pre-sented in such a way that they present a peak (instead ofa valley) in regions under positive selection They arecalculated using an outlier approach (2244) by sortingall the scores genome-wide and determining the log10of the rank divided by the number of values in the distri-bution taking the upper tail for most of the tests or thelower tail for Tajimarsquos D Fay and Wursquos H Fu and Lirsquos Fand D R2 and Fursquos F (see Table 1 and a more detaileddescription on the entry page) The main purpose of theentry page is to provide a channel of communication withusers following the guidelines in (45) It serves as aplatform for updates questions and feedback (46)Therefore the page also provides documentation on thetracks and on the tests implemented as well as a FAQand a feedback section

Figure 1 Schematic workflow developed in order to calculate diversegenome-wide summary statistics informative for the action of selectionand to build a database in order to share and visualize the results

Nucleic Acids Research 2014 Vol 42 Database issue D905

Figure 2 Examples of genomic regions under selection in the 1000 genomes selection browser Tracks of statistics from different populations arevisualized in colour (CEU in green CHB in red and YRI in blue) Additional examples are given at httphsbupfedu (A) The p- and q-arms ofchromosome 2 (log10 of empirically ranked scores) Recurrent peaks at around 725Mb (left green arrow) and 1095Mb (right green arrow)indicate the loci CYP26B1EXOC6B and EDAR respectively (B) Signature of positive selection around SLC45A2 another established skin colourgene in the CEU population (05-Mb window log10 of empirically ranked scores) (C) Widespread balancing selection in the HLA region indicatedby strongly positive scores for Tajimarsquos D in all three analysed human populations (05-Mb window)

D906 Nucleic Acids Research 2014 Vol 42 Database issue

EXAMPLE APPLICATIONS

First we exemplify the use of the database by extract-ing results for a number of established loci under selectionEDAR (47) LCT (46) SLC45A2 (48) CD36 (49) HERC2(50) SLC24A5 (51) CD5 (52) and APOL1 (53) A loci-specific summary of statistical tests is given(Supplementary Table S2) Interestingly for any givenlocus only a subset of statistical tests shows an extremeoutlier score This is consistent with differences in thearchitecture of selective sweeps iHS scores near tocertain very pronounced selective sweeps (eg LCT andSLC24A5) failed to compute due to inherent propertiesof the statistics because either (i) the selected haplotypewas near fixation or (ii) the EHH did not drop below thedefined threshold in a given window Examples for bothpositive (SLC45A2) and balancing (HLA region) selectionare visualized in Figure 2 As expected Tajimarsquos D scoresaround HLA (54) as well as the ABO locus (55) (data notshown) were pronouncedly elevated in all three analyzedpopulations a pattern which is compatible with the actionof balancing selection

COMPARISON TO OTHER WEB RESOURCES

As for positive selection based on between-species com-parisons the Selectome database (httpbioinfounilchselectome (56)) presents results based on the dNdSmethod using a branch-site specific likelihood test Asfor recent natural selection within modern humans anumber of web resources are available For previousdatasets eg the HapMap 2 and HGDP projectsseveral positive selection statistics are available in formof the haplotter tool (httphaplotteruchicagoedu(24)) and in form of the HGDP selection browser(httphgdpuchicagoedu (57)) For the 1000 genomesproject data the online tool ENGINES (httpspsmartcesgaes (58)) is useful for the analysis of allele frequenciesand a recent study presented a method to calculate cor-rected summary statistics from low coverage sequencingdata (40) dbPSHP (httpjjwanglaborgdbpshp) offersa large number of statistical tests in a SNP-specificmanner for HapMap 3 and 1000 genomes datasetsComplementary to these databases our database gives alarge number of region- and SNP-specific scores (depend-ing on the test statistic) based on resequencing data (1000genomes Phase 1) with a special focus on genome-widesignificance (by the ranked scores) and the visualization ofseveral statistics in parallel (Figure 2)

CONCLUSIONS

By applying a large number of summary statistics to datafrom the 1000 genomes project we have built a timely andexpandable resource for the population genomics researchcommunity An associated user-friendly genome browsergives a visual impression of the genetic variation in agenomic region of interest and offers functionality for anarray of down-stream analyses While this resource willnot replace a thorough case by case analysis of selectionwe expect that it will prove useful for the research

community through the large number of test statisticsand the fine-grained character of resequencing data

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors thankfully acknowledge contributions fromAnna Ramırez-Soriano Arcadi Navarro FrancescCalafell Elena Bosch Chris Tyler-Smith GoncaloAbecasis Roger Bartomeus Penalver and the 1000Genomes Project (1000genomesorg) The authors alsothank Txema Heredia and the National Institute ofBioinformatics (httpwwwinaborg) for computationalsupport

FUNDING

Ministerio de Ciencia y Tecnologıa (Spain) DireccioGeneral de Recerca Generalitat de Catalunya (Grup deRecerca Consolidat 2009 SGR 1101) Subprogram BMC[BFU2010-19443 awarded to JB] Post-doctoral scholar-ship from the Volkswagenstiftung [Az I85 198 to JE]Spanish government [BFU-2008-01046 SAF2011-29239]The Spanish government FPI scholarships [BES-2009-017731 and BES-2011-04502 to GMD and MPrespectively] PhD fellowship from lsquoAccion Estrategica deSalud en el marco del Plan Nacional de InvestigacionCientıfica Desarrollo e Innovacion Tecnologica 2008-2011rsquo from Instituto de Salud Carlos III (to PL)Funding for open access charge Prof Jaume Bertranpetit

Conflict of interest statement None declared

REFERENCES

1 The 1000 Genomes Project Consortium AbecasisGR AutonABrooksLD DePristoMA DurbinRM HandsakerREKangHM MarthGT and McVeanGA (2012) An integratedmap of genetic variation from 1092 human genomes Nature491 56ndash65

2 The 1000 Genomes Project Consortium (2010) A map of humangenome variation from population-scale sequencing Nature 4671061ndash1073

3 TajimaF (1989) Statistical method for testing the neutralmutation hypothesis by DNA polymorphism Genetics 123585ndash595

4 NielsenR WilliamsonS KimY HubiszMJ ClarkAG andBustamanteC (2005) Genomic scans for selective sweeps usingSNP data Genome Res 15 1566ndash1575

5 FayJC and WuCI (2000) Hitchhiking under positiveDarwinian selection Genetics 155 1405ndash1413

6 FuYX and LiWH (1993) Statistical tests of neutrality ofmutations Genetics 133 693ndash709

7 Ramos-OnsinsSE and RozasJ (2002) Statistical properties ofnew neutrality tests against population growth Mol Biol Evol19 2092ndash2100

8 SabetiPC VarillyP FryB LohmuellerJ HostetterECotsapasC XieX ByrneEH McCarrollSA GaudetR et al(2007) Genome-wide detection and characterization of positiveselection in human populations Nature 449 913ndash918

Nucleic Acids Research 2014 Vol 42 Database issue D907

9 VoightBF KudaravalliS WenX and PritchardJK (2006) Amap of recent positive selection in the human genome PLoSBiol 4 e72

10 SabetiPC ReichDE HigginsJM LevineHZP RichterDJSchaffnerSF GabrielSB PlatkoJV PattersonNJMcDonaldJG et al (2002) Detecting recent positive selection inthe human genome from haplotype structure Nature 419832ndash837

11 WallJD (1999) Recombination and the power of statistical testsof neutrality Genet Res 74 65ndash79

12 WallJD (2000) A comparison of estimators of the populationrecombination rate Mol Biol Evol 17 156ndash163

13 FuYX (1997) Statistical tests of neutrality of mutations againstpopulation growth hitchhiking and background selectionGenetics 147 915ndash925

14 NeiM (1987) Molecular Evolutionary Genetics ColumbiaUniversity Press New York NY

15 RozasJ GullaudM BlandinG and AguadeM (2001) DNAvariation at the rp49 gene region of Drosophila simulansevolutionary inferences from an unusual haplotype structureGenetics 158 1147ndash1155

16 KellyJK (1997) A test of neutrality based on interlocusassociations Genetics 146 1197ndash1206

17 WeirBS and CockerhamCC (1984) Estimating F-statistics forthe analysis of population structure Evolution 38 1358ndash1370

18 HoferT RayN WegmannD and ExcoffierL (2009) Largeallele frequency differences between human continental groups aremore likely to have occurred by drift during range expansionsthan by selection Ann Hum Genet 73 95ndash108

19 ChenH PattersonN and ReichD (2010) Populationdifferentiation as a test for selective sweeps Genome Res 20393ndash402

20 NeiM and LiWH (1979) Mathematical model for studyinggenetic variation in terms of restriction endonucleasesProceedings of the National Academy of Sciences 76 5269ndash5273

21 BartonNH (1998) The effect of hitch-hiking on neutralgenealogies Genet Res 72 123ndash133

22 AkeyJM (2009) Constructing genomic maps of positive selectionin humans where do we go from here Genome Res 19711ndash722

23 PavlidisP JensenJD StephanW and StamatakisA (2012) Acritical assessment of storytelling gene ontology categories andthe importance of validating genomic scans Mol Biol Evol 293237ndash3248

24 SabetiPC SchaffnerSF FryB LohmuellerJ VarillyPShamovskyO PalmaA MikkelsenTS AltshulerD andLanderES (2006) Positive natural selection in the humanlineage Science 312 1614ndash1620

25 BarrettRDH and HoekstraHE (2011) Molecular spandrelstests of adaptation at the genetic level Nat Rev Genet 12767ndash780

26 DallrsquoolioGM LaayouniH LuisiP SikoraM MontanucciLand BertranpetitJ (2012) Distribution of events of positiveselection and population differentiation in a metabolicpathway the case of asparagine N-glycosylation BMC EvolBiol 12 98

27 LuisiP Alvarez-PonceD DallrsquoolioGM SikoraMBertranpetitJ and LaayouniH (2012) Network-level andpopulation genetics analysis of the insulinTOR signaltransduction pathway across human populations Mol Biol Evol29 1ndash40

28 VittiJJ ChoMK TishkoffSA and SabetiPC (2012) Humanevolutionary genomics ethical and interpretive issues TrendsGenet 28 137ndash145

29 HernandezRD KelleyJL ElyashivE MeltonSC AutonAMcVeanG SellaG and PrzeworskiM (2011) Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

30 GrossmanSR AndersenKG ShlyakhterI TabriziSWinnickiS YenA ParkDJ GriesemerD KarlssonEKWongSH et al (2013) Identifying recent adaptations in large-scale genomic data Cell 152 703ndash713

31 SmithJ and HaighJ (1974) The hitch-hiking effect of afavourable gene Genet Res 23 23ndash35

32 BravermanJM HudsonRR KaplanNL LangleyCH andStephanW (1995) The hitchhiking effect on the sitefrequency spectrum of DNA polymorphisms Genetics 140783ndash796

33 KimY and NielsenR (2004) Linkage disequilibrium as asignature of selective sweeps Genetics 167 1513ndash1524

34 BarreiroLB LavalG QuachH PatinE andQuintana-MurciL (2008) Natural selection has drivenpopulation differentiation in modern humans Nat Genet 40340ndash345

35 WilliamsonSH HubiszMJ ClarkAG PayseurBABustamanteCD and NielsenR (2007) Localizing recent adaptiveevolution in the human genome PLoS Genet 3 e90

36 Ramırez-SorianoA Ramos-OnsinsSE RozasJ CalafellF andNavarroA (2008) Statistical power analysis of neutrality testsunder demographic expansions contractions and bottlenecks withrecombination Genetics 179 555ndash567

37 WeirBS and HillWG (2002) Estimating F-statistics Annu RevGenet 36 721ndash750

38 NielsenR (2004) Population genetic analysis of ascertained SNPdata Hum Genomics 1 218ndash224

39 BamshadM and WoodingSP (2003) Signatures ofnatural selection in the human genome Nat Rev Genet 499ndash111

40 KorneliussenTS MoltkeI AlbrechtsenA and NielsenR(2013) Calculation of Tajimarsquos D and other neutrality teststatistics from low depth next-generation sequencing data BMCBioinform 14 289

41 KentWJ SugnetCW FureyTS RoskinKM PringleTHZahlerAM and HausslerAD (2002) The Human GenomeBrowser at UCSC Genome Res 12 996ndash1006

42 DreszerTR KarolchikD ZweigAS HinrichsASRaneyBJ KuhnRM MeyerLR WongM SloanCARosenbloomKR et al (2012) The UCSC Genome Browserdatabase extensions and updates 2011 Nucleic Acids Res 40D918ndashD923

43 KarolchikD HinrichsAS FureyTS RoskinKMSugnetCW HausslerD and KentWJ (2004) The UCSCTable Browser data retrieval tool Nucleic Acids Res 32D493ndashD496

44 KelleyJL MadeoyJ CalhounJC SwansonW andAkeyJM (2006) Genomic signatures of positive selection inhumans and the limits of outlier approaches Genome Res 16980ndash989

45 DallrsquoOlioGM MarinoJ SchubertM KeysKL StefanMIGillespieCS PoulainP ShameerK SugarR InvergoBMet al (2011) Ten simple rules for getting help from onlinescientific communities PLoS Comput Biol 7 e1002202

46 EnattahNS SahiT SavilahtiE TerwilligerJD PeltonenLand JarvelaI (2002) Identification of a variant associated withadult-type hypolactasia Nat Genet 30 233ndash237

47 BrykJ HardouinE PugachI HughesD StrotmannRStonekingM and MylesS (2008) Positive selection in EastAsians for an EDAR allele that enhances NF-kappaB activationPLoS One 3 e2209

48 BranickiW BrudnikU Draus-BariniJ KupiecT andWojas-PelcA (2008) Association of the SLC45A2 gene withphysiological human hair colour variation J Hum Genet 53966ndash971

49 FryAE GhansaA SmallKS PalmaA AuburnSDiakiteM GreenA CampinoS TeoYY ClarkTG et al(2009) Positive selection of a CD36 nonsense variant in sub-Saharan Africa but no association with severe malariaphenotypes Hum Mol Genet 18 2683ndash2692

50 DuffyDL MontgomeryGW ChenW ZhaoZZ LeLJamesMR HaywardNK MartinNG and SturmRA (2007)A three-single-nucleotide polymorphism haplotype in intron 1 ofOCA2 explains most human eye-color variation Am J HumGenet 80 241ndash252

51 LamasonRL MohideenM-APK MestJR WongACNortonHL ArosMC JurynecMJ MaoXHumphrevilleVR HumbertJE et al (2005) SLC24A5 aputative cation exchanger affects pigmentation in zebrafish andhumans Science 310 1782ndash1786

D908 Nucleic Acids Research 2014 Vol 42 Database issue

52 Carnero-MontoroE BonetL EngelkenJ BieligT Martınez-FlorensaM LozanoF and BoschE (2012) Evolutionary andfunctional evidence for positive selection at the human CD5immune receptor gene Mol Biol Evol 29 811ndash823

53 GenoveseG FriedmanDJ RossMD LecordierLUzureauP FreedmanBI BowdenDW LangefeldCDOleksykTK Uscinski KnobAL et al (2010) Association oftrypanolytic ApoL1 variants with kidney disease in AfricanAmericans Science 329 841ndash845

54 HedrickPW and ThomsonG (1983) Evidence for balancingselection at HLA Genetics 104 449ndash456

55 CalafellF RoubinetF Ramırez-SorianoA SaitouNBertranpetitJ and BlancherA (2008) Evolutionary dynamics ofthe human ABO gene Hum Genet 124 123ndash135

56 ProuxE StuderRA MorettiS and Robinson-RechaviM(2009) Selectome a database of positive selection Nucleic AcidsRes 37 D404ndashD407

57 PickrellJK CoopG NovembreJ KudaravalliS LiJZAbsherD SrinivasanBS BarshGS MyersRMFeldmanMW et al (2009) Signals of recent positive selection ina worldwide sample of human populations Genome Res 19826ndash837

58 AmigoJ SalasA and PhillipsC (2011) ENGINES exploringsingle nucleotide variation in entire human genomes BMCBioinformatics 12 105

Nucleic Acids Research 2014 Vol 42 Database issue D909

Page 3: 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures … · 2017-04-13 · 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures of

Additional statistics like segregating sites per 30-kb windowand the nucleotide diversity and others (Table 1) are listedas descriptive statistics A thorough description of the testsis given in the original literature (see Table 1) and in diverseexcellent reviews on the topic (3839)

COMPUTATIONAL FRAMEWORK ANDDESCRIPTION OF 1000 GENOMES SOURCE DATA

A framework to calculate diverse summary statistics(Table 1) from 1000 genomes data was developed(Figure 1) A detailed description of how the statisticswere implemented is given (Supplementary Material)A genome-wide overview of the results stored in thedatabase for selected summary statistics is given(Supplementary Table S1) As described in the 1000genomes Phase 1 paper (1) the quality of the 1000genomes low coverage data has improved considerablyover the pilot phase (2) but a number of limitationsneed to be kept in mind for population genomicanalysis (i) singletons and other rare variants are stillunderrepresented (ii) the accessibility of the genomewith the used short-read-sequencing technologies 94and (iii) the reported phasing switch error every 250 kb(median Supplementary Figure S5 in (1)) likely underesti-mates the length of long-shared haplotypes expected tooccur around recent selective sweeps Despite of thesedrawbacks which are mainly due to the nature of thelow coverage approach the short-read technology and dif-ferences in read depth (40) this dataset has importantadvantages over genotyping data most importantly (i) ahigher SNP density (ii) the overcoming of ascertainmentbias and (iii) a larger number of individuals per popula-tion when compared to previous datasets (HapMap IIand HGDP) We used phased data from the CEU theCHB and the YRI populations from the integratedPhase 1 variant set (April 2012) with 97 85 and 88 indi-viduals respectively From the input vcf (variant callformat) file we extracted exclusively the low-coverageVSQR SNP calls in order to avoid any bias that mightresult from differences between low-coverage calls andhigh-coverage exome SNP calls Indels were not usedAncestral states in this data set were identified using a4-way alignment of humans chimp orangutan andrhesus macaque provided by the 1000 genomes consor-tium (ftpftp1000genomesebiacukvol1ftpphase1analysis_resultssupportingancestral_alignments)

AVAILABILITY OF DATABASE

All data is available via our entry page httphsbupfeduA search mask gives the user easy access to the results fora specific gene or a genomic region of choice The lsquosubmitrsquobutton leads the user to a UCSC-style genome browser(httppgbibeupfedu) which is a custom installationof the UCSC Genome Browser (4142) This UCSCGenome Browser installment allows for a visual inspec-tion of the data and for an integration of our data withmany other available datasets The raw scores of thetracks can be conveniently downloaded using the UCSC

Table function (43) and is integrated with the Galaxyplatform (galaxyprojectorg) Using the lsquoconfigurersquofunction on the browser page the tracks can be furthercustomized and using lsquoright clickrsquo the visualized genomicregions can be downloaded as a picture in png formatFor every statistical test we provide two tracks one forthe raw scores and one for ranked scores The purpose ofthe rank score tracks is to provide a comparison to the restof the genome Conveniently the rank scores are pre-sented in such a way that they present a peak (instead ofa valley) in regions under positive selection They arecalculated using an outlier approach (2244) by sortingall the scores genome-wide and determining the log10of the rank divided by the number of values in the distri-bution taking the upper tail for most of the tests or thelower tail for Tajimarsquos D Fay and Wursquos H Fu and Lirsquos Fand D R2 and Fursquos F (see Table 1 and a more detaileddescription on the entry page) The main purpose of theentry page is to provide a channel of communication withusers following the guidelines in (45) It serves as aplatform for updates questions and feedback (46)Therefore the page also provides documentation on thetracks and on the tests implemented as well as a FAQand a feedback section

Figure 1 Schematic workflow developed in order to calculate diversegenome-wide summary statistics informative for the action of selectionand to build a database in order to share and visualize the results

Nucleic Acids Research 2014 Vol 42 Database issue D905

Figure 2 Examples of genomic regions under selection in the 1000 genomes selection browser Tracks of statistics from different populations arevisualized in colour (CEU in green CHB in red and YRI in blue) Additional examples are given at httphsbupfedu (A) The p- and q-arms ofchromosome 2 (log10 of empirically ranked scores) Recurrent peaks at around 725Mb (left green arrow) and 1095Mb (right green arrow)indicate the loci CYP26B1EXOC6B and EDAR respectively (B) Signature of positive selection around SLC45A2 another established skin colourgene in the CEU population (05-Mb window log10 of empirically ranked scores) (C) Widespread balancing selection in the HLA region indicatedby strongly positive scores for Tajimarsquos D in all three analysed human populations (05-Mb window)

D906 Nucleic Acids Research 2014 Vol 42 Database issue

EXAMPLE APPLICATIONS

First we exemplify the use of the database by extract-ing results for a number of established loci under selectionEDAR (47) LCT (46) SLC45A2 (48) CD36 (49) HERC2(50) SLC24A5 (51) CD5 (52) and APOL1 (53) A loci-specific summary of statistical tests is given(Supplementary Table S2) Interestingly for any givenlocus only a subset of statistical tests shows an extremeoutlier score This is consistent with differences in thearchitecture of selective sweeps iHS scores near tocertain very pronounced selective sweeps (eg LCT andSLC24A5) failed to compute due to inherent propertiesof the statistics because either (i) the selected haplotypewas near fixation or (ii) the EHH did not drop below thedefined threshold in a given window Examples for bothpositive (SLC45A2) and balancing (HLA region) selectionare visualized in Figure 2 As expected Tajimarsquos D scoresaround HLA (54) as well as the ABO locus (55) (data notshown) were pronouncedly elevated in all three analyzedpopulations a pattern which is compatible with the actionof balancing selection

COMPARISON TO OTHER WEB RESOURCES

As for positive selection based on between-species com-parisons the Selectome database (httpbioinfounilchselectome (56)) presents results based on the dNdSmethod using a branch-site specific likelihood test Asfor recent natural selection within modern humans anumber of web resources are available For previousdatasets eg the HapMap 2 and HGDP projectsseveral positive selection statistics are available in formof the haplotter tool (httphaplotteruchicagoedu(24)) and in form of the HGDP selection browser(httphgdpuchicagoedu (57)) For the 1000 genomesproject data the online tool ENGINES (httpspsmartcesgaes (58)) is useful for the analysis of allele frequenciesand a recent study presented a method to calculate cor-rected summary statistics from low coverage sequencingdata (40) dbPSHP (httpjjwanglaborgdbpshp) offersa large number of statistical tests in a SNP-specificmanner for HapMap 3 and 1000 genomes datasetsComplementary to these databases our database gives alarge number of region- and SNP-specific scores (depend-ing on the test statistic) based on resequencing data (1000genomes Phase 1) with a special focus on genome-widesignificance (by the ranked scores) and the visualization ofseveral statistics in parallel (Figure 2)

CONCLUSIONS

By applying a large number of summary statistics to datafrom the 1000 genomes project we have built a timely andexpandable resource for the population genomics researchcommunity An associated user-friendly genome browsergives a visual impression of the genetic variation in agenomic region of interest and offers functionality for anarray of down-stream analyses While this resource willnot replace a thorough case by case analysis of selectionwe expect that it will prove useful for the research

community through the large number of test statisticsand the fine-grained character of resequencing data

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors thankfully acknowledge contributions fromAnna Ramırez-Soriano Arcadi Navarro FrancescCalafell Elena Bosch Chris Tyler-Smith GoncaloAbecasis Roger Bartomeus Penalver and the 1000Genomes Project (1000genomesorg) The authors alsothank Txema Heredia and the National Institute ofBioinformatics (httpwwwinaborg) for computationalsupport

FUNDING

Ministerio de Ciencia y Tecnologıa (Spain) DireccioGeneral de Recerca Generalitat de Catalunya (Grup deRecerca Consolidat 2009 SGR 1101) Subprogram BMC[BFU2010-19443 awarded to JB] Post-doctoral scholar-ship from the Volkswagenstiftung [Az I85 198 to JE]Spanish government [BFU-2008-01046 SAF2011-29239]The Spanish government FPI scholarships [BES-2009-017731 and BES-2011-04502 to GMD and MPrespectively] PhD fellowship from lsquoAccion Estrategica deSalud en el marco del Plan Nacional de InvestigacionCientıfica Desarrollo e Innovacion Tecnologica 2008-2011rsquo from Instituto de Salud Carlos III (to PL)Funding for open access charge Prof Jaume Bertranpetit

Conflict of interest statement None declared

REFERENCES

1 The 1000 Genomes Project Consortium AbecasisGR AutonABrooksLD DePristoMA DurbinRM HandsakerREKangHM MarthGT and McVeanGA (2012) An integratedmap of genetic variation from 1092 human genomes Nature491 56ndash65

2 The 1000 Genomes Project Consortium (2010) A map of humangenome variation from population-scale sequencing Nature 4671061ndash1073

3 TajimaF (1989) Statistical method for testing the neutralmutation hypothesis by DNA polymorphism Genetics 123585ndash595

4 NielsenR WilliamsonS KimY HubiszMJ ClarkAG andBustamanteC (2005) Genomic scans for selective sweeps usingSNP data Genome Res 15 1566ndash1575

5 FayJC and WuCI (2000) Hitchhiking under positiveDarwinian selection Genetics 155 1405ndash1413

6 FuYX and LiWH (1993) Statistical tests of neutrality ofmutations Genetics 133 693ndash709

7 Ramos-OnsinsSE and RozasJ (2002) Statistical properties ofnew neutrality tests against population growth Mol Biol Evol19 2092ndash2100

8 SabetiPC VarillyP FryB LohmuellerJ HostetterECotsapasC XieX ByrneEH McCarrollSA GaudetR et al(2007) Genome-wide detection and characterization of positiveselection in human populations Nature 449 913ndash918

Nucleic Acids Research 2014 Vol 42 Database issue D907

9 VoightBF KudaravalliS WenX and PritchardJK (2006) Amap of recent positive selection in the human genome PLoSBiol 4 e72

10 SabetiPC ReichDE HigginsJM LevineHZP RichterDJSchaffnerSF GabrielSB PlatkoJV PattersonNJMcDonaldJG et al (2002) Detecting recent positive selection inthe human genome from haplotype structure Nature 419832ndash837

11 WallJD (1999) Recombination and the power of statistical testsof neutrality Genet Res 74 65ndash79

12 WallJD (2000) A comparison of estimators of the populationrecombination rate Mol Biol Evol 17 156ndash163

13 FuYX (1997) Statistical tests of neutrality of mutations againstpopulation growth hitchhiking and background selectionGenetics 147 915ndash925

14 NeiM (1987) Molecular Evolutionary Genetics ColumbiaUniversity Press New York NY

15 RozasJ GullaudM BlandinG and AguadeM (2001) DNAvariation at the rp49 gene region of Drosophila simulansevolutionary inferences from an unusual haplotype structureGenetics 158 1147ndash1155

16 KellyJK (1997) A test of neutrality based on interlocusassociations Genetics 146 1197ndash1206

17 WeirBS and CockerhamCC (1984) Estimating F-statistics forthe analysis of population structure Evolution 38 1358ndash1370

18 HoferT RayN WegmannD and ExcoffierL (2009) Largeallele frequency differences between human continental groups aremore likely to have occurred by drift during range expansionsthan by selection Ann Hum Genet 73 95ndash108

19 ChenH PattersonN and ReichD (2010) Populationdifferentiation as a test for selective sweeps Genome Res 20393ndash402

20 NeiM and LiWH (1979) Mathematical model for studyinggenetic variation in terms of restriction endonucleasesProceedings of the National Academy of Sciences 76 5269ndash5273

21 BartonNH (1998) The effect of hitch-hiking on neutralgenealogies Genet Res 72 123ndash133

22 AkeyJM (2009) Constructing genomic maps of positive selectionin humans where do we go from here Genome Res 19711ndash722

23 PavlidisP JensenJD StephanW and StamatakisA (2012) Acritical assessment of storytelling gene ontology categories andthe importance of validating genomic scans Mol Biol Evol 293237ndash3248

24 SabetiPC SchaffnerSF FryB LohmuellerJ VarillyPShamovskyO PalmaA MikkelsenTS AltshulerD andLanderES (2006) Positive natural selection in the humanlineage Science 312 1614ndash1620

25 BarrettRDH and HoekstraHE (2011) Molecular spandrelstests of adaptation at the genetic level Nat Rev Genet 12767ndash780

26 DallrsquoolioGM LaayouniH LuisiP SikoraM MontanucciLand BertranpetitJ (2012) Distribution of events of positiveselection and population differentiation in a metabolicpathway the case of asparagine N-glycosylation BMC EvolBiol 12 98

27 LuisiP Alvarez-PonceD DallrsquoolioGM SikoraMBertranpetitJ and LaayouniH (2012) Network-level andpopulation genetics analysis of the insulinTOR signaltransduction pathway across human populations Mol Biol Evol29 1ndash40

28 VittiJJ ChoMK TishkoffSA and SabetiPC (2012) Humanevolutionary genomics ethical and interpretive issues TrendsGenet 28 137ndash145

29 HernandezRD KelleyJL ElyashivE MeltonSC AutonAMcVeanG SellaG and PrzeworskiM (2011) Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

30 GrossmanSR AndersenKG ShlyakhterI TabriziSWinnickiS YenA ParkDJ GriesemerD KarlssonEKWongSH et al (2013) Identifying recent adaptations in large-scale genomic data Cell 152 703ndash713

31 SmithJ and HaighJ (1974) The hitch-hiking effect of afavourable gene Genet Res 23 23ndash35

32 BravermanJM HudsonRR KaplanNL LangleyCH andStephanW (1995) The hitchhiking effect on the sitefrequency spectrum of DNA polymorphisms Genetics 140783ndash796

33 KimY and NielsenR (2004) Linkage disequilibrium as asignature of selective sweeps Genetics 167 1513ndash1524

34 BarreiroLB LavalG QuachH PatinE andQuintana-MurciL (2008) Natural selection has drivenpopulation differentiation in modern humans Nat Genet 40340ndash345

35 WilliamsonSH HubiszMJ ClarkAG PayseurBABustamanteCD and NielsenR (2007) Localizing recent adaptiveevolution in the human genome PLoS Genet 3 e90

36 Ramırez-SorianoA Ramos-OnsinsSE RozasJ CalafellF andNavarroA (2008) Statistical power analysis of neutrality testsunder demographic expansions contractions and bottlenecks withrecombination Genetics 179 555ndash567

37 WeirBS and HillWG (2002) Estimating F-statistics Annu RevGenet 36 721ndash750

38 NielsenR (2004) Population genetic analysis of ascertained SNPdata Hum Genomics 1 218ndash224

39 BamshadM and WoodingSP (2003) Signatures ofnatural selection in the human genome Nat Rev Genet 499ndash111

40 KorneliussenTS MoltkeI AlbrechtsenA and NielsenR(2013) Calculation of Tajimarsquos D and other neutrality teststatistics from low depth next-generation sequencing data BMCBioinform 14 289

41 KentWJ SugnetCW FureyTS RoskinKM PringleTHZahlerAM and HausslerAD (2002) The Human GenomeBrowser at UCSC Genome Res 12 996ndash1006

42 DreszerTR KarolchikD ZweigAS HinrichsASRaneyBJ KuhnRM MeyerLR WongM SloanCARosenbloomKR et al (2012) The UCSC Genome Browserdatabase extensions and updates 2011 Nucleic Acids Res 40D918ndashD923

43 KarolchikD HinrichsAS FureyTS RoskinKMSugnetCW HausslerD and KentWJ (2004) The UCSCTable Browser data retrieval tool Nucleic Acids Res 32D493ndashD496

44 KelleyJL MadeoyJ CalhounJC SwansonW andAkeyJM (2006) Genomic signatures of positive selection inhumans and the limits of outlier approaches Genome Res 16980ndash989

45 DallrsquoOlioGM MarinoJ SchubertM KeysKL StefanMIGillespieCS PoulainP ShameerK SugarR InvergoBMet al (2011) Ten simple rules for getting help from onlinescientific communities PLoS Comput Biol 7 e1002202

46 EnattahNS SahiT SavilahtiE TerwilligerJD PeltonenLand JarvelaI (2002) Identification of a variant associated withadult-type hypolactasia Nat Genet 30 233ndash237

47 BrykJ HardouinE PugachI HughesD StrotmannRStonekingM and MylesS (2008) Positive selection in EastAsians for an EDAR allele that enhances NF-kappaB activationPLoS One 3 e2209

48 BranickiW BrudnikU Draus-BariniJ KupiecT andWojas-PelcA (2008) Association of the SLC45A2 gene withphysiological human hair colour variation J Hum Genet 53966ndash971

49 FryAE GhansaA SmallKS PalmaA AuburnSDiakiteM GreenA CampinoS TeoYY ClarkTG et al(2009) Positive selection of a CD36 nonsense variant in sub-Saharan Africa but no association with severe malariaphenotypes Hum Mol Genet 18 2683ndash2692

50 DuffyDL MontgomeryGW ChenW ZhaoZZ LeLJamesMR HaywardNK MartinNG and SturmRA (2007)A three-single-nucleotide polymorphism haplotype in intron 1 ofOCA2 explains most human eye-color variation Am J HumGenet 80 241ndash252

51 LamasonRL MohideenM-APK MestJR WongACNortonHL ArosMC JurynecMJ MaoXHumphrevilleVR HumbertJE et al (2005) SLC24A5 aputative cation exchanger affects pigmentation in zebrafish andhumans Science 310 1782ndash1786

D908 Nucleic Acids Research 2014 Vol 42 Database issue

52 Carnero-MontoroE BonetL EngelkenJ BieligT Martınez-FlorensaM LozanoF and BoschE (2012) Evolutionary andfunctional evidence for positive selection at the human CD5immune receptor gene Mol Biol Evol 29 811ndash823

53 GenoveseG FriedmanDJ RossMD LecordierLUzureauP FreedmanBI BowdenDW LangefeldCDOleksykTK Uscinski KnobAL et al (2010) Association oftrypanolytic ApoL1 variants with kidney disease in AfricanAmericans Science 329 841ndash845

54 HedrickPW and ThomsonG (1983) Evidence for balancingselection at HLA Genetics 104 449ndash456

55 CalafellF RoubinetF Ramırez-SorianoA SaitouNBertranpetitJ and BlancherA (2008) Evolutionary dynamics ofthe human ABO gene Hum Genet 124 123ndash135

56 ProuxE StuderRA MorettiS and Robinson-RechaviM(2009) Selectome a database of positive selection Nucleic AcidsRes 37 D404ndashD407

57 PickrellJK CoopG NovembreJ KudaravalliS LiJZAbsherD SrinivasanBS BarshGS MyersRMFeldmanMW et al (2009) Signals of recent positive selection ina worldwide sample of human populations Genome Res 19826ndash837

58 AmigoJ SalasA and PhillipsC (2011) ENGINES exploringsingle nucleotide variation in entire human genomes BMCBioinformatics 12 105

Nucleic Acids Research 2014 Vol 42 Database issue D909

Page 4: 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures … · 2017-04-13 · 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures of

Figure 2 Examples of genomic regions under selection in the 1000 genomes selection browser Tracks of statistics from different populations arevisualized in colour (CEU in green CHB in red and YRI in blue) Additional examples are given at httphsbupfedu (A) The p- and q-arms ofchromosome 2 (log10 of empirically ranked scores) Recurrent peaks at around 725Mb (left green arrow) and 1095Mb (right green arrow)indicate the loci CYP26B1EXOC6B and EDAR respectively (B) Signature of positive selection around SLC45A2 another established skin colourgene in the CEU population (05-Mb window log10 of empirically ranked scores) (C) Widespread balancing selection in the HLA region indicatedby strongly positive scores for Tajimarsquos D in all three analysed human populations (05-Mb window)

D906 Nucleic Acids Research 2014 Vol 42 Database issue

EXAMPLE APPLICATIONS

First we exemplify the use of the database by extract-ing results for a number of established loci under selectionEDAR (47) LCT (46) SLC45A2 (48) CD36 (49) HERC2(50) SLC24A5 (51) CD5 (52) and APOL1 (53) A loci-specific summary of statistical tests is given(Supplementary Table S2) Interestingly for any givenlocus only a subset of statistical tests shows an extremeoutlier score This is consistent with differences in thearchitecture of selective sweeps iHS scores near tocertain very pronounced selective sweeps (eg LCT andSLC24A5) failed to compute due to inherent propertiesof the statistics because either (i) the selected haplotypewas near fixation or (ii) the EHH did not drop below thedefined threshold in a given window Examples for bothpositive (SLC45A2) and balancing (HLA region) selectionare visualized in Figure 2 As expected Tajimarsquos D scoresaround HLA (54) as well as the ABO locus (55) (data notshown) were pronouncedly elevated in all three analyzedpopulations a pattern which is compatible with the actionof balancing selection

COMPARISON TO OTHER WEB RESOURCES

As for positive selection based on between-species com-parisons the Selectome database (httpbioinfounilchselectome (56)) presents results based on the dNdSmethod using a branch-site specific likelihood test Asfor recent natural selection within modern humans anumber of web resources are available For previousdatasets eg the HapMap 2 and HGDP projectsseveral positive selection statistics are available in formof the haplotter tool (httphaplotteruchicagoedu(24)) and in form of the HGDP selection browser(httphgdpuchicagoedu (57)) For the 1000 genomesproject data the online tool ENGINES (httpspsmartcesgaes (58)) is useful for the analysis of allele frequenciesand a recent study presented a method to calculate cor-rected summary statistics from low coverage sequencingdata (40) dbPSHP (httpjjwanglaborgdbpshp) offersa large number of statistical tests in a SNP-specificmanner for HapMap 3 and 1000 genomes datasetsComplementary to these databases our database gives alarge number of region- and SNP-specific scores (depend-ing on the test statistic) based on resequencing data (1000genomes Phase 1) with a special focus on genome-widesignificance (by the ranked scores) and the visualization ofseveral statistics in parallel (Figure 2)

CONCLUSIONS

By applying a large number of summary statistics to datafrom the 1000 genomes project we have built a timely andexpandable resource for the population genomics researchcommunity An associated user-friendly genome browsergives a visual impression of the genetic variation in agenomic region of interest and offers functionality for anarray of down-stream analyses While this resource willnot replace a thorough case by case analysis of selectionwe expect that it will prove useful for the research

community through the large number of test statisticsand the fine-grained character of resequencing data

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors thankfully acknowledge contributions fromAnna Ramırez-Soriano Arcadi Navarro FrancescCalafell Elena Bosch Chris Tyler-Smith GoncaloAbecasis Roger Bartomeus Penalver and the 1000Genomes Project (1000genomesorg) The authors alsothank Txema Heredia and the National Institute ofBioinformatics (httpwwwinaborg) for computationalsupport

FUNDING

Ministerio de Ciencia y Tecnologıa (Spain) DireccioGeneral de Recerca Generalitat de Catalunya (Grup deRecerca Consolidat 2009 SGR 1101) Subprogram BMC[BFU2010-19443 awarded to JB] Post-doctoral scholar-ship from the Volkswagenstiftung [Az I85 198 to JE]Spanish government [BFU-2008-01046 SAF2011-29239]The Spanish government FPI scholarships [BES-2009-017731 and BES-2011-04502 to GMD and MPrespectively] PhD fellowship from lsquoAccion Estrategica deSalud en el marco del Plan Nacional de InvestigacionCientıfica Desarrollo e Innovacion Tecnologica 2008-2011rsquo from Instituto de Salud Carlos III (to PL)Funding for open access charge Prof Jaume Bertranpetit

Conflict of interest statement None declared

REFERENCES

1 The 1000 Genomes Project Consortium AbecasisGR AutonABrooksLD DePristoMA DurbinRM HandsakerREKangHM MarthGT and McVeanGA (2012) An integratedmap of genetic variation from 1092 human genomes Nature491 56ndash65

2 The 1000 Genomes Project Consortium (2010) A map of humangenome variation from population-scale sequencing Nature 4671061ndash1073

3 TajimaF (1989) Statistical method for testing the neutralmutation hypothesis by DNA polymorphism Genetics 123585ndash595

4 NielsenR WilliamsonS KimY HubiszMJ ClarkAG andBustamanteC (2005) Genomic scans for selective sweeps usingSNP data Genome Res 15 1566ndash1575

5 FayJC and WuCI (2000) Hitchhiking under positiveDarwinian selection Genetics 155 1405ndash1413

6 FuYX and LiWH (1993) Statistical tests of neutrality ofmutations Genetics 133 693ndash709

7 Ramos-OnsinsSE and RozasJ (2002) Statistical properties ofnew neutrality tests against population growth Mol Biol Evol19 2092ndash2100

8 SabetiPC VarillyP FryB LohmuellerJ HostetterECotsapasC XieX ByrneEH McCarrollSA GaudetR et al(2007) Genome-wide detection and characterization of positiveselection in human populations Nature 449 913ndash918

Nucleic Acids Research 2014 Vol 42 Database issue D907

9 VoightBF KudaravalliS WenX and PritchardJK (2006) Amap of recent positive selection in the human genome PLoSBiol 4 e72

10 SabetiPC ReichDE HigginsJM LevineHZP RichterDJSchaffnerSF GabrielSB PlatkoJV PattersonNJMcDonaldJG et al (2002) Detecting recent positive selection inthe human genome from haplotype structure Nature 419832ndash837

11 WallJD (1999) Recombination and the power of statistical testsof neutrality Genet Res 74 65ndash79

12 WallJD (2000) A comparison of estimators of the populationrecombination rate Mol Biol Evol 17 156ndash163

13 FuYX (1997) Statistical tests of neutrality of mutations againstpopulation growth hitchhiking and background selectionGenetics 147 915ndash925

14 NeiM (1987) Molecular Evolutionary Genetics ColumbiaUniversity Press New York NY

15 RozasJ GullaudM BlandinG and AguadeM (2001) DNAvariation at the rp49 gene region of Drosophila simulansevolutionary inferences from an unusual haplotype structureGenetics 158 1147ndash1155

16 KellyJK (1997) A test of neutrality based on interlocusassociations Genetics 146 1197ndash1206

17 WeirBS and CockerhamCC (1984) Estimating F-statistics forthe analysis of population structure Evolution 38 1358ndash1370

18 HoferT RayN WegmannD and ExcoffierL (2009) Largeallele frequency differences between human continental groups aremore likely to have occurred by drift during range expansionsthan by selection Ann Hum Genet 73 95ndash108

19 ChenH PattersonN and ReichD (2010) Populationdifferentiation as a test for selective sweeps Genome Res 20393ndash402

20 NeiM and LiWH (1979) Mathematical model for studyinggenetic variation in terms of restriction endonucleasesProceedings of the National Academy of Sciences 76 5269ndash5273

21 BartonNH (1998) The effect of hitch-hiking on neutralgenealogies Genet Res 72 123ndash133

22 AkeyJM (2009) Constructing genomic maps of positive selectionin humans where do we go from here Genome Res 19711ndash722

23 PavlidisP JensenJD StephanW and StamatakisA (2012) Acritical assessment of storytelling gene ontology categories andthe importance of validating genomic scans Mol Biol Evol 293237ndash3248

24 SabetiPC SchaffnerSF FryB LohmuellerJ VarillyPShamovskyO PalmaA MikkelsenTS AltshulerD andLanderES (2006) Positive natural selection in the humanlineage Science 312 1614ndash1620

25 BarrettRDH and HoekstraHE (2011) Molecular spandrelstests of adaptation at the genetic level Nat Rev Genet 12767ndash780

26 DallrsquoolioGM LaayouniH LuisiP SikoraM MontanucciLand BertranpetitJ (2012) Distribution of events of positiveselection and population differentiation in a metabolicpathway the case of asparagine N-glycosylation BMC EvolBiol 12 98

27 LuisiP Alvarez-PonceD DallrsquoolioGM SikoraMBertranpetitJ and LaayouniH (2012) Network-level andpopulation genetics analysis of the insulinTOR signaltransduction pathway across human populations Mol Biol Evol29 1ndash40

28 VittiJJ ChoMK TishkoffSA and SabetiPC (2012) Humanevolutionary genomics ethical and interpretive issues TrendsGenet 28 137ndash145

29 HernandezRD KelleyJL ElyashivE MeltonSC AutonAMcVeanG SellaG and PrzeworskiM (2011) Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

30 GrossmanSR AndersenKG ShlyakhterI TabriziSWinnickiS YenA ParkDJ GriesemerD KarlssonEKWongSH et al (2013) Identifying recent adaptations in large-scale genomic data Cell 152 703ndash713

31 SmithJ and HaighJ (1974) The hitch-hiking effect of afavourable gene Genet Res 23 23ndash35

32 BravermanJM HudsonRR KaplanNL LangleyCH andStephanW (1995) The hitchhiking effect on the sitefrequency spectrum of DNA polymorphisms Genetics 140783ndash796

33 KimY and NielsenR (2004) Linkage disequilibrium as asignature of selective sweeps Genetics 167 1513ndash1524

34 BarreiroLB LavalG QuachH PatinE andQuintana-MurciL (2008) Natural selection has drivenpopulation differentiation in modern humans Nat Genet 40340ndash345

35 WilliamsonSH HubiszMJ ClarkAG PayseurBABustamanteCD and NielsenR (2007) Localizing recent adaptiveevolution in the human genome PLoS Genet 3 e90

36 Ramırez-SorianoA Ramos-OnsinsSE RozasJ CalafellF andNavarroA (2008) Statistical power analysis of neutrality testsunder demographic expansions contractions and bottlenecks withrecombination Genetics 179 555ndash567

37 WeirBS and HillWG (2002) Estimating F-statistics Annu RevGenet 36 721ndash750

38 NielsenR (2004) Population genetic analysis of ascertained SNPdata Hum Genomics 1 218ndash224

39 BamshadM and WoodingSP (2003) Signatures ofnatural selection in the human genome Nat Rev Genet 499ndash111

40 KorneliussenTS MoltkeI AlbrechtsenA and NielsenR(2013) Calculation of Tajimarsquos D and other neutrality teststatistics from low depth next-generation sequencing data BMCBioinform 14 289

41 KentWJ SugnetCW FureyTS RoskinKM PringleTHZahlerAM and HausslerAD (2002) The Human GenomeBrowser at UCSC Genome Res 12 996ndash1006

42 DreszerTR KarolchikD ZweigAS HinrichsASRaneyBJ KuhnRM MeyerLR WongM SloanCARosenbloomKR et al (2012) The UCSC Genome Browserdatabase extensions and updates 2011 Nucleic Acids Res 40D918ndashD923

43 KarolchikD HinrichsAS FureyTS RoskinKMSugnetCW HausslerD and KentWJ (2004) The UCSCTable Browser data retrieval tool Nucleic Acids Res 32D493ndashD496

44 KelleyJL MadeoyJ CalhounJC SwansonW andAkeyJM (2006) Genomic signatures of positive selection inhumans and the limits of outlier approaches Genome Res 16980ndash989

45 DallrsquoOlioGM MarinoJ SchubertM KeysKL StefanMIGillespieCS PoulainP ShameerK SugarR InvergoBMet al (2011) Ten simple rules for getting help from onlinescientific communities PLoS Comput Biol 7 e1002202

46 EnattahNS SahiT SavilahtiE TerwilligerJD PeltonenLand JarvelaI (2002) Identification of a variant associated withadult-type hypolactasia Nat Genet 30 233ndash237

47 BrykJ HardouinE PugachI HughesD StrotmannRStonekingM and MylesS (2008) Positive selection in EastAsians for an EDAR allele that enhances NF-kappaB activationPLoS One 3 e2209

48 BranickiW BrudnikU Draus-BariniJ KupiecT andWojas-PelcA (2008) Association of the SLC45A2 gene withphysiological human hair colour variation J Hum Genet 53966ndash971

49 FryAE GhansaA SmallKS PalmaA AuburnSDiakiteM GreenA CampinoS TeoYY ClarkTG et al(2009) Positive selection of a CD36 nonsense variant in sub-Saharan Africa but no association with severe malariaphenotypes Hum Mol Genet 18 2683ndash2692

50 DuffyDL MontgomeryGW ChenW ZhaoZZ LeLJamesMR HaywardNK MartinNG and SturmRA (2007)A three-single-nucleotide polymorphism haplotype in intron 1 ofOCA2 explains most human eye-color variation Am J HumGenet 80 241ndash252

51 LamasonRL MohideenM-APK MestJR WongACNortonHL ArosMC JurynecMJ MaoXHumphrevilleVR HumbertJE et al (2005) SLC24A5 aputative cation exchanger affects pigmentation in zebrafish andhumans Science 310 1782ndash1786

D908 Nucleic Acids Research 2014 Vol 42 Database issue

52 Carnero-MontoroE BonetL EngelkenJ BieligT Martınez-FlorensaM LozanoF and BoschE (2012) Evolutionary andfunctional evidence for positive selection at the human CD5immune receptor gene Mol Biol Evol 29 811ndash823

53 GenoveseG FriedmanDJ RossMD LecordierLUzureauP FreedmanBI BowdenDW LangefeldCDOleksykTK Uscinski KnobAL et al (2010) Association oftrypanolytic ApoL1 variants with kidney disease in AfricanAmericans Science 329 841ndash845

54 HedrickPW and ThomsonG (1983) Evidence for balancingselection at HLA Genetics 104 449ndash456

55 CalafellF RoubinetF Ramırez-SorianoA SaitouNBertranpetitJ and BlancherA (2008) Evolutionary dynamics ofthe human ABO gene Hum Genet 124 123ndash135

56 ProuxE StuderRA MorettiS and Robinson-RechaviM(2009) Selectome a database of positive selection Nucleic AcidsRes 37 D404ndashD407

57 PickrellJK CoopG NovembreJ KudaravalliS LiJZAbsherD SrinivasanBS BarshGS MyersRMFeldmanMW et al (2009) Signals of recent positive selection ina worldwide sample of human populations Genome Res 19826ndash837

58 AmigoJ SalasA and PhillipsC (2011) ENGINES exploringsingle nucleotide variation in entire human genomes BMCBioinformatics 12 105

Nucleic Acids Research 2014 Vol 42 Database issue D909

Page 5: 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures … · 2017-04-13 · 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures of

EXAMPLE APPLICATIONS

First we exemplify the use of the database by extract-ing results for a number of established loci under selectionEDAR (47) LCT (46) SLC45A2 (48) CD36 (49) HERC2(50) SLC24A5 (51) CD5 (52) and APOL1 (53) A loci-specific summary of statistical tests is given(Supplementary Table S2) Interestingly for any givenlocus only a subset of statistical tests shows an extremeoutlier score This is consistent with differences in thearchitecture of selective sweeps iHS scores near tocertain very pronounced selective sweeps (eg LCT andSLC24A5) failed to compute due to inherent propertiesof the statistics because either (i) the selected haplotypewas near fixation or (ii) the EHH did not drop below thedefined threshold in a given window Examples for bothpositive (SLC45A2) and balancing (HLA region) selectionare visualized in Figure 2 As expected Tajimarsquos D scoresaround HLA (54) as well as the ABO locus (55) (data notshown) were pronouncedly elevated in all three analyzedpopulations a pattern which is compatible with the actionof balancing selection

COMPARISON TO OTHER WEB RESOURCES

As for positive selection based on between-species com-parisons the Selectome database (httpbioinfounilchselectome (56)) presents results based on the dNdSmethod using a branch-site specific likelihood test Asfor recent natural selection within modern humans anumber of web resources are available For previousdatasets eg the HapMap 2 and HGDP projectsseveral positive selection statistics are available in formof the haplotter tool (httphaplotteruchicagoedu(24)) and in form of the HGDP selection browser(httphgdpuchicagoedu (57)) For the 1000 genomesproject data the online tool ENGINES (httpspsmartcesgaes (58)) is useful for the analysis of allele frequenciesand a recent study presented a method to calculate cor-rected summary statistics from low coverage sequencingdata (40) dbPSHP (httpjjwanglaborgdbpshp) offersa large number of statistical tests in a SNP-specificmanner for HapMap 3 and 1000 genomes datasetsComplementary to these databases our database gives alarge number of region- and SNP-specific scores (depend-ing on the test statistic) based on resequencing data (1000genomes Phase 1) with a special focus on genome-widesignificance (by the ranked scores) and the visualization ofseveral statistics in parallel (Figure 2)

CONCLUSIONS

By applying a large number of summary statistics to datafrom the 1000 genomes project we have built a timely andexpandable resource for the population genomics researchcommunity An associated user-friendly genome browsergives a visual impression of the genetic variation in agenomic region of interest and offers functionality for anarray of down-stream analyses While this resource willnot replace a thorough case by case analysis of selectionwe expect that it will prove useful for the research

community through the large number of test statisticsand the fine-grained character of resequencing data

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

The authors thankfully acknowledge contributions fromAnna Ramırez-Soriano Arcadi Navarro FrancescCalafell Elena Bosch Chris Tyler-Smith GoncaloAbecasis Roger Bartomeus Penalver and the 1000Genomes Project (1000genomesorg) The authors alsothank Txema Heredia and the National Institute ofBioinformatics (httpwwwinaborg) for computationalsupport

FUNDING

Ministerio de Ciencia y Tecnologıa (Spain) DireccioGeneral de Recerca Generalitat de Catalunya (Grup deRecerca Consolidat 2009 SGR 1101) Subprogram BMC[BFU2010-19443 awarded to JB] Post-doctoral scholar-ship from the Volkswagenstiftung [Az I85 198 to JE]Spanish government [BFU-2008-01046 SAF2011-29239]The Spanish government FPI scholarships [BES-2009-017731 and BES-2011-04502 to GMD and MPrespectively] PhD fellowship from lsquoAccion Estrategica deSalud en el marco del Plan Nacional de InvestigacionCientıfica Desarrollo e Innovacion Tecnologica 2008-2011rsquo from Instituto de Salud Carlos III (to PL)Funding for open access charge Prof Jaume Bertranpetit

Conflict of interest statement None declared

REFERENCES

1 The 1000 Genomes Project Consortium AbecasisGR AutonABrooksLD DePristoMA DurbinRM HandsakerREKangHM MarthGT and McVeanGA (2012) An integratedmap of genetic variation from 1092 human genomes Nature491 56ndash65

2 The 1000 Genomes Project Consortium (2010) A map of humangenome variation from population-scale sequencing Nature 4671061ndash1073

3 TajimaF (1989) Statistical method for testing the neutralmutation hypothesis by DNA polymorphism Genetics 123585ndash595

4 NielsenR WilliamsonS KimY HubiszMJ ClarkAG andBustamanteC (2005) Genomic scans for selective sweeps usingSNP data Genome Res 15 1566ndash1575

5 FayJC and WuCI (2000) Hitchhiking under positiveDarwinian selection Genetics 155 1405ndash1413

6 FuYX and LiWH (1993) Statistical tests of neutrality ofmutations Genetics 133 693ndash709

7 Ramos-OnsinsSE and RozasJ (2002) Statistical properties ofnew neutrality tests against population growth Mol Biol Evol19 2092ndash2100

8 SabetiPC VarillyP FryB LohmuellerJ HostetterECotsapasC XieX ByrneEH McCarrollSA GaudetR et al(2007) Genome-wide detection and characterization of positiveselection in human populations Nature 449 913ndash918

Nucleic Acids Research 2014 Vol 42 Database issue D907

9 VoightBF KudaravalliS WenX and PritchardJK (2006) Amap of recent positive selection in the human genome PLoSBiol 4 e72

10 SabetiPC ReichDE HigginsJM LevineHZP RichterDJSchaffnerSF GabrielSB PlatkoJV PattersonNJMcDonaldJG et al (2002) Detecting recent positive selection inthe human genome from haplotype structure Nature 419832ndash837

11 WallJD (1999) Recombination and the power of statistical testsof neutrality Genet Res 74 65ndash79

12 WallJD (2000) A comparison of estimators of the populationrecombination rate Mol Biol Evol 17 156ndash163

13 FuYX (1997) Statistical tests of neutrality of mutations againstpopulation growth hitchhiking and background selectionGenetics 147 915ndash925

14 NeiM (1987) Molecular Evolutionary Genetics ColumbiaUniversity Press New York NY

15 RozasJ GullaudM BlandinG and AguadeM (2001) DNAvariation at the rp49 gene region of Drosophila simulansevolutionary inferences from an unusual haplotype structureGenetics 158 1147ndash1155

16 KellyJK (1997) A test of neutrality based on interlocusassociations Genetics 146 1197ndash1206

17 WeirBS and CockerhamCC (1984) Estimating F-statistics forthe analysis of population structure Evolution 38 1358ndash1370

18 HoferT RayN WegmannD and ExcoffierL (2009) Largeallele frequency differences between human continental groups aremore likely to have occurred by drift during range expansionsthan by selection Ann Hum Genet 73 95ndash108

19 ChenH PattersonN and ReichD (2010) Populationdifferentiation as a test for selective sweeps Genome Res 20393ndash402

20 NeiM and LiWH (1979) Mathematical model for studyinggenetic variation in terms of restriction endonucleasesProceedings of the National Academy of Sciences 76 5269ndash5273

21 BartonNH (1998) The effect of hitch-hiking on neutralgenealogies Genet Res 72 123ndash133

22 AkeyJM (2009) Constructing genomic maps of positive selectionin humans where do we go from here Genome Res 19711ndash722

23 PavlidisP JensenJD StephanW and StamatakisA (2012) Acritical assessment of storytelling gene ontology categories andthe importance of validating genomic scans Mol Biol Evol 293237ndash3248

24 SabetiPC SchaffnerSF FryB LohmuellerJ VarillyPShamovskyO PalmaA MikkelsenTS AltshulerD andLanderES (2006) Positive natural selection in the humanlineage Science 312 1614ndash1620

25 BarrettRDH and HoekstraHE (2011) Molecular spandrelstests of adaptation at the genetic level Nat Rev Genet 12767ndash780

26 DallrsquoolioGM LaayouniH LuisiP SikoraM MontanucciLand BertranpetitJ (2012) Distribution of events of positiveselection and population differentiation in a metabolicpathway the case of asparagine N-glycosylation BMC EvolBiol 12 98

27 LuisiP Alvarez-PonceD DallrsquoolioGM SikoraMBertranpetitJ and LaayouniH (2012) Network-level andpopulation genetics analysis of the insulinTOR signaltransduction pathway across human populations Mol Biol Evol29 1ndash40

28 VittiJJ ChoMK TishkoffSA and SabetiPC (2012) Humanevolutionary genomics ethical and interpretive issues TrendsGenet 28 137ndash145

29 HernandezRD KelleyJL ElyashivE MeltonSC AutonAMcVeanG SellaG and PrzeworskiM (2011) Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

30 GrossmanSR AndersenKG ShlyakhterI TabriziSWinnickiS YenA ParkDJ GriesemerD KarlssonEKWongSH et al (2013) Identifying recent adaptations in large-scale genomic data Cell 152 703ndash713

31 SmithJ and HaighJ (1974) The hitch-hiking effect of afavourable gene Genet Res 23 23ndash35

32 BravermanJM HudsonRR KaplanNL LangleyCH andStephanW (1995) The hitchhiking effect on the sitefrequency spectrum of DNA polymorphisms Genetics 140783ndash796

33 KimY and NielsenR (2004) Linkage disequilibrium as asignature of selective sweeps Genetics 167 1513ndash1524

34 BarreiroLB LavalG QuachH PatinE andQuintana-MurciL (2008) Natural selection has drivenpopulation differentiation in modern humans Nat Genet 40340ndash345

35 WilliamsonSH HubiszMJ ClarkAG PayseurBABustamanteCD and NielsenR (2007) Localizing recent adaptiveevolution in the human genome PLoS Genet 3 e90

36 Ramırez-SorianoA Ramos-OnsinsSE RozasJ CalafellF andNavarroA (2008) Statistical power analysis of neutrality testsunder demographic expansions contractions and bottlenecks withrecombination Genetics 179 555ndash567

37 WeirBS and HillWG (2002) Estimating F-statistics Annu RevGenet 36 721ndash750

38 NielsenR (2004) Population genetic analysis of ascertained SNPdata Hum Genomics 1 218ndash224

39 BamshadM and WoodingSP (2003) Signatures ofnatural selection in the human genome Nat Rev Genet 499ndash111

40 KorneliussenTS MoltkeI AlbrechtsenA and NielsenR(2013) Calculation of Tajimarsquos D and other neutrality teststatistics from low depth next-generation sequencing data BMCBioinform 14 289

41 KentWJ SugnetCW FureyTS RoskinKM PringleTHZahlerAM and HausslerAD (2002) The Human GenomeBrowser at UCSC Genome Res 12 996ndash1006

42 DreszerTR KarolchikD ZweigAS HinrichsASRaneyBJ KuhnRM MeyerLR WongM SloanCARosenbloomKR et al (2012) The UCSC Genome Browserdatabase extensions and updates 2011 Nucleic Acids Res 40D918ndashD923

43 KarolchikD HinrichsAS FureyTS RoskinKMSugnetCW HausslerD and KentWJ (2004) The UCSCTable Browser data retrieval tool Nucleic Acids Res 32D493ndashD496

44 KelleyJL MadeoyJ CalhounJC SwansonW andAkeyJM (2006) Genomic signatures of positive selection inhumans and the limits of outlier approaches Genome Res 16980ndash989

45 DallrsquoOlioGM MarinoJ SchubertM KeysKL StefanMIGillespieCS PoulainP ShameerK SugarR InvergoBMet al (2011) Ten simple rules for getting help from onlinescientific communities PLoS Comput Biol 7 e1002202

46 EnattahNS SahiT SavilahtiE TerwilligerJD PeltonenLand JarvelaI (2002) Identification of a variant associated withadult-type hypolactasia Nat Genet 30 233ndash237

47 BrykJ HardouinE PugachI HughesD StrotmannRStonekingM and MylesS (2008) Positive selection in EastAsians for an EDAR allele that enhances NF-kappaB activationPLoS One 3 e2209

48 BranickiW BrudnikU Draus-BariniJ KupiecT andWojas-PelcA (2008) Association of the SLC45A2 gene withphysiological human hair colour variation J Hum Genet 53966ndash971

49 FryAE GhansaA SmallKS PalmaA AuburnSDiakiteM GreenA CampinoS TeoYY ClarkTG et al(2009) Positive selection of a CD36 nonsense variant in sub-Saharan Africa but no association with severe malariaphenotypes Hum Mol Genet 18 2683ndash2692

50 DuffyDL MontgomeryGW ChenW ZhaoZZ LeLJamesMR HaywardNK MartinNG and SturmRA (2007)A three-single-nucleotide polymorphism haplotype in intron 1 ofOCA2 explains most human eye-color variation Am J HumGenet 80 241ndash252

51 LamasonRL MohideenM-APK MestJR WongACNortonHL ArosMC JurynecMJ MaoXHumphrevilleVR HumbertJE et al (2005) SLC24A5 aputative cation exchanger affects pigmentation in zebrafish andhumans Science 310 1782ndash1786

D908 Nucleic Acids Research 2014 Vol 42 Database issue

52 Carnero-MontoroE BonetL EngelkenJ BieligT Martınez-FlorensaM LozanoF and BoschE (2012) Evolutionary andfunctional evidence for positive selection at the human CD5immune receptor gene Mol Biol Evol 29 811ndash823

53 GenoveseG FriedmanDJ RossMD LecordierLUzureauP FreedmanBI BowdenDW LangefeldCDOleksykTK Uscinski KnobAL et al (2010) Association oftrypanolytic ApoL1 variants with kidney disease in AfricanAmericans Science 329 841ndash845

54 HedrickPW and ThomsonG (1983) Evidence for balancingselection at HLA Genetics 104 449ndash456

55 CalafellF RoubinetF Ramırez-SorianoA SaitouNBertranpetitJ and BlancherA (2008) Evolutionary dynamics ofthe human ABO gene Hum Genet 124 123ndash135

56 ProuxE StuderRA MorettiS and Robinson-RechaviM(2009) Selectome a database of positive selection Nucleic AcidsRes 37 D404ndashD407

57 PickrellJK CoopG NovembreJ KudaravalliS LiJZAbsherD SrinivasanBS BarshGS MyersRMFeldmanMW et al (2009) Signals of recent positive selection ina worldwide sample of human populations Genome Res 19826ndash837

58 AmigoJ SalasA and PhillipsC (2011) ENGINES exploringsingle nucleotide variation in entire human genomes BMCBioinformatics 12 105

Nucleic Acids Research 2014 Vol 42 Database issue D909

Page 6: 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures … · 2017-04-13 · 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures of

9 VoightBF KudaravalliS WenX and PritchardJK (2006) Amap of recent positive selection in the human genome PLoSBiol 4 e72

10 SabetiPC ReichDE HigginsJM LevineHZP RichterDJSchaffnerSF GabrielSB PlatkoJV PattersonNJMcDonaldJG et al (2002) Detecting recent positive selection inthe human genome from haplotype structure Nature 419832ndash837

11 WallJD (1999) Recombination and the power of statistical testsof neutrality Genet Res 74 65ndash79

12 WallJD (2000) A comparison of estimators of the populationrecombination rate Mol Biol Evol 17 156ndash163

13 FuYX (1997) Statistical tests of neutrality of mutations againstpopulation growth hitchhiking and background selectionGenetics 147 915ndash925

14 NeiM (1987) Molecular Evolutionary Genetics ColumbiaUniversity Press New York NY

15 RozasJ GullaudM BlandinG and AguadeM (2001) DNAvariation at the rp49 gene region of Drosophila simulansevolutionary inferences from an unusual haplotype structureGenetics 158 1147ndash1155

16 KellyJK (1997) A test of neutrality based on interlocusassociations Genetics 146 1197ndash1206

17 WeirBS and CockerhamCC (1984) Estimating F-statistics forthe analysis of population structure Evolution 38 1358ndash1370

18 HoferT RayN WegmannD and ExcoffierL (2009) Largeallele frequency differences between human continental groups aremore likely to have occurred by drift during range expansionsthan by selection Ann Hum Genet 73 95ndash108

19 ChenH PattersonN and ReichD (2010) Populationdifferentiation as a test for selective sweeps Genome Res 20393ndash402

20 NeiM and LiWH (1979) Mathematical model for studyinggenetic variation in terms of restriction endonucleasesProceedings of the National Academy of Sciences 76 5269ndash5273

21 BartonNH (1998) The effect of hitch-hiking on neutralgenealogies Genet Res 72 123ndash133

22 AkeyJM (2009) Constructing genomic maps of positive selectionin humans where do we go from here Genome Res 19711ndash722

23 PavlidisP JensenJD StephanW and StamatakisA (2012) Acritical assessment of storytelling gene ontology categories andthe importance of validating genomic scans Mol Biol Evol 293237ndash3248

24 SabetiPC SchaffnerSF FryB LohmuellerJ VarillyPShamovskyO PalmaA MikkelsenTS AltshulerD andLanderES (2006) Positive natural selection in the humanlineage Science 312 1614ndash1620

25 BarrettRDH and HoekstraHE (2011) Molecular spandrelstests of adaptation at the genetic level Nat Rev Genet 12767ndash780

26 DallrsquoolioGM LaayouniH LuisiP SikoraM MontanucciLand BertranpetitJ (2012) Distribution of events of positiveselection and population differentiation in a metabolicpathway the case of asparagine N-glycosylation BMC EvolBiol 12 98

27 LuisiP Alvarez-PonceD DallrsquoolioGM SikoraMBertranpetitJ and LaayouniH (2012) Network-level andpopulation genetics analysis of the insulinTOR signaltransduction pathway across human populations Mol Biol Evol29 1ndash40

28 VittiJJ ChoMK TishkoffSA and SabetiPC (2012) Humanevolutionary genomics ethical and interpretive issues TrendsGenet 28 137ndash145

29 HernandezRD KelleyJL ElyashivE MeltonSC AutonAMcVeanG SellaG and PrzeworskiM (2011) Classic selectivesweeps were rare in recent human evolution Science 331920ndash924

30 GrossmanSR AndersenKG ShlyakhterI TabriziSWinnickiS YenA ParkDJ GriesemerD KarlssonEKWongSH et al (2013) Identifying recent adaptations in large-scale genomic data Cell 152 703ndash713

31 SmithJ and HaighJ (1974) The hitch-hiking effect of afavourable gene Genet Res 23 23ndash35

32 BravermanJM HudsonRR KaplanNL LangleyCH andStephanW (1995) The hitchhiking effect on the sitefrequency spectrum of DNA polymorphisms Genetics 140783ndash796

33 KimY and NielsenR (2004) Linkage disequilibrium as asignature of selective sweeps Genetics 167 1513ndash1524

34 BarreiroLB LavalG QuachH PatinE andQuintana-MurciL (2008) Natural selection has drivenpopulation differentiation in modern humans Nat Genet 40340ndash345

35 WilliamsonSH HubiszMJ ClarkAG PayseurBABustamanteCD and NielsenR (2007) Localizing recent adaptiveevolution in the human genome PLoS Genet 3 e90

36 Ramırez-SorianoA Ramos-OnsinsSE RozasJ CalafellF andNavarroA (2008) Statistical power analysis of neutrality testsunder demographic expansions contractions and bottlenecks withrecombination Genetics 179 555ndash567

37 WeirBS and HillWG (2002) Estimating F-statistics Annu RevGenet 36 721ndash750

38 NielsenR (2004) Population genetic analysis of ascertained SNPdata Hum Genomics 1 218ndash224

39 BamshadM and WoodingSP (2003) Signatures ofnatural selection in the human genome Nat Rev Genet 499ndash111

40 KorneliussenTS MoltkeI AlbrechtsenA and NielsenR(2013) Calculation of Tajimarsquos D and other neutrality teststatistics from low depth next-generation sequencing data BMCBioinform 14 289

41 KentWJ SugnetCW FureyTS RoskinKM PringleTHZahlerAM and HausslerAD (2002) The Human GenomeBrowser at UCSC Genome Res 12 996ndash1006

42 DreszerTR KarolchikD ZweigAS HinrichsASRaneyBJ KuhnRM MeyerLR WongM SloanCARosenbloomKR et al (2012) The UCSC Genome Browserdatabase extensions and updates 2011 Nucleic Acids Res 40D918ndashD923

43 KarolchikD HinrichsAS FureyTS RoskinKMSugnetCW HausslerD and KentWJ (2004) The UCSCTable Browser data retrieval tool Nucleic Acids Res 32D493ndashD496

44 KelleyJL MadeoyJ CalhounJC SwansonW andAkeyJM (2006) Genomic signatures of positive selection inhumans and the limits of outlier approaches Genome Res 16980ndash989

45 DallrsquoOlioGM MarinoJ SchubertM KeysKL StefanMIGillespieCS PoulainP ShameerK SugarR InvergoBMet al (2011) Ten simple rules for getting help from onlinescientific communities PLoS Comput Biol 7 e1002202

46 EnattahNS SahiT SavilahtiE TerwilligerJD PeltonenLand JarvelaI (2002) Identification of a variant associated withadult-type hypolactasia Nat Genet 30 233ndash237

47 BrykJ HardouinE PugachI HughesD StrotmannRStonekingM and MylesS (2008) Positive selection in EastAsians for an EDAR allele that enhances NF-kappaB activationPLoS One 3 e2209

48 BranickiW BrudnikU Draus-BariniJ KupiecT andWojas-PelcA (2008) Association of the SLC45A2 gene withphysiological human hair colour variation J Hum Genet 53966ndash971

49 FryAE GhansaA SmallKS PalmaA AuburnSDiakiteM GreenA CampinoS TeoYY ClarkTG et al(2009) Positive selection of a CD36 nonsense variant in sub-Saharan Africa but no association with severe malariaphenotypes Hum Mol Genet 18 2683ndash2692

50 DuffyDL MontgomeryGW ChenW ZhaoZZ LeLJamesMR HaywardNK MartinNG and SturmRA (2007)A three-single-nucleotide polymorphism haplotype in intron 1 ofOCA2 explains most human eye-color variation Am J HumGenet 80 241ndash252

51 LamasonRL MohideenM-APK MestJR WongACNortonHL ArosMC JurynecMJ MaoXHumphrevilleVR HumbertJE et al (2005) SLC24A5 aputative cation exchanger affects pigmentation in zebrafish andhumans Science 310 1782ndash1786

D908 Nucleic Acids Research 2014 Vol 42 Database issue

52 Carnero-MontoroE BonetL EngelkenJ BieligT Martınez-FlorensaM LozanoF and BoschE (2012) Evolutionary andfunctional evidence for positive selection at the human CD5immune receptor gene Mol Biol Evol 29 811ndash823

53 GenoveseG FriedmanDJ RossMD LecordierLUzureauP FreedmanBI BowdenDW LangefeldCDOleksykTK Uscinski KnobAL et al (2010) Association oftrypanolytic ApoL1 variants with kidney disease in AfricanAmericans Science 329 841ndash845

54 HedrickPW and ThomsonG (1983) Evidence for balancingselection at HLA Genetics 104 449ndash456

55 CalafellF RoubinetF Ramırez-SorianoA SaitouNBertranpetitJ and BlancherA (2008) Evolutionary dynamics ofthe human ABO gene Hum Genet 124 123ndash135

56 ProuxE StuderRA MorettiS and Robinson-RechaviM(2009) Selectome a database of positive selection Nucleic AcidsRes 37 D404ndashD407

57 PickrellJK CoopG NovembreJ KudaravalliS LiJZAbsherD SrinivasanBS BarshGS MyersRMFeldmanMW et al (2009) Signals of recent positive selection ina worldwide sample of human populations Genome Res 19826ndash837

58 AmigoJ SalasA and PhillipsC (2011) ENGINES exploringsingle nucleotide variation in entire human genomes BMCBioinformatics 12 105

Nucleic Acids Research 2014 Vol 42 Database issue D909

Page 7: 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures … · 2017-04-13 · 1000 Genomes Selection Browser 1.0: a genome browser dedicated to signatures of

52 Carnero-MontoroE BonetL EngelkenJ BieligT Martınez-FlorensaM LozanoF and BoschE (2012) Evolutionary andfunctional evidence for positive selection at the human CD5immune receptor gene Mol Biol Evol 29 811ndash823

53 GenoveseG FriedmanDJ RossMD LecordierLUzureauP FreedmanBI BowdenDW LangefeldCDOleksykTK Uscinski KnobAL et al (2010) Association oftrypanolytic ApoL1 variants with kidney disease in AfricanAmericans Science 329 841ndash845

54 HedrickPW and ThomsonG (1983) Evidence for balancingselection at HLA Genetics 104 449ndash456

55 CalafellF RoubinetF Ramırez-SorianoA SaitouNBertranpetitJ and BlancherA (2008) Evolutionary dynamics ofthe human ABO gene Hum Genet 124 123ndash135

56 ProuxE StuderRA MorettiS and Robinson-RechaviM(2009) Selectome a database of positive selection Nucleic AcidsRes 37 D404ndashD407

57 PickrellJK CoopG NovembreJ KudaravalliS LiJZAbsherD SrinivasanBS BarshGS MyersRMFeldmanMW et al (2009) Signals of recent positive selection ina worldwide sample of human populations Genome Res 19826ndash837

58 AmigoJ SalasA and PhillipsC (2011) ENGINES exploringsingle nucleotide variation in entire human genomes BMCBioinformatics 12 105

Nucleic Acids Research 2014 Vol 42 Database issue D909