ll l l 0.4 inarbitrarypopulationstructures 0

63
Relatedness and differentiation in arbitrary population structures Alejandro Ochoa, John D. Storey Lab, Princeton University DrAlexOchoa viiia.org/research/ [email protected] 1 / 35

Transcript of ll l l 0.4 inarbitrarypopulationstructures 0

Page 1: ll l l 0.4 inarbitrarypopulationstructures 0

●●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

● ● ●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●● ●

●●

● ●

0.2

0.3

0.4

Inbr

eedi

ng c

oeff.

Relatedness and differentiationin arbitrary population structures

Alejandro Ochoa, John D. Storey Lab, Princeton University7 DrAlexOchoa � viiia.org/research/ R [email protected]

1 / 35

Page 2: ll l l 0.4 inarbitrarypopulationstructures 0

My research areas / Contributions1. Stratified False Discovery Rate (FDR)

I Finding: per-stratum local FDR maximizes power controlling overall FDRI Improved power in protein domain predictionI Identified protein classes with problematic statistics

2. Genetic and other factors controlling the placebo responseI Collaboration with Otsuka PharmaceuticalI Mixed-effects modeling of longitudinal response in drug trials for

Schizophrenia, Bipolar Disorder, and Major Depressive DisorderI Genome-wide association study for individual-specific placebo response

3. Kinship and FST for arbitrary population structuresI Motivation: world-wide human population structureI Generalized definitions and modelsI Novel bias calculations validated by simulationsI Novel non-parametric estimator with greatly improved accuracy

2 / 35

Page 3: ll l l 0.4 inarbitrarypopulationstructures 0

My research areas / Contributions1. Stratified False Discovery Rate (FDR)

I Finding: per-stratum local FDR maximizes power controlling overall FDRI Improved power in protein domain predictionI Identified protein classes with problematic statistics

2. Genetic and other factors controlling the placebo responseI Collaboration with Otsuka PharmaceuticalI Mixed-effects modeling of longitudinal response in drug trials for

Schizophrenia, Bipolar Disorder, and Major Depressive DisorderI Genome-wide association study for individual-specific placebo response

3. Kinship and FST for arbitrary population structuresI Motivation: world-wide human population structureI Generalized definitions and modelsI Novel bias calculations validated by simulationsI Novel non-parametric estimator with greatly improved accuracy

2 / 35

Page 4: ll l l 0.4 inarbitrarypopulationstructures 0

My research areas / Contributions1. Stratified False Discovery Rate (FDR)

I Finding: per-stratum local FDR maximizes power controlling overall FDRI Improved power in protein domain predictionI Identified protein classes with problematic statistics

2. Genetic and other factors controlling the placebo responseI Collaboration with Otsuka PharmaceuticalI Mixed-effects modeling of longitudinal response in drug trials for

Schizophrenia, Bipolar Disorder, and Major Depressive DisorderI Genome-wide association study for individual-specific placebo response

3. Kinship and FST for arbitrary population structuresI Motivation: world-wide human population structureI Generalized definitions and modelsI Novel bias calculations validated by simulationsI Novel non-parametric estimator with greatly improved accuracy

2 / 35

Page 5: ll l l 0.4 inarbitrarypopulationstructures 0

Why study relatedness?

Human geneticsis fascinating!

Pop. structureconfoundsassociationstudies (GWAS)

Heritability ofcomplex traits

Animal and plantbreeding

3 / 35

Page 6: ll l l 0.4 inarbitrarypopulationstructures 0

Why study relatedness?

Human geneticsis fascinating!

Pop. structureconfoundsassociationstudies (GWAS)

Heritability ofcomplex traits

Animal and plantbreeding

3 / 35

Page 7: ll l l 0.4 inarbitrarypopulationstructures 0

Why study relatedness?

Human geneticsis fascinating!

Pop. structureconfoundsassociationstudies (GWAS)

Heritability ofcomplex traits

Animal and plantbreeding

3 / 35

Page 8: ll l l 0.4 inarbitrarypopulationstructures 0

Why study relatedness?

Human geneticsis fascinating!

Pop. structureconfoundsassociationstudies (GWAS)

Heritability ofcomplex traits

Animal and plantbreeding

3 / 35

Page 9: ll l l 0.4 inarbitrarypopulationstructures 0

Single Nucleotide Polymorphism (SNP) data

Genotype xijCC 0CT 1TT 2

Individuals

Loci

0 2 2 1 1 0 10 2 1 0 12 ...

X

4 / 35

Page 10: ll l l 0.4 inarbitrarypopulationstructures 0

Single Nucleotide Polymorphism (SNP) data

Genotype xijCC 0CT 1TT 2

Individuals

Loci

0 2 2 1 1 0 10 2 1 0 12 ...

X

4 / 35

Page 11: ll l l 0.4 inarbitrarypopulationstructures 0

Single Nucleotide Polymorphism (SNP) data

Genotype xijCC 0CT 1TT 2

Individuals

Loci

0 2 2 1 1 0 10 2 1 0 12 ...

X4 / 35

Page 12: ll l l 0.4 inarbitrarypopulationstructures 0

Hardy-Weinberg Equillibrium (HWE): Binomial draws

xij = genotype at locus i for individual j .

pTi = frequency of reference allele at locus i , (ancestral) population T .

Under HWE:

Pr(xij = 2|pTi ) =(pTi)2,

Pr(xij = 1|pTi ) = 2pTi(1− pTi

),

Pr(xij = 0|pTi ) =(1− pTi

)2.

HWE not valid under population structure!

5 / 35

Page 13: ll l l 0.4 inarbitrarypopulationstructures 0

Hardy-Weinberg Equillibrium (HWE): Binomial draws

xij = genotype at locus i for individual j .

pTi = frequency of reference allele at locus i , (ancestral) population T .

Under HWE:

Pr(xij = 2|pTi ) =(pTi)2,

Pr(xij = 1|pTi ) = 2pTi(1− pTi

),

Pr(xij = 0|pTi ) =(1− pTi

)2.

HWE not valid under population structure!

5 / 35

Page 14: ll l l 0.4 inarbitrarypopulationstructures 0

Hardy-Weinberg Equillibrium (HWE): Binomial draws

xij = genotype at locus i for individual j .

pTi = frequency of reference allele at locus i , (ancestral) population T .

Under HWE:

Pr(xij = 2|pTi ) =(pTi)2,

Pr(xij = 1|pTi ) = 2pTi(1− pTi

),

Pr(xij = 0|pTi ) =(1− pTi

)2.

HWE not valid under population structure!

5 / 35

Page 15: ll l l 0.4 inarbitrarypopulationstructures 0

Goal: measure dependence structure of genotype matrix columns

IndividualsLo

ci0 2 2 1 1 0 10 2 1 0 12 ...

X

High-dimensional binomial data

Population structure⇒ dependence between individuals (columns)

Linkage disequilibrium⇒ dependence between loci (rows)

6 / 35

Page 16: ll l l 0.4 inarbitrarypopulationstructures 0

Goal: measure dependence structure of genotype matrix columns

IndividualsLo

ci0 2 2 1 1 0 10 2 1 0 12 ...

X

High-dimensional binomial data

Population structure⇒ dependence between individuals (columns)

Linkage disequilibrium⇒ dependence between loci (rows)

6 / 35

Page 17: ll l l 0.4 inarbitrarypopulationstructures 0

Goal: measure dependence structure of genotype matrix columns

IndividualsLo

ci0 2 2 1 1 0 10 2 1 0 12 ...

X

High-dimensional binomial data

Population structure⇒ dependence between individuals (columns)

Linkage disequilibrium⇒ dependence between loci (rows)

6 / 35

Page 18: ll l l 0.4 inarbitrarypopulationstructures 0

Model parametersIBD(T ): “Identical By Descent” for ancestral population T — shared coin flips

f Tj : Inbreeding coefficientPr. that the two alleles at a random locus of individual j are IBD(T )

Var(xij |T ) = 2pTi(1− pTi

)(1 + f Tj )

ϕTjk : Kinship coefficient

Pr. that two alleles, one at random from each of individuals j and k , at onerandom locus are IBD(T )

Cov(xij , xik |T ) = 4pTi(1− pTi

)ϕTjk

FST: Fixation indexPr. that two random alleles in a subpopulation at a random locus are IBD(T )

7 / 35

Page 19: ll l l 0.4 inarbitrarypopulationstructures 0

Model parametersIBD(T ): “Identical By Descent” for ancestral population T — shared coin flips

f Tj : Inbreeding coefficientPr. that the two alleles at a random locus of individual j are IBD(T )

Var(xij |T ) = 2pTi(1− pTi

)(1 + f Tj )

ϕTjk : Kinship coefficient

Pr. that two alleles, one at random from each of individuals j and k , at onerandom locus are IBD(T )

Cov(xij , xik |T ) = 4pTi(1− pTi

)ϕTjk

FST: Fixation indexPr. that two random alleles in a subpopulation at a random locus are IBD(T )

7 / 35

Page 20: ll l l 0.4 inarbitrarypopulationstructures 0

Model parametersIBD(T ): “Identical By Descent” for ancestral population T — shared coin flips

f Tj : Inbreeding coefficientPr. that the two alleles at a random locus of individual j are IBD(T )

Var(xij |T ) = 2pTi(1− pTi

)(1 + f Tj )

ϕTjk : Kinship coefficient

Pr. that two alleles, one at random from each of individuals j and k , at onerandom locus are IBD(T )

Cov(xij , xik |T ) = 4pTi(1− pTi

)ϕTjk

FST: Fixation indexPr. that two random alleles in a subpopulation at a random locus are IBD(T )

7 / 35

Page 21: ll l l 0.4 inarbitrarypopulationstructures 0

Model parametersIBD(T ): “Identical By Descent” for ancestral population T — shared coin flips

f Tj : Inbreeding coefficientPr. that the two alleles at a random locus of individual j are IBD(T )

Var(xij |T ) = 2pTi(1− pTi

)(1 + f Tj )

ϕTjk : Kinship coefficient

Pr. that two alleles, one at random from each of individuals j and k , at onerandom locus are IBD(T )

Cov(xij , xik |T ) = 4pTi(1− pTi

)ϕTjk

FST: Fixation indexPr. that two random alleles in a subpopulation at a random locus are IBD(T )

7 / 35

Page 22: ll l l 0.4 inarbitrarypopulationstructures 0

Existing approaches

1. FST estimationI For independent subpopulations only!I Weir-Cockerham (WC) estimator (1984) — 15K citations!I “Hudson” pairwise estimator (2013) tweaks WCI BayeScan (2008) — 1.2K citations

2. Kinship estimationI “Standard” kinship estimator (1950s)

I Used by most modern GWAS approaches that control for population structure(PCA, LMM, adj. χ2; top paper 6K citations)

I GCTA heritability estimation (2 papers: 4K citations)I Our novel finding: accuracy requires unstructured population

(a minority of closely-related individuals)

8 / 35

Page 23: ll l l 0.4 inarbitrarypopulationstructures 0

Existing approaches

1. FST estimationI For independent subpopulations only!I Weir-Cockerham (WC) estimator (1984) — 15K citations!I “Hudson” pairwise estimator (2013) tweaks WCI BayeScan (2008) — 1.2K citations

2. Kinship estimationI “Standard” kinship estimator (1950s)

I Used by most modern GWAS approaches that control for population structure(PCA, LMM, adj. χ2; top paper 6K citations)

I GCTA heritability estimation (2 papers: 4K citations)I Our novel finding: accuracy requires unstructured population

(a minority of closely-related individuals)

8 / 35

Page 24: ll l l 0.4 inarbitrarypopulationstructures 0

Dataset: Human Origins

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●● ●

● ●

● ●

●●●

●●

●●● ●

●●●

●●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●●

●●●●●●●●

● ●●

●●●

●●●●

●●●

●●●

●●●

●●●

●●

●●●

● ●

●●

●●

●●

● ●

●●

● ●●

●●

●●

●●

●●●

●●

●●●●

● ●● ●

●●

●●

● ●●

●● ●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●● ●

●●●

●●

● ●

●●

●●

●●

Ju_hoan_South

Ju_hoan_North

Taa_West

Taa_East

Taa_North

NaroGui

Hoan

Xuun

GanaTshwa

Khomani

Nama

Haiom

Kgalagadi

Shua

Khwe

Mbuti

Biaka

BantuSA

Tswana

Damara

Himba

Wambo

YorubaEsan

Mende

BantuKenya

Luhya

MandenkaGambian

Luo

Dinka

Kikuyu

Hadza

Sandawe

Masai

Datog

Somali

OromoJew_Ethiopian

Saharawi

MoroccanMozabite

Algerian Tunisian

Libyan Egyptian

Yemeni

Jew_Libyan

Jew_TunisianJew_Moroccan

Jew_Yemenite

Saudi

BedouinBBedouinA

PalestinianJordanian

SyrianLebanese_Muslim

Lebanese_Christian

Jew_Turkish

Assyrian

Druze

Lebanese

Jew_iraqi

Iranian_Bandari

Iranian

Jew_Iranian

Turkish

Jew_Ashkenazi

CypriotMaltese

Canary_Islander

Italian_SouthSicilian

Romanian

FrenchN

SpanishSW Greek

Italian_North

SpanishNE

FrenchS

Sardinian

Basque

Orcadian

English

Icelandic

Norwegian

Irish

Irish_UlsterScottish

Shetlandic

GermanSorb

Polish

Czech

Croatian

Hungarian

AlbanianBulgarian

Mordovian

FinnishRussian

Estonian

Ukrainian

BelarusianLithuanian

Armenian

Jew_GeorgianGeorgian

Abkhasian

Adygei

North_Ossetian

Chechen

Lezgin

KumykBalkar

Nogai

Makrani

BrahuiBalochi

Jew_Cochin

Tajik

Kalash

Pathan

Sindhi

Burusho

Brahmin_TiwariGujarati

Punjabi

Vishwabrahmin

Lodhi

Mala

BengaliKharia

Onge

Chuvash

Turkmen Uzbek

Hazara

Uygur

Tubalar

Mansi

Selkup

Kyrgyz

Altaian

Tuvinian

Nganasan

Dolgan

Yakut

Xibo

Hezhen

OroqenUlchi

Even

Yukagir

ItelmenKoryak

Chukchi

Eskimo

AleutAleut_Tlingit

Kusunda

Burmese

CambodianThai

Kalmyk

Ami

Atayal

Malay

Kinh

Vietnamese

Dai

Lahu

NaxiYi

TujiaHan

MiaoShe

Tu

Mongola

Daur

JapaneseKorean

Chipewyan

CreeAlgonquin

Ojibwa

Pima

Mixe

Mixtec

Zapotec

Mayan

Inga

Kaqchikel

Cabecar

Piapoco

Karitiana

Surui

Bolivian

Aymara

Quechua

Guarani

Chilote

Australian

Nasoi

Papuan

Ata

Baining_Malasait

Baining_Marabu

Bajo

Buka

Dusun

Ilocano

Kankanaey

Kol_New_Britain

Kove

Kuot_Kabil

Kuot_LamalauaLavongaiLebbo Madak

MamusiMamusi_Paleabu

Mangseng

Manus

Melamela

Mengen

Murut Mussau

Nakanai_Bileki

Nakanai_Loso

Nailik

Notsi

Saposa

Sulka

Tagalog

Teop

Tigak

Tolai

Visayan

Subpopulations

SAfricaMAfricaNAfricaMiddleEastEuropeCaucasusSAsiaNAsiaEAsiaAmericasOceania

2,922 indivs. from 244 locs. — 593,124 loci — SNP chipLazaridis et al. (2014), (2016); Skoglund et al. (2016)

9 / 35

Page 25: ll l l 0.4 inarbitrarypopulationstructures 0

Ju_h

oan_

Sou

thJu

_hoa

n_N

orth

Taa_

Wes

tTa

a_E

ast

Taa_

Nor

thN

aro

Gui

Hoa

nX

uun

Gan

aT

shw

aK

hom

ani

Nam

aH

aiom

Kga

laga

diS

hua

Khw

eM

buti

Bia

kaB

antu

SA

Tsw

ana

Dam

ara

Him

baW

ambo

Yoru

baE

san

Men

deB

antu

Ken

yaLu

hya

Man

denk

aG

ambi

an Luo

Din

kaK

ikuy

uH

adza

San

daw

eM

asai

Dat

ogS

omal

iO

rom

oJe

w_E

thio

pian

Sah

araw

iM

oroc

can

Moz

abite

Alg

eria

nTu

nisi

anLi

byan

Egy

ptia

nYe

men

iJe

w_L

ibya

nJe

w_T

unis

ian

Jew

_Mor

occa

nJe

w_Y

emen

iteS

audi

Bed

ouin

BB

edou

inA

Pal

estin

ian

Jord

ania

nS

yria

nLe

bane

se_M

uslim

Leba

nese

_Chr

istia

nJe

w_T

urki

shA

ssyr

ian

Dru

zeLe

bane

seJe

w_i

raqi

Iran

ian_

Ban

dari

Iran

ian

Jew

_Ira

nian

Turk

ish

Jew

_Ash

kena

ziC

yprio

tM

alte

seC

anar

y_Is

land

erIta

lian_

Sou

thS

icili

anR

oman

ian

Fre

nchN

Spa

nish

SW

Gre

ekIta

lian_

Nor

thS

pani

shN

EF

renc

hSS

ardi

nian

Bas

que

Orc

adia

nE

nglis

hIc

elan

dic

Nor

weg

ian

Iris

hIr

ish_

Uls

ter

Sco

ttish

She

tland

icG

erm

anS

orb

Pol

ish

Cze

chC

roat

ian

Hun

garia

nA

lban

ian

Bul

garia

nM

ordo

vian

Fin

nish

Rus

sian

Est

onia

nU

krai

nian

Bel

arus

ian

Lith

uani

anA

rmen

ian

Jew

_Geo

rgia

nG

eorg

ian

Abk

hasi

anA

dyge

iN

orth

_Oss

etia

nC

hech

enLe

zgin

Kum

ykB

alka

rN

ogai

Mak

rani

Bra

hui

Bal

ochi

Jew

_Coc

hin

Tajik

Kal

ash

Pat

han

Sin

dhi

Bur

usho

Bra

hmin

_Tiw

ari

Guj

arat

iP

unja

biV

ishw

abra

hmin

Lodh

iM

ala

Ben

gali

Kha

riaO

nge

Chu

vash

Turk

men

Uzb

ekH

azar

aU

ygur

Tuba

lar

Man

siS

elku

pK

yrgy

zA

ltaia

nTu

vini

anN

gana

san

Dol

gan

Yaku

tX

ibo

Hez

hen

Oro

qen

Ulc

hiE

ven

Yuka

gir

Itelm

enK

orya

kC

hukc

hiE

skim

oA

leut

Ale

ut_T

lingi

tK

usun

daK

alm

ykB

urm

ese

Cam

bodi

anT

hai

TuM

ongo

laD

aur

Kin

hV

ietn

ames

eD

aiLa

huN

axi

Yi

Tujia

Han

Mia

oS

heJa

pane

seK

orea

nA

mi

Ata

yal

Kan

kana

eyM

urut

Dus

unIlo

cano

Taga

log

Vis

ayan

Baj

oM

alay

Lebb

oC

hipe

wya

nC

ree

Alg

onqu

inO

jibw

aP

ima

Mix

eM

ixte

cZ

apot

ecM

ayan

Inga

Kaq

chik

elC

abec

arP

iapo

coK

ariti

ana

Sur

uiB

oliv

ian

Aym

ara

Que

chua

Gua

rani

Chi

lote

Mus

sau

Sap

osa

Buk

aTe

opT

igak

Aus

tral

ian

Kov

eM

elam

ela

Nak

anai

_Bile

kiM

angs

eng

Man

usN

asoi

Nai

likN

otsi

SW

_Bou

gain

ville

Kuo

t_K

abil

Kuo

t_La

mal

aua

Lavo

ngai

Mad

akTo

lai

Men

gen

Mam

usi

Mam

usi_

Pal

eabu

Nak

anai

_Los

oA

taS

ulka

Kol

_New

_Brit

ain

Bai

ning

_Mal

asai

tB

aini

ng_M

arab

uP

apua

n

Ju_hoan_SouthJu_hoan_North

Taa_WestTaa_East

Taa_NorthNaro

GuiHoanXuunGana

TshwaKhomani

NamaHaiom

KgalagadiShuaKhweMbutiBiaka

BantuSATswanaDamara

HimbaWamboYoruba

EsanMende

BantuKenyaLuhya

MandenkaGambian

LuoDinka

KikuyuHadza

SandaweMasaiDatog

SomaliOromo

Jew_EthiopianSaharawi

MoroccanMozabiteAlgerianTunisian

LibyanEgyptian

YemeniJew_Libyan

Jew_TunisianJew_MoroccanJew_Yemenite

SaudiBedouinBBedouinA

PalestinianJordanian

SyrianLebanese_Muslim

Lebanese_ChristianJew_Turkish

AssyrianDruze

LebaneseJew_iraqi

Iranian_BandariIranian

Jew_IranianTurkish

Jew_AshkenaziCypriotMaltese

Canary_IslanderItalian_South

SicilianRomanian

FrenchNSpanishSW

GreekItalian_North

SpanishNEFrenchS

SardinianBasque

OrcadianEnglish

IcelandicNorwegian

IrishIrish_Ulster

ScottishShetlandic

GermanSorb

PolishCzech

CroatianHungarian

AlbanianBulgarian

MordovianFinnish

RussianEstonian

UkrainianBelarusianLithuanianArmenian

Jew_GeorgianGeorgian

AbkhasianAdygei

North_OssetianChechen

LezginKumykBalkarNogai

MakraniBrahui

BalochiJew_Cochin

TajikKalashPathanSindhi

BurushoBrahmin_Tiwari

GujaratiPunjabi

VishwabrahminLodhiMala

BengaliKhariaOnge

ChuvashTurkmen

UzbekHazaraUygur

TubalarMansi

SelkupKyrgyzAltaian

TuvinianNganasan

DolganYakutXibo

HezhenOroqen

UlchiEven

YukagirItelmenKoryak

ChukchiEskimo

AleutAleut_Tlingit

KusundaKalmyk

BurmeseCambodian

ThaiTu

MongolaDaurKinh

VietnameseDai

LahuNaxi

YiTujiaHan

MiaoShe

JapaneseKorean

AmiAtayal

KankanaeyMurut

DusunIlocanoTagalogVisayan

BajoMalayLebbo

ChipewyanCree

AlgonquinOjibwa

PimaMixe

MixtecZapotec

MayanInga

KaqchikelCabecarPiapoco

KaritianaSurui

BolivianAymara

QuechuaGuaraniChilote

MussauSaposa

BukaTeop

TigakAustralian

KoveMelamela

Nakanai_BilekiMangseng

ManusNasoiNailikNotsi

SW_BougainvilleKuot_Kabil

Kuot_LamalauaLavongai

MadakTolai

MengenMamusi

Mamusi_PaleabuNakanai_Loso

AtaSulka

Kol_New_BritainBaining_MalasaitBaining_Marabu

Papuan

SAfrica MAfrica NAfrica MiddleEast Europe Caucasus SAsia NAsia EAsia Americas Oceania

SA

fric

aM

Afr

ica

NA

fric

aM

iddl

eEas

tE

urop

eC

auca

sus

SA

sia

NA

sia

EA

sia

Am

eric

asO

cean

ia

00.

10.

20.

3

Kin

ship

Indi

vidu

als

Our new kinshipestimatesGenotypes from “Human Origins”(Lazaridis et al. 2014, 2016;Skoglund et al. 2016)

Edited from Ephert [CC BY-SA 3.0], viaWikimedia Commons

*Inbreeding coeffs. on diagonal

10 / 35

Page 26: ll l l 0.4 inarbitrarypopulationstructures 0

Ju_h

oan_

Sou

thJu

_hoa

n_N

orth

Taa_

Wes

tTa

a_E

ast

Taa_

Nor

thN

aro

Gui

Hoa

nX

uun

Gan

aT

shw

aK

hom

ani

Nam

aH

aiom

Kga

laga

diS

hua

Khw

eM

buti

Bia

kaB

antu

SA

Tsw

ana

Dam

ara

Him

baW

ambo

Yoru

baE

san

Men

deB

antu

Ken

yaLu

hya

Man

denk

aG

ambi

an Luo

Din

kaK

ikuy

uH

adza

San

daw

eM

asai

Dat

ogS

omal

iO

rom

oJe

w_E

thio

pian

Sah

araw

iM

oroc

can

Moz

abite

Alg

eria

nTu

nisi

anLi

byan

Egy

ptia

nYe

men

iJe

w_L

ibya

nJe

w_T

unis

ian

Jew

_Mor

occa

nJe

w_Y

emen

iteS

audi

Bed

ouin

BB

edou

inA

Pal

estin

ian

Jord

ania

nS

yria

nLe

bane

se_M

uslim

Leba

nese

_Chr

istia

nJe

w_T

urki

shA

ssyr

ian

Dru

zeLe

bane

seJe

w_i

raqi

Iran

ian_

Ban

dari

Iran

ian

Jew

_Ira

nian

Turk

ish

Jew

_Ash

kena

ziC

yprio

tM

alte

seC

anar

y_Is

land

erIta

lian_

Sou

thS

icili

anR

oman

ian

Fre

nchN

Spa

nish

SW

Gre

ekIta

lian_

Nor

thS

pani

shN

EF

renc

hSS

ardi

nian

Bas

que

Orc

adia

nE

nglis

hIc

elan

dic

Nor

weg

ian

Iris

hIr

ish_

Uls

ter

Sco

ttish

She

tland

icG

erm

anS

orb

Pol

ish

Cze

chC

roat

ian

Hun

garia

nA

lban

ian

Bul

garia

nM

ordo

vian

Fin

nish

Rus

sian

Est

onia

nU

krai

nian

Bel

arus

ian

Lith

uani

anA

rmen

ian

Jew

_Geo

rgia

nG

eorg

ian

Abk

hasi

anA

dyge

iN

orth

_Oss

etia

nC

hech

enLe

zgin

Kum

ykB

alka

rN

ogai

Mak

rani

Bra

hui

Bal

ochi

Jew

_Coc

hin

Tajik

Kal

ash

Pat

han

Sin

dhi

Bur

usho

Bra

hmin

_Tiw

ari

Guj

arat

iP

unja

biV

ishw

abra

hmin

Lodh

iM

ala

Ben

gali

Kha

riaO

nge

Chu

vash

Turk

men

Uzb

ekH

azar

aU

ygur

Tuba

lar

Man

siS

elku

pK

yrgy

zA

ltaia

nTu

vini

anN

gana

san

Dol

gan

Yaku

tX

ibo

Hez

hen

Oro

qen

Ulc

hiE

ven

Yuka

gir

Itelm

enK

orya

kC

hukc

hiE

skim

oA

leut

Ale

ut_T

lingi

tK

usun

daK

alm

ykB

urm

ese

Cam

bodi

anT

hai

TuM

ongo

laD

aur

Kin

hV

ietn

ames

eD

aiLa

huN

axi

Yi

Tujia

Han

Mia

oS

heJa

pane

seK

orea

nA

mi

Ata

yal

Kan

kana

eyM

urut

Dus

unIlo

cano

Taga

log

Vis

ayan

Baj

oM

alay

Lebb

oC

hipe

wya

nC

ree

Alg

onqu

inO

jibw

aP

ima

Mix

eM

ixte

cZ

apot

ecM

ayan

Inga

Kaq

chik

elC

abec

arP

iapo

coK

ariti

ana

Sur

uiB

oliv

ian

Aym

ara

Que

chua

Gua

rani

Chi

lote

Mus

sau

Sap

osa

Buk

aTe

opT

igak

Aus

tral

ian

Kov

eM

elam

ela

Nak

anai

_Bile

kiM

angs

eng

Man

usN

asoi

Nai

likN

otsi

SW

_Bou

gain

ville

Kuo

t_K

abil

Kuo

t_La

mal

aua

Lavo

ngai

Mad

akTo

lai

Men

gen

Mam

usi

Mam

usi_

Pal

eabu

Nak

anai

_Los

oA

taS

ulka

Kol

_New

_Brit

ain

Bai

ning

_Mal

asai

tB

aini

ng_M

arab

uP

apua

n

Ju_hoan_SouthJu_hoan_North

Taa_WestTaa_East

Taa_NorthNaro

GuiHoanXuunGana

TshwaKhomani

NamaHaiom

KgalagadiShuaKhweMbutiBiaka

BantuSATswanaDamara

HimbaWamboYoruba

EsanMende

BantuKenyaLuhya

MandenkaGambian

LuoDinka

KikuyuHadza

SandaweMasaiDatog

SomaliOromo

Jew_EthiopianSaharawi

MoroccanMozabiteAlgerianTunisian

LibyanEgyptian

YemeniJew_Libyan

Jew_TunisianJew_MoroccanJew_Yemenite

SaudiBedouinBBedouinA

PalestinianJordanian

SyrianLebanese_Muslim

Lebanese_ChristianJew_Turkish

AssyrianDruze

LebaneseJew_iraqi

Iranian_BandariIranian

Jew_IranianTurkish

Jew_AshkenaziCypriotMaltese

Canary_IslanderItalian_South

SicilianRomanian

FrenchNSpanishSW

GreekItalian_North

SpanishNEFrenchS

SardinianBasque

OrcadianEnglish

IcelandicNorwegian

IrishIrish_Ulster

ScottishShetlandic

GermanSorb

PolishCzech

CroatianHungarian

AlbanianBulgarian

MordovianFinnish

RussianEstonian

UkrainianBelarusianLithuanianArmenian

Jew_GeorgianGeorgian

AbkhasianAdygei

North_OssetianChechen

LezginKumykBalkarNogai

MakraniBrahui

BalochiJew_Cochin

TajikKalashPathanSindhi

BurushoBrahmin_Tiwari

GujaratiPunjabi

VishwabrahminLodhiMala

BengaliKhariaOnge

ChuvashTurkmen

UzbekHazaraUygur

TubalarMansi

SelkupKyrgyzAltaian

TuvinianNganasan

DolganYakutXibo

HezhenOroqen

UlchiEven

YukagirItelmenKoryak

ChukchiEskimo

AleutAleut_Tlingit

KusundaKalmyk

BurmeseCambodian

ThaiTu

MongolaDaurKinh

VietnameseDai

LahuNaxi

YiTujiaHan

MiaoShe

JapaneseKorean

AmiAtayal

KankanaeyMurut

DusunIlocanoTagalogVisayan

BajoMalayLebbo

ChipewyanCree

AlgonquinOjibwa

PimaMixe

MixtecZapotec

MayanInga

KaqchikelCabecarPiapoco

KaritianaSurui

BolivianAymara

QuechuaGuaraniChilote

MussauSaposa

BukaTeop

TigakAustralian

KoveMelamela

Nakanai_BilekiMangseng

ManusNasoiNailikNotsi

SW_BougainvilleKuot_Kabil

Kuot_LamalauaLavongai

MadakTolai

MengenMamusi

Mamusi_PaleabuNakanai_Loso

AtaSulka

Kol_New_BritainBaining_MalasaitBaining_Marabu

Papuan

SAfrica MAfrica NAfrica MiddleEast Europe Caucasus SAsia NAsia EAsia Americas Oceania

SA

fric

aM

Afr

ica

NA

fric

aM

iddl

eEas

tE

urop

eC

auca

sus

SA

sia

NA

sia

EA

sia

Am

eric

asO

cean

ia

−0.0

50

0.05

0.1

0.15

Kin

ship

Indi

vidu

als

Standard kinshipestimatesGenotypes from “Human Origins”(Lazaridis et al. 2014, 2016;Skoglund et al. 2016)

Edited from Ephert [CC BY-SA 3.0], viaWikimedia Commons

11 / 35

Page 27: ll l l 0.4 inarbitrarypopulationstructures 0

Only our new estimator is accurate in simulations

True KinshipA New estimateB Standard est.C

00.

10.

2

Kin

ship

Indi

vidu

als

12 / 35

Page 28: ll l l 0.4 inarbitrarypopulationstructures 0

Population-level inbreeding increases with distance from Africa

●●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

● ● ●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●● ●

●●

● ●

0.2

0.3

0.4

Inbr

eedi

ng c

oeff.

13 / 35

Page 29: ll l l 0.4 inarbitrarypopulationstructures 0

Differentiation (FST) previously underestimated

0.0 0.1 0.2 0.3 0.4 0.5

020

40

Inbreeding Coefficient

Den

sity

SAfricaMAfricaNAfricaMiddleEastEuropeCaucasusSAsiaNAsiaEAsiaAmericasOceania

Subpopulations

SAfricaMAfricaNAfricaMiddleEastEuropeCaucasusSAsiaNAsiaEAsiaAmericasOceania

FST estimates

BayeScanWCHudsonKNew

14 / 35

Page 30: ll l l 0.4 inarbitrarypopulationstructures 0

Only our new method estimates generalized FST accurately

0.00

0.05

0.10

Ind. Subpops.A

Wei

r−C

ocke

rham

Hud

sonK

Bay

eSca

n

Sta

ndar

dK

insh

ip

New

− − −−

− AdmixtureB

Wei

r−C

ocke

rham

Hud

sonK

Bay

eSca

n

Sta

ndar

dK

insh

ip

New

− −−

True FST

FSTindep limit

Estimates−

FS

T e

stim

ate

FST estimator

15 / 35

Page 31: ll l l 0.4 inarbitrarypopulationstructures 0

Recently-admixed populations

African-Americans

Baharian et al. (2016)

Hispanics

Moreno-Estrada et al. (2013)16 / 35

Page 32: ll l l 0.4 inarbitrarypopulationstructures 0

Admixed siblings from different populations?

Lucy and Maria, UK

Ochoa brothers, MX

High Admixture LD:

Moreno-Estrada et al. (2013)

Solution: treat every individual as its own population!

17 / 35

Page 33: ll l l 0.4 inarbitrarypopulationstructures 0

Admixed siblings from different populations?

Lucy and Maria, UK

Ochoa brothers, MX

High Admixture LD:

Moreno-Estrada et al. (2013)

Solution: treat every individual as its own population!

17 / 35

Page 34: ll l l 0.4 inarbitrarypopulationstructures 0

Admixed siblings from different populations?

Lucy and Maria, UK

Ochoa brothers, MX

High Admixture LD:

Moreno-Estrada et al. (2013)

Solution: treat every individual as its own population!

17 / 35

Page 35: ll l l 0.4 inarbitrarypopulationstructures 0

Admixed siblings from different populations?

Lucy and Maria, UK

Ochoa brothers, MX

High Admixture LD:

Moreno-Estrada et al. (2013)

Solution: treat every individual as its own population!17 / 35

Page 36: ll l l 0.4 inarbitrarypopulationstructures 0

Dataset: 1000 Genomes Project (2013)

● ●

●●

●●

●●

ASW

ACB

ESN

GWD

LWK

MSLYRI

GBR

FIN

IBS

TSI

CEU

BEB

GIH

ITU

PJL

STU

CDX

CHB

JPT

KHV

CHS

CLM

MXL

PEL

PUR

Subpopulations

SAfricaEuropeSAsiaEAsiaAmr

2,504 indivs. from 26 locs. — 20,417,698 loci (asc. in YRI) — WGS trios, etc.

18 / 35

Page 37: ll l l 0.4 inarbitrarypopulationstructures 0

00.

10.

20.

30.

4

Kin

ship

Indi

vidu

als

0.0

0.5

1.0

Individuals

Anc

estr

y fr

ac.

x

PU

RC

LMP

EL

MX

L

Pop

ulat

ion

x

AF

RA

MR

EU

R

Anc

estr

y

Kinship driven byadmixture in Hispanics

Our new kinship estimates

Genotypes from the 1000 Genomes Project (2013)

19 / 35

Page 38: ll l l 0.4 inarbitrarypopulationstructures 0

Comparison of population structures in simulation

T

S1

f TS1

S2

f TS2

...SK

f TSK

Indep. Subpops.T

S1 S2 ... SK

A1 A2 ... An

Admixture

00.

10.

2

Kin

ship

Indi

vidu

als

20 / 35

Page 39: ll l l 0.4 inarbitrarypopulationstructures 0

FST in the independent subpopulation model

●●●

●●

0.4

0.6

Alle

le fr

eque

ncy

Illustration.

FST =Var(pSi∣∣T)

pTi(1− pTi

) .Here FST = proportion of variance explained by pop. structure

21 / 35

Page 40: ll l l 0.4 inarbitrarypopulationstructures 0

FST in the independent subpopulation model

●●●

●●

0.4

0.6

Alle

le fr

eque

ncy

Illustration.

FST =Var(pSi∣∣T)

pTi(1− pTi

) .Here FST = proportion of variance explained by pop. structure

21 / 35

Page 41: ll l l 0.4 inarbitrarypopulationstructures 0

Wright’s FST

T = Total, S = Subpopulation, I = Individual.

Total inbreeding: FIT =1|S |∑j∈S

f Tj ,

Local inbreeding: FIS =1|S |∑j∈S

f Sj ,

Structural inbreeding: FST =FIT − FIS1− FIS

.

22 / 35

Page 42: ll l l 0.4 inarbitrarypopulationstructures 0

Our generalized FSTNeed new “local” subpopulations Lj (separates total from local inbreeding):(

1− f Tj)

=(1− f

Ljj

)(1− f TLj

).

Generalized FST: applicable to arbitrary population structures, equals previousdefinition for non-overlapping subpopulations:

FST =n∑

j=1

wj fTLj.

Mean heterozygosity in a structured population:

H̄i =1n

n∑j=1

Pr(xij = 1|T ) = 2pTi(1− pTi

)(1− FST) .

23 / 35

Page 43: ll l l 0.4 inarbitrarypopulationstructures 0

Our generalized FSTNeed new “local” subpopulations Lj (separates total from local inbreeding):(

1− f Tj)

=(1− f

Ljj

)(1− f TLj

).

Generalized FST: applicable to arbitrary population structures, equals previousdefinition for non-overlapping subpopulations:

FST =n∑

j=1

wj fTLj.

Mean heterozygosity in a structured population:

H̄i =1n

n∑j=1

Pr(xij = 1|T ) = 2pTi(1− pTi

)(1− FST) .

23 / 35

Page 44: ll l l 0.4 inarbitrarypopulationstructures 0

Our generalized FSTNeed new “local” subpopulations Lj (separates total from local inbreeding):(

1− f Tj)

=(1− f

Ljj

)(1− f TLj

).

Generalized FST: applicable to arbitrary population structures, equals previousdefinition for non-overlapping subpopulations:

FST =n∑

j=1

wj fTLj.

Mean heterozygosity in a structured population:

H̄i =1n

n∑j=1

Pr(xij = 1|T ) = 2pTi(1− pTi

)(1− FST) .

23 / 35

Page 45: ll l l 0.4 inarbitrarypopulationstructures 0

FST measures population structure / differentiation

●●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

● ● ●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●● ●

●●

● ●

0.2

0.4

Alle

le fr

eque

ncy

Median diff. SNP in Human Origins (rs2650044; given MAF ≥ 10%).

F̂WCST ≈ 0.0961 using Weir-Cockerham estimator and K = 244.

24 / 35

Page 46: ll l l 0.4 inarbitrarypopulationstructures 0

FST measures population structure / differentiation

●●

●●

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

●● ●

●●

●●

●●

●●

● ●

● ● ●

●●

●●

●●

● ●

●●

●●●

●●

●●

●●

●● ●

●●

● ●

0.2

0.4

Alle

le fr

eque

ncy

Median diff. SNP in Human Origins (rs2650044; given MAF ≥ 10%).

F̂WCST ≈ 0.0961 using Weir-Cockerham estimator and K = 244.

24 / 35

Page 47: ll l l 0.4 inarbitrarypopulationstructures 0

Comparison of population structures in simulation

T

S1

f TS1

S2

f TS2

...SK

f TSK

Indep. Subpops.T

S1 S2 ... SK

A1 A2 ... An

Admixture

00.

10.

2

Kin

ship

Indi

vidu

als

25 / 35

Page 48: ll l l 0.4 inarbitrarypopulationstructures 0

Our admixture simulation (R package ‘bnpsd’ on CRAN)Intermediate subpop. diff.

FS

T

01

A

0.0

0.2

Intermediate subpop. spread

dens

ity

B

Admixture proportions

ance

stry

01

C

Discrete subpop. approx.

ance

stry

01

D

position in 1D geography

26 / 35

Page 49: ll l l 0.4 inarbitrarypopulationstructures 0

Kinship model for genotypes

symbol meaningT ref ancestral populationi locus index

j , k individual indexespTi ref allele frequencyxij genotype (num ref alleles)ϕTjk kinship of j , k

f Tj inbreeding of j

Statistical model:

E[xij |T ] = 2pTi ,

Var(xij |T ) = 2pTi(1− pTi

)(1 + f Tj ),

Cov(xij , xik |T ) = 4pTi(1− pTi

)ϕTjk .

(Wright 1921, 1951; Malécot 1948; Jacquard 1970).

27 / 35

Page 50: ll l l 0.4 inarbitrarypopulationstructures 0

Problem: common estimators not consistent under structure

Estimate of ancestral allele frequency:

p̂Ti =12

n∑j=1

wjxij

Variance asymptotically non-zero under population structure:

Var(p̂Ti∣∣T) = pTi

(1− pTi

)ϕ̄T

(for independent individuals ϕ̄T = 12n(1 + FST). ⇒ nEff ≈ 4 in Human Origins!)

Therefore, naive estimators that use p̂Ti (next) are not consistent!

28 / 35

Page 51: ll l l 0.4 inarbitrarypopulationstructures 0

Problem: common estimators not consistent under structure

Estimate of ancestral allele frequency:

p̂Ti =12

n∑j=1

wjxij

Variance asymptotically non-zero under population structure:

Var(p̂Ti∣∣T) = pTi

(1− pTi

)ϕ̄T

(for independent individuals ϕ̄T = 12n(1 + FST). ⇒ nEff ≈ 4 in Human Origins!)

Therefore, naive estimators that use p̂Ti (next) are not consistent!

28 / 35

Page 52: ll l l 0.4 inarbitrarypopulationstructures 0

Problem: common estimators not consistent under structure

Estimate of ancestral allele frequency:

p̂Ti =12

n∑j=1

wjxij

Variance asymptotically non-zero under population structure:

Var(p̂Ti∣∣T) = pTi

(1− pTi

)ϕ̄T

(for independent individuals ϕ̄T = 12n(1 + FST). ⇒ nEff ≈ 4 in Human Origins!)

Therefore, naive estimators that use p̂Ti (next) are not consistent!

28 / 35

Page 53: ll l l 0.4 inarbitrarypopulationstructures 0

Bias in standard kinship estimator

ϕ̂T ,stdjk =

m∑i=1

(xij − 2p̂Ti

) (xik − 2p̂Ti

)4

m∑i=1

p̂Ti(1− p̂Ti

) , p̂Ti =12

n∑j=1

wjxij .

Bias varies by j , k :

ϕ̂T ,stdjk

a.s.−−−→m→∞

ϕTjk − ϕ̄T

j − ϕ̄Tk + ϕ̄T

1− ϕ̄T.

True KinshipA New estimateB Standard estimateC Limit of standardD

00.

10.

2

Kin

ship

Indi

vidu

als

29 / 35

Page 54: ll l l 0.4 inarbitrarypopulationstructures 0

Bias in standard kinship estimator

ϕ̂T ,stdjk =

m∑i=1

(xij − 2p̂Ti

) (xik − 2p̂Ti

)4

m∑i=1

p̂Ti(1− p̂Ti

) , p̂Ti =12

n∑j=1

wjxij .

Bias varies by j , k :

ϕ̂T ,stdjk

a.s.−−−→m→∞

ϕTjk − ϕ̄T

j − ϕ̄Tk + ϕ̄T

1− ϕ̄T.

True KinshipA New estimateB Standard estimateC Limit of standardD

00.

10.

2

Kin

ship

Indi

vidu

als

29 / 35

Page 55: ll l l 0.4 inarbitrarypopulationstructures 0

Bias in standard kinship estimator

ϕ̂T ,stdjk =

m∑i=1

(xij − 2p̂Ti

) (xik − 2p̂Ti

)4

m∑i=1

p̂Ti(1− p̂Ti

) , p̂Ti =12

n∑j=1

wjxij .

Bias varies by j , k :

ϕ̂T ,stdjk

a.s.−−−→m→∞

ϕTjk − ϕ̄T

j − ϕ̄Tk + ϕ̄T

1− ϕ̄T.

True KinshipA New estimateB Standard estimateC Limit of standardD

00.

10.

2

Kin

ship

Indi

vidu

als

29 / 35

Page 56: ll l l 0.4 inarbitrarypopulationstructures 0

Our new estimator (R package ‘popkin’ on CRAN)

Step 1: “pre-adjusted” kinship estimator with uniform bias.

ϕ̂T ,preadjjk =

m∑i=1

(xij − 1)(xik − 1)− 1

4m∑i=1

p̂Ti(1− p̂Ti

) + 1 a.s.−−−→m→∞

ϕTjk − ϕ̄T

1− ϕ̄T,

Step 2: Estimate minimum kinship, use to unbias “step 1” estimates.

ϕ̂T ,preadjmin

a.s.−−−→m→∞

− ϕ̄T

1− ϕ̄T, ϕ̂T ,new

jk =ϕ̂T ,preadjjk − ϕ̂T ,preadj

min

1− ϕ̂T ,preadjmin

a.s.−−−→m→∞

ϕTjk .

This yields consistent f̂ T ,newj , F̂ new

ST estimators!

30 / 35

Page 57: ll l l 0.4 inarbitrarypopulationstructures 0

Our new estimator (R package ‘popkin’ on CRAN)

Step 1: “pre-adjusted” kinship estimator with uniform bias.

ϕ̂T ,preadjjk =

m∑i=1

(xij − 1)(xik − 1)− 1

4m∑i=1

p̂Ti(1− p̂Ti

) + 1 a.s.−−−→m→∞

ϕTjk − ϕ̄T

1− ϕ̄T,

Step 2: Estimate minimum kinship, use to unbias “step 1” estimates.

ϕ̂T ,preadjmin

a.s.−−−→m→∞

− ϕ̄T

1− ϕ̄T, ϕ̂T ,new

jk =ϕ̂T ,preadjjk − ϕ̂T ,preadj

min

1− ϕ̂T ,preadjmin

a.s.−−−→m→∞

ϕTjk .

This yields consistent f̂ T ,newj , F̂ new

ST estimators!

30 / 35

Page 58: ll l l 0.4 inarbitrarypopulationstructures 0

Performance of new estimator

True KinshipA New estimateB Standard estimateC Limit of standardD

00.

10.

2

Kin

ship

Indi

vidu

als

31 / 35

Page 59: ll l l 0.4 inarbitrarypopulationstructures 0

Bias in FST estimators for independent subpopulationsPrevious estimator for n subpopulations, simplified for known AFs (πij):

F̂ indepST =

m∑i=1

σ̂2i

m∑i=1

p̂Ti(1− p̂Ti

)+ 1

nσ̂2i

,

p̂Ti =1n

n∑j=1

πij , σ̂2i =1

n − 1

n∑j=1

(πij − p̂Ti

)2.

Estimator is biased in dependent subpopulations:

F̂ indepST

a.s.−−−→m→∞

FST − 1n−1

(nθ̄T − FST

)1− 1

n−1

(nθ̄T − FST

) .

32 / 35

Page 60: ll l l 0.4 inarbitrarypopulationstructures 0

Bias in FST estimators for independent subpopulationsPrevious estimator for n subpopulations, simplified for known AFs (πij):

F̂ indepST =

m∑i=1

σ̂2i

m∑i=1

p̂Ti(1− p̂Ti

)+ 1

nσ̂2i

,

p̂Ti =1n

n∑j=1

πij , σ̂2i =1

n − 1

n∑j=1

(πij − p̂Ti

)2.

Estimator is biased in dependent subpopulations:

F̂ indepST

a.s.−−−→m→∞

FST − 1n−1

(nθ̄T − FST

)1− 1

n−1

(nθ̄T − FST

) .

32 / 35

Page 61: ll l l 0.4 inarbitrarypopulationstructures 0

Only our new method estimates generalized FST accurately

0.00

0.05

0.10

Ind. Subpops.A

Wei

r−C

ocke

rham

Hud

sonK

Bay

eSca

n

Sta

ndar

dK

insh

ip

New

− − −−

− AdmixtureB

Wei

r−C

ocke

rham

Hud

sonK

Bay

eSca

n

Sta

ndar

dK

insh

ip

New

− −−

True FST

FSTindep limit

Estimates−

FS

T e

stim

ate

FST estimator

33 / 35

Page 62: ll l l 0.4 inarbitrarypopulationstructures 0

The future: improved kinship has repercussions across genetics!

Accurate andefficientestimation,admixturemodeling

Associationstudies, selectiontests

Bias inheritability ofcomplex traits

Animal and plantbreeding

34 / 35

Page 63: ll l l 0.4 inarbitrarypopulationstructures 0

Acknowledgments

John D. StoreyAndrew BassIrineo CabrerosWei HaoRiley Skeen-Gaar

Neo Christopher ChungUniversity of Warsaw

Funding:National Institutes of HealthOtsuka Pharmaceutical

Lewis-Sigler Institute for Integrative Genomics

35 / 35