Investigatingthepotentialofancestralstate ...gjaeger/slides/slidesLeiden2015.pdf · testa cap...

61
Investigating the potential of ancestral state reconstruction algorithms in historical linguistics Gerhard Jäger & Johann-Mattis List Tübingen University & CRLAO / Team AIRE, Paris Capturing Phylogenetic Algorithms for Linguistics, Leiden October 28, 2015 Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 1 / 42

Transcript of Investigatingthepotentialofancestralstate ...gjaeger/slides/slidesLeiden2015.pdf · testa cap...

Investigating the potential of ancestral statereconstruction algorithms in historical linguistics

Gerhard Jäger & Johann-Mattis List

Tübingen University & CRLAO / Team AIRE, Paris

Capturing Phylogenetic Algorithms for Linguistics, Leiden

October 28, 2015

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 1 / 42

Introduction

What is Ancestral State Reconstruction?

While tree-building methods seek to find branching diagrams whichexplain how a language family has evolved, ASR methods use thebranching diagrams in order to explain what has evolved concretely.Ancestral state reconstruction is very common in evolutionary biologybut only spuriously practiced in computational historical linguistics(Bouchard-Côté et al. 2013).In classical historical linguistics, on the other hand, linguisticreconstruction of proto-forms and proto-meanings is very common andone of the main goals of the classical comparative method (Fox 1995).

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 2 / 42

Introduction

ASR of Lexical Replacement Patterns

If we look for words corresponding to one meaning in a wordlist andknow which of the words are cognate or not, we may ask which of theword forms was the most likely candidate to be used in theproto-language of all descendant languages.This question resembles the task of “semantic reconstruction”, but incontrast to classical semantic reconstruction, we are only operatingwithin one concept slot here, disregarding all words with a differentmeaning which may also be cognate with the words in our sample.As a result of this restriction, it is quite likely that we cannot recoverthe original form from our data.It is, however, very interesting to see to which degree we can proposea good candidate word form (cognate set) for the proto-language.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 3 / 42

Introduction

ASR of Lexical Replacement Patterns

Kopf"head"

kop"head"

head"head"

tête"head"

testa"head"

cap"head"

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42

Introduction

ASR of Lexical Replacement Patterns

Kopf"head"

kop"head"

head"head"

tête"head"

testa"head"

cap"head"

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42

Introduction

ASR of Lexical Replacement Patterns

Kopf"head"

kop"head"

head"head"

tête"head"

testa"head"

cap"head"

"head"?

?

?

?

?

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42

Introduction

ASR of Lexical Replacement Patterns

Kopf"head"

kop"head"

head"head"

tête"head"

testa"head"

cap"head"

*kop"head"

testa"head"

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42

Introduction

ASR of Lexical Replacement Patterns

Kopf"head"

kop"head"

head"head"

tête"head"

testa"head"

cap"head"

*kop"head"

*haubud-"head"

testa"head"

caput"head"

*kaput-"head"

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 4 / 42

Introduction

This talk

reconstruction of cognate class at the root

AA B BC C

?

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 5 / 42

Introduction

This talk

reconstruction of cognate class at the root

AA B BC C

B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 5 / 42

Materials and Methods Materials

Data

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 6 / 42

Materials and Methods Materials

Data

IELex

153 Indo-European doculects

207 concepts

entries for Proto-Indo-Europeanfor 135 concepts → used asgold standard

arbitrarily split into training setand test set:

training set: 67 concepts,1127 cognate classes (83occur in PIE)test set: 68 concepts, 957cognate classes (79 fromPIE)

ABVD

743 Austronesian doculects →100 were selected at random

210 concepts; for 154 of thementries for Proto-Austronesian

split into training set and testset:

training set: 81 concepts,1695 cognate classes (88occur in PAn)test set: 74 concepts,1584 cognate classes (79occur in PAn)

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 7 / 42

Materials and Methods Methods

Prerequisites: Trees

Treestrees were inferred with fulldata set (training + testdata) via Bayesian inference

IELex outgroup: AnatolianABVD outgroup:Malayo-Polynesian

random samples of 1000trees from posteriordistributionsmaximum clade credibilitytrees

600.0

Kashmiri

Upper_Sorbian

Lahnda

Old_High_German

Sariqoli

Stavangersk

Pennsylvania_Dutch

Urdu

Old_Norse

Polish

Bulgarian

Old_Swedish

Portuguese_St

Greek_Mod

Hitt i te

Oriya

Panjabi_St

Ashkun

Romansh

Prasun

Luvian

Irish_A

Tocharian_A

Classical_Armenian

GaulishOld_Irish

Old_Gutnish

Gujarati

Swedish_Vl

Standard_German_Munich

Serbian

Norwegian

Latvian

Wakhi

Frisian

Greek_Md

Bulgarian_P

Khaskura

Czech_E

Polish_P

Kati

Sardinian_N

Digor_Ossetic

French

Danish

Standard_Albanian

Brazilian

Ladin

Ossetic

Manx

Albanian_K

Magahi

Marathi

Sardinian_L

Old_Prussian

Rumanian_List

Slovak_P

Albanian_Top

Albanian_T

Waziri

German

Greek_D

Byelorussian

Oscan

Hindi

Vlach

Vedic_Sanskrit

Shughni

Schwyzerduetsch

Breton_List

Old_Welsh

Macedonian

Slovenian

Albanian_C

Provencal

Serbocroatian

Breton_Se

Persian

Lithuanian_O

Baluchi

Ancient_Greek

Slovak

Catalan

Gaelic_Scots

Serbocroatian_P

Czech

Icelandic_St

Albanian_G

Gothic

Lithuanian_St

Dolomite_Ladino

Latin

Ukrainian

Marwari

Gypsy_Gk

Avestan

Swedish

Welsh_N

Macedonian_P

Greek_K

Tocharian_B

Oevdalian

Armenian_List

Old_Breton

Flemish

Old_English

Swedish_Up

Bihari

Welsh_C

Sindhi

Italian

Bhojpuri

Old_Persian

Byelorussian_P

Afrikaans

Friulian

Faroese

Gutnish_Lau

Tadzik

Sardinian_C

Old_Cornish

Palaic

Czech_P

Ukrainian_P

Irish_B

Dutch_List

Singhalese

Russian

Cornish

Lower_Sorbian

Assamese

Russian_P

Greek_Ml

Nepali

English

Kurdish

Breton_St

Sogdian

Letzebuergesch

Spanish

Danish_Fjolde

Pashto

Umbrian

Zazaki

Iron_Ossetic

Old_Church_Slavonic

Lycian

Walloon

Armenian_Mod

Slovenian_P

Albanian

Tsakonian

Bengali

0.06

FijianBau

Isamorong

KwaraaeSolomonIslands

Cebuano

LampungApiKalianda

Lampung

KomeringIlirPalauGemantungVillage

Tagalog

Ivasay

EastSumbaneseUmbuRatuNggaidialect

Carolinian

LampungApiKrui

Anakalang

LampungApiBelalau

LampungNyoMenggalaTulangBawang

Melayu

KakidugenIlongot

Komering

KomeringUluPerjayaVillage

Kerinci

TetunTerikFehandialect

Surigaonon

Woleai

LampungApiDaya

Mamboru

Tabar

Marquesan

EastSumbaneseLewadialect

Maori

Tongan

Tolo

CiuliAtayalBandai

Rarotongan

BlablangaGhove

LampungApiSungkai

GhariTandai

TahitianModern

LampungNyoAbungKotabumi

Tuamotu

Babuyan

Rurutuan

MalayBahasaIndonesia

Saa

Imorod

PaiwanKulalao

Niue

KomeringKayuAgungAsli

Blablanga

FutunaEast

TaliseMalagheti

Ogan

Indonesian

MaringeKmagha

Toambaita

Itbayat

LampungApiTalangPadang

KilokakaYsabel

Yami

ManoboAtaupriver

DayakNgaju

Masiwang

Luangiua

LampungApiJabung

Lau

KomeringUluAdumanisVillage

Tikopia

NakanaiBilekiDialect

Neveei

Sengga

Iraralay

ManoboAtadownriver

Itbayaten

LampungApiPubian

Pukapuka

Talise

SquliqAtayal

TannaSouthwest

LampungNyoAbungSukadana

KomeringUluDamarpuraVillage

Hawaiian

Katingan

LampungApiSukau

WesternBukidnonManobo

Chuukese

TagalogAnthonydelaPaz

LampungApiWayKanan

Samoan

EastSumbaneseKamberaSoutherndialect

Kokota

Lakalai

LampungApiKotaAgung

Penrhyn

BabatanaKatazi

Sikaiana

GhariNggeri

Kambera

Luqa

LampungApiRanau

Rennellese

Kubokota

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 8 / 42

Materials and Methods Methods

Phylogenetic uncertainty

proper way to deal with it:work with posterior samplerather than with a single treepoor man’s method:

remove all short branches(shorter than somethreshold)do ASR with resultingmultifurcating tree

PrasunAshkunKatiSogdianOsseticDigor_OsseticIron_OsseticPashtoWaziri

BaluchiKurdishZazakiTadzikPersian

WakhiShughniSariqoli

Old_PersianAvestan

Vedic_SanskritKashmiriNepaliKhaskura

BengaliAssameseOriya

BihariGujaratiMarathi

SindhiMarwari

HindiUrdu

LahndaPanjabi_St

BhojpuriMagahi

Gypsy_GkSinghalese

Old_PrussianLatvianLithuanian_OLithuanian_St

Old_Church_SlavonicSerbocroatianSerbianSerbocroatian_P

Bulgarian_PBulgarianMacedonianMacedonian_P

SlovenianSlovenian_P

RussianRussian_PUkrainian_P

Byelorussian_PByelorussian

PolishUkrainian

Polish_PUpper_SorbianLower_Sorbian

CzechSlovakCzech_ESlovak_PCzech_P

GothicGerman

Standard_German_MunichPennsylvania_Dutch

SchwyzerduetschLetzebuergesch

FrisianAfrikaans

FlemishDutch_List

Old_High_GermanOld_English

EnglishOld_Gutnish

StavangerskNorwegian

DanishDanish_Fjolde

Gutnish_LauOevdalianSwedish

Swedish_UpSwedish_Vl

Old_SwedishFaroese

Old_NorseIcelandic_St

Old_BretonOld_Cornish

Old_WelshWelsh_CWelsh_N

CornishBreton_St

Breton_SeBreton_List

GaulishOld_Irish

Irish_AIrish_B

Gaelic_ScotsManx

OscanUmbrian

VlachRumanian_List

Dolomite_LadinoRomanshLadinFriulianItalianWalloonFrenchProvencalCatalan

BrazilianPortuguese_StSpanish

Sardinian_LSardinian_CSardinian_N

LatinTocharian_ATocharian_B

Albanian_TStandard_Albanian

AlbanianAlbanian_G

Albanian_TopAlbanian_KAlbanian_C

Ancient_GreekGreek_Mod

Greek_MdGreek_MlGreek_D

TsakonianGreek_K

Classical_ArmenianArmenian_ModArmenian_List

LycianLuvianPalaic

Hittite

100.0

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 9 / 42

Materials and Methods Methods

Coding

Multi-state

AA B BC C

B

Binarized

AA

non-A

non-A non-A non-A non-A

B B

B

non-Bnon-B non-B non-B

Cnon-C Cnon-C non-C non-C

non-C

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 10 / 42

Materials and Methods Methods

Polymorphisms (a.k.a. synonyms)

Kopf"head"

kop"head"

head"head"

tête"head"

testa"head"

cap"head"

Haupt"head"

hoofd"head"

problem for multistatecodingpossible representations:

epistemic: bothobservations have 50%(subjective) probabilitylifted model: states in thetechnical sense are sets ofcognate classes

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 11 / 42

Materials and Methods Methods

Parsimony reconstruction

A C C

B

A B B

A

BB

C

Parsimony = 2

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 12 / 42

Materials and Methods Methods

Parsimony reconstruction

A C CA B B

A

B

C

Parsimony = 3A

A

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 12 / 42

Materials and Methods Methods

Parsimony reconstruction

A C CA B B

AC

Parsimony = 3

A

C

C

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 12 / 42

Materials and Methods Methods

Weighted parsimony reconstruction

A C C

B

A B B

A

BB

C

WeightedParsimony = 3 Weight matrix

A B C

A 0 1 2B 1 0 2C 2 2 0

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 13 / 42

Materials and Methods Methods

Weighted parsimony reconstruction

A C CA B B

A

B

C

A

A

WeightedParsimony = 4 Weight matrix

A B C

A 0 1 2B 1 0 2C 2 2 0

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 13 / 42

Materials and Methods Methods

Weighted parsimony reconstruction

A C CA B B

AC

WeightedParsimony = 5

A

C

C

Weight matrix

A B C

A 0 1 2B 1 0 2C 2 2 0

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 13 / 42

Materials and Methods Methods

Dynamic Programming (Sankoff Algorithm)

wp(mother, s) =∑

d∈daughtersmin

s′∈states(w(s, s′) + wp(d, s′))

A C CA B B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 14 / 42

Materials and Methods Methods

Dynamic Programming (Sankoff Algorithm)

wp(mother, s) =∑

d∈daughtersmin

s′∈states(w(s, s′) + wp(d, s′))

A C CA B B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 14 / 42

Materials and Methods Methods

Dynamic Programming (Sankoff Algorithm)

wp(mother, s) =∑

d∈daughtersmin

s′∈states(w(s, s′) + wp(d, s′))

A C CA B B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 14 / 42

Materials and Methods Methods

Dynamic Programming (Sankoff Algorithm)

wp(mother, s) =∑

d∈daughtersmin

s′∈states(w(s, s′) + wp(d, s′))

A C CA B B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 14 / 42

Materials and Methods Methods

Weighted Parsimony reconstruction

the state with the lowest parsimony score winsin case of ties, frequency at the leafs is tie-breakerbinary characters:

w(0 → 2) = 1;w(1 → 0) = 2

multi-state characters:all weights = 1polymorphism only admitted at tips:

w(a → {a, b}) = 0

w(a → {b, c}) = 1

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 15 / 42

Materials and Methods Methods

The MLN Method for ASR

The MLN method (List et al. 2014a) uses parsimony for ancestralstate reconstruction.In contrast to classical parsimony, MLN tests different weightingschemes for gains and losses and selects the optimal scheme with helpof the vocabulary size criterion.The vocabulary size criterion states that the amount of synonyms perword should be similar in the ancestral and the descendant languages.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 16 / 42

Materials and Methods Methods

The MLN Method for ASR

Too many synonyms in

ancestral nodes!

The vocabulary size criterion states that the amount of synonyms per word(here reflected by the size of the nodes in the tree) should be similar acrossancestral and descendant languages. With help of this criterion, an optimalweighting scheme for gain-loss rates is chosen for individual datasets.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 17 / 42

Materials and Methods Methods

The MLN Method for ASR

Too fewsynonyms in

ancestral nodes!

The vocabulary size criterion states that the amount of synonyms per word(here reflected by the size of the nodes in the tree) should be similar acrossancestral and descendant languages. With help of this criterion, an optimalweighting scheme for gain-loss rates is chosen for individual datasets.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 17 / 42

Materials and Methods Methods

The MLN Method for ASR

Optimal amount of synonyms in

ancestral nodes!

The vocabulary size criterion states that the amount of synonyms per word(here reflected by the size of the nodes in the tree) should be similar acrossancestral and descendant languages. With help of this criterion, an optimalweighting scheme for gain-loss rates is chosen for individual datasets.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 17 / 42

Materials and Methods Methods

Reconstruction on a posterior sample

if a sample of trees is used: A state is reconstructed if it isreconstructed in more than θ trees in the sample. θ is estimated usingthe training set.values:

database method θ

IELex Sankoff/binary 0.690Sankoff/multistate 0.056MLN 0.464

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 18 / 42

Materials and Methods Methods

Likelihood-based reconstruction

logL(tips below|mother = s) =∑d∈daughters

∑s′∈states logP (s → s′|branchlength)+

log(L(tips below d|d = s′))

A C CA B B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 19 / 42

Materials and Methods Methods

Likelihood-based reconstruction

note: likelihoods (unlike parsimony scores) depend on branch lengths!likelihoods at the root give likelihood of a reconstruction, given allobserved data (for that character)total likelihood is obtained by multiplying root state likelihoods withequilibrium probabilities given a rate matrixrate matrix is optimized to maximize likelihood

rates across characters are independently optimizedfor multistate characters, all rates are constrained to be equal(otherwise BayesTraits crashes…)

using equilibrium probabilities, you can derive exptected stateprobabilities for root statesa state is likelihood-reconstructed if its expected probability > θ2

again, threshold θ2 must be estimated from training set

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 20 / 42

Results General Results

Evaluation

0.0

0.2

0.4

0.6

0.8

precision recall F.score

database

ABVD

IELex

0.0

0.2

0.4

0.6

0.8

precision recall F.score

algorithm

ML

MLN

Sankoff

0.0

0.2

0.4

0.6

0.8

precision recall F.score

character type

binary valued

multi−valued

0.0

0.2

0.4

0.6

0.8

precision recall F.score

tree type

bifurcating

multifurcating

0.0

0.2

0.4

0.6

0.8

precision recall F.score

tree sample

posterior sample

summary tree

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 21 / 42

Results General Results

EvaluationIELex

algorithm characters furcating treeSample precision recall F-scoreML binary bifurcating summary tree 0.817 0.734 0.773ML binary bifurcating posterior sample 0.795 0.734 0.763ML binary multifurcating summary tree 0.792 0.722 0.755ML binary multifurcating posterior sample 0.756 0.747 0.752Sankoff binary multifurcating summary tree 0.716 0.734 0.725Sankoff binary bifurcating summary tree 0.704 0.722 0.712Sankoff binary multifurcating posterior sample 0.720 0.684 0.701Sankoff binary bifurcating posterior sample 0.72 0.684 0.701ML multi bifurcating posterior sample 0.642 0.772 0.701MLN multi bifurcating posterior sample 0.743 0.658 0.698MLN binary multifurcating posterior sample 0.743 0.658 0.698MLN binary bifurcating posterior sample 0.743 0.658 0.698Sankoff multi bifurcating summary tree 0.671 0.722 0.695Sankoff multi multifurcating posterior sample 0.671 0.722 0.695Sankoff multi bifurcating posterior sample 0.671 0.722 0.695ML multi multifurcating posterior sample 0.629 0.772 0.693MLN multi multifurcating posterior sample 0.758 0.633 0.690Sankoff multi multifurcating summary tree 0.735 0.633 0.680ML multi multifurcating summary tree 0.735 0.633 0.680ML multi bifurcating summary tree 0.721 0.620 0.667MLN multi multifurcating summary tree 0.584 0.658 0.619MLN binary multifurcating summary tree 0.584 0.658 0.619MLN multi bifurcating summary tree 0.742 0.291 0.418MLN binary bifurcating summary tree 0.742 0.291 0.418

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 22 / 42

Results General Results

EvaluationABVD

algorithm characters furcating treeSample precision recall F-scoreML multi bifurcating posterior sample 0.738 0.747 0.742ML binary bifurcating posterior sample 0.682 0.759 0.719ML multi bifurcating summary tree 0.740 0.684 0.711ML binary bifurcating summary tree 0.757 0.681 0.711Sankoff multi bifurcating summary tree 0.691 0.709 0.700Sankoff binary multifurcating posterior sample 0.781 0.633 0.699ML binary multifurcating posterior sample 0.761 0.646 0.699ML multi multifurcating summary tree 0.726 0.671 0.697Sankoff binary bifurcating posterior sample 0.726 0.671 0.697ML binary multifurcating summary tree 0.732 0.658 0.693Sankoff multi multifurcating summary tree 0.679 0.696 0.688MLN multi bifurcating summary tree 0.655 0.722 0.687MLN binary bifurcating summary tree 0.655 0.722 0.687Sankoff binary bifurcating summary tree 0.629 0.557 0.591Sankoff multi multifurcating posterior sample 0.542 0.570 0.556Sankoff multi bifurcating posterior sample 0.542 0.570 0.556MLN multi multifurcating posterior sample 0.414 0.848 0.556MLN multi bifurcating posterior sample 0.414 0.848 0.556MLN binary multifurcating posterior sample 0.414 0.848 0.556MLN binary bifurcating posterior sample 0.414 0.848 0.556ML multi multifurcating posterior sample 0.421 0.709 0.528Sankoff binary multifurcating summary tree 0.469 0.570 0.514MLN multi multifurcating summary tree 0.667 0.405 0.504MLN binary multifurcating summary tree 0.667 0.405 0.504

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 23 / 42

Results Specific Results

Summary on Indo-European ASR

Error Type GS ASR NumberMissing forms A Ø 7Different forms A B 9Additional forms in ASR A A, B 5Missing root in ASR A, B A 4Summary 25

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 24 / 42

Results Specific Results

Evaluating the DifferencesWe evaluate the differences qualitatively by checking

the reflection of the proposed root in the branches, especially withsemantically shifted word forms which may not occur in the wordlistdata, using standard sources like Meier-Brügger (2002), Wodtko et al.(2008), Rix et al. (2002), and Pokorny (1959) for Indo-European ingeneral, and specific sources like Vaan (2008) for Latin, Derksen(2008) and Vasmer (1986/1987) for Slavic, and Kroonen (2013) forGermanic.the likelihood of semantic shift of the given root with help of theDatabase of Cross-Linguistic Colexifications (CLICS, List et al. 2013and 2014b, http://clics.lingpy.org),whether the cognate sets in the data are really reflexes of theproposed PIE root.

Based on this check, we distinguish four grades of root quality:erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 25 / 42

Results Specific Results

Indo-European ASR: Missing formsConcept Form Meaning in

ReflexesComment

SEE *derḱ- to see Only reflected in Indo-Iranian, cognates also problematic.

SEE *weid- to see or toknow

Safe root for Indo-European.

SING *kan- to sing or therooster

Root is proposed for PIE on the basis of Germanic reflexes meaning “rooster”which is a highly unlikely semantic change

SMELL *h₃ed- to smell Potential root for PIE, but only reflected in Greek and Romance

SMALL *mei- small Wrong cognate judgments in the database, since neither Russian malenkijnor English small go back to this root

THINK *teng- to think or tofeel

Root only reflected in Germanic languages with spurious reflexes in seman-tically shifted form in other branches. A better candidate for PIE would be*men- “the mind or to think”.

WASH *leh₂w- to wash or topour

Wrong cognate assignment in the source since Romance and Albanian re-flexes are not annotated.

WASH *neigʷ- to wash or watermonster

Very unlikely cognate assignment, due to the extreme shift from “to wash”to “water monster” (cf. English nix) in the Germanic languages.

WET *wed- water or wet Semantic change from “water” to “wet” is likely according to CLICS, but itis not clear why this should have already happened in PIE times.

erroneous problematic possible goodJäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 26 / 42

Results Specific Results

Indo-European ASR: Missing formsConcept Form Meaning in

ReflexesComment

SEE *derḱ- to see Only reflected in Indo-Iranian, cognates also problematic.

SEE *weid- to see or toknow

Safe root for Indo-European.

SING *kan- to sing or therooster

Root is proposed for PIE on the basis of Germanic reflexes meaning “rooster”which is a highly unlikely semantic change

SMELL *h₃ed- to smell Potential root for PIE, but only reflected in Greek and Romance

SMALL *mei- small Wrong cognate judgments in the database, since neither Russian malenkijnor English small go back to this root

THINK *teng- to think or tofeel

Root only reflected in Germanic languages with spurious reflexes in seman-tically shifted form in other branches. A better candidate for PIE would be*men- “the mind or to think”.

WASH *leh₂w- to wash or topour

Wrong cognate assignment in the source since Romance and Albanian re-flexes are not annotated.

WASH *neigʷ- to wash or watermonster

Very unlikely PIE root, due to the extreme shift from “to wash” to “watermonster” (cf. English nix) in the Germanic languages.

WET *wed- water or wet Semantic change from “water” to “wet” is likely according to CLICS, but itis not clear why this should have already happened in PIE times.

erroneous problematic possible goodJäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 26 / 42

Results Specific Results

Indo-European ASR: Missing Forms in ASR

Concept Form in GS CommentNOT *meh₁ This form is reflected in Old Greek as a prohibitive negation and also re-

constructed as such. Whether it was the normal negation in PIE is lessclear.

SLEEP *drem This form is mainly reflected in Latin and spuriously in Indian and Greek.It is much more likely that it meant something else in PIE and then shiftedinto this meaning.

VOMIT *h₁rewg- No need to reconstruct this form back to PIE, since it is only reflected intwo languages of Romance.

YEAR *ieHr- This form has only reflexes in Germanic languages. Generally, the meaning“year” is difficult to reconstruct, due to the high potential for shift from“summer”, “winter”, “time”, etc. as shown in CLICS.

erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 27 / 42

Results Specific Results

Indo-European ASR: Missing Forms in ASR

Concept Form in GS CommentNOT *meh₁ This form is reflected in Old Greek as a prohibitive negation and also re-

constructed as such. Whether it was the normal negation in PIE is lessclear.

SLEEP *drem This form is mainly reflected in Latin and spuriously in Indian and Greek.It is much more likely that it meant something else in PIE and then shiftedinto this meaning.

VOMIT *h₁rewg- No need to reconstruct this form back to PIE, since it is only reflected intwo languages of Romance.

YEAR *ieHr- This form has only reflexes in Germanic languages. Generally, the meaning“year” is difficult to reconstruct, due to the high potential for shift from“summer”, “winter”, “time”, etc. as shown in CLICS.

erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 27 / 42

Results Specific Results

Indo-European ASR: Different Forms

Concept GS ASR CommentRIVER *h₂ekʷeh₂ *h₂ep- Form in GS meant “water” in PIE. Although a shift from “water” to “river” is likely

according to CLICS, this meaning is an innovation in Germanic. The ASR form isreflected across multiple branches and a much better candidate.

RUB *melh₁- *terh₁- Form in GS is not reflected in the standard literature (LIV and LIN), form in ASR isreflected in the meaning “to rub, to bore”.

SCRATCH *gerbʰ- *kes- Form in GS is only reflected in few Germanic languages, probably with a wrong cognateassignment. Following Derksen (2008), assuming the GSR form is a much bettercandidate for the PIE word for “scratch”.

SKIN *pel *(s)kewH- Form in GS is a good PIE root, but not necessarily with the meaning “skin”, as themeaning of the reflexes differs greatly. The GSR form derives from a PIE verb meaning“to cover”, but the cognate should not contain Slavic words (Derksen 2008).

WALK *ǵʰeh₁ *h₁ei- The GS form is only reflected in Germanic. The ASR form is a clear PIE root, but themeaning may also have been “to go”.

WATER *h₂ekʷeh₂ *wódr̥ The ASR form is a much better candidate for “water” in PIE, due to its high numberof reflexes in all branches.

WHITE *h₂elbʰós *h₂erǵó- The GS form is only reflected in Romance in this meaning and as meaning “cloud”in Hittite. The ASR form is a much better candidate, with a much more plausibleconnection between reflexes meaning “shine” and “white”, as also confirmed by CLICS.

WORM *wrm̥i- *kʷrm̥is The ASR form is reflected in more different branches of PIE, while the GS form is onlyreflected in Germanic and Romance.

erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 28 / 42

Results Specific Results

Indo-European ASR: Different Forms

Concept GS ASR CommentRIVER *h₂ekʷeh₂ *h₂ep- Form in GS meant “water” in PIE. Although a shift from “water” to “river” is likely

according to CLICS, this meaning is an innovation in Germanic. The ASR form isreflected across multiple branches and a much better candidate.

RUB *melh₁- *terh₁- Form in GS is not reflected in the standard literature (LIV and LIN), form in ASR isreflected in the meaning “to rub, to bore”.

SCRATCH *gerbʰ- *kes- Form in GS is only reflected in few Germanic languages, probably with a wrong cognateassignment. Following Derksen (2008), assuming the GSR form is a much bettercandidate for the PIE word for “scratch”.

SKIN *pel *(s)kewH- Form in GS is a good PIE root, but not necessarily with the meaning “skin”, as themeaning of the reflexes differs greatly. The GSR form derives from a PIE verb meaning“to cover”, but the cognate should not contain Slavic words (Derksen 2008).

WALK *ǵʰeh₁ *h₁ei- The GS form is only reflected in Germanic. The ASR form is a clear PIE root, but themeaning may also have been “to go”.

WATER *h₂ekʷeh₂ *wódr̥ The ASR form is a much better candidate for “water” in PIE, due to its high numberof reflexes in all branches.

WHITE *h₂elbʰós *h₂erǵó- The GS form is only reflected in Romance in this meaning and as meaning “cloud”in Hittite. The ASR form is a much better candidate, with a much more plausibleconnection between reflexes meaning “shine” and “white”, as also confirmed by CLICS.

WORM *wrm̥i- *kʷrm̥is The ASR form is reflected in more different branches of PIE, while the GS form is onlyreflected in Germanic and Romance.

erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 28 / 42

Results Specific Results

Indo-European ASR: Additional Forms

Concept Form in ASR CommentMOON *lewk-s-nh₂ This form would go back to a PIE root meaning “to shine” and is often said

to have independently turned to mean “moon” in Romance and Slavic andother branches. The shift from “shine” to “moon” is however not very likely(no evidence in CLICS), so it is also possible that the word meant already“moon” in PIE as an epithet (Vaan 2008).

SNOW *ǵʰéi-mn̥- The form has probably independently shifted from the original meaning“frost, cold”, which is a very likely shift according to CLICS.

SUCK *suḱ- The root is present in this meaning in many subbranches and a good can-didate for PIE in this meaning.

THIS *so / *to The root is a clear PIE demonstrative (Meier-Brügger 2010), but the reflexesin the daughter languages vary greatly, due to analogical levelling.

WITH *sm̥ A very good candidate for the meaning with reflexes in Greek, Indo-Iranianand Slavic.

erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 29 / 42

Results Specific Results

Indo-European ASR: Additional Forms

Concept Form in ASR CommentMOON *lewk-s-nh₂ This form would go back to a PIE root meaning “to shine” and is often said

to have independently turned to mean “moon” in Romance and Slavic andother branches. The shift from “shine” to “moon” is however not very likely(no evidence in CLICS), so it is also possible that the word meant already“moon” in PIE as an epithet (Vaan 2008).

SNOW *ǵʰéi-mn̥- The form has probably independently shifted from the original meaning“frost, cold”, which is a very likely shift according to CLICS.

SUCK *suḱ- The root is present in this meaning in many subbranches and a good can-didate for PIE in this meaning.

THIS *so / *to The root is a clear PIE demonstrative (Meier-Brügger 2010), but the reflexesin the daughter languages vary greatly, due to analogical levelling.

WITH *sm̥ A very good candidate for the meaning with reflexes in Greek, Indo-Iranianand Slavic.

erroneous problematic possible good

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 29 / 42

Results Specific Results

Evaluation against our manually created goldstandard

precision: 0.986 (1 false positive)recall: 0.895 (8 false negatives)F-score: 0.9381

1The IELex PIE entries have an F-score of 0.854.Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 30 / 42

Results Specific Results

False positive

SogdianOsseticDigor OsseticIron OsseticWakhiShughniSariqoli

BaluchiZazakiTadzikPersianPashtoWaziri

Avestan

Vedic SanskritKashmiri

MarathiNepaliKhaskuraGypsy GkSinghalese

Old PrussianLatvianLithuanian OLithuanian St

Bulgarian PBulgarianMacedonianMacedonian PSerbocroatianSerbianSerbocroatian PSlovenian

Slovenian PRussianRussian PUkrainian P

PolishUkrainianByelorussianByelorussian P

SlovakCzech ECzechSlovak PCzech P

Polish PUpper SorbianLower Sorbian

Old Church Slavonic

Cornish

Breton SeBreton ListBreton StWelsh CWelsh N

Old Irish

Irish AIrish BGaelic Scots

Vlach

Dolomite Ladino

RomanshLadinFriulianItalian

WalloonFrenchProvencal

Catalan

BrazilianPortuguese StSpanish

Sardinian LSardinian C

Latin

Gothic

AfrikaansFlemishDutch ListFrisian

GermanStandard German Munich

SchwyzerduetschLetzebuergeschPennsylvania Dutch

Old High GermanOld English

English

Old NorseIcelandic StFaroese

Old Swedish

StavangerskNorwegian

DanishDanish Fjolde

Gutnish LauOevdalianSwedishSwedish UpSwedish Vl

Albanian T

AlbanianAlbanian GStandard AlbanianAlbanian TopAlbanian KAlbanian C

Ancient Greek

Greek MlGreek DGreek MdGreek ModGreek K

Classical ArmenianArmenian ModArmenian List

●●●●●●●

●●●●●●

●●

●●●●●

●●●●

●●●●●●●●

●●●●

●●●●

●●●●●

●●●

●●●●●

●●●

●●●●

●●●

●●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●●●●

●●●●●●

●●●●●

●●●

snow:D

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 31 / 42

Results Specific Results

False negatives

Kati

SogdianOsseticDigor OsseticIron Ossetic

ZazakiTadzikPersianPashto

Old PersianAvestan

Vedic Sanskrit

HindiPanjabi StSindhiMarwariGujaratiMarathiAssameseOriyaBengaliNepaliKhaskuraSinghalese

Old PrussianLatvianLithuanian OLithuanian St

Bulgarian PBulgarianMacedonianMacedonian PSerbocroatianSerbianSerbocroatian P

Slovenian PRussianRussian PUkrainian P

PolishUkrainianByelorussianByelorussian P

SlovakCzechSlovak PCzech P

Polish PUpper SorbianLower Sorbian

Old Church Slavonic

Cornish

Breton SeBreton ListBreton StWelsh CWelsh N

GaulishOld Irish

Irish AIrish BGaelic Scots

VlachRumanian List

Dolomite Ladino

RomanshLadinFriulianItalian

WalloonFrenchProvencal

Catalan

BrazilianPortuguese StSpanish

Sardinian LSardinian CSardinian N

Latin

Gothic

FlemishFrisianGermanStandard German MunichSchwyzerduetschLetzebuergesch

Old High GermanOld English

Old NorseIcelandic StFaroese

Old Swedish

StavangerskNorwegian

DanishDanish Fjolde

Gutnish LauOevdalianSwedishSwedish UpSwedish Vl

Albanian T

AlbanianAlbanian GStandard AlbanianAlbanian TopAlbanian KAlbanian C

Ancient Greek

Greek MlGreek DGreek MdGreek ModGreek K

Classical ArmenianArmenian ModArmenian List

LuvianHittite

●●●●

●●●●

●●

●●●●●●●●●●●●

●●●●

●●●●●●●

●●●●

●●●●

●●●●

●●●

●●●●●

●●

●●●

●●

●●●●

●●●

●●●

●●●

●●●●●●

●●

●●●

●●

●●

●●●●●

●●●●●●

●●●●●

●●●

●●

river:O

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 32 / 42

Results Specific Results

False negatives

Digor OsseticIron OsseticShughni

BaluchiZazakiTadzikPersianPashto

Vedic Sanskrit

Hindi

LahndaPanjabi StUrduSindhiGujaratiMarathiAssameseOriyaBengaliBihariNepaliKhaskuraGypsy Gk

Old PrussianLatvianLithuanian St

BulgarianMacedonianMacedonian PSerbocroatianSerbianSerbocroatian PSlovenian

Slovenian PRussian PUkrainian P

PolishUkrainianByelorussianByelorussian P

SlovakCzech ECzechSlovak PCzech P

Polish PUpper SorbianLower Sorbian

Old Church Slavonic

Cornish

Breton SeBreton ListBreton StWelsh CWelsh N

Old IrishIrish AGaelic Scots

Rumanian List

Dolomite LadinoRomanshItalian

WalloonFrenchProvencal

Catalan

BrazilianPortuguese StSpanish

Sardinian CLatin

AfrikaansFlemishDutch ListFrisian

GermanStandard German MunichLetzebuergesch

Old High GermanOld English

Old NorseIcelandic StFaroese

Old Swedish

StavangerskNorwegian

DanishDanish Fjolde

Gutnish LauOevdalianSwedishSwedish UpSwedish Vl

Tocharian ATocharian B

Albanian TAlbanianAlbanian TopAlbanian K

Ancient Greek

Greek MlGreek DGreek MdGreek ModGreek K

Classical ArmenianArmenian ModArmenian List

●●●

●●●●●

●●●●●●●●●●●●●

●●●

●●●●●●●

●●●

●●●●

●●●●●

●●●

●●●●●

●●●

●●●

●●●

●●●

●●

●●●●

●●●

●●

●●●

●●

●●

●●●●●

●●

●●●●

●●●●●

●●●

smell:W

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 33 / 42

Results Specific Results

False negatives

KatiSogdianOsseticDigor OsseticIron OsseticWakhiShughni

BaluchiTadzikPersianPashtoWaziri

Avestan

Vedic Sanskrit Kashmiri

HindiSindhiMarwariGujaratiMarathiAssameseOriyaBengaliBihariGypsy GkSinghalese

LatvianLithuanian OLithuanian St

Bulgarian PBulgarianMacedonian PSerbocroatianSerbianSerbocroatian PSlovenian

Slovenian PRussianRussian PUkrainian P

PolishUkrainianByelorussianByelorussian P

SlovakCzech ECzechSlovak PCzech P

Polish PUpper SorbianLower Sorbian

Old Church Slavonic

Cornish

Breton SeBreton ListBreton StWelsh CWelsh N

Old Irish

Irish AIrish BGaelic Scots

VlachRumanian List

Dolomite Ladino

RomanshLadinFriulianItalian

WalloonFrenchProvencal

Catalan

BrazilianPortuguese StSpanish

Sardinian LSardinian CSardinian N

Latin

Gothic

AfrikaansFlemishDutch ListFrisian

GermanStandard German Munich

SchwyzerduetschLetzebuergeschPennsylvania Dutch

Old High GermanOld English English

Old Norse Icelandic StFaroeseOld Swedish

StavangerskNorwegian

DanishDanish Fjolde

Gutnish LauOevdalianSwedishSwedish UpSwedish Vl

Albanian T

AlbanianAlbanian GStandard AlbanianAlbanian TopAlbanian KAlbanian C

Ancient Greek

Greek MlGreek DGreek MdGreek ModGreek K

Classical ArmenianArmenian ModArmenian List

●●●●●●●

●●●●●

●●

●●●●●●●●●●●

●●●

●●●●●●●

●●●●

●●●●

●●●●●

●●●

●●●●●

●●●

●●

●●●●

●●●

●●●

●●●

●●●●

●●

●●●

●●●

● ●●

●●

●●

●●●●●

●●●●●●

●●●●●

●●●

wet:I

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 34 / 42

Results Specific Results

False negatives

PrasunAshkunKati

SogdianOsseticDigor OsseticIron OsseticWakhi

BaluchiKurdishTadzikPersianPashtoWaziri

Avestan

Vedic Sanskrit Kashmiri

HindiLahndaUrduMarwariGujaratiMarathiAssameseOriyaBengaliBihariNepaliKhaskura

LatvianLithuanian OLithuanian St

Bulgarian PBulgarianMacedonianMacedonian PSerbocroatianSerbianSerbocroatian PSlovenian

Slovenian PRussianRussian PUkrainian P

PolishUkrainianByelorussianByelorussian P

SlovakCzech ECzechSlovak PCzech P

Polish PUpper SorbianLower Sorbian

Old Church Slavonic

Old BretonOld CornishOld Welsh

Cornish

Breton SeBreton ListBreton StWelsh CWelsh N

Old Irish

Irish AIrish BGaelic ScotsManx

Rumanian List

Dolomite Ladino

RomanshLadinFriulianItalian

WalloonFrenchProvencal

Catalan

BrazilianPortuguese StSpanish

Sardinian LSardinian CSardinian N

Latin

Gothic

AfrikaansFlemishDutch ListFrisian

GermanStandard German Munich

SchwyzerduetschLetzebuergeschPennsylvania Dutch

Old High GermanOld English

Old Norse Icelandic StFaroeseOld Swedish

StavangerskNorwegian

DanishDanish Fjolde

Gutnish LauOevdalianSwedishSwedish UpSwedish Vl

Tocharian ATocharian B

Albanian T

AlbanianAlbanian GStandard AlbanianAlbanian TopAlbanian KAlbanian C

Ancient Greek

Greek MlGreek DGreek MdTsakonianGreek ModGreek K

Classical Armenian Armenian List

●●●

●●●●●

●●●●●●

● ●

●●●●●●●●●●●●

●●●

●●●●●●●●

●●●●

●●●●

●●●●●

●●●

●●●

●●●●●

●●●●

●●●●

●●●

●●●

●●●

●●●●

●●

●●●

●●

● ●●●

●●

●●

●●●●●

●●

●●●●●●

●●●●●●

● ●

skin:B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 35 / 42

Results Specific Results

False negatives

KatiSogdianOsseticDigor OsseticIron OsseticWakhiShughniSariqoli

BaluchiKurdishZazakiTadzikPersianPashtoWaziri

Avestan

Vedic Sanskrit Kashmiri

Hindi

LahndaPanjabi StUrduBhojpuriSindhiMarwariGujaratiMarathiAssameseOriyaBengaliBihariNepaliKhaskuraSinghalese

Old Prussian LatvianLithuanian OLithuanian St

Bulgarian PBulgarianMacedonianMacedonian PSerbocroatianSerbocroatian PSlovenian

Slovenian PRussianRussian PUkrainian P

PolishUkrainianByelorussianByelorussian P

SlovakCzech ECzechSlovak PCzech P

Polish PUpper SorbianLower Sorbian

Old Church Slavonic

Cornish

Breton SeBreton ListBreton StWelsh CWelsh N

Old Irish

Irish AIrish BGaelic ScotsManx

VlachRumanian List

Dolomite Ladino

RomanshLadinFriulianItalian

WalloonFrenchProvencal

Catalan

BrazilianPortuguese StSpanish

Sardinian LSardinian CSardinian N

Latin

Gothic

AfrikaansFlemishDutch ListFrisian

GermanStandard German Munich

SchwyzerduetschLetzebuergeschPennsylvania Dutch

Old High GermanOld English English

Old Gutnish

Old Norse Icelandic StFaroeseOld Swedish

StavangerskNorwegian

DanishDanish Fjolde

Gutnish LauOevdalianSwedishSwedish UpSwedish Vl

Tocharian ATocharian B

Albanian T

AlbanianAlbanian GStandard AlbanianAlbanian TopAlbanian KAlbanian C

Ancient Greek

Greek MlGreek DGreek MdTsakonianGreek ModGreek KArmenian ModArmenian List

Hittite

●●●●●●●●

●●●●●●●

● ●

●●●●●●●●●●●●●●●

● ●●●

●●●●●●●

●●●●

●●●●

●●●●●

●●●

●●●●●

●●●●

●●

●●●●

●●●

●●●

●●●

●●●●

●●

●●●

●● ●

● ●●●

●●

●●

●●●●●

●●

●●●●●●

●●●●●●●●

sleep:E

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 36 / 42

Results Specific Results

False negatives

PrasunAshkunKati

SogdianOsseticDigor OsseticIron OsseticSariqoli

BaluchiKurdishZazakiTadzikPersianPashtoWaziri

Avestan

Vedic Sanskrit Kashmiri

HindiLahndaPanjabi StMarwariGujaratiMarathiOriyaBihariNepaliKhaskuraGypsy GkSinghalese

LatvianLithuanian OLithuanian St

Bulgarian PBulgarianMacedonianMacedonian PSerbocroatianSerbianSerbocroatian P

Slovenian PRussianRussian PUkrainian P

PolishUkrainianByelorussianByelorussian P

SlovakCzech ECzechSlovak PCzech P

Polish PUpper SorbianLower Sorbian

Old Church Slavonic

Old BretonOld CornishOld Welsh

Cornish

Breton SeBreton ListBreton StWelsh CWelsh N

GaulishOld Irish

Irish AIrish BGaelic ScotsManx

VlachRumanian List

Dolomite Ladino

RomanshLadinFriulianItalian

WalloonFrenchProvencal

Catalan

BrazilianPortuguese StSpanish

Sardinian LSardinian CSardinian N

Latin

Gothic

AfrikaansFlemishDutch ListFrisian

GermanStandard German Munich

SchwyzerduetschLetzebuergeschPennsylvania Dutch

Old High GermanOld English English

Old Gutnish

Old Norse Icelandic StFaroeseOld Swedish

StavangerskNorwegian

DanishDanish Fjolde

Gutnish LauOevdalianSwedishSwedish UpSwedish Vl

Tocharian ATocharian B

Albanian T

AlbanianAlbanian GStandard AlbanianAlbanian TopAlbanian KAlbanian C

Ancient Greek

Greek MlGreek DGreek MdTsakonianGreek ModGreek KArmenian List

Hittite

●●●

●●●●●

●●●●●●●

● ●

●●●●●●●●●●●●

●●●

●●●●●●●

●●●●

●●●●

●●●●●

●●●

●●●

●●●●●

●●

●●●●

●●

●●●●

●●●

●●●

●●●

●●●●

●●

●●●

●● ●

● ●●●

●●

●●

●●●●●

●●

●●●●●●

●●●●●●●

white:E

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 37 / 42

Results Specific Results

False negatives

SogdianDigor OsseticIron OsseticWakhiSariqoli

BaluchiZazakiTadzikPersianPashtoWaziri

Vedic SanskritKashmiri

Hindi

LahndaPanjabi StUrduMagahiSindhiGujaratiMarathiAssameseOriyaBengaliNepaliSinghalese

Old PrussianLatvianLithuanian OLithuanian St

Bulgarian PBulgarianMacedonianMacedonian PSerbocroatianSerbianSerbocroatian P

Slovenian PRussianRussian PUkrainian P

PolishUkrainianByelorussianByelorussian P

SlovakCzech ECzechSlovak PCzech P

Polish PUpper SorbianLower Sorbian

Old Church Slavonic

Cornish

Breton SeBreton ListBreton StWelsh N

Old IrishIrish BGaelic Scots

VlachRumanian List

Dolomite LadinoLadinFriulianItalian

WalloonFrenchProvencal

BrazilianPortuguese StSpanish

Sardinian LSardinian CSardinian N

Latin

Gothic

AfrikaansFlemishDutch ListFrisian

GermanStandard German Munich

SchwyzerduetschLetzebuergeschPennsylvania Dutch

Old High GermanOld English

English

Old NorseIcelandic StFaroese

Old Swedish

StavangerskNorwegian

DanishDanish Fjolde

Gutnish LauOevdalianSwedishSwedish UpSwedish Vl

Tocharian ATocharian B

Albanian T

AlbanianAlbanian GStandard AlbanianAlbanian TopAlbanian KAlbanian C

Greek MlGreek DGreek MdGreek ModGreek K

Classical ArmenianArmenian ModArmenian List

●●●●●

●●●●●●

●●

●●●●●●●●●●●●

●●●●

●●●●●●●

●●●●

●●●●

●●●●●

●●●

●●●●

●●●

●●

●●●●

●●●

●●●

●●●

●●●●

●●

●●●

●●

●●●

●●

●●

●●●●●

●●

●●●●●●

●●●●●

●●●

worm:B

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 38 / 42

Results Specific Results

Summary on Indo-European

As the qualitative evaluation shows, the proto-forms proposed to bereconstructed back to PIE by our best ASR method are mostly equallygood if not even better candidates than those which we found in the goldstandard. Given the general and well-known uncertainties in semanticreconstruction in classical historical linguistics, it seems that ASR methodscould provide actual help in semantic reconstruction by providing objectiveevolutionary scenarios for word evolution along a given tree which follow aspecific evolutionary model.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 39 / 42

Discussion

Benefits of ASR (?)

If the language family is well-knownASR is of limited use in semantic reconstruction, since independentreconstructions by the comparative methods are available, butit is quite useful to check data quality and reference tree topology inlexicostatistical datasets.

If the language family is less well-knownASR is definitely useful as a preliminary analysis for semanticreconstruction, since it gives a more objective assessment of theconsequences of a given theory of lexical replacement and externallanguage change (a tree topology).

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 40 / 42

Discussion

Benefits of ASR (!)

ASR may help1 to identify loci of homoplasy and gives thus a first hint for parallel

semantic change patterns and borrowing.2 to quantify differential rates of lexical replacements for the concepts in

a given wordlist.3 to automatically identify sound change patterns and proto-form

reconstructions.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 41 / 42

Discussion

Caveats

Our current models are still very simplistic, in so far as theyoperate independently for each meaning slot,handle only binary (yes-no) cognate relations between words.

Future research will show whether it is possible to model lexical changeacross meanings and to allow for more fine-grained relations betweencognate classes.

Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 42 / 42

References

A. Bouchard-Côté, D. Hall, T. L. Griffiths, and D. Klein. Automated reconstruction of ancientlanguages using probabilistic models of sound change. Proceedings of the National Academyof Sciences of the United States of America, 110(11):4224–4229, 2013.

R. Derksen. Etymological dictionary of the Slavic inherited lexicon. Brill, Leiden and Boston,2008.

G. Kroonen. Etymological dictionary of Proto-Germanic. Number 11 in Leiden Indo-EuropeanEtymological Dictionary Series. Brill, Leiden and Boston, 2013.

J.-M. List, A. Terhalle, and M. Urban. Using network approaches to enhance the analysis ofcross-linguistic polysemies. In Proceedings of the 10th International Conference onComputational Semantics – Short Papers, pages 347–353, Stroudsburg, 2013. Association forComputational Linguistics.

J.-M. List, T. Mayer, A. Terhalle, and M. Urban. Clics: Database of Cross-LinguisticColexifications. Online Resource, 2014a. URL http://clics.lingpy.org.

J.-M. List, S. Nelson-Sathi, H. Geisler, and W. Martin. Networks of lexical borrowing and lateralgene transfer in language and genome evolution. Bioessays, 36(2):141–150, 2014b.

M. Meier-Brügger. Indogermanische Sprachwissenschaft. de Gruyter, Berlin and New York, 8edition, 2002.

J. Pokorny. Indogermanisches etymologisches Wörterbuch, volume 1. Francke, Bern, 1959.M. Vaan. Etymological dictionary of Latin and the other Italic languages. Number 7 in Leiden

Indo-European Etymological Dictionary Series. Brill, Leiden and Boston, 2008.M. Vasmer. Ėtimologičeskij slovar’ russkogo jazyka. Progress, Moscow, 1986/1987.D. Wodtko, B. Irslinger, and C. Schneider. Nomina im Indogermanischen Lexikon. Winter,

Heidelberg, 2008.Jäger & List (Tübingen/Paris) Ancestral state reconstruction Leiden 42 / 42