Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru...

26
Language Typology and Areal Linguistics Yiru July 13, 2016 Yiru Language Typology July 13, 2016 1 / 26

Transcript of Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru...

Page 1: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Language Typology and Areal Linguistics

Yiru

July 13, 2016

Yiru Language Typology July 13, 2016 1 / 26

Page 2: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Overview

1 Introduction

2 Typologically-based Clusters

3 Areal Linguistics

Yiru Language Typology July 13, 2016 2 / 26

Page 3: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Language similarity

Why are some languages more alike than others?

the languages may be related ”genetically”.derived from a common ancestor language

the similarities may be due to chance.linguistic universals

the languages may be related areally.due to sharing

Yiru Language Typology July 13, 2016 3 / 26

Page 4: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Language similarity

Differences between the concepts of genetic relatedness and languagesimilarities lead us to the following questions:

If we cluster languages based only on their typological features, howdo the induced clusters compare to phylogenetic groupings?

How well do induced clusters and genetic families perform inpredicting values for typological features?

What typological features tend to stay the same within languagefamilies, and what features are likely to differ?

Yiru Language Typology July 13, 2016 4 / 26

Page 5: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

WALS(World Atlas of Language Structures)

The WALS project consists of a database that catalogs linguistic featuresfor

over 2,556 languages in 208 language families

using 142 features in 11 different categories.

Data sparsity: only 16% of the cells are filled-presents serious problems to clustering algorithms

Yiru Language Typology July 13, 2016 5 / 26

Page 6: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Pruning Methods

Pruning the data to produce a smaller but denser subset

Prune Languages by Minimum Featuresrequire languages have a minimum of 25 features for the whole-worldset, or 10 features for comparing across subfamilies

Prune Features by Minimum Coveragepruning features that do not cover more than 10% of the selectedlanguages in the whole-world set, and 25% in comparisons acrosssubfamilies.

Use a Dense Language Family

Yiru Language Typology July 13, 2016 6 / 26

Page 7: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Features and Feature Values

the actual representation of the features

values cannot be treated using distance measures: Binarization

Yiru Language Typology July 13, 2016 7 / 26

Page 8: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Experimental Setup

Q1: how do induced clusters compare to phylogenetic groupings?Clustering Methods

k-medoids algorithm

methods from the CLUTO: repeated-bisection (rb), a k-meansimplementation (direct), an agglomerative algorithm (agglo) usingUPGMA to produce hierarchical clusters, and bagglo, a variant ofagglo

Similarity Measures

CLUTOs default cosine similarity measure (cos)

shared overlap = #FeatureswithSameValues#FeaturesBothFilledOutinWALS

Yiru Language Typology July 13, 2016 8 / 26

Page 9: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Clustering Performance Metrics

The genetic families as the gold standard

Rand Index

Cluster Precision, Recall, and F-Score

Yiru Language Typology July 13, 2016 9 / 26

Page 10: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Prediction Accuracy

Q2: how do induced clusters and genetic families compare in predictingthe values of features for languages in the same group?Q3: what typological features tend to stay the same within relatedfamilies?

Prediction accuracy:

use 90% of the filled cells to build clusters

predicted the values of the remaining 10% of filled cells

the accuracy is calculated by comparing these predicted values with theactual values in the gold standard

Yiru Language Typology July 13, 2016 10 / 26

Page 11: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Results & Analysis

Cluster Similarity

Yiru Language Typology July 13, 2016 11 / 26

Page 12: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Results & Analysis

Prediction Accuracy

Yiru Language Typology July 13, 2016 12 / 26

Page 13: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Results & Analysis

Prediction Accuracy

Yiru Language Typology July 13, 2016 13 / 26

Page 14: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Results & Analysis

Feature Selection

Yiru Language Typology July 13, 2016 14 / 26

Page 15: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Error Analysis

Language Similarity vs. Genetic

Yiru Language Typology July 13, 2016 15 / 26

Page 16: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Error Analysis

WALS as the Dataset

The Feature Set in WALS

Data Sparsity and Shared Features

Yiru Language Typology July 13, 2016 16 / 26

Page 17: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Areal Linguistics

The use of areas improves genetic reconstruction of languages according toa variety of metrics.Basic ideas: develop a Bayesian model of typology that allows for

the existence of linguistic areas

preference for some feature to be shared areally

to show that reconstructing language family trees is significantly aided byknowledge of areal features

Yiru Language Typology July 13, 2016 17 / 26

Page 18: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Areal Linguistics

some of the well-known linguistic areas

The Balkans: Albanian, Bulgarian, Greek, Macedonian, Rumanianand Serbo-Croatian. (Sometimes: Romani and Turkish)

The Baltic: Baltic languages, Baltic German, and Finnic languages(especially Estonian and Livonian).

linguistic features most easily shared areally

Ross (1988): nouns > verbs > adjectives > syntax >non − boundfunctionwords > boundmorphemes > phonemes

Curnow (2001): 15 categories of borrowable features, phonetics(rare), phonology (common), lexical (very common)

Yiru Language Typology July 13, 2016 18 / 26

Page 19: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

A Bayesian Model for Areal Linguistics

Pitman-Yor process for modeling linguistic areas

Kingmans coalescent for modeling linguistic phylogeny

Yiru Language Typology July 13, 2016 19 / 26

Page 20: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Identifying Language Areas

2

Yiru Language Typology July 13, 2016 20 / 26

Page 21: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Identifying Areal Features

Yiru Language Typology July 13, 2016 21 / 26

Page 22: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Genetic Reconstruction

Yiru Language Typology July 13, 2016 22 / 26

Page 23: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Genetic Reconstruction

Yiru Language Typology July 13, 2016 23 / 26

Page 24: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Conclusion

1. Comparing clusters derived from typological features to genetic groupsin the worlds languages

the induced clusters look very different from genetic grouping

despite the differences, induced clusters show similar, or even greaterlevels of typological similarity than genetic grouping

2. The use of areas improves genetic reconstruction of languages

Yiru Language Typology July 13, 2016 24 / 26

Page 25: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

References

Ryan Georgi, Fei Xia, William Lewis (2001)

Comparing Language Similarity across Genetic and Typologically-Based Groupings

Hal Daume III(2009)

Non-Parametric Bayesian Areal Linguistics

Yiru Language Typology July 13, 2016 25 / 26

Page 26: Language Typology and Areal Linguistics - uni · PDF filelinguistic universals ... Yiru Language Typology July 13, 2016 4 / 26. ... Use a Dense Language Family Yiru Language Typology

Thank You!

Yiru Language Typology July 13, 2016 26 / 26