Microsoft Word - 2019-03-15-final-report.docxLoxodonta africana
subspecies distribution across African Elephant Database Input
Zones
Hyeon Jeong Kim and Samuel K Wasser Center for Conservation
Biology
Department of Biology University of Washington
March 15, 2019
2
Executive summary The aim of this project is to identify the
distribution of savannah, forest, and hybrid elephant populations
within IUCN/SCC African Elephant Specialist Group’s input zones and
range boundaries of the African Elephant Database (AED) using the
Center for Conservation Biology’s genetic information. We selected
for analysis a total of 2292 geo-referenced samples with genetic
information at a minimum of 10 out of 16 microsatellite loci, but
always including two highly subspecies discriminating loci. Of
these samples, 1432, 519, and 171 samples were respectively
identified as savannah, forest, or hybrid samples. The remainder of
the samples did not meet our stringent criteria for subspecies
designation. The samples with subspecies status were found in 106
out of 411 AED input zones and 117 out of 975 AED range boundaries.
The 106 input zones were distributed into 57 savannah, 34 forest, 4
hybrid, and 11 mixed population input zones. To identify the
subspecies status of the remaining 305 input zones, the data were
analyzed using a k-nearest neighbor approach and a spatial
population genetic analysis. A total of 96 and 129 input zones were
respectively found to have only savannah or forest samples within
300 km of the polygon. Thirty-one of the remaining input zones had
a mix of savannah, forest, and/or hybrid samples whereas 32 input
zones did not have samples within 300 km of the polygon boundary.
Spatial population genetic analysis using the genetic information
of the geo-referenced samples resulted in a map of genetic ancestry
of elephants in Africa, from which to estimate the subspecies
status of the unknown input zones.
3
METHOD Reference Sample Subspecies Identification Sample collection
Elephant samples used in this project were collected between 2000
and 2018 as part of the University of Washington Center for
Conservation Biology (CCB), African elephant forensic database
established in 2004 (Wasser et al. 2004, 2015). Samples from a
hybrid assessment project (Mondol et al. 2015) conducted by the CCB
were also included. Samples in the reference database consist of
fecal, blood and hair samples. Whenever possible, every effort was
made to consecutive samples at distances ≥1 km apart to minimize
chances of obtaining multiple samples from the same family group.
Latitude/Longitude coordinates for each sample or each batch of
samples were recorded at the time of collection. Genetic analysis
and data filtering DNA was extracted from each sample and genotyped
at 16 di-nucleotide microsatellite loci following the methods of
Wasser et al. (2004). All samples were extracted in duplicate and
each extract was amplified 2-3 times per locus, using a multiple
tubes approach to minimize allelic drop-out (i.e., missing alleles
due to DNA amplification failure at a given locus). Stringent data
filter criteria were applied to the dataset; only samples with
accurate geographic information, genetic data at a minimum of 10
out of 16 loci and always including two loci (FH71 and SO4) with
high subspecific differentiating power. The subspecies status of
the sample was identified following the methods of Mondol et al.
(2015) using EBhybrids v. 0.991, a program written specifically for
elephant subspecies and hybrid identification (Mondol et al. 2015;
available at https://github.com/stephenslab/EBhybrids). EBhybrids
uses allelic drop out rates for each subspecies and ancestry
proportions of each sample to calculate the posterior probability
that a given sample is a pure forest elephant, pure savanna
elephant or a hybrid between the two subspecies, including whether
the hybrid is F1 generation, F2 generation, or backcrossed to
either a savannah or forest elephant. The allelic drop out rates
were calculated using MicroDrop version 1.01 (Wang et al. 2012).
The ancestry proportion values were estimated using the software,
STRUCTURE v. 2.3.4 (Pritchard et al. 2000; Falush et al. 2003;
2007; Hubisz et al. 2009), which were compiled using CLUMPP v.
1.1.2 (Jakobsson & Rosenberg 2007) and TESS3 implemented in R
(Chen et al. 2007; Caye et al. 2015). Samples retained for further
analysis were those identified as either forest, savannah, or
hybrid, each with > 0.95 posterior probability of being in its
respective subgroup using EBhybrids analysis, based on both
STRUCTURE and TESS3. Objective 1 Spatial analysis The African
Elephant Database (AED) includes 411 input zones in 37 countries
and the AED range layer consists of 975 polygons divided into three
occurrence categories: doubtful, possible,
4
and known. The reference samples were merged with the input zones
and range layers to identify the number of samples of each
subspecies in each of the polygons. Spatial analysis was conducted
using Geopandas in Python 3.6.5 and all other data manipulation was
conducted in R version 3.5.1 using the tidyverse packages.
Objective 2 K-nearest-neighbor algorithm A k-nearest-neighbor
analysis was conducted to identify the most likely subspecies to be
present in each of the input zones that had no overlap with
reference samples. The 20 closest samples within 300 km of the
input zone were identified, using only samples with unique
locations to maximize the number of samples identified as nearest
neighbors. This was conducted using a k- nearest neighbor algorithm
implemented in the R package nngeo. The number of each subspecies
and average distance to each subspecies was calculated. Spatial
inference based on genetic data The software TESS3 uses spatial and
genetic information to assign the ancestry proportion of each
reference sample to either savannah or forest subspecies. The
ancestry proportions were inferred over geographic space to predict
the subspecies present in the unknown input zones. To plot the
genetic information over geographic space, a raster file of Africa
was downloaded
(http://membres-timc.imag.fr/Olivier.Francois/RasterMaps.zip) and
cropped to fit the boundaries of the AED Africa base layer.
5
RESULTS Reference Sample Subspecies Identification After filtering
for geographic coordinates and the above mentioned genetic criteria
for subspecies assignment, 2292 elephant fecal samples were
retained. A total of 2122 samples were identified to subspecies
status: 1432 as savannah elephant, 519 as forest elephant, and 171
as hybrids. These number represent samples and not unique
individuals. The remaining samples were excluded from all
subsequent analysis because they did not meet the criteria to be
assigned to one of the three categories. Objective 1: To combine
genetic data with input zones to identify the species present in
input zones and range boundaries. Input zone The African Elephant
Database contains 411 input zones in 37 countries. All 2122 samples
with clear subspecies status were spatially merged with the input
zones to identify the subspecies of elephants present in each input
zones (Figure 1).
Figure 1. Reference samples identified to subspecies status are
shown (forest = green; savannah = orange, hybrids = blue) with AED
input zones (light brown = input zones that contain no samples,
brown = input zones that contain samples).
6
In total, 1821 samples were located inside 106 input zones in 29
countries while 301 samples did not fall inside any input zones.
Table 1 shows the number of input zones classified as forest,
savannah, hybrid or a combination of the three. Table 1. Number of
input zones classified as each subspecies and the number of
countries.
Subspecies Number of input zones
Number of countries
Savannah 57 17 Forest 34 13 Hybrid 4 3 Mixed 11 6
Eleven of the 106 input zones included more than one subspecies of
elephants (Table 2). The detailed workflow of identifying
subspecies status is shown in Figure 2.
Figure 2. Flow diagram detailing the number of input zones
identified to subspecies status at each analysis.
7
Table 2. List of 11 input zones with sample of more than one
subspecies status.
Input zone Subspecies Number of samples
Garamba Ecosystem Forest 12 Garamba Ecosystem Hybrid 1
Gourma aerial survey zone Hybrid 7 Gourma aerial survey zone
Savannah 10 Gourma: surrounding area Hybrid 1 Gourma: surrounding
area Savannah 6
Kibale National Park Hybrid 70 Kibale National Park Savannah
4
Mekrou Hunting Zone Forest 3 Mekrou Hunting Zone Hybrid 1
Murchison Falls Conservation Area Hybrid 1 Murchison Falls
Conservation Area Savannah 31 Queen Elizabeth Conservation Area
Forest 1 Queen Elizabeth Conservation Area Hybrid 31 Queen
Elizabeth Conservation Area Savannah 38
Sudanian Area Hunting Blocks Hybrid 1 Sudanian Area Hunting Blocks
Savannah 9
Virunga (North & Central) National Park Hybrid 6 Virunga (North
& Central) National Park Savannah 2
W du Bénin National Park Forest 17 W du Bénin National Park Hybrid
1
Zakouma National Park Forest 1 Zakouma National Park Savannah
25
Range layer The number of samples found within a range polygon was
1875 and the number of samples found outside a range polygon was
247. The 1875 samples were found within 117 polygons of the ranger
layer (Table 3). The majority of samples, 1804, fell inside a known
range polygon, while 25 and 46 samples fell inside possible and
doubtful range polygons, respectively (Figure 3).
8
Table 3. The number of samples found in each category of AED’s
range layer.
Range category
Polygons with samples
Number of samples
Known 571 95 1804 Possible 190 10 25 Doubtful 214 12 46
Figure 3. The reference samples are shown in blue. The range
layer’s known, possible, doubtful polygons are shown in green,
yellow, and red, respectively. Polygons with samples found within
them are shown in darker green, yellow, and red.
9
Objective 2: To determine species distribution of savannah, forest,
hybrids and remaining unknown populations of African elephants
across the input zones and range boundaries using a statistical
model K-nearest neighbor Out of 305 unknown input zones, 242 input
zones matched with samples of a single subspecies within 300 km of
that zone, 31 input zones matched with samples of multiple
subspecies, and 32 input zones had no nearest neighbors. The input
zones that matched with only a single subspecies of input zones
could be identified as 96 forest and 129 savannah input zones.
However, 17 input zones matched with a single sample and therefore
the subspecies status could not be determined. The detailed
breakdown is shown in Figure 2. The average distance from an
unknown input zone to samples of each subspecies were 147 km for
savannah, 139 km for forest and 78.1 km for hybrid samples. Spatial
inference There were 80 input zones for which subspecies status
could not be determined by merging the reference samples or by the
k-nearest neighbor approach (highlighted in yellow in Figure 2). A
spatially explicit population structure analysis was conducted to
estimate the ancestry proportion of a sample coming from each of
the 80 input zones. Each sample was estimated to be from either a
forest or savannah subspecies with varying levels of admixture
between the two subspecies. A value of 1 indicates a pure savannah
elephant sample while a value of 0 indicates a pure forest elephant
sample (Figure 4). These values were inferred across geographic
space and merged with the input zones to predict the subspecies
status of each of the input zones.
10
Figure 4. The distribution of forest and savannah genetic ancestry
proportions plotted across the African continent. A: Reference
samples identified as forest (green), savannah (orange), or hybird
(hybrid) are plotted on the inferred spatial distribution.
Conclusions The spatial genetic analysis of African elephant
samples show distinct genetic and spatial separation between
savannah and forest subspecies with limited hybridization. Savannah
elephants are largely restricted to woodland-savannah habitat
whereas forest elephants are largely restrcited to forest habitat.
Hybrids are clustered at the junction between savannah and forest
habitat. However, despite vast availability of savannah-forest
ecotone, hybrids are largely restricted to the borders of eastern
DRC, western Uganda and South Sudan, and secondarily along the
Mali-Burkina Faso and Benin-Burkina Faso borders. These findings
support the suggestion by Mondol et al. (2015) that this restricted
geospatial hybrid concentration is largely due to asymmetrical
poaching pressure, whereby the subspecies experiencing high
poaching pressure flees to safe haven in a neighboring country
despite the habitat change. Mondol et al also found that there was
no parental sex-bias in the subspecies of these hybrids. Genome
wide studies of extinct and extant elephantidae by Palkopoulou et
al (2018) also indicate that hybridization between forest and
savanna African elephants was extremely rare over their
evolutionary history despite a high overall occurrence of
hybridization among elephantidae as a whole.
−10 0 10 20 30 40 50
−3 0
−2 0
−1 0
0 10
20 30
11
REFERENCES Caye K, Deist TM, Martins H, Michel O, François O. 2015.
TESS3: fast inference of spatial
population structure and genome scans for selection. Molecular
Ecology Resources 16:540– 548.
Caye K, and Francois O. (2016). tess3r: Inference of Spatial
Population Genetic Structure. R package version 1.1.0.
Chen C, Durand E, Forbes F, François O. 2007. Bayesian clustering
algorithms ascertaining spatial population structure: a new
computer program and a comparison study. Molecular Ecology Notes
7:747–756.
Dorman, M. (2018). nngeo: k-Nearest Neighbor Join for Spatial Data.
R package version 0.2.4.
https://CRAN.R-project.org/package=nngeo
Falush D, Stephens M, Pritchard JK. 2003. Inference of population
structure using multilocus genotype data: linked loci and
correlated allele frequencies. Genetics 164:1567–1587. Genetics
Society of America.
Falush D, Stephens M, Pritchard JK. 2007. Inference of population
structure using multilocus genotype data: dominant markers and null
alleles. Molecular Ecology Notes 7:574–578.
Hubisz MJ, Falush D, Stephens M, Pritchard JK. 2009. Inferring weak
population structure with the assistance of sample group
information. Molecular Ecology Resources 9:1322–1332.
Jakobsson M, Rosenberg NA. 2007. CLUMPP: a cluster matching and
permutation program for dealing with label switching and
multimodality in analysis of population structure. Bioinformatics
23:1801–1806.
Mondol S, Moltke I, Hart J, Keigwin M, Brown L, Stephens M, Wasser
SK. 2015. New evidence for hybrid zones of forest and savanna
elephants in Central and West Africa. Molecular Ecology
24:6134–6147.
Palkopoulou, E. et al. 2018. A comprehensive genomic history of
extinct and living elephants. Proceedings of the National Academy
of Sciences March 115: E2566-E2574. National Academy of Sciences.
https://doi.org/10.1073/pnas.1720554115
Pritchard JK, Stephens M, Donnelly P. 2000. Inference of Population
Structure Using Multilocus Genotype Data. Genetics 155:945–959.
Genetics.
R Core Team (2018). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria.
URL http://www.R-project.org/.
Wang C, Schroeder KB, Rosenberg NA. 2012. A maximum-likelihood
method to correct for allelic dropout in microsatellite data with
no replicate genotypes. Genetics 192:651–669. Genetics.
Wasser SK, Mailand C, Mondol S, Clark W, Laurie C, Weir BS, Brown
L. 2015. Genetic assignment of large seizures of elephant ivory
reveals Africa's major poaching hotspots. Science (New York, N.Y.)
349:84–87.
Wasser SK, Shedlock AM, Comstock K, Ostrander EA, MUTAYOBA B,
Stephens M. 2004. Assigning African elephant DNA to geographic
region of origin: applications to the ivory trade. Proceedings of
the National Academy of Sciences 101:14847–14852. National Academy
of Sciences.
Wickham, H. (2017). tidyverse: Easily Install and Load the
'Tidyverse'. R package version 1.2.1.
https://CRAN.R-project.org/package=tidyverse
12
SUPPLEMENTARY FILES Spatial data files 1. Shapefile of elephant
samples: “reference-samples.shp” 2. ASCII raster file of estimated
elephant distribution: “distribution-raster.ascii” Reference files
1. Bibliography as a Zotero RDF file: “elephant-library.rdf” 2.
Bibliography as a BibTeX file: “elephant-library.bib”
Dataframes 1. Dataframe of input zones merged with samples and
their subspecies designation:
“sj_id_106_iz.csv” 2. Dataframe of k-nearest-neighbor analysis
results: “knn_d_273_iz.csv” 3. Dataframe of 80 unknown input zones
with genetic ancestry results: “ts_id_80_iz.csv”