Fruit Flies and Moduli: Interactions between Biology and ... · and, in some cases, our substantial...

7
Fruit Flies and Moduli: Interactions between Biology and Mathematics Ezra Miller P ossibilities for using geometry and topol- ogy to analyze statistical problems in biology raise a host of novel questions in geometry, probability, algebra, and com- binatorics that demonstrate the power of biology to influence the future of pure mathematics. This is a tour through some biological explorations and their mathematical ramifications. A Biological Hypothesis Evolution sometimes results in discrete morpho- logical differences among populations that diverge from a common source. This “saltation” can oc- cur with features quantified by integers—limbs, segments, petals, teeth, or digits (humans are occasionally born with six fingers)—or quantified by other discrete invariants, such as tesselation patterns—seeds in flowers or protomers in virus capsids. Biology has explanations of how popula- tions that already exhibit a varying trait can lead to populations in which one or the other dominates. The question here is: what mechanism generates topological variation in sufficient quantity for selection to act? Take the fruit fly, for example. The normal Drosophila melanogaster wing depicted in (a) of Figure 1 differs from the abnormal other two, (b) and (c), in topology as well as geometry. Indeed, mathematically the veins in each wing can be abstracted as an embedded planar graph, with a Ezra Miller is professor of mathematics and of statistical sci- ence at Duke University. His email address is ezra@math. duke.edu. Professor Miller acknowledges support from NSF grant DMS- 1001437. Images of the fruit fly wings are courtesy of David Houle. For permission to reprint this article, please contact: [email protected]. DOI: http://dx.doi.org/10.1090/noti1290 (a) (b) (c) Figure 1. location for each vertex and a contour for each arc. The graph in (b) has an extra edge, and hence two extra vertices, while the graph in (c) is lacking a vertex. These topological variants, along with many others, occur in natural D. melanogaster populations, but rarely. On the other hand, different species of Drosophila exhibit a range of wing vein topologies. How did that come to be? Wing veins serve several key purposes: as structural supports as well as conduits for airways, nerves, and blood cells, among other things [8]. Is it possible that some force causes aberrant vein topologies to occur more frequently than would otherwise be expected in a natural population—frequently enough for evolutionary processes to act? Results from biologist Kenneth Weber, and later with more power by David Houle’s lab, show that selecting for continuous wing deformations results in skews toward deformed wings with normal vein topology [39], [40], [20], but also, 1178 Notices of the AMS Volume 62, Number 10

Transcript of Fruit Flies and Moduli: Interactions between Biology and ... · and, in some cases, our substantial...

Page 1: Fruit Flies and Moduli: Interactions between Biology and ... · and, in some cases, our substantial understanding of their geometry and combinatorics, less is known about the probability

Fruit Flies and ModuliInteractions betweenBiology and MathematicsEzra Miller

Possibilities for using geometry and topol-ogy to analyze statistical problems inbiology raise a host of novel questions ingeometry probability algebra and com-binatorics that demonstrate the power of

biology to influence the future of pure mathematicsThis is a tour through some biological explorationsand their mathematical ramifications

A Biological Hypothesis

Evolution sometimes results in discrete morpho-logical differences among populations that divergefrom a common source This ldquosaltationrdquo can oc-cur with features quantified by integersmdashlimbssegments petals teeth or digits (humans areoccasionally born with six fingers)mdashor quantifiedby other discrete invariants such as tesselationpatternsmdashseeds in flowers or protomers in viruscapsids Biology has explanations of how popula-tions that already exhibit a varying trait can lead topopulations in which one or the other dominatesThe question here is what mechanism generatestopological variation in sufficient quantity forselection to act

Take the fruit fly for example The normalDrosophila melanogaster wing depicted in (a) ofFigure 1 differs from the abnormal other two (b)and (c) in topology as well as geometry Indeedmathematically the veins in each wing can beabstracted as an embedded planar graph with a

Ezra Miller is professor of mathematics and of statistical sci-ence at Duke University His email address is ezramathdukeeduProfessor Miller acknowledges support from NSF grant DMS-1001437

Images of the fruit fly wings are courtesy of David Houle

For permission to reprint this article please contactreprint-permissionamsorg

DOI httpdxdoiorg101090noti1290

(a) (b)

(c)

Figure 1

location for each vertex and a contour for eacharc The graph in (b) has an extra edge andhence two extra vertices while the graph in (c) islacking a vertex These topological variants alongwith many others occur in natural Dmelanogasterpopulations but rarely On the other hand differentspecies of Drosophila exhibit a range of wing veintopologies How did that come to be Wing veinsserve several key purposes as structural supportsas well as conduits for airways nerves and bloodcells among other things [8] Is it possible thatsome force causes aberrant vein topologies to occurmore frequently than would otherwise be expectedin a natural populationmdashfrequently enough forevolutionary processes to act

Results from biologist Kenneth Weber andlater with more power by David Houlersquos lab showthat selecting for continuous wing deformationsresults in skews toward deformed wings withnormal vein topology [39] [40] [20] but also

1178 Notices of the AMS Volume 62 Number 10

unexpectedly much higher rates of topologicalnovelty This latter claim which is unpublishedand has yet to be tested statistically suggestsa fundamental biological hypothesis topologicalnovelty arises at the extreme of selection forcontinuous shape characteristics

Wings to Modules

This hypothesis could potentially be tested usingpersistent homology a tool for data analysis thatuses computational topology to assign modulesover polynomial rings to subsets X sube Rn [12] Thistool is a good candidate because of its abilityto emphasize differences in stratification amongotherwise similar subsets of Rn

Take our case of an embedded graph X sube R2for example For any nonnegative real numbers rand s let Xsr sube X be the set of points at distanceat least r from every vertex and within s of someedge Thus Xsr is obtained by taking the union ofthe balls of radius r around the vertices away fromthe union of s-neighborhoods of the edges In themagnified portion of the middle wing (Figure 2) ris approximately twice s

Figure 2

The homology Hi(Xsr ) with coefficients in a field kcounts connected components or loops of Xsr wheni = 0 or 1 respectively Introducing a new vertex toan edge ofX tends to create connected componentsand destroy cycles when r s because the ballsaround vertices protect them from the expandingedges The precise relations between r and s thatalter the topology of Xsr depend on the geometryof X such as the angles between edges at agiven vertex and that is the point persistenthomology uses topological invariants as measuresof geometry

To keep the data structure finite the param-eters r and s can without loss of significantinformation be restricted to integer multiples of asmall positive length ε The persistent homology ofthe graphX with the two parameters r and s is thendefined to be the direct sum Mi(X) =

oplusr s Hi(Xsr )

of all of the homology groups It is a bigraded mod-ule over the polynomial ring k[x y] the variablesx and y act on Mi(X) by comparing the homologyof Xsr with that of Xsrminusε and Xs+εr respectively

Persistent homology with only one parameter[37] [15] instead of two or more results in amodule over a polynomial ring in one variableThis case is much better studied in part becauseit behaves more tamely In particular there is afinite computable set of topological featuresmdashconnected components loops or in the generalcase features of higher dimensionmdasheach of whichhas well-defined parameters where its ldquobirthrdquoand ldquodeathrdquo occur such that every homologyclass is a direct sum of these features [14] Forfly wings or arbitrary multiparameter situationswhere the homology groups record the topologyof several increasing chains of subsets of a singletopological space no such clean description ispossible [12] However alternative presentations ofmodules from combinatorial commutative algebra[31] (or see [33 Chapter 11]) based essentiallyon the theory of primary decomposition can beunderstood topologically in terms of birth anddeath parameters Such understanding is necessaryif statistics on sets of bigraded modules are to beinterpretable biologically

Moduli as Statistical Sample Spaces

Persistent homology summarizes a sample offly wings by transforming it into a sample ofmodules It is a general principle of statistics thatto analyze samples from a set of objects oneneeds sufficient understanding of the set of allobjects from metric probabilistic and sometimescombinatorial perspectives How far apart are pairsof objects How likely is each object to be selectedat random Mathematics excels at placing coherentstructures on sets of all objects of a given typeThe resulting ldquomoduli spacesrdquo pervade geometryof many sortsmdashdifferential algebraic arithmeticcomplex discretemdashand also theoretical physicsand topology though in the latter field they arecalled classifying spaces But despite their ubiquityand in some cases our substantial understandingof their geometry and combinatorics less is knownabout the probability and statistics of samplingfrom them

Like many moduli spaces the ones parametriz-ing bigraded modules over k[x y] are quotientsof algebraic varieties by continuous group actions[12] This makes the moduli spaces complicatedunions of manifolds of varying dimension Onepossibility covered in the next section is to de-velop geometric methods to analyze samples fromsuch ldquostratified spacesrdquo Another which tends tobe favored a priori for computational reasonsis to use discrete invariants as proxies for thecontinuous moduli For bigraded modules thesediscrete invariants include

November 2015 Notices of the AMS 1179

bull single-parameter persistence by tracing zigzagsthrough the groups Hi(Xsr ) [11]

bull Hilbert series meaning the dimensions of thevector spaces Hi(Xsr ) disregarding all of thehomomorphisms between them

bull rank invariants which take into account theranks of the homomorphisms but not theirprecise algebraic structure [12] [10]

bull Betti numbers which record discrete homolog-ical invariants of the module [29]

Any discrete invariant subdivides the modulispace into regions where the invariant is constantUnderstanding the nature of these subdivisions isa pragmatic matter for bigraded k[x y]-modulesgiven the fly wing context but (modifications of)some of these new sorts of questions make sensefor any moduli space or indeed any stratified space

1 What metric or combinatorial properties dothese subdivisions possess For exampledo the regions have equal dimension androughly equal size or are there a few bigregions (or only one) of top dimension and abunch of smaller ones

2 What distributions of discrete invariants areexpected from a given (biological) problemMight the discrete invariants be expected todistinguish between the modules producedby applied situations even if the regionsarenrsquot of similar size

3 General geometric statistical question what(natural) measures should be placed on thevalues of a discrete invariant given thegeometry of the moduli spaces

4 Can the continuous variation be captured dis-cretely to desired precision More preciselyis there a family (indexed by n) of sets ofdiscrete invariants such that letting n rarr infinresults in an increasingly fine subdivision

Geometric Probability on Stratified Spaces

As we have seen for fly wings statistical problemswhere the sample objects are more complicatedthan vectors in vector spaces naturally lead tosampling from stratified spaces The goal ofgeometric statistics in this setting is like inordinary linear statistics to identify describesummarize or make inferences about an unknownprobability distribution on the sample space fromwhich the sample points are assumed to bedrawn To that end it is crucial to understand theopposite problem from probability theory given adistribution on the relevant sample space how dosamples from that distribution behave

The simplest summary of a distribution isa pointmdashan average or population meanmdashaboutwhich the distribution is centered Laws of largenumbers assert that means of increasingly largerandom samples from a distribution converge to apopulation mean Statistics requires knowledge of

the expected difference between a sample meanand a population mean Central limit theoremshelp quantify that difference by describing thevariation of sample means around populationmeans In ordinary statistics for example whenthe sample space is the real line the central limittheorem dictates that sample means vary aroundthe population mean according to a distributionthat is in the limit of infinite sample size Gaussian

In Euclidean space basic concepts such as meanexpectation and average coincide and thereforeadmit multiple equivalent characterizations suchas via least squares or arithmetic average Alreadythinking about asymptotics of samples fromsmooth manifoldsmdashlet alone singular spaces suchas the moduli spaces relevant to fly wingsmdashrequiresa radical shift in perspective as compared withsamples from linear spaces because differentcharacterizations lift to different notions in curvedsettings [23] Moreover for many of these notionssuch as a Freacutechet mean defined by least squaresthe minimizer is not unique what is the averageof the north pole and south pole on the sphere Itis the entire equator (This explains the phrase ldquoapopulation meanrdquo in the previous paragraph asopposed to ldquothe population meanrdquo) Nonethelesslaws of large numbers hold [41] [5] and centrallimit theorems exist in various situations [26][17] [18] [6] [22] such as when the data areconcentrated near a Freacutechet mean (Many of thesetheorems were motivated by biology read the titleof [22] for instance) For additional backgroundand references concerning statistics on manifoldssee [23]

Statistics on smooth manifolds relies on approx-imation of the manifold by its tangent space whichis Euclidean Once a metric on the manifold hasbeen specifiedmdasha necessary and often nontrivialstep for statistics because of the need to knowhow far apart sample objects aremdashthe exponen-tial map at a point x takes a neighborhood Uof 0 in the tangent space Tx to a neighborhoodexp(U) around x Ordinary probability in the vectorspace Tx is transformed into geometric probabilityon the smooth sample space via the exponentialmap at x which is close to an isometry when Uis small In particular central limit theorems onsmooth manifolds can be interpreted as describingvariation of sample means around a populationmean by pushing forward the linear setup alongan exponential map

In the singular setting of stratified spaces thereis no general method to compare with or reduceto ordinary linear probability and statistics in thetangent space at a mean The types of samplespaces M intended here are those possessinga topological stratification (see [16] or [35]) anexpression as a disjoint union M = M0 cup M1 cupmiddot middot middot cupMr of strata such that for all d isin 0 rbull the stratum Md is a manifold

1180 Notices of the AMS Volume 62 Number 10

0 micro micro

Figure 3

bull M0 cup middot middot middot cupMd is closed in M andbull for every pair x y isin Md there is a homeomor-

phism M rarr M that takes x to y and takes eachstratum to itself

The third condition ensures that the topology ofMand its stratification behave precisely the same waynear x as they do near y Examples include graphs(or networks) whose strata are vertices and edgespolytopes whose strata are (relatively open) facesand real (semi)algebraic varieties whose strataconsist of classes of singular points The tangentspace Tx to a stratified space M at a point x isa cone over a stratified space of dimension oneless than dim(M) If M is already a cone withapex x then Tx M is as complicated as M itselfcapturing the local structure of a stratified spacenear a point need not simplify the geometry theway it does in the smooth setting

Cases where stratified laws of large numbersand central limit theorems are known occurin specific simple examples where comparisonwith linear spaces is possible such as openbooks [19] (unions of Euclidean half-spaces gluedalong their boundary subspaces) isolated planarhyperbolic singularities (cones where the singularpoint has angle sum gt 2π instead of the icecream case of lt 2π ) [24] and binary trees [4]But in general de novo geometric constructionsare required Such has been the avenue for agood deal of probability theory including laws oflarge numbers that has been established in thegenerality of nonpositively curved spaces (see [38])which are defined as spaces whose triangles withgiven edge lengths are thinner than would beexpected from Euclidean geometry (see [9]) Themetric structures of nonpositively curved spacesinduce a number of simplifying consequencessuch as uniqueness of Freacutechet means which haveplayed important roles in the progress thus far ingeometric probability and statistics on stratifiedspaces Nonetheless the promising interactions ofnonpositive curvature with geometrically stratifiedprobability and statistics remain in their infancy

Sticky Means

The novelty of attempting statistics on stratifiedsample spaces is exemplified by nonclassicalldquostickyrdquo phenomena that can occur at singularitiesIn Euclidean statistics the mean of a finite set ofpoints moves slightly in any desired direction byperturbing the points This intuition extends to

manifolds by linear approximation [26] [17] [6][22] but it can fail even in the simplest singularsample spaces Consider the tripods for instancedepicted in Figure 3 In the center tripod theFreacutechet mean micro of the three points on the legs isthe origin 0 by symmetry But wiggling the threepoints does not move the Freacutechet mean at all oneof the points would have to be moved more thantwice as far from the center as in the final tripodto nudge the mean onto its leg

An open book with three pages is a product ofa tripod with vector space Rd (To get an arbitrarynumber of pages replace the tripod with a graphhaving an arbitrary number of rays emanatingfrom the center point It bears mentioning thatevery topologically stratified space M is locallyhomeomorphic to an open book near any point ona stratum of dimension dim(M)minus1 In other wordsthe tangent cones to points on codimension 1strata are open books Hence this example isuniversal in some sense) In an open book themean sticks to the spinemdashthe copy of Rd that iscontained in all three pagesmdashwhen three similarlysituated points are wiggled although that wigglingcan move the mean in arbitrary directions insidethe spine [19]

With these examples and others in mind aformal definition has been developed [24] letM be a set of measures on a metric space KAssume M has a given topology A mean is acontinuous assignmentMrarr closed subsets ofKA measure micro sticks to a closed subset C sube K if everyneighborhood of micro in M contains a nonemptyopen subset consisting of measures whose meansets are contained in C

Stickiness implies that it is possible for themeans of large samples from a distribution on astratified space to lie in a subset of low dimensionwith positive probability (equal to 1 in some casessuch as the tripod) even if the distribution beingsampled is well behaved [2] [4] [19] [24] Thiscontrasts with usual laws of large numbers wheresample means approach the population mean butalmost surely never land on itmdashor on any givensubset of low dimension containing it Thinking interms of central limit theorems whereas in usualcases the limiting distributions have full supportin sticky cases the limiting distributions canhave components supported on low-dimensionalsubsets of the sample space

Examples aside central limit theorems on strat-ified spaces of any generality have yet to beformulated let alone proved Subtle and deepbehavior associated with the boundary betweensticky and nonsticky is still being discovered Inparticular the distinction between positive andnegative curvature seems to be critical for stick-iness One common type of positive curvatureparticularly in statistical problems appears whena flat or positively curved smooth manifold is

November 2015 Notices of the AMS 1181

quotiented by a proper Lie group action Shapespaces (see [27] [28])mdashincluding those applicableto fly wings with constant topology by keepingtrack of the locations of the vertices of the graphmdashhave this form for instance being quotients ofmatrix spaces by actions of rotations scaling orprojective transformations Huckemann [23] hasshown essentially that when sampling from a strat-ified space that is a quotient of this form Freacutechetmeans run away from singularities and hence landalmost surely in the smooth locus On the otherhand every case exhibiting stickiness has negativecurvature (in the sense of Alexandrov curvaturebounded strictly above by 0 see [9]) It remains anopen problem to formulate a condition in terms ofsomething like negative sectional curvature thatallows means to run toward singularities of thespace for appropriate types of distributions on it

Further potential implications for pure mathe-matics emerge when considering how to recovercurvature invariants from asymptotics insteadof the other way around In the smooth casefor local samplesmdashthat is sufficiently nearby themeanmdashaccounting for curvature reduces centrallimit theorems to the Euclidean version Thus ifthe curvature is unknown but properties of the dis-tribution are known then the curvature ought to berecoverable In singular settings attempting sucha recovery could give rise to singular analogues ofsmooth Riemannian curvature invariants

Phylogenetic Trees

Moduli spaces of fly wing modules constitute justone of myriad ways that geometric probability onstratified spaces can arise Among those nothingis special about biology except perhaps that itsdiversity of forms and the nature of their variationlend themselves to geometric data analysis of thissort That said the genesis of stratified statisticscame directly from another evolutionary biologymoduli space

One of the principal aims of systematic andevolutionary biology is to determine relationshipsbetween species Trees representing these kinshipsare reconstructed from biological data such asDNA sequences or morphology Experimentalprocedures generate distributions on the set ofphylogenetic trees in multiple ways For examplethe evolutionary history of a single gene acrossmultiple individuals is represented by a ldquogene treerdquoNatural processes such as incomplete lineagesorting cause gene trees sampled from a set ofindividuals to differ in topology from one anotherand from the evolutionary history of their set ofspeciesmdashthe history of population bifurcationsleading to divergence (see [30] for example)Another crucial example occurs not from the databut from the method of inference the problemof phylogenetic tree reconstruction is intractableenough that probabilistic methods are commonly

used resulting in posterior distributions insteadof a single optimum [36] [25]

Mathematically speaking a phylogenetic tree ona given set of species is a connected metric graphwith no loops whose vertices of degree 1 (ldquoleavesrdquo)are labeled by the species The introduction byBillera Holmes and Vogtmann of an appropriatemoduli space for the problem namely the spaceof phylogenetic trees [7] initiated a surge ofactivity attempting to mine the combinatoricsand geometry of the space to devise statisticalmethods And the combinatorics is formidablefor n species the tree space Tn is a polyhedralstratified space composed of (2nminus 3) Euclideanorthants of dimension nminus2 Despite its complexityTn succumbed the advent of a polynomial timealgorithm for shortest paths in tree space [34] madeit possible to compute Freacutechet means in Tn [1][32] based on probability theory for nonpositivelycurved spaces [38]

The geometric probability on stratified spaces inthe previous two sections was initially developedspecifically to understand the behavior of Freacutechetmeans in the moduli spaceTn The simple examples[19] [24] deal with informative local subsets of TnIn addition to those efforts are under way to provelaws of large numbers and central limit theoremsin the global context of Tn itself [2] [3]

Stickiness in tree space has a concrete mean-ingful biological interpretation (although the juryis out on the extent to which the interpretationreflects reality) Points in strata of lower dimensionrepresent phylogenetic trees with one or more non-binary vertices Biologically these are unresolvedphylogenies one species diverges simultaneouslyinto three new species for example instead of firstsplitting into two new species followed by anotherbinary divergence event from one of the two Setsof phylogenetic trees from biological experimentsoften contain evidence for many or all of thepossible sequences of binary divergence eventsthat resolve a given multiple divergence Stickinessimplies that the mean tree contains an unresolveddivergence whenever there is insufficient strengthof pull toward any one of the resolving binarysequences to support the conclusion it representsThe picture to keep in mind is the tripod whosestickiness we saw earlier it is the tree space T3 onthree species

Conclusion

Spaces of biological forms provide an environmentin which mathematical methods can assign dis-tances between phenotypes The lines of inquiryhere fit into biologist David Houlersquos vision of phe-nomics [21] particularly the genotype-phenotypemap To make a long story short selection actson phenotype whereas descent and modificationact via genotype so it becomes desirable to com-pare phenotypic distance to genotypic distance

1182 Notices of the AMS Volume 62 Number 10

including some working definition of each distancein any given case study In general to grapplewith statistical correlations between genotypes andphenotypes requires a mathematical parametriza-tion for each of those biological concepts Forgenotypes that is likely to involve combinato-rial considerations since the basic quantum ofinformation is discrete Perhaps large-scale con-tinuous analogues or approximations might bemeaningful as they are for statistical mechanicsand that is another potential departure point formathematics On the other hand phenotype isoften continuous in nature what are the locationsof the vertices in a fly wing How do the arcsbetween the vertices bow outward or inward Howdo these characteristics change from wing to wingParametrization in such a context requires thinkingabout spaces of continuous objects the sort ofthinking that mathematics is designed to carryout The examples presented here demonstrate asample of the kinds of abstract structures in puremathematicsmdashalong with unexpected questionsabout themmdashthat biological investigations reveal

References[1] Miroslav Bacaacutek Computing medians and means in

Hadamard space Society for Industrial and AppliedMathematics (SIAM) Journal on Optimization 24 (2014)1542ndash1566

[2] Dennis Barden Huiling Le and Megan OwenCentral limit theorems for Freacutechet means in thespace of phylogenetic trees The Electronic Journalof Probability 8 (2013) no 25 1ndash25

[3] Limiting behaviour of Freacutechet means inthe space of phylogenetic trees preprint 2014arXivmathPR14097602v1

[4] B Basrak Limit theorems for the inductive mean onmetric trees Journal of Applied Probability 47 (2010)1136ndash1149

[5] R Bhattacharya and V Patrangenaru Large sam-ple theory of intrinsic and extrinsic sample means onmanifolds I The Annals of Statistics 31 (2003) no 11ndash29

[6] Large sample theory of intrinsic and extrin-sic sample means on manifolds II The Annals ofStatistics 33 (2005) no 3 1225ndash1259

[7] L Billera S Holmes and K Vogtmann Geometry ofthe space of phylogenetic trees Advances in AppliedMathematics 27 (2001) 733ndash767

[8] Seth S Blair Wing vein patterning in Drosophila andthe analysis of intercellular signaling The Annual Re-view of Cell and Developmental Biology 23 (2007)293ndash319

[9] Martin R Bridson and Andreacute Haefliger Met-ric Spaces of Nonpositive Curvature Grundlehrender Mathematischen Wissenschaften [Fundamen-tal Principles of Mathematical Sciences] vol 319Springer-Verlag Berlin 1999

[10] Francesca Cagliari Barbara Di Fabio andMassimo Ferri One-dimensional reduction of mul-tidimensional persistent homology Proceedings ofthe American Mathematical Society 138 (2010) no 83003ndash3017

[11] Gunnar Carlsson and Vin de Silva Zigzag persis-tence Foundations of Computational Mathematics 10(2010) no 4 367ndash405

[12] Gunnar Carlsson and Afra Zomorodian The the-ory of multidimensional persistence Discrete andComputational Geometry 42 (2009) 71ndash93

[13] James Degnan and Noah Rosenberg Discordance ofspecies trees with their most likely gene trees PublicLibrary of Science (PLoS) Genetics 3 (2006) 762ndash768

[14] Herbert Edelsbrunner David Letscher and AfraZomorodian Topological persistence and simpli-fication Discrete and Computational Geometry 28511ndash533

[15] Patrizio Frosini and Claudia Landi Size functionsand formal series Applicable Algebra in EngineeringCommunication and Computing 12 (2001) no 4 327ndash349

[16] Mark Goresky and Robert MacPherson StratifiedMorse Theory Ergebnisse der Mathematik und ihrerGrenzgebiete (3) [Results in Mathematics and RelatedAreas (3)] vol 14 Springer-Verlag 1988

[17] H Hendriks and Z Landsman Asymptotic behaviourof sample mean location for manifolds Statistics andProbability Letters 26 (1996) 169ndash178

[18] Mean location and sample mean location onmanifolds Asymptotics tests confidence regionsJournal of Multivariate Analysis 67 (1998) 227ndash243

[19] Thomas Hotz Stephan Huckemann Huiling LeJ S Marron Jonathan C Mattingly Ezra MillerJames Nolen Megan Owen Vic Patrangenaru andSean Skwerer Sticky central limit theorems on openbooks Annals of Applied Probability 23 (2013) no 62238ndash2258

[20] David Houle Jason Mezey Paul Galpern and Ash-ley Carter Automated measurement of Drosophilawings BioMed Central (BBMC) Evolutionary Biology 3(2003) 25ndash37

[21] David Houle Numbering the hairs on our heads Theshared challenge and promise of phenomics Proceed-ings of the National Academy of Sciences USA 107(2010) 1793ndash1799

[22] Stephan Huckemann Inference on 3D Procrustesmeans Tree boles growth rank-deficient diffusion ten-sors and perturbation models Scandinavian Journalof Statistics 38 (2011) no 3 424ndash446

[23] On the meaning of mean shape Manifoldstability locus and the two sample test Annals ofthe Institute of Statistical Mathematics 64 (2012)1227ndash1259

[24] Stephan Huckemann Jonathan C MattinglyEzra Miller and James Nolen Sticky centrallimit theorems at isolated hyperbolic planarsingularities Electronic Journal of Probability20 (2015) no 78 1ndash34 doi101214EJPv20-3887arXivmathPR14106879

[25] John P Huelsenbeck and Fredrik Ronquist Mr-Bayes Bayesian inference of phylogenetic treesBioinformatics 17 (2001) 754ndash755

[26] P Jupp Residuals for directional data Journal ofApplied Statistics 15 (1988) no 2 137ndash147

[27] David G Kendall Shape manifolds Procrustean met-rics and complex projective spaces Bulletin of theLondon Mathematical Society 16 (1984) 81ndash121

[28] David G Kendall Dennis Barden Thomas KCarne and Huiling Le Shape and Shape Theory

November 2015 Notices of the AMS 1183

A M E R I C A N M AT H E M AT I C A L S O C I E T Y

Sign up to be a member of the AMS today

wwwamsorgmembership

With your membership you will receive special discounts and free

shipping at any on-site AMS exhibit bookstore

Do you attend JMMHow about any AMS Sectional Meetings

Did you know that your savings from attending these

meetings will practically pay for your AMS membership

Wiley Series in Probability and Statistics John Wiley ampSons Ltd Chichester UK 1999

[29] Kevin Knudson A refinement of multi-dimensionalpersistence Homology Homotopy and Applications10 (2008) no 1 259ndash281

[30] Wayne P Maddison Gene trees in species treesSystematic Biology 46 (1997) 523ndash536

[31] Ezra Miller CohenndashMacaulay quotients of nor-mal semigroup rings via irreducible resolutionsMathematical Research Letters 9 (2002) no 1117ndash128

[32] Ezra Miller Megan Owen and Scott Provan Poly-hedral computational geometry for averaging metricphylogenetic trees Advances in Applied Mathematics15 (2015) 51ndash91

[33] Ezra Miller and Bernd Sturmfels CombinatorialCommutative Algebra Graduate Texts in Mathematicsvol 227 Springer-Verlag 2005

[34] Megan Owen and J S Provan A fast algorithm forcomputing geodesic distance in tree space IEEEACMTrans Comput Biol Bioinform 8 (2011) 2ndash13

[35] Markus J Pflaum Analytic and Geometric Study ofStratified Spaces Lecture Notes in Mathematics vol1768 Springer-Verlag 2001

[36] Bruce Rannala and Ziheng Yang Bayesian phyloge-netic inference using DNA sequences A Markov chainMonte Carlo method Molecular Biology and Evolution14 (1997) 717ndash724

[37] Vanessa Robins Towards computing homology fromfinite approximations Topology Proceedings 24 (1999)no 1 503ndash532

[38] K-T Sturm Probability measures on metric spaces ofnonpositive curvature in Heat Kernels and Analysison Manifolds Graphs and Metric Spaces lecture notesfrom a quarter program on heat kernels random walksand analysis on manifolds and graphs ContemporaryMathematics vol 338 American Mathematical Society2003 pp 357ndash390

[39] K E Weber Selection on wing allometry in Drosophilamelanogaster Genetics 126 (1990) 975ndash989

[40] How small are the smallest selectable domainsof form Genetics 130 (1992) 345ndash353

[41] H Ziezold On expected figures and a strong law oflarge numbers for random elements in quasi-metricspaces in Transactions of the Seventh Prague Con-ference on Information Theory Statistical DecisionFunctions Random Processes and of the Eighth Eu-ropean Meeting of Statisticians (Tech Univ PraguePrague 1974) Vol A Reidel Dordrecht 1977pp 591ndash602

1184 Notices of the AMS Volume 62 Number 10

Page 2: Fruit Flies and Moduli: Interactions between Biology and ... · and, in some cases, our substantial understanding of their geometry and combinatorics, less is known about the probability

unexpectedly much higher rates of topologicalnovelty This latter claim which is unpublishedand has yet to be tested statistically suggestsa fundamental biological hypothesis topologicalnovelty arises at the extreme of selection forcontinuous shape characteristics

Wings to Modules

This hypothesis could potentially be tested usingpersistent homology a tool for data analysis thatuses computational topology to assign modulesover polynomial rings to subsets X sube Rn [12] Thistool is a good candidate because of its abilityto emphasize differences in stratification amongotherwise similar subsets of Rn

Take our case of an embedded graph X sube R2for example For any nonnegative real numbers rand s let Xsr sube X be the set of points at distanceat least r from every vertex and within s of someedge Thus Xsr is obtained by taking the union ofthe balls of radius r around the vertices away fromthe union of s-neighborhoods of the edges In themagnified portion of the middle wing (Figure 2) ris approximately twice s

Figure 2

The homology Hi(Xsr ) with coefficients in a field kcounts connected components or loops of Xsr wheni = 0 or 1 respectively Introducing a new vertex toan edge ofX tends to create connected componentsand destroy cycles when r s because the ballsaround vertices protect them from the expandingedges The precise relations between r and s thatalter the topology of Xsr depend on the geometryof X such as the angles between edges at agiven vertex and that is the point persistenthomology uses topological invariants as measuresof geometry

To keep the data structure finite the param-eters r and s can without loss of significantinformation be restricted to integer multiples of asmall positive length ε The persistent homology ofthe graphX with the two parameters r and s is thendefined to be the direct sum Mi(X) =

oplusr s Hi(Xsr )

of all of the homology groups It is a bigraded mod-ule over the polynomial ring k[x y] the variablesx and y act on Mi(X) by comparing the homologyof Xsr with that of Xsrminusε and Xs+εr respectively

Persistent homology with only one parameter[37] [15] instead of two or more results in amodule over a polynomial ring in one variableThis case is much better studied in part becauseit behaves more tamely In particular there is afinite computable set of topological featuresmdashconnected components loops or in the generalcase features of higher dimensionmdasheach of whichhas well-defined parameters where its ldquobirthrdquoand ldquodeathrdquo occur such that every homologyclass is a direct sum of these features [14] Forfly wings or arbitrary multiparameter situationswhere the homology groups record the topologyof several increasing chains of subsets of a singletopological space no such clean description ispossible [12] However alternative presentations ofmodules from combinatorial commutative algebra[31] (or see [33 Chapter 11]) based essentiallyon the theory of primary decomposition can beunderstood topologically in terms of birth anddeath parameters Such understanding is necessaryif statistics on sets of bigraded modules are to beinterpretable biologically

Moduli as Statistical Sample Spaces

Persistent homology summarizes a sample offly wings by transforming it into a sample ofmodules It is a general principle of statistics thatto analyze samples from a set of objects oneneeds sufficient understanding of the set of allobjects from metric probabilistic and sometimescombinatorial perspectives How far apart are pairsof objects How likely is each object to be selectedat random Mathematics excels at placing coherentstructures on sets of all objects of a given typeThe resulting ldquomoduli spacesrdquo pervade geometryof many sortsmdashdifferential algebraic arithmeticcomplex discretemdashand also theoretical physicsand topology though in the latter field they arecalled classifying spaces But despite their ubiquityand in some cases our substantial understandingof their geometry and combinatorics less is knownabout the probability and statistics of samplingfrom them

Like many moduli spaces the ones parametriz-ing bigraded modules over k[x y] are quotientsof algebraic varieties by continuous group actions[12] This makes the moduli spaces complicatedunions of manifolds of varying dimension Onepossibility covered in the next section is to de-velop geometric methods to analyze samples fromsuch ldquostratified spacesrdquo Another which tends tobe favored a priori for computational reasonsis to use discrete invariants as proxies for thecontinuous moduli For bigraded modules thesediscrete invariants include

November 2015 Notices of the AMS 1179

bull single-parameter persistence by tracing zigzagsthrough the groups Hi(Xsr ) [11]

bull Hilbert series meaning the dimensions of thevector spaces Hi(Xsr ) disregarding all of thehomomorphisms between them

bull rank invariants which take into account theranks of the homomorphisms but not theirprecise algebraic structure [12] [10]

bull Betti numbers which record discrete homolog-ical invariants of the module [29]

Any discrete invariant subdivides the modulispace into regions where the invariant is constantUnderstanding the nature of these subdivisions isa pragmatic matter for bigraded k[x y]-modulesgiven the fly wing context but (modifications of)some of these new sorts of questions make sensefor any moduli space or indeed any stratified space

1 What metric or combinatorial properties dothese subdivisions possess For exampledo the regions have equal dimension androughly equal size or are there a few bigregions (or only one) of top dimension and abunch of smaller ones

2 What distributions of discrete invariants areexpected from a given (biological) problemMight the discrete invariants be expected todistinguish between the modules producedby applied situations even if the regionsarenrsquot of similar size

3 General geometric statistical question what(natural) measures should be placed on thevalues of a discrete invariant given thegeometry of the moduli spaces

4 Can the continuous variation be captured dis-cretely to desired precision More preciselyis there a family (indexed by n) of sets ofdiscrete invariants such that letting n rarr infinresults in an increasingly fine subdivision

Geometric Probability on Stratified Spaces

As we have seen for fly wings statistical problemswhere the sample objects are more complicatedthan vectors in vector spaces naturally lead tosampling from stratified spaces The goal ofgeometric statistics in this setting is like inordinary linear statistics to identify describesummarize or make inferences about an unknownprobability distribution on the sample space fromwhich the sample points are assumed to bedrawn To that end it is crucial to understand theopposite problem from probability theory given adistribution on the relevant sample space how dosamples from that distribution behave

The simplest summary of a distribution isa pointmdashan average or population meanmdashaboutwhich the distribution is centered Laws of largenumbers assert that means of increasingly largerandom samples from a distribution converge to apopulation mean Statistics requires knowledge of

the expected difference between a sample meanand a population mean Central limit theoremshelp quantify that difference by describing thevariation of sample means around populationmeans In ordinary statistics for example whenthe sample space is the real line the central limittheorem dictates that sample means vary aroundthe population mean according to a distributionthat is in the limit of infinite sample size Gaussian

In Euclidean space basic concepts such as meanexpectation and average coincide and thereforeadmit multiple equivalent characterizations suchas via least squares or arithmetic average Alreadythinking about asymptotics of samples fromsmooth manifoldsmdashlet alone singular spaces suchas the moduli spaces relevant to fly wingsmdashrequiresa radical shift in perspective as compared withsamples from linear spaces because differentcharacterizations lift to different notions in curvedsettings [23] Moreover for many of these notionssuch as a Freacutechet mean defined by least squaresthe minimizer is not unique what is the averageof the north pole and south pole on the sphere Itis the entire equator (This explains the phrase ldquoapopulation meanrdquo in the previous paragraph asopposed to ldquothe population meanrdquo) Nonethelesslaws of large numbers hold [41] [5] and centrallimit theorems exist in various situations [26][17] [18] [6] [22] such as when the data areconcentrated near a Freacutechet mean (Many of thesetheorems were motivated by biology read the titleof [22] for instance) For additional backgroundand references concerning statistics on manifoldssee [23]

Statistics on smooth manifolds relies on approx-imation of the manifold by its tangent space whichis Euclidean Once a metric on the manifold hasbeen specifiedmdasha necessary and often nontrivialstep for statistics because of the need to knowhow far apart sample objects aremdashthe exponen-tial map at a point x takes a neighborhood Uof 0 in the tangent space Tx to a neighborhoodexp(U) around x Ordinary probability in the vectorspace Tx is transformed into geometric probabilityon the smooth sample space via the exponentialmap at x which is close to an isometry when Uis small In particular central limit theorems onsmooth manifolds can be interpreted as describingvariation of sample means around a populationmean by pushing forward the linear setup alongan exponential map

In the singular setting of stratified spaces thereis no general method to compare with or reduceto ordinary linear probability and statistics in thetangent space at a mean The types of samplespaces M intended here are those possessinga topological stratification (see [16] or [35]) anexpression as a disjoint union M = M0 cup M1 cupmiddot middot middot cupMr of strata such that for all d isin 0 rbull the stratum Md is a manifold

1180 Notices of the AMS Volume 62 Number 10

0 micro micro

Figure 3

bull M0 cup middot middot middot cupMd is closed in M andbull for every pair x y isin Md there is a homeomor-

phism M rarr M that takes x to y and takes eachstratum to itself

The third condition ensures that the topology ofMand its stratification behave precisely the same waynear x as they do near y Examples include graphs(or networks) whose strata are vertices and edgespolytopes whose strata are (relatively open) facesand real (semi)algebraic varieties whose strataconsist of classes of singular points The tangentspace Tx to a stratified space M at a point x isa cone over a stratified space of dimension oneless than dim(M) If M is already a cone withapex x then Tx M is as complicated as M itselfcapturing the local structure of a stratified spacenear a point need not simplify the geometry theway it does in the smooth setting

Cases where stratified laws of large numbersand central limit theorems are known occurin specific simple examples where comparisonwith linear spaces is possible such as openbooks [19] (unions of Euclidean half-spaces gluedalong their boundary subspaces) isolated planarhyperbolic singularities (cones where the singularpoint has angle sum gt 2π instead of the icecream case of lt 2π ) [24] and binary trees [4]But in general de novo geometric constructionsare required Such has been the avenue for agood deal of probability theory including laws oflarge numbers that has been established in thegenerality of nonpositively curved spaces (see [38])which are defined as spaces whose triangles withgiven edge lengths are thinner than would beexpected from Euclidean geometry (see [9]) Themetric structures of nonpositively curved spacesinduce a number of simplifying consequencessuch as uniqueness of Freacutechet means which haveplayed important roles in the progress thus far ingeometric probability and statistics on stratifiedspaces Nonetheless the promising interactions ofnonpositive curvature with geometrically stratifiedprobability and statistics remain in their infancy

Sticky Means

The novelty of attempting statistics on stratifiedsample spaces is exemplified by nonclassicalldquostickyrdquo phenomena that can occur at singularitiesIn Euclidean statistics the mean of a finite set ofpoints moves slightly in any desired direction byperturbing the points This intuition extends to

manifolds by linear approximation [26] [17] [6][22] but it can fail even in the simplest singularsample spaces Consider the tripods for instancedepicted in Figure 3 In the center tripod theFreacutechet mean micro of the three points on the legs isthe origin 0 by symmetry But wiggling the threepoints does not move the Freacutechet mean at all oneof the points would have to be moved more thantwice as far from the center as in the final tripodto nudge the mean onto its leg

An open book with three pages is a product ofa tripod with vector space Rd (To get an arbitrarynumber of pages replace the tripod with a graphhaving an arbitrary number of rays emanatingfrom the center point It bears mentioning thatevery topologically stratified space M is locallyhomeomorphic to an open book near any point ona stratum of dimension dim(M)minus1 In other wordsthe tangent cones to points on codimension 1strata are open books Hence this example isuniversal in some sense) In an open book themean sticks to the spinemdashthe copy of Rd that iscontained in all three pagesmdashwhen three similarlysituated points are wiggled although that wigglingcan move the mean in arbitrary directions insidethe spine [19]

With these examples and others in mind aformal definition has been developed [24] letM be a set of measures on a metric space KAssume M has a given topology A mean is acontinuous assignmentMrarr closed subsets ofKA measure micro sticks to a closed subset C sube K if everyneighborhood of micro in M contains a nonemptyopen subset consisting of measures whose meansets are contained in C

Stickiness implies that it is possible for themeans of large samples from a distribution on astratified space to lie in a subset of low dimensionwith positive probability (equal to 1 in some casessuch as the tripod) even if the distribution beingsampled is well behaved [2] [4] [19] [24] Thiscontrasts with usual laws of large numbers wheresample means approach the population mean butalmost surely never land on itmdashor on any givensubset of low dimension containing it Thinking interms of central limit theorems whereas in usualcases the limiting distributions have full supportin sticky cases the limiting distributions canhave components supported on low-dimensionalsubsets of the sample space

Examples aside central limit theorems on strat-ified spaces of any generality have yet to beformulated let alone proved Subtle and deepbehavior associated with the boundary betweensticky and nonsticky is still being discovered Inparticular the distinction between positive andnegative curvature seems to be critical for stick-iness One common type of positive curvatureparticularly in statistical problems appears whena flat or positively curved smooth manifold is

November 2015 Notices of the AMS 1181

quotiented by a proper Lie group action Shapespaces (see [27] [28])mdashincluding those applicableto fly wings with constant topology by keepingtrack of the locations of the vertices of the graphmdashhave this form for instance being quotients ofmatrix spaces by actions of rotations scaling orprojective transformations Huckemann [23] hasshown essentially that when sampling from a strat-ified space that is a quotient of this form Freacutechetmeans run away from singularities and hence landalmost surely in the smooth locus On the otherhand every case exhibiting stickiness has negativecurvature (in the sense of Alexandrov curvaturebounded strictly above by 0 see [9]) It remains anopen problem to formulate a condition in terms ofsomething like negative sectional curvature thatallows means to run toward singularities of thespace for appropriate types of distributions on it

Further potential implications for pure mathe-matics emerge when considering how to recovercurvature invariants from asymptotics insteadof the other way around In the smooth casefor local samplesmdashthat is sufficiently nearby themeanmdashaccounting for curvature reduces centrallimit theorems to the Euclidean version Thus ifthe curvature is unknown but properties of the dis-tribution are known then the curvature ought to berecoverable In singular settings attempting sucha recovery could give rise to singular analogues ofsmooth Riemannian curvature invariants

Phylogenetic Trees

Moduli spaces of fly wing modules constitute justone of myriad ways that geometric probability onstratified spaces can arise Among those nothingis special about biology except perhaps that itsdiversity of forms and the nature of their variationlend themselves to geometric data analysis of thissort That said the genesis of stratified statisticscame directly from another evolutionary biologymoduli space

One of the principal aims of systematic andevolutionary biology is to determine relationshipsbetween species Trees representing these kinshipsare reconstructed from biological data such asDNA sequences or morphology Experimentalprocedures generate distributions on the set ofphylogenetic trees in multiple ways For examplethe evolutionary history of a single gene acrossmultiple individuals is represented by a ldquogene treerdquoNatural processes such as incomplete lineagesorting cause gene trees sampled from a set ofindividuals to differ in topology from one anotherand from the evolutionary history of their set ofspeciesmdashthe history of population bifurcationsleading to divergence (see [30] for example)Another crucial example occurs not from the databut from the method of inference the problemof phylogenetic tree reconstruction is intractableenough that probabilistic methods are commonly

used resulting in posterior distributions insteadof a single optimum [36] [25]

Mathematically speaking a phylogenetic tree ona given set of species is a connected metric graphwith no loops whose vertices of degree 1 (ldquoleavesrdquo)are labeled by the species The introduction byBillera Holmes and Vogtmann of an appropriatemoduli space for the problem namely the spaceof phylogenetic trees [7] initiated a surge ofactivity attempting to mine the combinatoricsand geometry of the space to devise statisticalmethods And the combinatorics is formidablefor n species the tree space Tn is a polyhedralstratified space composed of (2nminus 3) Euclideanorthants of dimension nminus2 Despite its complexityTn succumbed the advent of a polynomial timealgorithm for shortest paths in tree space [34] madeit possible to compute Freacutechet means in Tn [1][32] based on probability theory for nonpositivelycurved spaces [38]

The geometric probability on stratified spaces inthe previous two sections was initially developedspecifically to understand the behavior of Freacutechetmeans in the moduli spaceTn The simple examples[19] [24] deal with informative local subsets of TnIn addition to those efforts are under way to provelaws of large numbers and central limit theoremsin the global context of Tn itself [2] [3]

Stickiness in tree space has a concrete mean-ingful biological interpretation (although the juryis out on the extent to which the interpretationreflects reality) Points in strata of lower dimensionrepresent phylogenetic trees with one or more non-binary vertices Biologically these are unresolvedphylogenies one species diverges simultaneouslyinto three new species for example instead of firstsplitting into two new species followed by anotherbinary divergence event from one of the two Setsof phylogenetic trees from biological experimentsoften contain evidence for many or all of thepossible sequences of binary divergence eventsthat resolve a given multiple divergence Stickinessimplies that the mean tree contains an unresolveddivergence whenever there is insufficient strengthof pull toward any one of the resolving binarysequences to support the conclusion it representsThe picture to keep in mind is the tripod whosestickiness we saw earlier it is the tree space T3 onthree species

Conclusion

Spaces of biological forms provide an environmentin which mathematical methods can assign dis-tances between phenotypes The lines of inquiryhere fit into biologist David Houlersquos vision of phe-nomics [21] particularly the genotype-phenotypemap To make a long story short selection actson phenotype whereas descent and modificationact via genotype so it becomes desirable to com-pare phenotypic distance to genotypic distance

1182 Notices of the AMS Volume 62 Number 10

including some working definition of each distancein any given case study In general to grapplewith statistical correlations between genotypes andphenotypes requires a mathematical parametriza-tion for each of those biological concepts Forgenotypes that is likely to involve combinato-rial considerations since the basic quantum ofinformation is discrete Perhaps large-scale con-tinuous analogues or approximations might bemeaningful as they are for statistical mechanicsand that is another potential departure point formathematics On the other hand phenotype isoften continuous in nature what are the locationsof the vertices in a fly wing How do the arcsbetween the vertices bow outward or inward Howdo these characteristics change from wing to wingParametrization in such a context requires thinkingabout spaces of continuous objects the sort ofthinking that mathematics is designed to carryout The examples presented here demonstrate asample of the kinds of abstract structures in puremathematicsmdashalong with unexpected questionsabout themmdashthat biological investigations reveal

References[1] Miroslav Bacaacutek Computing medians and means in

Hadamard space Society for Industrial and AppliedMathematics (SIAM) Journal on Optimization 24 (2014)1542ndash1566

[2] Dennis Barden Huiling Le and Megan OwenCentral limit theorems for Freacutechet means in thespace of phylogenetic trees The Electronic Journalof Probability 8 (2013) no 25 1ndash25

[3] Limiting behaviour of Freacutechet means inthe space of phylogenetic trees preprint 2014arXivmathPR14097602v1

[4] B Basrak Limit theorems for the inductive mean onmetric trees Journal of Applied Probability 47 (2010)1136ndash1149

[5] R Bhattacharya and V Patrangenaru Large sam-ple theory of intrinsic and extrinsic sample means onmanifolds I The Annals of Statistics 31 (2003) no 11ndash29

[6] Large sample theory of intrinsic and extrin-sic sample means on manifolds II The Annals ofStatistics 33 (2005) no 3 1225ndash1259

[7] L Billera S Holmes and K Vogtmann Geometry ofthe space of phylogenetic trees Advances in AppliedMathematics 27 (2001) 733ndash767

[8] Seth S Blair Wing vein patterning in Drosophila andthe analysis of intercellular signaling The Annual Re-view of Cell and Developmental Biology 23 (2007)293ndash319

[9] Martin R Bridson and Andreacute Haefliger Met-ric Spaces of Nonpositive Curvature Grundlehrender Mathematischen Wissenschaften [Fundamen-tal Principles of Mathematical Sciences] vol 319Springer-Verlag Berlin 1999

[10] Francesca Cagliari Barbara Di Fabio andMassimo Ferri One-dimensional reduction of mul-tidimensional persistent homology Proceedings ofthe American Mathematical Society 138 (2010) no 83003ndash3017

[11] Gunnar Carlsson and Vin de Silva Zigzag persis-tence Foundations of Computational Mathematics 10(2010) no 4 367ndash405

[12] Gunnar Carlsson and Afra Zomorodian The the-ory of multidimensional persistence Discrete andComputational Geometry 42 (2009) 71ndash93

[13] James Degnan and Noah Rosenberg Discordance ofspecies trees with their most likely gene trees PublicLibrary of Science (PLoS) Genetics 3 (2006) 762ndash768

[14] Herbert Edelsbrunner David Letscher and AfraZomorodian Topological persistence and simpli-fication Discrete and Computational Geometry 28511ndash533

[15] Patrizio Frosini and Claudia Landi Size functionsand formal series Applicable Algebra in EngineeringCommunication and Computing 12 (2001) no 4 327ndash349

[16] Mark Goresky and Robert MacPherson StratifiedMorse Theory Ergebnisse der Mathematik und ihrerGrenzgebiete (3) [Results in Mathematics and RelatedAreas (3)] vol 14 Springer-Verlag 1988

[17] H Hendriks and Z Landsman Asymptotic behaviourof sample mean location for manifolds Statistics andProbability Letters 26 (1996) 169ndash178

[18] Mean location and sample mean location onmanifolds Asymptotics tests confidence regionsJournal of Multivariate Analysis 67 (1998) 227ndash243

[19] Thomas Hotz Stephan Huckemann Huiling LeJ S Marron Jonathan C Mattingly Ezra MillerJames Nolen Megan Owen Vic Patrangenaru andSean Skwerer Sticky central limit theorems on openbooks Annals of Applied Probability 23 (2013) no 62238ndash2258

[20] David Houle Jason Mezey Paul Galpern and Ash-ley Carter Automated measurement of Drosophilawings BioMed Central (BBMC) Evolutionary Biology 3(2003) 25ndash37

[21] David Houle Numbering the hairs on our heads Theshared challenge and promise of phenomics Proceed-ings of the National Academy of Sciences USA 107(2010) 1793ndash1799

[22] Stephan Huckemann Inference on 3D Procrustesmeans Tree boles growth rank-deficient diffusion ten-sors and perturbation models Scandinavian Journalof Statistics 38 (2011) no 3 424ndash446

[23] On the meaning of mean shape Manifoldstability locus and the two sample test Annals ofthe Institute of Statistical Mathematics 64 (2012)1227ndash1259

[24] Stephan Huckemann Jonathan C MattinglyEzra Miller and James Nolen Sticky centrallimit theorems at isolated hyperbolic planarsingularities Electronic Journal of Probability20 (2015) no 78 1ndash34 doi101214EJPv20-3887arXivmathPR14106879

[25] John P Huelsenbeck and Fredrik Ronquist Mr-Bayes Bayesian inference of phylogenetic treesBioinformatics 17 (2001) 754ndash755

[26] P Jupp Residuals for directional data Journal ofApplied Statistics 15 (1988) no 2 137ndash147

[27] David G Kendall Shape manifolds Procrustean met-rics and complex projective spaces Bulletin of theLondon Mathematical Society 16 (1984) 81ndash121

[28] David G Kendall Dennis Barden Thomas KCarne and Huiling Le Shape and Shape Theory

November 2015 Notices of the AMS 1183

A M E R I C A N M AT H E M AT I C A L S O C I E T Y

Sign up to be a member of the AMS today

wwwamsorgmembership

With your membership you will receive special discounts and free

shipping at any on-site AMS exhibit bookstore

Do you attend JMMHow about any AMS Sectional Meetings

Did you know that your savings from attending these

meetings will practically pay for your AMS membership

Wiley Series in Probability and Statistics John Wiley ampSons Ltd Chichester UK 1999

[29] Kevin Knudson A refinement of multi-dimensionalpersistence Homology Homotopy and Applications10 (2008) no 1 259ndash281

[30] Wayne P Maddison Gene trees in species treesSystematic Biology 46 (1997) 523ndash536

[31] Ezra Miller CohenndashMacaulay quotients of nor-mal semigroup rings via irreducible resolutionsMathematical Research Letters 9 (2002) no 1117ndash128

[32] Ezra Miller Megan Owen and Scott Provan Poly-hedral computational geometry for averaging metricphylogenetic trees Advances in Applied Mathematics15 (2015) 51ndash91

[33] Ezra Miller and Bernd Sturmfels CombinatorialCommutative Algebra Graduate Texts in Mathematicsvol 227 Springer-Verlag 2005

[34] Megan Owen and J S Provan A fast algorithm forcomputing geodesic distance in tree space IEEEACMTrans Comput Biol Bioinform 8 (2011) 2ndash13

[35] Markus J Pflaum Analytic and Geometric Study ofStratified Spaces Lecture Notes in Mathematics vol1768 Springer-Verlag 2001

[36] Bruce Rannala and Ziheng Yang Bayesian phyloge-netic inference using DNA sequences A Markov chainMonte Carlo method Molecular Biology and Evolution14 (1997) 717ndash724

[37] Vanessa Robins Towards computing homology fromfinite approximations Topology Proceedings 24 (1999)no 1 503ndash532

[38] K-T Sturm Probability measures on metric spaces ofnonpositive curvature in Heat Kernels and Analysison Manifolds Graphs and Metric Spaces lecture notesfrom a quarter program on heat kernels random walksand analysis on manifolds and graphs ContemporaryMathematics vol 338 American Mathematical Society2003 pp 357ndash390

[39] K E Weber Selection on wing allometry in Drosophilamelanogaster Genetics 126 (1990) 975ndash989

[40] How small are the smallest selectable domainsof form Genetics 130 (1992) 345ndash353

[41] H Ziezold On expected figures and a strong law oflarge numbers for random elements in quasi-metricspaces in Transactions of the Seventh Prague Con-ference on Information Theory Statistical DecisionFunctions Random Processes and of the Eighth Eu-ropean Meeting of Statisticians (Tech Univ PraguePrague 1974) Vol A Reidel Dordrecht 1977pp 591ndash602

1184 Notices of the AMS Volume 62 Number 10

Page 3: Fruit Flies and Moduli: Interactions between Biology and ... · and, in some cases, our substantial understanding of their geometry and combinatorics, less is known about the probability

bull single-parameter persistence by tracing zigzagsthrough the groups Hi(Xsr ) [11]

bull Hilbert series meaning the dimensions of thevector spaces Hi(Xsr ) disregarding all of thehomomorphisms between them

bull rank invariants which take into account theranks of the homomorphisms but not theirprecise algebraic structure [12] [10]

bull Betti numbers which record discrete homolog-ical invariants of the module [29]

Any discrete invariant subdivides the modulispace into regions where the invariant is constantUnderstanding the nature of these subdivisions isa pragmatic matter for bigraded k[x y]-modulesgiven the fly wing context but (modifications of)some of these new sorts of questions make sensefor any moduli space or indeed any stratified space

1 What metric or combinatorial properties dothese subdivisions possess For exampledo the regions have equal dimension androughly equal size or are there a few bigregions (or only one) of top dimension and abunch of smaller ones

2 What distributions of discrete invariants areexpected from a given (biological) problemMight the discrete invariants be expected todistinguish between the modules producedby applied situations even if the regionsarenrsquot of similar size

3 General geometric statistical question what(natural) measures should be placed on thevalues of a discrete invariant given thegeometry of the moduli spaces

4 Can the continuous variation be captured dis-cretely to desired precision More preciselyis there a family (indexed by n) of sets ofdiscrete invariants such that letting n rarr infinresults in an increasingly fine subdivision

Geometric Probability on Stratified Spaces

As we have seen for fly wings statistical problemswhere the sample objects are more complicatedthan vectors in vector spaces naturally lead tosampling from stratified spaces The goal ofgeometric statistics in this setting is like inordinary linear statistics to identify describesummarize or make inferences about an unknownprobability distribution on the sample space fromwhich the sample points are assumed to bedrawn To that end it is crucial to understand theopposite problem from probability theory given adistribution on the relevant sample space how dosamples from that distribution behave

The simplest summary of a distribution isa pointmdashan average or population meanmdashaboutwhich the distribution is centered Laws of largenumbers assert that means of increasingly largerandom samples from a distribution converge to apopulation mean Statistics requires knowledge of

the expected difference between a sample meanand a population mean Central limit theoremshelp quantify that difference by describing thevariation of sample means around populationmeans In ordinary statistics for example whenthe sample space is the real line the central limittheorem dictates that sample means vary aroundthe population mean according to a distributionthat is in the limit of infinite sample size Gaussian

In Euclidean space basic concepts such as meanexpectation and average coincide and thereforeadmit multiple equivalent characterizations suchas via least squares or arithmetic average Alreadythinking about asymptotics of samples fromsmooth manifoldsmdashlet alone singular spaces suchas the moduli spaces relevant to fly wingsmdashrequiresa radical shift in perspective as compared withsamples from linear spaces because differentcharacterizations lift to different notions in curvedsettings [23] Moreover for many of these notionssuch as a Freacutechet mean defined by least squaresthe minimizer is not unique what is the averageof the north pole and south pole on the sphere Itis the entire equator (This explains the phrase ldquoapopulation meanrdquo in the previous paragraph asopposed to ldquothe population meanrdquo) Nonethelesslaws of large numbers hold [41] [5] and centrallimit theorems exist in various situations [26][17] [18] [6] [22] such as when the data areconcentrated near a Freacutechet mean (Many of thesetheorems were motivated by biology read the titleof [22] for instance) For additional backgroundand references concerning statistics on manifoldssee [23]

Statistics on smooth manifolds relies on approx-imation of the manifold by its tangent space whichis Euclidean Once a metric on the manifold hasbeen specifiedmdasha necessary and often nontrivialstep for statistics because of the need to knowhow far apart sample objects aremdashthe exponen-tial map at a point x takes a neighborhood Uof 0 in the tangent space Tx to a neighborhoodexp(U) around x Ordinary probability in the vectorspace Tx is transformed into geometric probabilityon the smooth sample space via the exponentialmap at x which is close to an isometry when Uis small In particular central limit theorems onsmooth manifolds can be interpreted as describingvariation of sample means around a populationmean by pushing forward the linear setup alongan exponential map

In the singular setting of stratified spaces thereis no general method to compare with or reduceto ordinary linear probability and statistics in thetangent space at a mean The types of samplespaces M intended here are those possessinga topological stratification (see [16] or [35]) anexpression as a disjoint union M = M0 cup M1 cupmiddot middot middot cupMr of strata such that for all d isin 0 rbull the stratum Md is a manifold

1180 Notices of the AMS Volume 62 Number 10

0 micro micro

Figure 3

bull M0 cup middot middot middot cupMd is closed in M andbull for every pair x y isin Md there is a homeomor-

phism M rarr M that takes x to y and takes eachstratum to itself

The third condition ensures that the topology ofMand its stratification behave precisely the same waynear x as they do near y Examples include graphs(or networks) whose strata are vertices and edgespolytopes whose strata are (relatively open) facesand real (semi)algebraic varieties whose strataconsist of classes of singular points The tangentspace Tx to a stratified space M at a point x isa cone over a stratified space of dimension oneless than dim(M) If M is already a cone withapex x then Tx M is as complicated as M itselfcapturing the local structure of a stratified spacenear a point need not simplify the geometry theway it does in the smooth setting

Cases where stratified laws of large numbersand central limit theorems are known occurin specific simple examples where comparisonwith linear spaces is possible such as openbooks [19] (unions of Euclidean half-spaces gluedalong their boundary subspaces) isolated planarhyperbolic singularities (cones where the singularpoint has angle sum gt 2π instead of the icecream case of lt 2π ) [24] and binary trees [4]But in general de novo geometric constructionsare required Such has been the avenue for agood deal of probability theory including laws oflarge numbers that has been established in thegenerality of nonpositively curved spaces (see [38])which are defined as spaces whose triangles withgiven edge lengths are thinner than would beexpected from Euclidean geometry (see [9]) Themetric structures of nonpositively curved spacesinduce a number of simplifying consequencessuch as uniqueness of Freacutechet means which haveplayed important roles in the progress thus far ingeometric probability and statistics on stratifiedspaces Nonetheless the promising interactions ofnonpositive curvature with geometrically stratifiedprobability and statistics remain in their infancy

Sticky Means

The novelty of attempting statistics on stratifiedsample spaces is exemplified by nonclassicalldquostickyrdquo phenomena that can occur at singularitiesIn Euclidean statistics the mean of a finite set ofpoints moves slightly in any desired direction byperturbing the points This intuition extends to

manifolds by linear approximation [26] [17] [6][22] but it can fail even in the simplest singularsample spaces Consider the tripods for instancedepicted in Figure 3 In the center tripod theFreacutechet mean micro of the three points on the legs isthe origin 0 by symmetry But wiggling the threepoints does not move the Freacutechet mean at all oneof the points would have to be moved more thantwice as far from the center as in the final tripodto nudge the mean onto its leg

An open book with three pages is a product ofa tripod with vector space Rd (To get an arbitrarynumber of pages replace the tripod with a graphhaving an arbitrary number of rays emanatingfrom the center point It bears mentioning thatevery topologically stratified space M is locallyhomeomorphic to an open book near any point ona stratum of dimension dim(M)minus1 In other wordsthe tangent cones to points on codimension 1strata are open books Hence this example isuniversal in some sense) In an open book themean sticks to the spinemdashthe copy of Rd that iscontained in all three pagesmdashwhen three similarlysituated points are wiggled although that wigglingcan move the mean in arbitrary directions insidethe spine [19]

With these examples and others in mind aformal definition has been developed [24] letM be a set of measures on a metric space KAssume M has a given topology A mean is acontinuous assignmentMrarr closed subsets ofKA measure micro sticks to a closed subset C sube K if everyneighborhood of micro in M contains a nonemptyopen subset consisting of measures whose meansets are contained in C

Stickiness implies that it is possible for themeans of large samples from a distribution on astratified space to lie in a subset of low dimensionwith positive probability (equal to 1 in some casessuch as the tripod) even if the distribution beingsampled is well behaved [2] [4] [19] [24] Thiscontrasts with usual laws of large numbers wheresample means approach the population mean butalmost surely never land on itmdashor on any givensubset of low dimension containing it Thinking interms of central limit theorems whereas in usualcases the limiting distributions have full supportin sticky cases the limiting distributions canhave components supported on low-dimensionalsubsets of the sample space

Examples aside central limit theorems on strat-ified spaces of any generality have yet to beformulated let alone proved Subtle and deepbehavior associated with the boundary betweensticky and nonsticky is still being discovered Inparticular the distinction between positive andnegative curvature seems to be critical for stick-iness One common type of positive curvatureparticularly in statistical problems appears whena flat or positively curved smooth manifold is

November 2015 Notices of the AMS 1181

quotiented by a proper Lie group action Shapespaces (see [27] [28])mdashincluding those applicableto fly wings with constant topology by keepingtrack of the locations of the vertices of the graphmdashhave this form for instance being quotients ofmatrix spaces by actions of rotations scaling orprojective transformations Huckemann [23] hasshown essentially that when sampling from a strat-ified space that is a quotient of this form Freacutechetmeans run away from singularities and hence landalmost surely in the smooth locus On the otherhand every case exhibiting stickiness has negativecurvature (in the sense of Alexandrov curvaturebounded strictly above by 0 see [9]) It remains anopen problem to formulate a condition in terms ofsomething like negative sectional curvature thatallows means to run toward singularities of thespace for appropriate types of distributions on it

Further potential implications for pure mathe-matics emerge when considering how to recovercurvature invariants from asymptotics insteadof the other way around In the smooth casefor local samplesmdashthat is sufficiently nearby themeanmdashaccounting for curvature reduces centrallimit theorems to the Euclidean version Thus ifthe curvature is unknown but properties of the dis-tribution are known then the curvature ought to berecoverable In singular settings attempting sucha recovery could give rise to singular analogues ofsmooth Riemannian curvature invariants

Phylogenetic Trees

Moduli spaces of fly wing modules constitute justone of myriad ways that geometric probability onstratified spaces can arise Among those nothingis special about biology except perhaps that itsdiversity of forms and the nature of their variationlend themselves to geometric data analysis of thissort That said the genesis of stratified statisticscame directly from another evolutionary biologymoduli space

One of the principal aims of systematic andevolutionary biology is to determine relationshipsbetween species Trees representing these kinshipsare reconstructed from biological data such asDNA sequences or morphology Experimentalprocedures generate distributions on the set ofphylogenetic trees in multiple ways For examplethe evolutionary history of a single gene acrossmultiple individuals is represented by a ldquogene treerdquoNatural processes such as incomplete lineagesorting cause gene trees sampled from a set ofindividuals to differ in topology from one anotherand from the evolutionary history of their set ofspeciesmdashthe history of population bifurcationsleading to divergence (see [30] for example)Another crucial example occurs not from the databut from the method of inference the problemof phylogenetic tree reconstruction is intractableenough that probabilistic methods are commonly

used resulting in posterior distributions insteadof a single optimum [36] [25]

Mathematically speaking a phylogenetic tree ona given set of species is a connected metric graphwith no loops whose vertices of degree 1 (ldquoleavesrdquo)are labeled by the species The introduction byBillera Holmes and Vogtmann of an appropriatemoduli space for the problem namely the spaceof phylogenetic trees [7] initiated a surge ofactivity attempting to mine the combinatoricsand geometry of the space to devise statisticalmethods And the combinatorics is formidablefor n species the tree space Tn is a polyhedralstratified space composed of (2nminus 3) Euclideanorthants of dimension nminus2 Despite its complexityTn succumbed the advent of a polynomial timealgorithm for shortest paths in tree space [34] madeit possible to compute Freacutechet means in Tn [1][32] based on probability theory for nonpositivelycurved spaces [38]

The geometric probability on stratified spaces inthe previous two sections was initially developedspecifically to understand the behavior of Freacutechetmeans in the moduli spaceTn The simple examples[19] [24] deal with informative local subsets of TnIn addition to those efforts are under way to provelaws of large numbers and central limit theoremsin the global context of Tn itself [2] [3]

Stickiness in tree space has a concrete mean-ingful biological interpretation (although the juryis out on the extent to which the interpretationreflects reality) Points in strata of lower dimensionrepresent phylogenetic trees with one or more non-binary vertices Biologically these are unresolvedphylogenies one species diverges simultaneouslyinto three new species for example instead of firstsplitting into two new species followed by anotherbinary divergence event from one of the two Setsof phylogenetic trees from biological experimentsoften contain evidence for many or all of thepossible sequences of binary divergence eventsthat resolve a given multiple divergence Stickinessimplies that the mean tree contains an unresolveddivergence whenever there is insufficient strengthof pull toward any one of the resolving binarysequences to support the conclusion it representsThe picture to keep in mind is the tripod whosestickiness we saw earlier it is the tree space T3 onthree species

Conclusion

Spaces of biological forms provide an environmentin which mathematical methods can assign dis-tances between phenotypes The lines of inquiryhere fit into biologist David Houlersquos vision of phe-nomics [21] particularly the genotype-phenotypemap To make a long story short selection actson phenotype whereas descent and modificationact via genotype so it becomes desirable to com-pare phenotypic distance to genotypic distance

1182 Notices of the AMS Volume 62 Number 10

including some working definition of each distancein any given case study In general to grapplewith statistical correlations between genotypes andphenotypes requires a mathematical parametriza-tion for each of those biological concepts Forgenotypes that is likely to involve combinato-rial considerations since the basic quantum ofinformation is discrete Perhaps large-scale con-tinuous analogues or approximations might bemeaningful as they are for statistical mechanicsand that is another potential departure point formathematics On the other hand phenotype isoften continuous in nature what are the locationsof the vertices in a fly wing How do the arcsbetween the vertices bow outward or inward Howdo these characteristics change from wing to wingParametrization in such a context requires thinkingabout spaces of continuous objects the sort ofthinking that mathematics is designed to carryout The examples presented here demonstrate asample of the kinds of abstract structures in puremathematicsmdashalong with unexpected questionsabout themmdashthat biological investigations reveal

References[1] Miroslav Bacaacutek Computing medians and means in

Hadamard space Society for Industrial and AppliedMathematics (SIAM) Journal on Optimization 24 (2014)1542ndash1566

[2] Dennis Barden Huiling Le and Megan OwenCentral limit theorems for Freacutechet means in thespace of phylogenetic trees The Electronic Journalof Probability 8 (2013) no 25 1ndash25

[3] Limiting behaviour of Freacutechet means inthe space of phylogenetic trees preprint 2014arXivmathPR14097602v1

[4] B Basrak Limit theorems for the inductive mean onmetric trees Journal of Applied Probability 47 (2010)1136ndash1149

[5] R Bhattacharya and V Patrangenaru Large sam-ple theory of intrinsic and extrinsic sample means onmanifolds I The Annals of Statistics 31 (2003) no 11ndash29

[6] Large sample theory of intrinsic and extrin-sic sample means on manifolds II The Annals ofStatistics 33 (2005) no 3 1225ndash1259

[7] L Billera S Holmes and K Vogtmann Geometry ofthe space of phylogenetic trees Advances in AppliedMathematics 27 (2001) 733ndash767

[8] Seth S Blair Wing vein patterning in Drosophila andthe analysis of intercellular signaling The Annual Re-view of Cell and Developmental Biology 23 (2007)293ndash319

[9] Martin R Bridson and Andreacute Haefliger Met-ric Spaces of Nonpositive Curvature Grundlehrender Mathematischen Wissenschaften [Fundamen-tal Principles of Mathematical Sciences] vol 319Springer-Verlag Berlin 1999

[10] Francesca Cagliari Barbara Di Fabio andMassimo Ferri One-dimensional reduction of mul-tidimensional persistent homology Proceedings ofthe American Mathematical Society 138 (2010) no 83003ndash3017

[11] Gunnar Carlsson and Vin de Silva Zigzag persis-tence Foundations of Computational Mathematics 10(2010) no 4 367ndash405

[12] Gunnar Carlsson and Afra Zomorodian The the-ory of multidimensional persistence Discrete andComputational Geometry 42 (2009) 71ndash93

[13] James Degnan and Noah Rosenberg Discordance ofspecies trees with their most likely gene trees PublicLibrary of Science (PLoS) Genetics 3 (2006) 762ndash768

[14] Herbert Edelsbrunner David Letscher and AfraZomorodian Topological persistence and simpli-fication Discrete and Computational Geometry 28511ndash533

[15] Patrizio Frosini and Claudia Landi Size functionsand formal series Applicable Algebra in EngineeringCommunication and Computing 12 (2001) no 4 327ndash349

[16] Mark Goresky and Robert MacPherson StratifiedMorse Theory Ergebnisse der Mathematik und ihrerGrenzgebiete (3) [Results in Mathematics and RelatedAreas (3)] vol 14 Springer-Verlag 1988

[17] H Hendriks and Z Landsman Asymptotic behaviourof sample mean location for manifolds Statistics andProbability Letters 26 (1996) 169ndash178

[18] Mean location and sample mean location onmanifolds Asymptotics tests confidence regionsJournal of Multivariate Analysis 67 (1998) 227ndash243

[19] Thomas Hotz Stephan Huckemann Huiling LeJ S Marron Jonathan C Mattingly Ezra MillerJames Nolen Megan Owen Vic Patrangenaru andSean Skwerer Sticky central limit theorems on openbooks Annals of Applied Probability 23 (2013) no 62238ndash2258

[20] David Houle Jason Mezey Paul Galpern and Ash-ley Carter Automated measurement of Drosophilawings BioMed Central (BBMC) Evolutionary Biology 3(2003) 25ndash37

[21] David Houle Numbering the hairs on our heads Theshared challenge and promise of phenomics Proceed-ings of the National Academy of Sciences USA 107(2010) 1793ndash1799

[22] Stephan Huckemann Inference on 3D Procrustesmeans Tree boles growth rank-deficient diffusion ten-sors and perturbation models Scandinavian Journalof Statistics 38 (2011) no 3 424ndash446

[23] On the meaning of mean shape Manifoldstability locus and the two sample test Annals ofthe Institute of Statistical Mathematics 64 (2012)1227ndash1259

[24] Stephan Huckemann Jonathan C MattinglyEzra Miller and James Nolen Sticky centrallimit theorems at isolated hyperbolic planarsingularities Electronic Journal of Probability20 (2015) no 78 1ndash34 doi101214EJPv20-3887arXivmathPR14106879

[25] John P Huelsenbeck and Fredrik Ronquist Mr-Bayes Bayesian inference of phylogenetic treesBioinformatics 17 (2001) 754ndash755

[26] P Jupp Residuals for directional data Journal ofApplied Statistics 15 (1988) no 2 137ndash147

[27] David G Kendall Shape manifolds Procrustean met-rics and complex projective spaces Bulletin of theLondon Mathematical Society 16 (1984) 81ndash121

[28] David G Kendall Dennis Barden Thomas KCarne and Huiling Le Shape and Shape Theory

November 2015 Notices of the AMS 1183

A M E R I C A N M AT H E M AT I C A L S O C I E T Y

Sign up to be a member of the AMS today

wwwamsorgmembership

With your membership you will receive special discounts and free

shipping at any on-site AMS exhibit bookstore

Do you attend JMMHow about any AMS Sectional Meetings

Did you know that your savings from attending these

meetings will practically pay for your AMS membership

Wiley Series in Probability and Statistics John Wiley ampSons Ltd Chichester UK 1999

[29] Kevin Knudson A refinement of multi-dimensionalpersistence Homology Homotopy and Applications10 (2008) no 1 259ndash281

[30] Wayne P Maddison Gene trees in species treesSystematic Biology 46 (1997) 523ndash536

[31] Ezra Miller CohenndashMacaulay quotients of nor-mal semigroup rings via irreducible resolutionsMathematical Research Letters 9 (2002) no 1117ndash128

[32] Ezra Miller Megan Owen and Scott Provan Poly-hedral computational geometry for averaging metricphylogenetic trees Advances in Applied Mathematics15 (2015) 51ndash91

[33] Ezra Miller and Bernd Sturmfels CombinatorialCommutative Algebra Graduate Texts in Mathematicsvol 227 Springer-Verlag 2005

[34] Megan Owen and J S Provan A fast algorithm forcomputing geodesic distance in tree space IEEEACMTrans Comput Biol Bioinform 8 (2011) 2ndash13

[35] Markus J Pflaum Analytic and Geometric Study ofStratified Spaces Lecture Notes in Mathematics vol1768 Springer-Verlag 2001

[36] Bruce Rannala and Ziheng Yang Bayesian phyloge-netic inference using DNA sequences A Markov chainMonte Carlo method Molecular Biology and Evolution14 (1997) 717ndash724

[37] Vanessa Robins Towards computing homology fromfinite approximations Topology Proceedings 24 (1999)no 1 503ndash532

[38] K-T Sturm Probability measures on metric spaces ofnonpositive curvature in Heat Kernels and Analysison Manifolds Graphs and Metric Spaces lecture notesfrom a quarter program on heat kernels random walksand analysis on manifolds and graphs ContemporaryMathematics vol 338 American Mathematical Society2003 pp 357ndash390

[39] K E Weber Selection on wing allometry in Drosophilamelanogaster Genetics 126 (1990) 975ndash989

[40] How small are the smallest selectable domainsof form Genetics 130 (1992) 345ndash353

[41] H Ziezold On expected figures and a strong law oflarge numbers for random elements in quasi-metricspaces in Transactions of the Seventh Prague Con-ference on Information Theory Statistical DecisionFunctions Random Processes and of the Eighth Eu-ropean Meeting of Statisticians (Tech Univ PraguePrague 1974) Vol A Reidel Dordrecht 1977pp 591ndash602

1184 Notices of the AMS Volume 62 Number 10

Page 4: Fruit Flies and Moduli: Interactions between Biology and ... · and, in some cases, our substantial understanding of their geometry and combinatorics, less is known about the probability

0 micro micro

Figure 3

bull M0 cup middot middot middot cupMd is closed in M andbull for every pair x y isin Md there is a homeomor-

phism M rarr M that takes x to y and takes eachstratum to itself

The third condition ensures that the topology ofMand its stratification behave precisely the same waynear x as they do near y Examples include graphs(or networks) whose strata are vertices and edgespolytopes whose strata are (relatively open) facesand real (semi)algebraic varieties whose strataconsist of classes of singular points The tangentspace Tx to a stratified space M at a point x isa cone over a stratified space of dimension oneless than dim(M) If M is already a cone withapex x then Tx M is as complicated as M itselfcapturing the local structure of a stratified spacenear a point need not simplify the geometry theway it does in the smooth setting

Cases where stratified laws of large numbersand central limit theorems are known occurin specific simple examples where comparisonwith linear spaces is possible such as openbooks [19] (unions of Euclidean half-spaces gluedalong their boundary subspaces) isolated planarhyperbolic singularities (cones where the singularpoint has angle sum gt 2π instead of the icecream case of lt 2π ) [24] and binary trees [4]But in general de novo geometric constructionsare required Such has been the avenue for agood deal of probability theory including laws oflarge numbers that has been established in thegenerality of nonpositively curved spaces (see [38])which are defined as spaces whose triangles withgiven edge lengths are thinner than would beexpected from Euclidean geometry (see [9]) Themetric structures of nonpositively curved spacesinduce a number of simplifying consequencessuch as uniqueness of Freacutechet means which haveplayed important roles in the progress thus far ingeometric probability and statistics on stratifiedspaces Nonetheless the promising interactions ofnonpositive curvature with geometrically stratifiedprobability and statistics remain in their infancy

Sticky Means

The novelty of attempting statistics on stratifiedsample spaces is exemplified by nonclassicalldquostickyrdquo phenomena that can occur at singularitiesIn Euclidean statistics the mean of a finite set ofpoints moves slightly in any desired direction byperturbing the points This intuition extends to

manifolds by linear approximation [26] [17] [6][22] but it can fail even in the simplest singularsample spaces Consider the tripods for instancedepicted in Figure 3 In the center tripod theFreacutechet mean micro of the three points on the legs isthe origin 0 by symmetry But wiggling the threepoints does not move the Freacutechet mean at all oneof the points would have to be moved more thantwice as far from the center as in the final tripodto nudge the mean onto its leg

An open book with three pages is a product ofa tripod with vector space Rd (To get an arbitrarynumber of pages replace the tripod with a graphhaving an arbitrary number of rays emanatingfrom the center point It bears mentioning thatevery topologically stratified space M is locallyhomeomorphic to an open book near any point ona stratum of dimension dim(M)minus1 In other wordsthe tangent cones to points on codimension 1strata are open books Hence this example isuniversal in some sense) In an open book themean sticks to the spinemdashthe copy of Rd that iscontained in all three pagesmdashwhen three similarlysituated points are wiggled although that wigglingcan move the mean in arbitrary directions insidethe spine [19]

With these examples and others in mind aformal definition has been developed [24] letM be a set of measures on a metric space KAssume M has a given topology A mean is acontinuous assignmentMrarr closed subsets ofKA measure micro sticks to a closed subset C sube K if everyneighborhood of micro in M contains a nonemptyopen subset consisting of measures whose meansets are contained in C

Stickiness implies that it is possible for themeans of large samples from a distribution on astratified space to lie in a subset of low dimensionwith positive probability (equal to 1 in some casessuch as the tripod) even if the distribution beingsampled is well behaved [2] [4] [19] [24] Thiscontrasts with usual laws of large numbers wheresample means approach the population mean butalmost surely never land on itmdashor on any givensubset of low dimension containing it Thinking interms of central limit theorems whereas in usualcases the limiting distributions have full supportin sticky cases the limiting distributions canhave components supported on low-dimensionalsubsets of the sample space

Examples aside central limit theorems on strat-ified spaces of any generality have yet to beformulated let alone proved Subtle and deepbehavior associated with the boundary betweensticky and nonsticky is still being discovered Inparticular the distinction between positive andnegative curvature seems to be critical for stick-iness One common type of positive curvatureparticularly in statistical problems appears whena flat or positively curved smooth manifold is

November 2015 Notices of the AMS 1181

quotiented by a proper Lie group action Shapespaces (see [27] [28])mdashincluding those applicableto fly wings with constant topology by keepingtrack of the locations of the vertices of the graphmdashhave this form for instance being quotients ofmatrix spaces by actions of rotations scaling orprojective transformations Huckemann [23] hasshown essentially that when sampling from a strat-ified space that is a quotient of this form Freacutechetmeans run away from singularities and hence landalmost surely in the smooth locus On the otherhand every case exhibiting stickiness has negativecurvature (in the sense of Alexandrov curvaturebounded strictly above by 0 see [9]) It remains anopen problem to formulate a condition in terms ofsomething like negative sectional curvature thatallows means to run toward singularities of thespace for appropriate types of distributions on it

Further potential implications for pure mathe-matics emerge when considering how to recovercurvature invariants from asymptotics insteadof the other way around In the smooth casefor local samplesmdashthat is sufficiently nearby themeanmdashaccounting for curvature reduces centrallimit theorems to the Euclidean version Thus ifthe curvature is unknown but properties of the dis-tribution are known then the curvature ought to berecoverable In singular settings attempting sucha recovery could give rise to singular analogues ofsmooth Riemannian curvature invariants

Phylogenetic Trees

Moduli spaces of fly wing modules constitute justone of myriad ways that geometric probability onstratified spaces can arise Among those nothingis special about biology except perhaps that itsdiversity of forms and the nature of their variationlend themselves to geometric data analysis of thissort That said the genesis of stratified statisticscame directly from another evolutionary biologymoduli space

One of the principal aims of systematic andevolutionary biology is to determine relationshipsbetween species Trees representing these kinshipsare reconstructed from biological data such asDNA sequences or morphology Experimentalprocedures generate distributions on the set ofphylogenetic trees in multiple ways For examplethe evolutionary history of a single gene acrossmultiple individuals is represented by a ldquogene treerdquoNatural processes such as incomplete lineagesorting cause gene trees sampled from a set ofindividuals to differ in topology from one anotherand from the evolutionary history of their set ofspeciesmdashthe history of population bifurcationsleading to divergence (see [30] for example)Another crucial example occurs not from the databut from the method of inference the problemof phylogenetic tree reconstruction is intractableenough that probabilistic methods are commonly

used resulting in posterior distributions insteadof a single optimum [36] [25]

Mathematically speaking a phylogenetic tree ona given set of species is a connected metric graphwith no loops whose vertices of degree 1 (ldquoleavesrdquo)are labeled by the species The introduction byBillera Holmes and Vogtmann of an appropriatemoduli space for the problem namely the spaceof phylogenetic trees [7] initiated a surge ofactivity attempting to mine the combinatoricsand geometry of the space to devise statisticalmethods And the combinatorics is formidablefor n species the tree space Tn is a polyhedralstratified space composed of (2nminus 3) Euclideanorthants of dimension nminus2 Despite its complexityTn succumbed the advent of a polynomial timealgorithm for shortest paths in tree space [34] madeit possible to compute Freacutechet means in Tn [1][32] based on probability theory for nonpositivelycurved spaces [38]

The geometric probability on stratified spaces inthe previous two sections was initially developedspecifically to understand the behavior of Freacutechetmeans in the moduli spaceTn The simple examples[19] [24] deal with informative local subsets of TnIn addition to those efforts are under way to provelaws of large numbers and central limit theoremsin the global context of Tn itself [2] [3]

Stickiness in tree space has a concrete mean-ingful biological interpretation (although the juryis out on the extent to which the interpretationreflects reality) Points in strata of lower dimensionrepresent phylogenetic trees with one or more non-binary vertices Biologically these are unresolvedphylogenies one species diverges simultaneouslyinto three new species for example instead of firstsplitting into two new species followed by anotherbinary divergence event from one of the two Setsof phylogenetic trees from biological experimentsoften contain evidence for many or all of thepossible sequences of binary divergence eventsthat resolve a given multiple divergence Stickinessimplies that the mean tree contains an unresolveddivergence whenever there is insufficient strengthof pull toward any one of the resolving binarysequences to support the conclusion it representsThe picture to keep in mind is the tripod whosestickiness we saw earlier it is the tree space T3 onthree species

Conclusion

Spaces of biological forms provide an environmentin which mathematical methods can assign dis-tances between phenotypes The lines of inquiryhere fit into biologist David Houlersquos vision of phe-nomics [21] particularly the genotype-phenotypemap To make a long story short selection actson phenotype whereas descent and modificationact via genotype so it becomes desirable to com-pare phenotypic distance to genotypic distance

1182 Notices of the AMS Volume 62 Number 10

including some working definition of each distancein any given case study In general to grapplewith statistical correlations between genotypes andphenotypes requires a mathematical parametriza-tion for each of those biological concepts Forgenotypes that is likely to involve combinato-rial considerations since the basic quantum ofinformation is discrete Perhaps large-scale con-tinuous analogues or approximations might bemeaningful as they are for statistical mechanicsand that is another potential departure point formathematics On the other hand phenotype isoften continuous in nature what are the locationsof the vertices in a fly wing How do the arcsbetween the vertices bow outward or inward Howdo these characteristics change from wing to wingParametrization in such a context requires thinkingabout spaces of continuous objects the sort ofthinking that mathematics is designed to carryout The examples presented here demonstrate asample of the kinds of abstract structures in puremathematicsmdashalong with unexpected questionsabout themmdashthat biological investigations reveal

References[1] Miroslav Bacaacutek Computing medians and means in

Hadamard space Society for Industrial and AppliedMathematics (SIAM) Journal on Optimization 24 (2014)1542ndash1566

[2] Dennis Barden Huiling Le and Megan OwenCentral limit theorems for Freacutechet means in thespace of phylogenetic trees The Electronic Journalof Probability 8 (2013) no 25 1ndash25

[3] Limiting behaviour of Freacutechet means inthe space of phylogenetic trees preprint 2014arXivmathPR14097602v1

[4] B Basrak Limit theorems for the inductive mean onmetric trees Journal of Applied Probability 47 (2010)1136ndash1149

[5] R Bhattacharya and V Patrangenaru Large sam-ple theory of intrinsic and extrinsic sample means onmanifolds I The Annals of Statistics 31 (2003) no 11ndash29

[6] Large sample theory of intrinsic and extrin-sic sample means on manifolds II The Annals ofStatistics 33 (2005) no 3 1225ndash1259

[7] L Billera S Holmes and K Vogtmann Geometry ofthe space of phylogenetic trees Advances in AppliedMathematics 27 (2001) 733ndash767

[8] Seth S Blair Wing vein patterning in Drosophila andthe analysis of intercellular signaling The Annual Re-view of Cell and Developmental Biology 23 (2007)293ndash319

[9] Martin R Bridson and Andreacute Haefliger Met-ric Spaces of Nonpositive Curvature Grundlehrender Mathematischen Wissenschaften [Fundamen-tal Principles of Mathematical Sciences] vol 319Springer-Verlag Berlin 1999

[10] Francesca Cagliari Barbara Di Fabio andMassimo Ferri One-dimensional reduction of mul-tidimensional persistent homology Proceedings ofthe American Mathematical Society 138 (2010) no 83003ndash3017

[11] Gunnar Carlsson and Vin de Silva Zigzag persis-tence Foundations of Computational Mathematics 10(2010) no 4 367ndash405

[12] Gunnar Carlsson and Afra Zomorodian The the-ory of multidimensional persistence Discrete andComputational Geometry 42 (2009) 71ndash93

[13] James Degnan and Noah Rosenberg Discordance ofspecies trees with their most likely gene trees PublicLibrary of Science (PLoS) Genetics 3 (2006) 762ndash768

[14] Herbert Edelsbrunner David Letscher and AfraZomorodian Topological persistence and simpli-fication Discrete and Computational Geometry 28511ndash533

[15] Patrizio Frosini and Claudia Landi Size functionsand formal series Applicable Algebra in EngineeringCommunication and Computing 12 (2001) no 4 327ndash349

[16] Mark Goresky and Robert MacPherson StratifiedMorse Theory Ergebnisse der Mathematik und ihrerGrenzgebiete (3) [Results in Mathematics and RelatedAreas (3)] vol 14 Springer-Verlag 1988

[17] H Hendriks and Z Landsman Asymptotic behaviourof sample mean location for manifolds Statistics andProbability Letters 26 (1996) 169ndash178

[18] Mean location and sample mean location onmanifolds Asymptotics tests confidence regionsJournal of Multivariate Analysis 67 (1998) 227ndash243

[19] Thomas Hotz Stephan Huckemann Huiling LeJ S Marron Jonathan C Mattingly Ezra MillerJames Nolen Megan Owen Vic Patrangenaru andSean Skwerer Sticky central limit theorems on openbooks Annals of Applied Probability 23 (2013) no 62238ndash2258

[20] David Houle Jason Mezey Paul Galpern and Ash-ley Carter Automated measurement of Drosophilawings BioMed Central (BBMC) Evolutionary Biology 3(2003) 25ndash37

[21] David Houle Numbering the hairs on our heads Theshared challenge and promise of phenomics Proceed-ings of the National Academy of Sciences USA 107(2010) 1793ndash1799

[22] Stephan Huckemann Inference on 3D Procrustesmeans Tree boles growth rank-deficient diffusion ten-sors and perturbation models Scandinavian Journalof Statistics 38 (2011) no 3 424ndash446

[23] On the meaning of mean shape Manifoldstability locus and the two sample test Annals ofthe Institute of Statistical Mathematics 64 (2012)1227ndash1259

[24] Stephan Huckemann Jonathan C MattinglyEzra Miller and James Nolen Sticky centrallimit theorems at isolated hyperbolic planarsingularities Electronic Journal of Probability20 (2015) no 78 1ndash34 doi101214EJPv20-3887arXivmathPR14106879

[25] John P Huelsenbeck and Fredrik Ronquist Mr-Bayes Bayesian inference of phylogenetic treesBioinformatics 17 (2001) 754ndash755

[26] P Jupp Residuals for directional data Journal ofApplied Statistics 15 (1988) no 2 137ndash147

[27] David G Kendall Shape manifolds Procrustean met-rics and complex projective spaces Bulletin of theLondon Mathematical Society 16 (1984) 81ndash121

[28] David G Kendall Dennis Barden Thomas KCarne and Huiling Le Shape and Shape Theory

November 2015 Notices of the AMS 1183

A M E R I C A N M AT H E M AT I C A L S O C I E T Y

Sign up to be a member of the AMS today

wwwamsorgmembership

With your membership you will receive special discounts and free

shipping at any on-site AMS exhibit bookstore

Do you attend JMMHow about any AMS Sectional Meetings

Did you know that your savings from attending these

meetings will practically pay for your AMS membership

Wiley Series in Probability and Statistics John Wiley ampSons Ltd Chichester UK 1999

[29] Kevin Knudson A refinement of multi-dimensionalpersistence Homology Homotopy and Applications10 (2008) no 1 259ndash281

[30] Wayne P Maddison Gene trees in species treesSystematic Biology 46 (1997) 523ndash536

[31] Ezra Miller CohenndashMacaulay quotients of nor-mal semigroup rings via irreducible resolutionsMathematical Research Letters 9 (2002) no 1117ndash128

[32] Ezra Miller Megan Owen and Scott Provan Poly-hedral computational geometry for averaging metricphylogenetic trees Advances in Applied Mathematics15 (2015) 51ndash91

[33] Ezra Miller and Bernd Sturmfels CombinatorialCommutative Algebra Graduate Texts in Mathematicsvol 227 Springer-Verlag 2005

[34] Megan Owen and J S Provan A fast algorithm forcomputing geodesic distance in tree space IEEEACMTrans Comput Biol Bioinform 8 (2011) 2ndash13

[35] Markus J Pflaum Analytic and Geometric Study ofStratified Spaces Lecture Notes in Mathematics vol1768 Springer-Verlag 2001

[36] Bruce Rannala and Ziheng Yang Bayesian phyloge-netic inference using DNA sequences A Markov chainMonte Carlo method Molecular Biology and Evolution14 (1997) 717ndash724

[37] Vanessa Robins Towards computing homology fromfinite approximations Topology Proceedings 24 (1999)no 1 503ndash532

[38] K-T Sturm Probability measures on metric spaces ofnonpositive curvature in Heat Kernels and Analysison Manifolds Graphs and Metric Spaces lecture notesfrom a quarter program on heat kernels random walksand analysis on manifolds and graphs ContemporaryMathematics vol 338 American Mathematical Society2003 pp 357ndash390

[39] K E Weber Selection on wing allometry in Drosophilamelanogaster Genetics 126 (1990) 975ndash989

[40] How small are the smallest selectable domainsof form Genetics 130 (1992) 345ndash353

[41] H Ziezold On expected figures and a strong law oflarge numbers for random elements in quasi-metricspaces in Transactions of the Seventh Prague Con-ference on Information Theory Statistical DecisionFunctions Random Processes and of the Eighth Eu-ropean Meeting of Statisticians (Tech Univ PraguePrague 1974) Vol A Reidel Dordrecht 1977pp 591ndash602

1184 Notices of the AMS Volume 62 Number 10

Page 5: Fruit Flies and Moduli: Interactions between Biology and ... · and, in some cases, our substantial understanding of their geometry and combinatorics, less is known about the probability

quotiented by a proper Lie group action Shapespaces (see [27] [28])mdashincluding those applicableto fly wings with constant topology by keepingtrack of the locations of the vertices of the graphmdashhave this form for instance being quotients ofmatrix spaces by actions of rotations scaling orprojective transformations Huckemann [23] hasshown essentially that when sampling from a strat-ified space that is a quotient of this form Freacutechetmeans run away from singularities and hence landalmost surely in the smooth locus On the otherhand every case exhibiting stickiness has negativecurvature (in the sense of Alexandrov curvaturebounded strictly above by 0 see [9]) It remains anopen problem to formulate a condition in terms ofsomething like negative sectional curvature thatallows means to run toward singularities of thespace for appropriate types of distributions on it

Further potential implications for pure mathe-matics emerge when considering how to recovercurvature invariants from asymptotics insteadof the other way around In the smooth casefor local samplesmdashthat is sufficiently nearby themeanmdashaccounting for curvature reduces centrallimit theorems to the Euclidean version Thus ifthe curvature is unknown but properties of the dis-tribution are known then the curvature ought to berecoverable In singular settings attempting sucha recovery could give rise to singular analogues ofsmooth Riemannian curvature invariants

Phylogenetic Trees

Moduli spaces of fly wing modules constitute justone of myriad ways that geometric probability onstratified spaces can arise Among those nothingis special about biology except perhaps that itsdiversity of forms and the nature of their variationlend themselves to geometric data analysis of thissort That said the genesis of stratified statisticscame directly from another evolutionary biologymoduli space

One of the principal aims of systematic andevolutionary biology is to determine relationshipsbetween species Trees representing these kinshipsare reconstructed from biological data such asDNA sequences or morphology Experimentalprocedures generate distributions on the set ofphylogenetic trees in multiple ways For examplethe evolutionary history of a single gene acrossmultiple individuals is represented by a ldquogene treerdquoNatural processes such as incomplete lineagesorting cause gene trees sampled from a set ofindividuals to differ in topology from one anotherand from the evolutionary history of their set ofspeciesmdashthe history of population bifurcationsleading to divergence (see [30] for example)Another crucial example occurs not from the databut from the method of inference the problemof phylogenetic tree reconstruction is intractableenough that probabilistic methods are commonly

used resulting in posterior distributions insteadof a single optimum [36] [25]

Mathematically speaking a phylogenetic tree ona given set of species is a connected metric graphwith no loops whose vertices of degree 1 (ldquoleavesrdquo)are labeled by the species The introduction byBillera Holmes and Vogtmann of an appropriatemoduli space for the problem namely the spaceof phylogenetic trees [7] initiated a surge ofactivity attempting to mine the combinatoricsand geometry of the space to devise statisticalmethods And the combinatorics is formidablefor n species the tree space Tn is a polyhedralstratified space composed of (2nminus 3) Euclideanorthants of dimension nminus2 Despite its complexityTn succumbed the advent of a polynomial timealgorithm for shortest paths in tree space [34] madeit possible to compute Freacutechet means in Tn [1][32] based on probability theory for nonpositivelycurved spaces [38]

The geometric probability on stratified spaces inthe previous two sections was initially developedspecifically to understand the behavior of Freacutechetmeans in the moduli spaceTn The simple examples[19] [24] deal with informative local subsets of TnIn addition to those efforts are under way to provelaws of large numbers and central limit theoremsin the global context of Tn itself [2] [3]

Stickiness in tree space has a concrete mean-ingful biological interpretation (although the juryis out on the extent to which the interpretationreflects reality) Points in strata of lower dimensionrepresent phylogenetic trees with one or more non-binary vertices Biologically these are unresolvedphylogenies one species diverges simultaneouslyinto three new species for example instead of firstsplitting into two new species followed by anotherbinary divergence event from one of the two Setsof phylogenetic trees from biological experimentsoften contain evidence for many or all of thepossible sequences of binary divergence eventsthat resolve a given multiple divergence Stickinessimplies that the mean tree contains an unresolveddivergence whenever there is insufficient strengthof pull toward any one of the resolving binarysequences to support the conclusion it representsThe picture to keep in mind is the tripod whosestickiness we saw earlier it is the tree space T3 onthree species

Conclusion

Spaces of biological forms provide an environmentin which mathematical methods can assign dis-tances between phenotypes The lines of inquiryhere fit into biologist David Houlersquos vision of phe-nomics [21] particularly the genotype-phenotypemap To make a long story short selection actson phenotype whereas descent and modificationact via genotype so it becomes desirable to com-pare phenotypic distance to genotypic distance

1182 Notices of the AMS Volume 62 Number 10

including some working definition of each distancein any given case study In general to grapplewith statistical correlations between genotypes andphenotypes requires a mathematical parametriza-tion for each of those biological concepts Forgenotypes that is likely to involve combinato-rial considerations since the basic quantum ofinformation is discrete Perhaps large-scale con-tinuous analogues or approximations might bemeaningful as they are for statistical mechanicsand that is another potential departure point formathematics On the other hand phenotype isoften continuous in nature what are the locationsof the vertices in a fly wing How do the arcsbetween the vertices bow outward or inward Howdo these characteristics change from wing to wingParametrization in such a context requires thinkingabout spaces of continuous objects the sort ofthinking that mathematics is designed to carryout The examples presented here demonstrate asample of the kinds of abstract structures in puremathematicsmdashalong with unexpected questionsabout themmdashthat biological investigations reveal

References[1] Miroslav Bacaacutek Computing medians and means in

Hadamard space Society for Industrial and AppliedMathematics (SIAM) Journal on Optimization 24 (2014)1542ndash1566

[2] Dennis Barden Huiling Le and Megan OwenCentral limit theorems for Freacutechet means in thespace of phylogenetic trees The Electronic Journalof Probability 8 (2013) no 25 1ndash25

[3] Limiting behaviour of Freacutechet means inthe space of phylogenetic trees preprint 2014arXivmathPR14097602v1

[4] B Basrak Limit theorems for the inductive mean onmetric trees Journal of Applied Probability 47 (2010)1136ndash1149

[5] R Bhattacharya and V Patrangenaru Large sam-ple theory of intrinsic and extrinsic sample means onmanifolds I The Annals of Statistics 31 (2003) no 11ndash29

[6] Large sample theory of intrinsic and extrin-sic sample means on manifolds II The Annals ofStatistics 33 (2005) no 3 1225ndash1259

[7] L Billera S Holmes and K Vogtmann Geometry ofthe space of phylogenetic trees Advances in AppliedMathematics 27 (2001) 733ndash767

[8] Seth S Blair Wing vein patterning in Drosophila andthe analysis of intercellular signaling The Annual Re-view of Cell and Developmental Biology 23 (2007)293ndash319

[9] Martin R Bridson and Andreacute Haefliger Met-ric Spaces of Nonpositive Curvature Grundlehrender Mathematischen Wissenschaften [Fundamen-tal Principles of Mathematical Sciences] vol 319Springer-Verlag Berlin 1999

[10] Francesca Cagliari Barbara Di Fabio andMassimo Ferri One-dimensional reduction of mul-tidimensional persistent homology Proceedings ofthe American Mathematical Society 138 (2010) no 83003ndash3017

[11] Gunnar Carlsson and Vin de Silva Zigzag persis-tence Foundations of Computational Mathematics 10(2010) no 4 367ndash405

[12] Gunnar Carlsson and Afra Zomorodian The the-ory of multidimensional persistence Discrete andComputational Geometry 42 (2009) 71ndash93

[13] James Degnan and Noah Rosenberg Discordance ofspecies trees with their most likely gene trees PublicLibrary of Science (PLoS) Genetics 3 (2006) 762ndash768

[14] Herbert Edelsbrunner David Letscher and AfraZomorodian Topological persistence and simpli-fication Discrete and Computational Geometry 28511ndash533

[15] Patrizio Frosini and Claudia Landi Size functionsand formal series Applicable Algebra in EngineeringCommunication and Computing 12 (2001) no 4 327ndash349

[16] Mark Goresky and Robert MacPherson StratifiedMorse Theory Ergebnisse der Mathematik und ihrerGrenzgebiete (3) [Results in Mathematics and RelatedAreas (3)] vol 14 Springer-Verlag 1988

[17] H Hendriks and Z Landsman Asymptotic behaviourof sample mean location for manifolds Statistics andProbability Letters 26 (1996) 169ndash178

[18] Mean location and sample mean location onmanifolds Asymptotics tests confidence regionsJournal of Multivariate Analysis 67 (1998) 227ndash243

[19] Thomas Hotz Stephan Huckemann Huiling LeJ S Marron Jonathan C Mattingly Ezra MillerJames Nolen Megan Owen Vic Patrangenaru andSean Skwerer Sticky central limit theorems on openbooks Annals of Applied Probability 23 (2013) no 62238ndash2258

[20] David Houle Jason Mezey Paul Galpern and Ash-ley Carter Automated measurement of Drosophilawings BioMed Central (BBMC) Evolutionary Biology 3(2003) 25ndash37

[21] David Houle Numbering the hairs on our heads Theshared challenge and promise of phenomics Proceed-ings of the National Academy of Sciences USA 107(2010) 1793ndash1799

[22] Stephan Huckemann Inference on 3D Procrustesmeans Tree boles growth rank-deficient diffusion ten-sors and perturbation models Scandinavian Journalof Statistics 38 (2011) no 3 424ndash446

[23] On the meaning of mean shape Manifoldstability locus and the two sample test Annals ofthe Institute of Statistical Mathematics 64 (2012)1227ndash1259

[24] Stephan Huckemann Jonathan C MattinglyEzra Miller and James Nolen Sticky centrallimit theorems at isolated hyperbolic planarsingularities Electronic Journal of Probability20 (2015) no 78 1ndash34 doi101214EJPv20-3887arXivmathPR14106879

[25] John P Huelsenbeck and Fredrik Ronquist Mr-Bayes Bayesian inference of phylogenetic treesBioinformatics 17 (2001) 754ndash755

[26] P Jupp Residuals for directional data Journal ofApplied Statistics 15 (1988) no 2 137ndash147

[27] David G Kendall Shape manifolds Procrustean met-rics and complex projective spaces Bulletin of theLondon Mathematical Society 16 (1984) 81ndash121

[28] David G Kendall Dennis Barden Thomas KCarne and Huiling Le Shape and Shape Theory

November 2015 Notices of the AMS 1183

A M E R I C A N M AT H E M AT I C A L S O C I E T Y

Sign up to be a member of the AMS today

wwwamsorgmembership

With your membership you will receive special discounts and free

shipping at any on-site AMS exhibit bookstore

Do you attend JMMHow about any AMS Sectional Meetings

Did you know that your savings from attending these

meetings will practically pay for your AMS membership

Wiley Series in Probability and Statistics John Wiley ampSons Ltd Chichester UK 1999

[29] Kevin Knudson A refinement of multi-dimensionalpersistence Homology Homotopy and Applications10 (2008) no 1 259ndash281

[30] Wayne P Maddison Gene trees in species treesSystematic Biology 46 (1997) 523ndash536

[31] Ezra Miller CohenndashMacaulay quotients of nor-mal semigroup rings via irreducible resolutionsMathematical Research Letters 9 (2002) no 1117ndash128

[32] Ezra Miller Megan Owen and Scott Provan Poly-hedral computational geometry for averaging metricphylogenetic trees Advances in Applied Mathematics15 (2015) 51ndash91

[33] Ezra Miller and Bernd Sturmfels CombinatorialCommutative Algebra Graduate Texts in Mathematicsvol 227 Springer-Verlag 2005

[34] Megan Owen and J S Provan A fast algorithm forcomputing geodesic distance in tree space IEEEACMTrans Comput Biol Bioinform 8 (2011) 2ndash13

[35] Markus J Pflaum Analytic and Geometric Study ofStratified Spaces Lecture Notes in Mathematics vol1768 Springer-Verlag 2001

[36] Bruce Rannala and Ziheng Yang Bayesian phyloge-netic inference using DNA sequences A Markov chainMonte Carlo method Molecular Biology and Evolution14 (1997) 717ndash724

[37] Vanessa Robins Towards computing homology fromfinite approximations Topology Proceedings 24 (1999)no 1 503ndash532

[38] K-T Sturm Probability measures on metric spaces ofnonpositive curvature in Heat Kernels and Analysison Manifolds Graphs and Metric Spaces lecture notesfrom a quarter program on heat kernels random walksand analysis on manifolds and graphs ContemporaryMathematics vol 338 American Mathematical Society2003 pp 357ndash390

[39] K E Weber Selection on wing allometry in Drosophilamelanogaster Genetics 126 (1990) 975ndash989

[40] How small are the smallest selectable domainsof form Genetics 130 (1992) 345ndash353

[41] H Ziezold On expected figures and a strong law oflarge numbers for random elements in quasi-metricspaces in Transactions of the Seventh Prague Con-ference on Information Theory Statistical DecisionFunctions Random Processes and of the Eighth Eu-ropean Meeting of Statisticians (Tech Univ PraguePrague 1974) Vol A Reidel Dordrecht 1977pp 591ndash602

1184 Notices of the AMS Volume 62 Number 10

Page 6: Fruit Flies and Moduli: Interactions between Biology and ... · and, in some cases, our substantial understanding of their geometry and combinatorics, less is known about the probability

including some working definition of each distancein any given case study In general to grapplewith statistical correlations between genotypes andphenotypes requires a mathematical parametriza-tion for each of those biological concepts Forgenotypes that is likely to involve combinato-rial considerations since the basic quantum ofinformation is discrete Perhaps large-scale con-tinuous analogues or approximations might bemeaningful as they are for statistical mechanicsand that is another potential departure point formathematics On the other hand phenotype isoften continuous in nature what are the locationsof the vertices in a fly wing How do the arcsbetween the vertices bow outward or inward Howdo these characteristics change from wing to wingParametrization in such a context requires thinkingabout spaces of continuous objects the sort ofthinking that mathematics is designed to carryout The examples presented here demonstrate asample of the kinds of abstract structures in puremathematicsmdashalong with unexpected questionsabout themmdashthat biological investigations reveal

References[1] Miroslav Bacaacutek Computing medians and means in

Hadamard space Society for Industrial and AppliedMathematics (SIAM) Journal on Optimization 24 (2014)1542ndash1566

[2] Dennis Barden Huiling Le and Megan OwenCentral limit theorems for Freacutechet means in thespace of phylogenetic trees The Electronic Journalof Probability 8 (2013) no 25 1ndash25

[3] Limiting behaviour of Freacutechet means inthe space of phylogenetic trees preprint 2014arXivmathPR14097602v1

[4] B Basrak Limit theorems for the inductive mean onmetric trees Journal of Applied Probability 47 (2010)1136ndash1149

[5] R Bhattacharya and V Patrangenaru Large sam-ple theory of intrinsic and extrinsic sample means onmanifolds I The Annals of Statistics 31 (2003) no 11ndash29

[6] Large sample theory of intrinsic and extrin-sic sample means on manifolds II The Annals ofStatistics 33 (2005) no 3 1225ndash1259

[7] L Billera S Holmes and K Vogtmann Geometry ofthe space of phylogenetic trees Advances in AppliedMathematics 27 (2001) 733ndash767

[8] Seth S Blair Wing vein patterning in Drosophila andthe analysis of intercellular signaling The Annual Re-view of Cell and Developmental Biology 23 (2007)293ndash319

[9] Martin R Bridson and Andreacute Haefliger Met-ric Spaces of Nonpositive Curvature Grundlehrender Mathematischen Wissenschaften [Fundamen-tal Principles of Mathematical Sciences] vol 319Springer-Verlag Berlin 1999

[10] Francesca Cagliari Barbara Di Fabio andMassimo Ferri One-dimensional reduction of mul-tidimensional persistent homology Proceedings ofthe American Mathematical Society 138 (2010) no 83003ndash3017

[11] Gunnar Carlsson and Vin de Silva Zigzag persis-tence Foundations of Computational Mathematics 10(2010) no 4 367ndash405

[12] Gunnar Carlsson and Afra Zomorodian The the-ory of multidimensional persistence Discrete andComputational Geometry 42 (2009) 71ndash93

[13] James Degnan and Noah Rosenberg Discordance ofspecies trees with their most likely gene trees PublicLibrary of Science (PLoS) Genetics 3 (2006) 762ndash768

[14] Herbert Edelsbrunner David Letscher and AfraZomorodian Topological persistence and simpli-fication Discrete and Computational Geometry 28511ndash533

[15] Patrizio Frosini and Claudia Landi Size functionsand formal series Applicable Algebra in EngineeringCommunication and Computing 12 (2001) no 4 327ndash349

[16] Mark Goresky and Robert MacPherson StratifiedMorse Theory Ergebnisse der Mathematik und ihrerGrenzgebiete (3) [Results in Mathematics and RelatedAreas (3)] vol 14 Springer-Verlag 1988

[17] H Hendriks and Z Landsman Asymptotic behaviourof sample mean location for manifolds Statistics andProbability Letters 26 (1996) 169ndash178

[18] Mean location and sample mean location onmanifolds Asymptotics tests confidence regionsJournal of Multivariate Analysis 67 (1998) 227ndash243

[19] Thomas Hotz Stephan Huckemann Huiling LeJ S Marron Jonathan C Mattingly Ezra MillerJames Nolen Megan Owen Vic Patrangenaru andSean Skwerer Sticky central limit theorems on openbooks Annals of Applied Probability 23 (2013) no 62238ndash2258

[20] David Houle Jason Mezey Paul Galpern and Ash-ley Carter Automated measurement of Drosophilawings BioMed Central (BBMC) Evolutionary Biology 3(2003) 25ndash37

[21] David Houle Numbering the hairs on our heads Theshared challenge and promise of phenomics Proceed-ings of the National Academy of Sciences USA 107(2010) 1793ndash1799

[22] Stephan Huckemann Inference on 3D Procrustesmeans Tree boles growth rank-deficient diffusion ten-sors and perturbation models Scandinavian Journalof Statistics 38 (2011) no 3 424ndash446

[23] On the meaning of mean shape Manifoldstability locus and the two sample test Annals ofthe Institute of Statistical Mathematics 64 (2012)1227ndash1259

[24] Stephan Huckemann Jonathan C MattinglyEzra Miller and James Nolen Sticky centrallimit theorems at isolated hyperbolic planarsingularities Electronic Journal of Probability20 (2015) no 78 1ndash34 doi101214EJPv20-3887arXivmathPR14106879

[25] John P Huelsenbeck and Fredrik Ronquist Mr-Bayes Bayesian inference of phylogenetic treesBioinformatics 17 (2001) 754ndash755

[26] P Jupp Residuals for directional data Journal ofApplied Statistics 15 (1988) no 2 137ndash147

[27] David G Kendall Shape manifolds Procrustean met-rics and complex projective spaces Bulletin of theLondon Mathematical Society 16 (1984) 81ndash121

[28] David G Kendall Dennis Barden Thomas KCarne and Huiling Le Shape and Shape Theory

November 2015 Notices of the AMS 1183

A M E R I C A N M AT H E M AT I C A L S O C I E T Y

Sign up to be a member of the AMS today

wwwamsorgmembership

With your membership you will receive special discounts and free

shipping at any on-site AMS exhibit bookstore

Do you attend JMMHow about any AMS Sectional Meetings

Did you know that your savings from attending these

meetings will practically pay for your AMS membership

Wiley Series in Probability and Statistics John Wiley ampSons Ltd Chichester UK 1999

[29] Kevin Knudson A refinement of multi-dimensionalpersistence Homology Homotopy and Applications10 (2008) no 1 259ndash281

[30] Wayne P Maddison Gene trees in species treesSystematic Biology 46 (1997) 523ndash536

[31] Ezra Miller CohenndashMacaulay quotients of nor-mal semigroup rings via irreducible resolutionsMathematical Research Letters 9 (2002) no 1117ndash128

[32] Ezra Miller Megan Owen and Scott Provan Poly-hedral computational geometry for averaging metricphylogenetic trees Advances in Applied Mathematics15 (2015) 51ndash91

[33] Ezra Miller and Bernd Sturmfels CombinatorialCommutative Algebra Graduate Texts in Mathematicsvol 227 Springer-Verlag 2005

[34] Megan Owen and J S Provan A fast algorithm forcomputing geodesic distance in tree space IEEEACMTrans Comput Biol Bioinform 8 (2011) 2ndash13

[35] Markus J Pflaum Analytic and Geometric Study ofStratified Spaces Lecture Notes in Mathematics vol1768 Springer-Verlag 2001

[36] Bruce Rannala and Ziheng Yang Bayesian phyloge-netic inference using DNA sequences A Markov chainMonte Carlo method Molecular Biology and Evolution14 (1997) 717ndash724

[37] Vanessa Robins Towards computing homology fromfinite approximations Topology Proceedings 24 (1999)no 1 503ndash532

[38] K-T Sturm Probability measures on metric spaces ofnonpositive curvature in Heat Kernels and Analysison Manifolds Graphs and Metric Spaces lecture notesfrom a quarter program on heat kernels random walksand analysis on manifolds and graphs ContemporaryMathematics vol 338 American Mathematical Society2003 pp 357ndash390

[39] K E Weber Selection on wing allometry in Drosophilamelanogaster Genetics 126 (1990) 975ndash989

[40] How small are the smallest selectable domainsof form Genetics 130 (1992) 345ndash353

[41] H Ziezold On expected figures and a strong law oflarge numbers for random elements in quasi-metricspaces in Transactions of the Seventh Prague Con-ference on Information Theory Statistical DecisionFunctions Random Processes and of the Eighth Eu-ropean Meeting of Statisticians (Tech Univ PraguePrague 1974) Vol A Reidel Dordrecht 1977pp 591ndash602

1184 Notices of the AMS Volume 62 Number 10

Page 7: Fruit Flies and Moduli: Interactions between Biology and ... · and, in some cases, our substantial understanding of their geometry and combinatorics, less is known about the probability

A M E R I C A N M AT H E M AT I C A L S O C I E T Y

Sign up to be a member of the AMS today

wwwamsorgmembership

With your membership you will receive special discounts and free

shipping at any on-site AMS exhibit bookstore

Do you attend JMMHow about any AMS Sectional Meetings

Did you know that your savings from attending these

meetings will practically pay for your AMS membership

Wiley Series in Probability and Statistics John Wiley ampSons Ltd Chichester UK 1999

[29] Kevin Knudson A refinement of multi-dimensionalpersistence Homology Homotopy and Applications10 (2008) no 1 259ndash281

[30] Wayne P Maddison Gene trees in species treesSystematic Biology 46 (1997) 523ndash536

[31] Ezra Miller CohenndashMacaulay quotients of nor-mal semigroup rings via irreducible resolutionsMathematical Research Letters 9 (2002) no 1117ndash128

[32] Ezra Miller Megan Owen and Scott Provan Poly-hedral computational geometry for averaging metricphylogenetic trees Advances in Applied Mathematics15 (2015) 51ndash91

[33] Ezra Miller and Bernd Sturmfels CombinatorialCommutative Algebra Graduate Texts in Mathematicsvol 227 Springer-Verlag 2005

[34] Megan Owen and J S Provan A fast algorithm forcomputing geodesic distance in tree space IEEEACMTrans Comput Biol Bioinform 8 (2011) 2ndash13

[35] Markus J Pflaum Analytic and Geometric Study ofStratified Spaces Lecture Notes in Mathematics vol1768 Springer-Verlag 2001

[36] Bruce Rannala and Ziheng Yang Bayesian phyloge-netic inference using DNA sequences A Markov chainMonte Carlo method Molecular Biology and Evolution14 (1997) 717ndash724

[37] Vanessa Robins Towards computing homology fromfinite approximations Topology Proceedings 24 (1999)no 1 503ndash532

[38] K-T Sturm Probability measures on metric spaces ofnonpositive curvature in Heat Kernels and Analysison Manifolds Graphs and Metric Spaces lecture notesfrom a quarter program on heat kernels random walksand analysis on manifolds and graphs ContemporaryMathematics vol 338 American Mathematical Society2003 pp 357ndash390

[39] K E Weber Selection on wing allometry in Drosophilamelanogaster Genetics 126 (1990) 975ndash989

[40] How small are the smallest selectable domainsof form Genetics 130 (1992) 345ndash353

[41] H Ziezold On expected figures and a strong law oflarge numbers for random elements in quasi-metricspaces in Transactions of the Seventh Prague Con-ference on Information Theory Statistical DecisionFunctions Random Processes and of the Eighth Eu-ropean Meeting of Statisticians (Tech Univ PraguePrague 1974) Vol A Reidel Dordrecht 1977pp 591ndash602

1184 Notices of the AMS Volume 62 Number 10