Introduction to functional data analysis. Canadian - Daniel Levitin

21
Abstract Psychologists and behavioural scientists are increasing- ly collecting data that are drawn from continuous underlying processes. We describe a set of quantitative methods, Functional Data Analysis (FDA), which can answer a number of questions that traditional statisti- cal approaches cannot. These methods are applicable for analyzing many datasets that are common in exper- imental psychology, including time series data, repeat- ed measures, and data distributed over time or space as in neuroimaging experiments. The primary advan- tage of FDA is that it allows the researcher to ask ques- tions about when in a time series differences may exist between two or more sets of observations. We discuss functional correlations, principal components, the derivatives of functional curves, and analysis of vari- ances models. Introduction to Functional Data Analysis New types of data require new tools for analysis. Our aim in this paper is to introduce functional data and the methods that have been recently developed to analyze these data. Although functional data them- selves are not new, a new conception of them is neces- sary as a result of the increasing sophistication of data collection in the behavioural sciences. Data collection technology has evolved over the last two decades so as to permit observations densely sampled over time, space, and other continua; these data usually reflect the influence of certain smooth functions that are assumed to underlie and to generate the observations. Classical multivariate statistical methods may be applied to such data, but they cannot take advantage of the additional information implied by the smooth- ness of the underlying functions. The functional data analysis (FDA) methods that we describe here can often extract additional information contained in the functions and their derivatives, not normally available through traditional methods. The goals of this paper are to survey some of this methodology rather than to offer a detailed account, to provide a conceptual and intuitive introduction, and to provide links to further resources (e.g., Ramsay & Silverman, 2002, 2005; Vines, Nuzzo, & Levitin, 2005). We aim to show in broad strokes the kinds of data that can be treated, and the kinds of questions that can be answered within an FDA framework. We will illustrate how FDA works in an interesting experi- mental context on music and emotion that brings together much of the novelty and power of FDA (Vines, Nuzzo, Krumhansl, Wanderley, & Levitin, 2006). We chose this example because very different data from studies in dozens of other domains can be analyzed under a quite similar approach to that taken in our illustration. We do not assume any particular level of background for readers of the present article. We have attempted to make this introduction intuitive, rather than mathematical (with references to more mathematical treatments for those who are interest- ed), and we have attempted to make this article con- ceptual rather than a “step-by-step” guide about how to perform FDA (with references to such resources also included). We introduce a few equations to establish a mathematical basis for some of the concepts, but these require nothing more complicated than first year cal- culus, and nonmathematical readers can skip these with no loss of comprehension of the larger concepts we are trying to describe. By the end of this article, we hope that readers will be able to know whether or not some of their own data may be considered functional data, what advantages they may see by analyzing their data using FDA, and finally, where to look next to begin conducting such analyses. What are Functional Data? A functional datum is not a single observation but rather a set of measurements along a continuum that, taken together, are to be regarded as a single entity, curve or image. Usually the continuum is time, and in this case the data are commonly called “longitudinal.” Introduction to Functional Data Analysis DANIEL J. LEVITIN McGill University REGINA L. NUZZO Gallaudet University, Washington, D.C. BRADLEY W. VINES University of California at Davis J. O. RAMSAY McGill University Canadian Psychology Copyright 2007 by the Canadian Psychological Association 2007, Vol. 48, No. 3, 135-155 DOI: 10.1037/cp2007014 Canadian Psychology/Psychologie canadienne, 2007, 48:3, 135-155

Transcript of Introduction to functional data analysis. Canadian - Daniel Levitin

Page 1: Introduction to functional data analysis. Canadian - Daniel Levitin

AbstractPsychologists and behavioural scientists are increasing-ly collecting data that are drawn from continuousunderlying processes. We describe a set of quantitativemethods, Functional Data Analysis (FDA), which cananswer a number of questions that traditional statisti-cal approaches cannot. These methods are applicablefor analyzing many datasets that are common in exper-imental psychology, including time series data, repeat-ed measures, and data distributed over time or spaceas in neuroimaging experiments. The primary advan-tage of FDA is that it allows the researcher to ask ques-tions about when in a time series differences may existbetween two or more sets of observations. We discussfunctional correlations, principal components, thederivatives of functional curves, and analysis of vari-ances models.

Introduction to Functional Data AnalysisNew types of data require new tools for analysis.

Our aim in this paper is to introduce functional dataand the methods that have been recently developed toanalyze these data. Although functional data them-selves are not new, a new conception of them is neces-sary as a result of the increasing sophistication of datacollection in the behavioural sciences. Data collectiontechnology has evolved over the last two decades so asto permit observations densely sampled over time,space, and other continua; these data usually reflectthe influence of certain smooth functions that areassumed to underlie and to generate the observations.

Classical multivariate statistical methods may beapplied to such data, but they cannot take advantageof the additional information implied by the smooth-ness of the underlying functions. The functional dataanalysis (FDA) methods that we describe here canoften extract additional information contained in thefunctions and their derivatives, not normally availablethrough traditional methods.

The goals of this paper are to survey some of thismethodology rather than to offer a detailed account,to provide a conceptual and intuitive introduction,and to provide links to further resources (e.g., Ramsay& Silverman, 2002, 2005; Vines, Nuzzo, & Levitin,2005). We aim to show in broad strokes the kinds ofdata that can be treated, and the kinds of questionsthat can be answered within an FDA framework. Wewill illustrate how FDA works in an interesting experi-mental context on music and emotion that bringstogether much of the novelty and power of FDA(Vines, Nuzzo, Krumhansl, Wanderley, & Levitin,2006). We chose this example because very differentdata from studies in dozens of other domains can beanalyzed under a quite similar approach to that takenin our illustration. We do not assume any particularlevel of background for readers of the present article.We have attempted to make this introduction intuitive,rather than mathematical (with references to moremathematical treatments for those who are interest-ed), and we have attempted to make this article con-ceptual rather than a “step-by-step” guide about howto perform FDA (with references to such resources alsoincluded). We introduce a few equations to establish amathematical basis for some of the concepts, but theserequire nothing more complicated than first year cal-culus, and nonmathematical readers can skip thesewith no loss of comprehension of the larger conceptswe are trying to describe. By the end of this article, wehope that readers will be able to know whether or notsome of their own data may be considered functionaldata, what advantages they may see by analyzing theirdata using FDA, and finally, where to look next tobegin conducting such analyses.

What are Functional Data?A functional datum is not a single observation but

rather a set of measurements along a continuum that,taken together, are to be regarded as a single entity,curve or image. Usually the continuum is time, and inthis case the data are commonly called “longitudinal.”

Introduction to Functional Data Analysis

DANIEL J. LEVITINMcGill University

REGINA L. NUZZOGallaudet University, Washington, D.C.

BRADLEY W. VINESUniversity of California at Davis

J. O. RAMSAYMcGill University

Canadian Psychology Copyright 2007 by the Canadian Psychological Association2007, Vol. 48, No. 3, 135-155 DOI: 10.1037/cp2007014

Canadian Psychology/Psychologie canadienne, 2007, 48:3, 135-155

CP 48-3 9/18/07 11:17 AM Page 135

Page 2: Introduction to functional data analysis. Canadian - Daniel Levitin

136 Levitin, Nuzzo, Vines, and Ramsay

But any continuous domain is possible, and thedomain may be multidimensional. In neuroimaging,for example, the activation levels observed at voxels inthe brain are the responses over the two or threedimensions of space and possibly time as well. Or, datamay be distributed over a continuous psychophysicalspace (such as vowel space, musical pitch space, orcolour space).1 In this paper, we deal primarily with

functional data measured over time, but the tech-niques described can be applied to other domains.

Many active domains of current research in psy-chology and the behavioural sciences produce func-tional data. Psychobiological functions that changeover time such as the levels of hormones in the blood,emotions, or the number and intensity of depressiveepisodes, are just a few of many examples. Humancommunication – including spoken and signed lan-guage, music performance and perception – is anoth-er domain in which the data evolve over time, and theresearcher may wish to test hypotheses about the timeat which certain events, or responses to those events,occur. Boker, Xu, Rotondo, and King (2002) note thathuman communication violates the assumption of sta-tionarity (i.e., for which consecutive values are depen-dent over time in a nonpredictable way; for furtherdiscussion of stationarity see Box, Jenkins, & Reinsel,1994; Hendry & Juselius, 2000; Shao & Chen, 1987)and suggest that understanding this nonstationaritymay be crucial to understanding the phenomenon.We believe such cases can be profitably understoodthrough the application of FDA, and that the nature ofthe resulting functions can provide crucial informa-tion for understanding human behaviour.

Figure 1 displays many of the features of functionaldata, and the experiment giving rise to them will beused throughout the paper as a source of illustrations.The aim of the experiment was to quantify the relative

Figure 1. An example of functional data: Tension judgments made by four typicalparticipants in the “auditory only” condition of the experiment described in thetext. Each line represents one participant’s judgment over the 80-second dura-tion of the musical performance.

1 By “psychophysical space,” we mean that model of human per-ception or mental representation that is derived from psychophy-sical experimentation, and is often represented as a manifold in ndimensions, and which does not necessarily possess the samedimensionality as the corresponding physical stimuli that gave riseto it.

For example, the physical representation of colour hue can bethought of as one-dimensional: As wavelength decreases, our per-ception of colour passes through the familiar rainbow from red toorange to yellow, etc. But our psychological representation is actu-ally two-dimensional – circular – as evidenced by experimentalfindings that colours at the opposite ends of the physical continu-um, red and violet, are judged to be more similar than coloursthat are closer together on the physical continuum such as redand green. Thus, the psychological (or psychophysical) space istwo-dimensional and continuous, and data collected from judg-ments within this space could be considered functional in (psy-chophysical) space. Similarly, judgments of vowel similarity(Shepard, 1972; Sinnott, Brown, Malik, & Kressley, 1997), musicalpitch (Krumhansl, 1990; Shepard, 1982), and musical timbre simi-larity (McAdams, Winsberg, Donnadieu, & De Soeta, 1995; Plomp,1970) suggest three-dimensional representations.

CP 48-3 9/18/07 11:17 AM Page 136

Page 3: Introduction to functional data analysis. Canadian - Daniel Levitin

Introduction to Functional Data Analysis 137

roles of auditory and visual information in conveyingemotional and structural aspects of music. The experi-mental participants were presented with a video-tapedmusical performance under one of three conditions:audio-visual combined (the normal way in which avideo of the performances would be viewed), audioonly, and visual only. The dependent variable was thecontinuous records of the position of a slider (linearpotentiometer) that participants controlled to indi-cate the tension that they were experiencing in realtime. It is apparent from Figure 1 that there existsconsiderable detail in these records, and that theyexhibit some features that are stable across partici-pants, as well as variation that is participant-specific(including interparticipant differences in motoraction planning, interpretation of the task, and deci-sion variability), and undoubtedly some high frequen-cy variation that we will treat as experimental errorresulting from nuisance variables such as measure-ment error, random factors, etc.

One traditional approach to data analysis wouldrequire that we obtain an average for each of thethree conditions over the time course of the experi-ment, and compare those means. Yet it is apparentfrom Figure 1 that such an average would obscureinteresting time-based variations in the ratings. Usingan ANOVA, we could answer the simple question aboutwhich of the three conditions yielded the highest rat-ings, and with either planned orthogonal contrasts or

post-hoc tests, we could ascertain which means, if any,differed from the others. A second traditionalapproach might be to treat each observation as arepeated measure, and conduct an RM ANOVA orMANOVA, neither of which is satisfactory because ofthe obvious autocorrelation between successive points:With the linear slider used by participants, observa-tions are not independent because the slider mustpass through intermediate points on its way from onedesired position to another; the data violate stationari-ty. In addition, there are other factors to consider insuch data, such as regression to the mean, and thepotential underlying phasic attributes of the musicalpiece being judged. Finally, traditional GLM-basedmethods would not allow for the precise comparisonof interparticipant responses, which may also be ofinterest. Although one could conduct correlationalanalyses among participants, factors such as differen-tial reaction times and differential use of the scalescould obscure relations in the data. Thus, traditionalstatistics would allow us to answer only simple ques-tions, and then, only with certain independenceassumptions violated.

To begin the FDA approach, we make the assump-tion that the underlying process generating the data issmooth. The tension in music experienced by a listen-er rises and falls in response to features in the musicthat evolve over time, and typically without abrupt,instantaneous changes. The observed data are subject

Figure 2. Top panel: Tension judgments for a single participant from Figure 1.Bottom panel: The first derivative of the tension judgments shown in the toppanel.

CP 48-3 9/18/07 11:17 AM Page 137

Page 4: Introduction to functional data analysis. Canadian - Daniel Levitin

138 Levitin, Nuzzo, Vines, and Ramsay

to measurement error and other types of local distur-bances that may mask this smoothness, but FDAincludes techniques to identify the smooth sources ofvariation by filtering out the unwanted variability.Other examples of processes assumed to be smoothinclude the depression level of a patient over time, theevoked response potential (ERP) measured on thescalp with surface electrodes, finger tapping inresponse to mental imagery or synchronously with anauditory stimulus, body core temperature, and theflow of blood through a region in the cortex inresponse to event-related tasks.

The continuous underlying process may beobserved not only at multiple time points (the dis-crete observations which, in our music data, are taken10 times per second), but at multiple manifestationsas well (the tension curves are replicated across partic-ipants). FDA was designed to take advantage of replica-tions, and in particular those produced by controlledexperiments.

When we say that a curve is smooth, we usually meanthat it is differentiable to a certain degree, implyingthat a number of derivatives can be derived or estimat-ed from the data. These derivatives, especially the firsttwo, “velocity” and “acceleration,” often have interpre-tations relevant to the study, and thus analyzing thesederivatives can be an important component of FDA(Vines et al., 2005). For example, we might want toknow the rate of change in tension in response to aparticular musical event, in addition to the level oftension. Useful derivative estimates require both anumber of sampling points and a signal-to-noise ratiothat is sufficiently high, features that are not oftenfound in the kind of longitudinal data that are sub-jected to hierarchical or multilevel linear modeling.

Derivatives are used in other ways in FDA, and theexploitation of derivative information is a centraltheme. Figure 2 shows the functional observation oftension for a single participant; on the top, the dis-crete raw data are shown as points and an estimate ofthe smooth underlying tension curve is indicated by acontinuous line. In the lower panel, the first derivativeof this curve estimate is displayed. The use and inter-pretation of derivatives in functional data is a rathernew, specialized topic in FDA (see Ramsay &Silverman, 2005; Vines et al., 2005). Intuitively, onecan imagine the ways in which such derivative infor-mation can reveal patterns in a (functional) datasetthat address important research questions. In the caseof depression levels in patients, one might want toknow not just when they reach a peak, but how rapidlythey rise and fall (the first derivative). In studies ofregional cerebral blood flow (rCBF), one might wantto know not just which neural structures are being

supplied by blood at a given point in time, and notjust how fast the blood is traveling from one region toanother, but whether the blood flow rate shows signifi-cant acceleration or deceleration in response to par-ticular tasks.

An Overview of Functional Data AnalysesFDA extends the capabilities of traditional statistical

techniques in a number of ways. Studying the variabili-ty of a dataset when the observations are curvesinstead of points requires that standard tools be adapt-ed for this functional framework, and that new toolsbe created to take advantage of the unique character-istics of functional data. For example, the availabilityof derivatives permits the use of differential equationsas models, and thus introduces descriptions of thedynamics of processes as well as their static features.Since the underlying processes one studies with FDAare, by definition, changing over time (or space, orsome other continuum), the ability to quantify thesedynamics is a principal advantage of these techniquesover traditional static techniques.

A comparison of FDA with traditional multivariatestatistics (MVA) is valid at many levels. In classical sta-tistics, a sample of discrete points is taken from a largepopulation. This sample may contain many sources ofvariation: Initial variability across the units being mea-sured, measurement error, variability due to sampling,inability to reproduce the treatment conditions exact-ly from one unit to another, and the systematic vari-ability that is the goal of the analysis. In a functionalsetting, a sample of curves is taken from a larger popu-lation of curves. This functional sample contains thesame sources of variation as does a discrete sample;now, however, there is the additional challenge ofquantifying variability within and across curves asopposed to discrete points.

The relationship between FDA and models forrepeated measures or longitudinal data of the mixedeffects and structural equation varieties is even closer.As a rule, FDA is concerned with data able to supportthe estimation of more complex curves than is usuallythe case with these older approaches. FDA data tend tohave a higher signal-to-noise ratio in each observedvalue, and to be observed over a number of samplingpoints ranging from 20 or 30 to tens of thousands ormore. As a result of this greater resolving power, theestimation and use of information in derivatives tendsto be central to FDA. (For reviews of multivariatemethods for longitudinal data see Cudeck, 1996, andPinhiero & Bates, 2000; Ramsay, 2002 provides furtherdetail on the relationship between FDA and other lon-gitudinal data approaches.)

A typical analysis of functional data begins by using

CP 48-3 9/18/07 11:17 AM Page 138

Page 5: Introduction to functional data analysis. Canadian - Daniel Levitin

Introduction to Functional Data Analysis 139

nonparametric smoothing techniques to representeach observation as a functional object. The originaldata are then set aside, and the estimated curves them-selves are used for the rest of the analysis. Further pro-cessing of the curves is also possible, such as the takingof derivatives or performing transformations.

These functional objects are next usually repre-sented in a Cartesian coordinate system with timealong the x-axis and the value of the dependent vari-able along the y-axis. At this point it may be apparentthat there are two sources of variability in the curvesthat comprise the functional data, each one corre-sponding to one of the dimensions of the graph of thecurve: “Amplitude variability” having to do with varia-tion in the shapes of the curves as reflected in peaksand valleys (along the y-axis), and “phase variability”having to do with shifts in the timings of these fea-tures (along the x-axis). If appropriate, this amplitudeand phase variability can be extracted and removedthrough scaling and curve registration techniques,respectively, after which these sources of variation canthemselves be further analyzed and related to oneanother, or to create individual difference profiles forexperimental units or participants.

As with traditional techniques, the exploration offunctional data is often preliminary to more struc-tured analyses, and we have found that plottingobserved effects is a crucial stage in the interpretationof these analyses that can greatly aid in choosing fur-ther analyses (cf. Cleveland, 1993; Tufte, 1983; Tukey,1977). Given a single functional variable, we may wantto estimate the correlation between curve values takenat two time points s and t; the result is a smooth bivari-ate correlation function r(s,t) that is the analogue ofthe correlation matrix in MVA. Principal componentsanalysis (PCA) is as valuable in FDA as it is in MVA forexploring the structure of the correlation function,and PCA has been adapted to permit the smoothing ofestimated functional principal components.

A functional linear model may involve functional dataon either side of the equation, and usually in conjunc-tion with multivariate observations. For example, afunctional analysis of variance (fANOVA) involves afunctional response modulated by a set of experimen-tal conditions, and in this context it can be useful toview indicators of effects such as F-ratios and squaredmultiple correlations as themselves functions of timeor whatever other continua are involved. This intro-duces interesting new versions of the multiple com-parison problem, and solutions are presently underdevelopment.

Linear models involving functional covariatesrequire the estimate of regression functions as opposedto MVA’s regression coefficients. Functional indepen-

dent variables also introduce the need for imposingsmoothness on their regression functions because afunctional covariate can represent a number ofdegrees of freedom for fit that exceeds the samplesize, and this could in principal produce over-fitting ofthe response data.

When a data set involves two or more variables,investigators are usually interested in how they co-vary.The cross-correlation function r(s,t) then consists ofestimated correlations between one set of function val-ues at time s and the set of functions values at time t.This information can be displayed graphically, but weusually want to supplement such displays by methodsthat look for more specific correlational structures.Canonical correlation analysis is perhaps more usefulin FDA than it tends to be in MVA, in part becausecanonical variables can be smoothed to avoid havingthem reflect too much uninteresting high-frequencyvariation.

Boker et al. (2002) describe an interesting exampleof functional data in which windowed cross-correla-tions are used to identify linkages between functionalvariables that may be specific to certain time intervalsthat may be lagged by an amount that must be esti-mated from the data. Their experiments involving ges-tures and body movements for couples during conver-sation and dance are compelling illustrations of thespirit of FDA at work. We now consider the stages ofFDA in more detail.

Applying FDADiscrete Functional Data

We use the notation x(tj) for the value of a contin-uous underlying process at time tj, and the notation yjto indicate a corresponding noisy observation of theprocess x(t), where j = 1,…,n. A single functionalobservation therefore consists of n pairs (tj, yj) , andwe want to use these data to estimate characteristics ofx(t) such as function or derivative values at unob-served time points.

Replications of this underlying process may be avail-able. Again, by replications we mean either multiplesamples of a single process (improving our estimates ofthe actual shape of the curve that represents theunderlying process), or multiple participants providingdata on the same dependent variable(s). Earlier, wepointed out that FDA is intended to be used on datathat come from a smooth and continuous process.However, virtually all data collection that we know ofproduces noncontinuous observations, taken as sam-ples at discrete points in time. Even when the samplingrate is very high (as in audio recordings routinelymade at 44 KHz or even 192 KHz sampling rate), thepoints, technically, are not continuous. Thus, when we

CP 48-3 9/18/07 11:17 AM Page 139

Page 6: Introduction to functional data analysis. Canadian - Daniel Levitin

140 Levitin, Nuzzo, Vines, and Ramsay

refer to discrete observations, the assumption is thatenough observations exist to model the underlyingprocess (cf. Nyquist, 1928; Shannon, 1949), and this islargely a matter of experience, experimenter intuition,and trial-and-error. Thus, because FDA attempts tomodel a smooth process from discrete points, it is gen-erally inappropriate for small datasets, and its greatestadvantages over traditional methods of analysis occurwith especially large datasets.

The ith replication of the underlying process iswritten as xi(t), and data values corresponding to timepoints tij are written as yij (and where t can take on anyreal values). Often the observations are taken at thesame arguments tj for each replication; in this paper,we will assume for notational simplicity this is true,that is, that tij = tj for each i, although in actual appli-cations the locations and even the number of timepoints can vary from replication to replication. Thismight occur, for example, in an experiment in whicha dependent variable was to be observed at precisepoints in time, but one or more measurements weretaken somewhat off-schedule. This happens often inclinical settings. Imagine one wanted to measureblood serum concentrations of a particular pharmaco-logical agent and blood samples are scheduled to bedrawn at 60-minute intervals. A given sample for a par-ticular participant or trial might be drawn some min-utes earlier or later than the nominal time due to fac-tors external to the experiment or due to experi-menter error (e.g., Nuzzo, 2002). Such variability canbe handled with techniques described in the section“Registration of Functional Objects” below (see alsoRamsay & Silverman, 2005).

Although measurement error usually does exist inthe observations yj, we will assume for the momentthat the tj are measured without error. We can relatethe discrete observations to the underlying smoothprocess through the equation yij = xi(tj) + εij, whereeach εij is the disturbance or error term, assumed tobe distributed with mean zero and finite variance σ2.

Building Smooth Curves From Discrete DataThe first step in a functional data analysis is to con-

vert the raw data into functional objects. To do this, acurve is fit to the discrete observations, which approxi-mate the continuous underlying process. Then thediscrete points are set aside and the functional objectsretained for subsequent analyses.

A common procedure in statistics, mathematics,and engineering for representing discrete data as asmooth function is the use of a basis expansion.

xi(t) = ci1 φ1(t) + … cik φk(t) + … + ciK φK(t), (1)

where K indicates the number of basis functions andfk(t) is the value of the kth basis function at argumentvalue t. The basis functions fk(t) are a system of func-tions specially chosen to use as building blocks to rep-resent a smooth curve. There are many different typesof basis function systems, of which the best known arepowers of t or monomials, used to construct polyno-mials, and the Fourier series used for periodic curves.The cik are the basis coefficients and determine the rela-tive weights of each basis function in constructing thebuilt curve for curve i. Estimating a curve that isexpressed as a basis function expansion then reducesto the essentially multivariate parameter estimationproblem of estimating these coefficients.

When curves are not periodic and are complex, sothat a low-order polynomial cannot capture their fea-tures, the most commonly used basis functions are theB-spline functions (de Boor, 2001; Shikin & Plis, 1995;Unser, 1999), and these will be used in this paper torepresent the tension curves in our illustrative musicand emotion study. B-splines are essentially polynomi-als joined end-to-end at a set of interval boundariescalled knots. This structure allows for differingamounts of smoothness or roughness at various placesalong the curve, that is, each strand between knotscan be parameterized in an optimal fashion. Knots areoften chosen to be equally spaced, but they may alsooccur at the exact times the data were actuallyobserved, or at prespecified time points of interest. B-splines are characterized by their order, which is onelarger than the degree of the polynomials from whichthey are constructed; for instance, B-splines of Order3 comprise quadratic curves joined at the knots, andB-splines of Order 2 are constructed from piecewiselines (i.e., line segments). A set of B-splines, com-posed of a B-spline which has been replicated andshifted down the number line, can serve as basis func-tions. Across a wide range of datasets, it has beenfound that Order 4 B-splines appear reasonablysmooth to the eye and thus they are often used instandard smoothing packages.

B-splines are normalized so that for any fixed valuet, the sum of all B-spline basis functions at that point isone. This normalization means that a coefficient cik isapproximately the value of the curve being fit at thelocation where the kth B-spline basis function achievesits maximum.

The basis coefficient weights are chosen so that theconstructed curve will optimally fit the data for a cer-tain degree of smoothing. The amount of smoothing iscontrolled by the number of basis functions used: Thegreater the number of basis functions K is, the morethe built curve will exactly fit or interpolate the dis-crete points; the smaller K is, the more the curve is a

CP 48-3 9/18/07 11:17 AM Page 140

Page 7: Introduction to functional data analysis. Canadian - Daniel Levitin

Introduction to Functional Data Analysis 141

smoothed version of the points, but at a cost of areduced capacity to capture sharper features in thecurves.

There is a decision associated with choosing thenumber of basis functions, often framed in terms ofthe bias-variance trade-off common in many statisticalanalyses. A high number of basis functions will yield acurve that is more faithful to the observed data (lowbias) but that is often less smooth (high variance).Using a small number of basis functions will produce acurve that places less importance on interpolating thediscrete points (high bias) but more importance onsmoothness (low variance). Put another way, under-smoothing of the curves leaves in artifacts and variabili-ty (“noise”) that are not truly part of the process beingobserved; the resulting curve may thus represent per-turbations in the observation process that do not per-tain to the analysis being sought or the research ques-tion being asked. On the other hand, over-smoothingdiscards small-scale, high-frequency behavioural datathat may be part of the process we wish to observe andanalyze – maybe even the most interesting part! Thus,there is an art associated with building a curve throughbasis expansion, and the analyst should cautiouslyexperiment with various parameters, driven by theoriesabout the underlying process whenever possible.Finally, a data-driven approach to basis function speci-fication is also possible; functional principal compo-nents analysis, described below, can be viewed as amethod for constructing an optimal orthogonal basisof fixed dimensionality, and these are often referred toas empirical basis functions. Automatic methods for basisexpansion is an area of active research (e.g., Koenker& Mizera, 2004), but in their present state of develop-ment should not take precedence over informed andcareful judgment.

Another way to build a smooth curve from discretedata is through the use of a roughness penalty (PENbelow). Here we have direct control over the amountof smoothing, and the problem is formulated as adirect trade-off between fitting the data and produc-ing a smooth curve. We often fit the data using as ourgoal the minimization of the least squares criterionSSE = ∑j{yj - x(tj)}2. The smoothness of the fitted curvecan be quantified using the criterion PEN2 = ∫{D2 x(s)}2

ds, which many readers will recognize as the integrat-ed squared second derivative. This penalty describesthe total curvature in the curve x, and is popular inthe smoothing literature (Simonoff, 1996). The small-er the size PEN2 is, the closer to a straight line thecurve will be. Of course, these two criteria, data fit andsmoothness, are normally in conflict, and we can putthem together to create a penalized residual sum ofsquares

PENSSEλ = SSE + λ PEN2. (2)

The smoothing parameter λ controls the amountof importance we place on each of the two competinggoals: As λ approaches infinity, the curve approachesthe straight line obtained from a linear regression; asλ approaches zero, the curve approaches a directinterpolant of the data, with cubic polynomials con-necting the observed points. The roughness penaltymethod makes explicit the bias-variance trade-offexplicit, and allows us to directly quantify our priori-ties through the choice of λ. Again, automatic meth-ods exist for choosing the best value of λ, the mostpopular of which involves a technique of partitioningthe data called cross-validation (Ramsay & Silverman,2005). However, knowledge of the process generatingthe data and a spirit of experimentation often proveto be the most important tools in the area of splinesmoothing (as they often are in all data analysis).

The use of roughness penalties is not limited to thecreation of a functional object from discrete data. Theidea of making explicit the competing goals in ananalysis is a powerful one in the field of statistics.Green and Silverman’s (1994) book contains manyexamples of applications of this idea. Since they lendthemselves naturally to working with curves andsmoothness, roughness penalties are also very helpfulfor adapting the tools of classical multivariate statisticsto an FDA framework.

Basis function expansions, whether roughnesspenalized or not, are by no means the only options forsmooth curve-fitting, and Fan and Gijbels (1996) canbe consulted for information on local polynomialsmoothing, for example.

Registration of Functional ObjectsConsider a case in which we graph our dependent

variable (on the y-axis) as a function of time (on the x-axis). As mentioned above, it is common in functionaldata analysis to speak of two types of variability in theobserved curves: amplitude variation, which deals withthe differences in height between the curves, andphase variation, which deals with the differences intiming of important features between the curves. Forexample, pure amplitude variation might occur whentwo individuals react simultaneously to the same musi-cal event, but differ in the intensity of their reaction,giving rise to two curves of different heights (but withtemporally synchronized peaks). Pure phase variationmight occur when the individuals react with the sameintensity to a specific musical event, but differ in whenthey react, giving rise to curves that are temporallymisaligned (but with concordant amplitudes).

The possibility of phase variation is characteristic of

CP 48-3 9/18/07 11:17 AM Page 141

Page 8: Introduction to functional data analysis. Canadian - Daniel Levitin

142 Levitin, Nuzzo, Vines, and Ramsay

functional data. Often in functional data, it is theamplitude variation that is of primary interest, sincethis describes how the heights or intensities of thecurves vary among participants (or among any groupsthat may exist), and in this case it can be critical thatphase variation is first removed. In other circum-stances, the phase variation is actually the measure ofinterest and can be analyzed itself. For example, oneresearcher may be interested only in the extent ofresponses, and regard variation in the timings of reac-tions to be a nuisance (the result of differing decisiontimes, reaction times, and neural clock speeds amongparticipants); one would then seek a technique to cor-rect event times for certain participants so that allreactions are simultaneous. Such an approach wouldbe consistent with the notion that the dependent vari-able is tracking an underlying process that has peaksand valleys at certain objective time points, with thetiming variations representing simply individual differ-ences in reaction time that are not of relevance to thestudy. Another researcher, however, may wonderwhether the phase variation is itself due in part to theexperimental manipulations, and want to study thisissue. In either case, the separation of these two curvecharacteristics in some way is required.

It is helpful to think of time as being a variable thatcan be manipulated. In the case of the tension ratings,clock time can be different than reaction time, and wewould usually like to transform the clock time for eachrater to reflect a standardized biological time. In time-warping registration, the idea is to scale, by locallyshrinking or expanding clock time, the moment eachresponse was recorded for each subject so that it con-forms with a standardized biological time. In somesimple situations, this may be a matter of a simple lin-ear transformation of the timings for certain individu-als. But often a more complex time-warping function isrequired that stretches time over some intervals andcompresses it over others.

We call our time warping function h(t). When aprocess is running faster than the standardized time,h(t) > t; if it is running more slowly, h(t) < t. If ouroriginal curve is x(t), the registered curve would bex[h(t)]. Mathematical procedures have been devel-oped to estimate time-warping functions for a dataset,and they have the following characteristics. First, wenormally expect the time-warping function to leavethe beginnings and ends of the data curves intact.That is, h(t0) = t0 and h(tend) = tend. Secondly, weimpose an assumption of monotonicity (an orderingassumption) in that we require that the events in theregistered curves occur in the same order as those inthe unregistered curves. In mathematical terms, wewould require then that h(t2) > h(t1) IFF t2 > t1, which

implies that the time-warping function is monotonical-ly increasing. Lastly, we have the goal that our regis-tered curves all possess the same shape as the originalcurves – that is, only the amplitudes are different, butthe peaks and valleys occur at the same time. Thiswould mean that the registered curves in the datasetare proportional to one another. In mathematicalterms, we say that if curves x1 and x2 are proportionalthen x1(h(t)) = a x2(t) for a positive constant a.

To estimate the time-warping functions for adataset, we again call upon the idea of roughnesspenalties. As before with the curve estimation, we havetwo competing goals in registration: On the one hand,we want to line up all the features of the curves, andon the other hand, we do not want the time-warpingfunction to be too rough, because taken to anextreme this would be “over-registering” the curves inthe same way that it is possible to “over-fit” a model todata. In order to balance these two goals in our regis-tration procedure, we need quantitative measures ofeach. For the first registration goal, researchers havedeveloped a “misfit” measure MF, quantifying theamount of misalignment between features of thecurves, which is analogous to the sums of squarederrors in curve estimation (Kneip, Li, MacGibbon, &Ramsay, 2000; Ramsay & Li, 1998). To quantify thesecond goal of a smooth time-warping function, astandard roughness penalty R is used, which is usuallythe integrated squared second derivative of the time-warping function h. Again as in curve estimation, theestimation of the time-warping function is achieved byminimizing the overall criterion MF + λR, where λ isthe smoothing parameter.

The registration procedure requires a standardizedcurve to which the observed curves will be aligned. Inpractice, a “gold standard” curve is often not available,and so one must be constructed from the data. Agood method is an iterative one: Find the mean func-tion of the original curves in the dataset; register theoriginal curves to this mean curve; find a new meanfunction of these registered curves; and repeat.Usually only one or two iterations are necessary, andregistering the curves to the initial mean function mayeven be satisfactory. As in traditional data analysis, agood rule of thumb is to inspect the results at eachstage to make sure they make sense! The experi-menter may have some knowledge about the underly-ing process that the curves are intended to represent,and this can inform smoothing and registration opera-tions. For example, if the experimenter knows thattwo peaks of activity existed in the original stimulus setat particular points in time, she or he would obviouslywant to avoid any registration solution that collapsesthose two peaks into a single peak.

CP 48-3 9/18/07 11:17 AM Page 142

Page 9: Introduction to functional data analysis. Canadian - Daniel Levitin

Introduction to Functional Data Analysis 143

Adaptation of Classical Methods for Functional DataFunctional analogues for many multivariate statisti-

cal methods have already been developed and pub-lished in the FDA literature (Guo, 2002, 2003;Vidakovic, 2001). These include simple linear models,ANOVA, generalized linear models, generalized addi-tive models, clustering, classification, principal com-ponents analysis, and canonical correlation analysis.In this section we will describe the functional princi-pal components analysis and the functional linearmodel; these are good representative examples ofadapting multivariate methods to the functionalframework.

Functional Principal Components AnalysisPrincipal components analysis (PCA) is a data

reduction technique conducted when one wishes to a)reduce the number of (possibly) correlated variablesto a smaller number of uncorrelated variables, and b)reveal latent structure in the relations between vari-ables. PCA finds the linear combinations of variablesthat highlight the principal modes of variation.

For functional data, on the other hand, the princi-pal components are defined by PC weight functionsvarying over the same range of t as the functionaldata. The PC weight functions serve the same purposeas the multivariate principal components: to charac-terize the principal types of variation among theobserved curves. These components can also be rotat-ed to enhance interpretability. Each unrotated com-ponent is associated to an eigenvalue that indicatesthe variance in the modality that the componentdefines, and the sum of these eigenvalues indicatesthe total variability accounted for by the associated PCweight functions. Principal component scores for eachcurve can also be computed for each rotated or unro-tated principal component. Thus, the only criticalchange in moving from multivariate to functional PCsis that the components are functions rather than vec-tors.

Functional principal components analysis may alsoincorporate a roughness penalty into its methods inorder to ensure that the estimated principal compo-nents are sufficiently smooth. A common roughnesspenalty is the integrated squared second derivative ofthe principal component curve; this penalty was alsoused in the section on curve-fitting.

As with principal component vectors in classicalPCA, we may wish to rotate the principal componentfunctions to increase interpretability. A functionalanalogue has been developed for the classical VARI-MAX rotation strategy, which finds principal compo-nent curves whose values have greater variability aboutzero. However, as in multivariate analysis, there is no

essential requirement that the rotation maintainorthogonality. Other tools of classical PCA can also beused in the functional case; scree plots are helpful todetermine the number of principal component curvesto retain, and plotting scores of the participants onthe first two principal components can highlight hid-den types of subject groups.

Finally, we may choose to apply functional PCA tovarious derivatives or mixtures of them in order tostudy variation in, for example, velocities and acceler-ations.

Functional Linear ModelsMost aspects of the multivariate linear model are

readily applicable to the case of functional data. Thissection will discuss simple linear models, but the sametools can then be generalized to create functionalnonlinear models, functional generalized linear mod-els, and so on. In the functional setting, either theresponse, the predictors, or both variables are curves.

The simplest case is when the response yi(t) is func-tional, and predictors xij are multivariate. In this casethe model is

yi(t) = ∑xij βj(t) + εi(t), (3)

and we note that this is simply a conventional regres-sion analysis for each value of t. Consequently, theregression coefficients βj(t) are functions of time.These regression coefficient curves show how theeffect of the predictor changes the response at eachtime t. In fact, the conventional univariate regressionproblem simply describes how the predictor affectsthe response in the snapshot of time when the datawere collected. However, a new feature in the func-tional situation is the possibility of forcing the regres-sion functions to be smooth by adding one or moreroughness penalties to the usual least squares criteri-on, in much the same manner as we did in (2). Wewill apply this form of the functional linear model tothe tension data in Section 3, where we discuss theeffects of experimental conditions on the tensioncurves.

Functional predictors xi(t) with a scalar response yiare also possible. Here the coefficients are also curves,and they describe the contribution (at every point intime) of the pattern of independent variables towardthe scalar dependent variable. The model looks likethis:

yi = ∫xi(s) β(s) ds + εi (4)

where β(s) is a regression coefficient function and εi isa disturbance or error term. The integral of the prod-

CP 48-3 9/18/07 11:17 AM Page 143

Page 10: Introduction to functional data analysis. Canadian - Daniel Levitin

144 Levitin, Nuzzo, Vines, and Ramsay

uct replaces the sum of products in the classical linearmodel.

But a new problem now intrudes here: Since, ineffect, each time point is analogous to a new indepen-dent variable, implying in turn that we have an infinityof independent variables at our disposal, we have theproblem that this model can fit any finite set of dataexactly. To keep the model from being trivial andtherefore impractical, we must impose a roughnesspenalty on the regression function β(s). By ensuringthat β(s) is sufficiently smooth, we also can discoverwhether the model works or not.

The case of functional predictors and functionalresponses is still more complex: The regression coeffi-cients are then two-dimensional functional surfacesβ(s,t) rather than one-dimensional functional curves.These surfaces detail how each point in time of thepredictor curve xi(s) affects each point in time of theresponse curve yi(t). The model can allow for delayedor lagged influence; for a given point in time, the pastpattern of the predictor can influence the presentpoint in the response curve. Future behaviour of thepredictor variables may also affect the response curveat any given point; we call such predictors oracles. Ingeneral, the influence of the predictor can be moder-ated so that any portion of the predictor curve isincluded in the linear model. The coefficient surfacewill reflect the windows of influence chosen. Furtherdetails can be found in Ramsay and Silverman (2002,2005).

Goodness-of-fit diagnostic techniques from classicallinear models can be adapted for the functional caseas well. The difference is the dependence of theseprocedures on time t in the functional setting. In afunctional residual analysis, the residuals are nowfunctions of time, and a graphical plot of the residualfunctions that show a trend over time is suggestive of amodel with a poor fit. A functional principal compo-nents analysis of the residuals may reveal the presenceof previously unknown patterns in the data that werenot captured by the linear model.

Furthermore, goodness-of-fit statistics can be calcu-lated by using functional analogues of the sums ofsquares. In the functional response/multivariate pre-dictor situation, the sums of squared errors function,for instance, is

SSE(t) = ∑i [yi(t) – Zi β(t)]2, (5)

where Zi is a row vector of covariate values. Similarly,the sums of squares for regression is

SSY(t) = ∑i [yi(t) – ˆµ(t)]2, (6)

where ˆµ(t) is the overall mean function. From this wecan calculate the squared multiple correlation func-tion RSQ with values

RSQ(t) = [SSY(t) – SSE(t)]/SSY(t). (7)

We can also calculate the mean square error func-tion MSE with values MSE(t) = SSE(t)/df(error), wheredf(error) is the degrees of freedom for error, or thenumber of observed curves N less the number of coef-ficient functions β in the model. Similarly, the meansquare regression function can be determined, withvalues MSR(t) = [SSY(t) - SSE(t)]/df(model). Finally,we can compute the F-ratio function F-ratio(t) =MSR(t)/MSE(t).

Inferential StatisticsDetermination of statistical significance for func-

tional models is an open area of research, and Ramsayand Silverman (2005) discuss a variety of issues andsummarize current work.

A simple method for expressing statistically signifi-cant regions of regression coefficient curves involves apointwise comparison of the F-ratio function values tothe critical values of the F distribution with appropri-ate degrees of freedom. Regions in which the F-ratiofunction exceeds the critical value threshold are statis-tically significant. This pointwise method is not fully inthe spirit of functional data analysis, however, andmore appropriate methods might be found in the sim-ulation and resampling of the observed functions.

Differential Equation ModelsA central theme of FDA is the use of information in

derivatives such as velocities and accelerations. Thisleads naturally to models for data in the form of dif-ferential equations. A differential equation expresses arelationship that holds between a function and one ormore of its derivatives. For example, the function x(t)= ekt can also be expressed as the equation Dx(t) =kx(t), where the notation “Dx(t)” means “the velocityor rate of change in x at time t” or “the derivative of xat t.”

Differential equations are standard mathematicaltools for physical and biological scientists and forengineers, but seldom appear in behavioural scienceliteratures. Nevertheless, they offer some exciting pos-sibilities as modeling tools. Methods for estimating dif-ferential equations such as principal differential analy-sis (Ramsay, 1996) are a part of the FDA arsenal. SeeRamsay (2000) for an example of the use of a differ-ential equation to model some exceedingly complexhandwriting data.

CP 48-3 9/18/07 11:17 AM Page 144

Page 11: Introduction to functional data analysis. Canadian - Daniel Levitin

Introduction to Functional Data Analysis 145

AutocorrelationLet be a periodic sequence, then the auto-

correlation of the sequences is the sequence

(8)

where the final subscript is taken modulo N. The auto-correlation is simply the correlation of the sequenceagainst a time-shifted version of itself. The autocorre-lation RF(t) is defined as

(9)

Techniques do not presently exist for accuratelycalculating the correlation between two functions thatare themselves serially correlated and nonstationary(Box et al., 1994, but see Huitema & McKean, 1998for a discussion of the loosening of restrictions insome cases). In many functional datasets, successiveobservations of the dependent variable are intercorre-lated, as is often the case when the investigator is tak-ing measurements of a continuous variable that, whenchanging from one value to another, must passthrough intermediate values. Examples include neu-rophysiological measures such as heart and respira-tion rate, hormone levels or serum blood concentra-tion levels of a substance, or the movement of ahuman-controlled input device (such as a slider). Indata from brain imaging experiments, both the spatialextent and the level of the blood oxygenated leveldependent signal are intercorrelated: The likelihoodthat a given voxel will be activated in an experiment isdependent on whether an adjacent voxel is activated(spatial extent) and the amount of activation at a par-ticular time t is dependent on the activation at time t-1. Thus, in all these cases, a given observation isdependent on adjacent observations, violating theindependence assumptions in classical MVA.

Alternative windowed cross-correlation methodsfor functional data that violate the conditions of sta-tionarity are currently in development (by our labora-tories and others). Key to our approach is the settingof boundary conditions, that is, computing a correla-tion in two different ways, one of which is known tooverestimate and the other of which is known tounderestimate the true degree of relation betweentwo or more variables. Boker et al. (2002) introduceda windowed cross-correlation method that can com-pute correlations under such circumstances, and thatdoes not assume stationarity. (If we let a vector repre-sent the sequence of observations obtained in anexperiment, a subset of contiguous observations sam-pled from that vector is often called a window.) Withfunctional data, one might consider the minimum

and maximum correlation values obtained from suchmoving window analyses as boundary conditions forthe true value of the correlation between the variablesof interest. Often with functional data, there are a pri-ori reasons to choose a window of a certain size basedon a consideration that it likely contains stable or sta-tionary data due to the time course of the underlyingprocess. Haemodynamic lag, for example, suggeststhat observations within a time period of 1.5 secondsor so can be windowed for such a cross-correlationalanalysis.

BootstrappingAs with conventional statistical analyses, bootstrap-

ping (Efron, 1979; Efron & Tibshirani, 1993) can bean effective tool for providing estimates of the sam-pling distribution of an estimator by resampling withreplacement from the original sample. Two articles onbootsltrapping offer a good starting point for theinterested reader. McKnight, McKean, and Huitema(2000) introduced a method based on Freedman(1984) for bootstrapping datasets with autoregressiveproperties. Bootstrapping was recently combined withFDA to evaluate the range of forecasts of sulfur dioxidelevels near a power plant (Fernández de Castro,Guillas, & Gonzáles Manteiga, 2005).

PracticalitiesFunctional data share characteristics with other

types of data such as time series or multivariatedatasets. Yet in real-world situations, data can rarely beclassified exclusively as one type or another. Somedatasets may be unequivocally functional. In thesecases, for example, derivatives play an important rolein the analyses; data are collected at a very fine resolu-tion; the observed function is very smooth; and manyreplications of the process are available. Otherdatasets, when some of those characteristics are miss-ing, may straddle the fence between “functional” and“multivariate” (or “time-series”). Still other datasetsmay be decidedly nonfunctional. So for most datasets,a variety of tools are available for analysis. Functionaldata analysis tools are best used when there are toomany data points for a repeated measures ANOVA, forinstance, when traditional methods ignore correla-tions between datapoints, or – and perhaps mostimportantly – when the research question focuses onprecisely when in a time series (or where in a multidi-mensional psychophysical or Cartesian space) a signif-icant result occurs rather than collapsing across all theobservations for a mean or summary differencebetween conditions. FDA methods are also most help-ful when derivatives of the observed functions areimportant and physically meaningful. Note that the

CP 48-3 9/18/07 11:17 AM Page 145

Page 12: Introduction to functional data analysis. Canadian - Daniel Levitin

146 Levitin, Nuzzo, Vines, and Ramsay

previously mentioned techniques of Boker and col-leagues (2002) can also be used for these purposes.

Software for the analysis of functional data usingstandard FDA techniques, including those describedhere, is available for the R, S-PLUS, and Matlab pro-gramming languages. The functions with their docu-mentation can be downloaded from the web sitewww.functionaldata.org. These languages are easilyextensible for customized functions and moreadvanced techniques for researchers with some pro-gramming experience. New techniques and softwareare being developed and shared with otherresearchers. A background in introductory calculusand statistics is necessary for the basic use of thesetechniques; familiarity with differential equations,matrix algebra, and more advanced multivariate andnonparametric statistics would be helpful to betterunderstand the concepts. Analysis of functional datausing these techniques may require more time of theanalyst than it would for multivariate data using stan-dard techniques. Most of this extra time stems fromthe need to represent the discrete observations withfunctional curves, using the basis expansion tech-niques described earlier. This is not a fully automaticprocedure; the analyst needs to use prior knowledgeand expectations of the process to best represent thedata with smooth curves. The rest of the analysis mayfollow along straightforward lines, if, for example, afunctional linear model or functional principal com-

ponents analysis is all that is required. More compli-cated analyses may require additional programmingand time from the analyst.

Case Study: Perceived Tension in Music Over TimeVines et al. (2006) sought to understand how the

physical gestures of professional musicians might con-tribute to the perception of emotion in a musical per-formance. Twenty-nine musically trained participantssaw, heard, or both saw and heard performances of aStravinsky concerto by a clarinetist. All participantsperformed the same task: a continuous judgment ofexperienced tension in the musical performance,made by moving a slider up and down to indicate min-imum or maximum tension. The position of the sliderwas translated into a discrete value along a continuumfrom 0 to 127, and was sampled every 10 ms. Thisexample illustrates the ways in which continuous mea-sures (location and time) can be discretized for analy-sis purposes in FDA. The principal research questionof interest was how similar the rated tension experi-ence was for each sensory modality, as a way to under-stand the similarities and differences in musical ten-sion conveyed by the different sense modalities. A sub-sidiary research question concerned where in thepiece, if at all, the auditory versus visual modalitiescarried the most influence on the overall experienceof tension by an audience member.

The analyses in this section used FDA software

Figure 3. For the same four participants from Figure 1 in the audio only condi-tion, the functional objects obtained after expanding the scaled raw data with6th order B-splines as described in the text. Note the similarity to Figure 1, indi-cating the quality of the B-spline estimation.

CP 48-3 9/18/07 11:17 AM Page 146

Page 13: Introduction to functional data analysis. Canadian - Daniel Levitin

Introduction to Functional Data Analysis 147

implemented in Matlab. To begin, the raw data wereappropriately scaled by transforming them to z scores,in order to account for differential use of the sliderdevice. Each of the 30 observation records consistedof a vector of length 800, from data collected every 10milliseconds for the duration of the 80-second perfor-mance (one participant had to be excluded due toequipment failure, resulting in a total n of 29). Thevalues ranged from 0 to 127 depending on how muchof the range of the slider each participant used. Toaccount for differences in individual use of the slider,each vector was scaled by its median. That is, themedian of a given vector was subtracted from each ofthe 800 observations in that vector. This procedurecentres each participant’s ratings so that a score ofzero refers to the middle-most tension felt by the par-ticipant during the performance. Then the vectorswere scaled so that the maximum score correspondsto a value of 100 and the minimum a value of 0, soeach participant’s ratings are comparable on an inter-val of [0, 100]. Note that this scaling does not equatethe level of tension between two or more participants;rather it allows us to compare the maximum and mini-mum tensions reported by participants during theirexperience of the performance.

Functional objects were then created from the dis-crete, scaled vectors. That is, each vector of observa-tions was represented as a smooth curve. First, 150Order 6 B-splines were used as a basis expansion foreach curve. In a sense, this represented an initialreduction of data, since the 800 observations werenow summarized using 150 basis coefficients. Order 6splines were used because they ensure that the firstand second derivatives of the resulting curves wouldbe composed of Order 5 and Order 4 B-splines,respectively and thus still be smooth (DeBoor, 2001;Shikin & Plis, 1995). Then each curve was smoothedby a small amount to take out a bit of roughness.

The raw data were previously shown in Figure 1

and the functional objects created from them are nowdisplayed in Figure 3. A noticeable alignment can beseen at about the 30-second mark in the Audio-Onlygroup. This sudden rise and then drop in tension wasseen less clearly in the Audio + Video and Video-Onlygroups (not shown). It is the goal of the analysis tofind the times in the performance when the threegroups differ most strongly and to quantify what formthese differences take. To facilitate this next goal,mean functional curves across participants were creat-ed for each of the three conditions. These mean func-tional curves are shown in Figure 4. Among the firstthings we note in a visual inspection of the resultingfigure is that the auditory-only judgments track quiteclosely to the auditory + visual judgments throughoutmost of the piece. Note also that the visual compo-nent diverges from the auditory and the auditory + visu-al at certain points in the piece, notably around sec-ond numbers 18 and 34. We will return to this shortly.

Functional Principal Components AnalysisAs in a multivariate dataset, a good exploratory

technique for these curves is a principal componentsanalysis. In the functional case, the principal compo-nents are curves instead of vectors; however, the inter-pretation is very similar: The resulting principal com-ponent functions highlight the directions in which thedataset most varies. We performed a functional princi-pal components analysis (fPCA) on the entire datasetto explore the modes of variation in all the partici-pants. The principal component functions weresmoothed slightly to aid in interpretation. Then thefirst three principal components were rotated usingthe VARIMAX criterion, three having been chosenhere as in multivariate analysis on the basis of a satis-factory amount of variance having explained the com-ponents being interpretable. This procedure attemptsto transform the principal component curves so that

Figure 4. Functional mean curves (across participants) for the three conditions of theexperiment.

CP 48-3 9/18/07 11:17 AM Page 147

Page 14: Introduction to functional data analysis. Canadian - Daniel Levitin

148 Levitin, Nuzzo, Vines, and Ramsay

Figure 5. Effects of the rotated functional principalcomponents (fPC) on tension judgments. In allthree panels, the mean of participants’ tensionjudgments is shown as the black line; the purpleline shows the mean plus the principal compo-nent; the yellow line shows the mean minus theprincipal component. Top panel: The first compo-nent, accounting for 48% of the variance, indexesthe degree to which a subject’s judgments areexpanded or compressed (amplified or attenuat-ed) from the mean. Middle panel: The secondcomponent, accounting for 15% of the variance,indexes “start” effects, that is, the degree to whichparticipants’ judgments started out high or low.Bottom panel: The third component, accountingfor 14% of the variance, indexes edge effects, thatis, the degree to which the beginning and endingof the piece are seen as more or less extreme thanthe middle of the piece.

CP 48-3 9/18/07 11:17 AM Page 148

Page 15: Introduction to functional data analysis. Canadian - Daniel Levitin

Introduction to Functional Data Analysis 149

their values are either shrunk to zero or strongly posi-tive and negative. This aids in the interpretation ofthe curves and does not affect the orthogonality of thecomponents. The following graphical tool is helpfulto visualize the effects of the principal components:The mean curve for the dataset is first plotted; then asmall multiple of each principal component is addedto and subtracted from this mean curve and plottedon the same axes. Thus we can more clearly see thetimes in the performance where the participants feelstrongly differing tensions.

The effect of the first functional principal compo-nent is displayed in the top panel of Figure 5 along-side the mean of participants’ judgments. The princi-pal components in FDA are curves. We can see that theeffect of the first extracted component is that of atten-uation or amplification – we might label this compo-nent “compression/expansion,” referring to thestrength of an individual’s reactions to the underlyingtension events. Participants with a high score on thiscomponent (purple line) experience an amplifiedeffect as compared to the mean effect: a strong dropin tension at about Second 35. These participants alsoreport a sharp increase in tension at about Second 65after a period of low tension for 30 seconds. On theother hand, participants with a low score on this com-ponent (yellow line) report a more attenuated experi-ence: The drop in tension at Second 30 is not asnoticeable, and the tension remains high for the next35 seconds and beyond.

The effect of the second principal component isdisplayed in the middle panel of Figure 5. It appearsthat this component identifies the “start effects” of theparticipants’ experiences. A high score on this compo-nent indicates a participant who reports a high levelof tension in the beginning of the piece but thenexperiences typical tension throughout the rest. Aparticipant with a low score on this component startswith a level of tension lower than average.

The effect of the third principal component is dis-played in the bottom panel of Figure 5. This appearsto identify the “edge effects” of the ratings.Participants with a high score on this component willbegin with a high rating of tension, immediately com-pensate with a lower rating, and then conclude thepiece with a high rating again. Those with a low scorewill begin with a low rating and also conclude with alow rating.

Functional Linear ModelA linear model of the tension ratings was built

from the functional data to better quantify the effectssuggested by the principal components analysis.Recall that in the musical tension study, participants

were assigned to one of three conditions in a between-participants design: Audio + Video (the natural condi-tion, in which the performance was both heard andseen), Audio Only (the sound was on but the imagewas turned off), and Visual Only (the sound was offbut the image was turned on). The predictors werescalars: dummy variables indicating the group (or con-dition) membership of the participant producingeach tension rating. Since the responses were func-tional objects, the resulting regression coefficientswere also curves rather than scalars, and they allow usto determine where in the performance the experi-mental condition most affects the tension ratings. Thecoefficient curves were smoothed slightly to aid ininterpretation.

The linear model can be written as

yi(t) = β0(t) + β1(t) ARi + β2(t) VRi + εi(t). (10)

Here the dummy variables AR and VR are such thatARi = 0 if the ith participant experienced the perfor-mance with the audio removed (Video Only), and 1otherwise, and VRi = 0 if the ith participant experi-enced the performance with the video removed(Audio Only), and 1 otherwise. So an average tensionrating for the participants in the Audio-Only groupwould be β0(t) + β2(t); for those in the Video-Onlygroup it would be β0(t) + β1(t); and for those in theAudio + Video group it would be β0. Notice that thislinear model is built with the baseline effect being thepresence of both audio and video rather than theabsence of both. Furthermore, the coefficients referto the effects of the removal of sensory inputs ratherthan the presence of them. This is because the condi-tion of no audio and no video could not be measuredeasily, so the standard of reference must be the “natur-al” condition of Audio + Video, and changes to thiscondition measured in subtraction rather than addi-tion.

The coefficient curves for this linear model areplotted in Figure 6. The baseline curve of the naturalcondition (blue line) shows a moderate drop in ten-sion at about Second 30, as expected, with a subse-quent increase in tension at Second 65 and beyond.The “video removed” coefficient (red line) shows theeffect of removing the video component of the perfor-mance. As in a scalar regression coefficient, valuesnear zero indicate a smaller effect of the predictor,and strongly positive or negative values point to a larg-er effect. From this we can see that, compared to the“nothing removed” (Audio + Video) condition, the“video removed” (Audio Only) group does not experi-ence much difference in tension from the beginningof the performance until about Second 30. From this

CP 48-3 9/18/07 11:17 AM Page 149

Page 16: Introduction to functional data analysis. Canadian - Daniel Levitin

150 Levitin, Nuzzo, Vines, and Ramsay

Figure 6. Coefficient curves for the functional linear model of tension. Thesecurves allow one to visualize the effects of removing one of the sensory channels(audio or video) from the “natural” condition of audio + video.

Figure 7. The fitted curves from the linear model for the three experimentalgroups, averaged across participants.

CP 48-3 9/18/07 11:17 AM Page 150

Page 17: Introduction to functional data analysis. Canadian - Daniel Levitin

Introduction to Functional Data Analysis 151

point until the end of the piece, the effect of remov-ing the video is to decrease the tension as comparedto the baseline group. This effect is moderately low,meaning that the Audio-Only group does not differgreatly from the Audio + Video group throughout thepiece. On the other hand, the coefficient for the“audio removed” (Visual-Only) component (greenline) shows a stronger effect. Areas of greater changein tension are seen specifically at about Second 5,Second 10, Second 20, Second 30, and especially fromSecond 35 to Second 65. A sharp decrease in tensionat Second 30, reported by participants experiencingthe audio component, was not seen by participants inthe Video-Only group. And their subsequent experi-ence from Second 35 to Second 65 was not one of sub-stantially lowered tension, as it was with the Audio-Only and Audio + Video groups.

Fitted values from the linear model for the threeexperimental groups are plotted in Figure 7. Here thenet effect of the regression coefficient curves can bemore clearly seen. The Audio-Only group (red line)showed judgments that were quite similar to theAudio + Video (green line) group for the duration ofthe performance, differing only in intensity at thepeak at Second 30 and during the periods of lull andclosure from Second 35 to Second 75.

The Video-Only group (purple line), on the otherhand, shows a markedly different pattern of tensionthroughout the piece. Most noticeable are the attenu-ated peak at Second 30 and the report of average ten-sion from Second 35 to Second 65. During this latterspan of 30 seconds, the visual signal is evidentlyresponsible for higher tension ratings than the audioor audio + visual. This led the experimenters to re-examine the original performance in an effort tounderstand why this might be so. The point at whichthe visual information first breaks apart most clearlyfrom the other channels (around Second 32) coin-

cides with a long rest in the piece. The performerdraws the clarinet away from his mouth, relaxes hisentire body for one second or so, and then takes adeep breath in preparation for playing again. There isno evidence of this activity in the audio-only channel,but in the visual channel it is quite obvious that theperformer is making preparatory gestures in anticipa-tion of playing. This gesture serves as a cue to theviewer that something is imminent, and may underliethe ratings of increased tension at this point. Why theratings in the visual condition stay uniquely high forthe next 30 seconds – or why the audio + video partici-pants did not respond similarly to this visual cue –remain topics for further investigation.

Functional Analysis of Variance (fANOVA)A functional analysis of variance (fANOVA) can be

performed on the regression curves derived in theprevious section. Here, for the sake of simplicity, wehave chosen to conduct a one-way fANOVA with pre-sentation condition as the between participants factor,and we have analyzed two conditions: the audio-video(AV) and the audio only (AO). H0 is that these twocurves are the same, H1 is that they are different. Ourresearch question asks if the addition of visual infor-mation to the auditory information makes a signifi-cant impact on the experience of participants, and ifso, how much? Perhaps most interestingly, we are nowin a position to ask when in the performance these dif-ferences are significant.

Referring to Figure 8, the y-axis shows values of theF function, and the dashed line represents the .05level of significance, at F = 4.4 (for df = 2,18). FDA toolscalculated this F-value at every point in time; when theF-value is above 4.4, the difference between the AV andthe AO groups is significant at .05. (At present, suchtools are rather primitive and do not take intoaccount family-wise error rates, nor do they have an

Figure 8. Comparing means for the Audio + Visual and Audio-Only groups. Thedashed line represents the F-value for the hypothesis that the two group meansare different with alpha=.05. When the F-value rises above the dotted line, thedifference between groups is significant.

CP 48-3 9/18/07 11:17 AM Page 151

Page 18: Introduction to functional data analysis. Canadian - Daniel Levitin

152 Levitin, Nuzzo, Vines, and Ramsay

easy means of determining power. These issues areunder development.) The presence or absence ofvisual stimuli differentiates the experience for the twosubject groups, so the visual component (or its inter-action with the sound) is assumed responsible for anydifferences between the curves. We see clearly thatthere exists a significant section that lasts from 51.3seconds to 53.8 seconds. The researchers closelyexamined the frames of the video corresponding tothis section and discovered that the music and theperformer’s gestures are in some sense at odds withone another, which may account for the statisticallysignificant difference between AV and AO here.Musically, the piece becomes very soft. Visually, theperformer takes on an intense expression in his facewith his eyebrows raised, and his body moves in a ris-ing motion even beyond the sounding of notes, dur-ing rests. All of this visual activity contrasts with thesonic content, which conveys a sense of total calm andquiet. The performer’s body movements and facialexpressions clearly indicate that he had a highly emo-tional experience during this portion of the perfor-mance. This high level of emotion was conveyed tothe AV participants via the visual modality. The affec-tive response of the AV group, as indicated by the ten-sion judgment, maintained a significantly greatermagnitude than that of the AO group because of thevisual content in the performance. This is a subset ofthe period of time during which participants in thevisual-only condition were also showing high amountsof tension.

Summary of FindingsThe auditory and visual components of this partic-

ular performance conveyed some of the same experi-ence of tension, during particular portions of thepiece. It has long been suggested in theories of artand aesthetics that the gestures made by an artist reflectemotional intent, even when the gestures themselves arenot the primary object of evaluation by the perceiver.For example, in Japanese ink paintings, the gesturesused by the painter and the resulting brush strokesare believed to convey the emotional state of the artistat the precise moment of the gestures (Davey, 1999).The data here provide evidence that the gestures andmovements of a performing musician – even in theabsence of sound – are conveying some of the sameinformation as the sound itself in many cases. In othercases, as we have seen, the visual channel is providingcomplementary and different information than theaudio channel. Much remains to be studied in thisdomain. We have included this example both becauseof its novelty and because it illustrates that certaininterpretations of a dataset are made easier with FDA.

Other ExamplesThere are many other examples of functional data

to be found in psychology and many of the techniquesjust described would be applicable in these cases aswell. Many studies in psychology, electrical engineer-ing, and linguistics concern temporal synchrony/asyn-chrony. For example, Cummins (2003) studied speechsynchronization: Native speakers of English wereasked to simultaneously read an unfamiliar passage oftext, and the degree to which their utterances weretemporally synchronized was studied. Of interest waswhether or not the participants improved with prac-tice. Replications existed at the level of participants (n= 27) and at the level of trials (each participant readfive text excerpts). For each paired reading (n = 27 ×5 = 135), the median asynchrony, in milliseconds, wascomputed at three temporal positions: the onset,medial, and final vowel onsets. FDA would permit theresearcher to construct a synchrony curve for eachparticipant pair – or for the participants all takentogether – and to analyze the first versus last trial, orthe five trials together, without having to conduct testsat only three time points. FDA thus allows theresearcher to use all of the information available – notjust those at certain time points. This could reveal anylatent information about when during the trials partic-ipants were maximally or minimally synchronized, andan analysis of derivatives could reveal informationabout the manner in which participants “sped up” or“slowed down” in order to achieve synchrony. Thesemicro-variations in time synchrony could inform cog-nitive theories of time-keeping and central clockmechanisms.

In studies of motor planning and coordination,tapping tasks are often employed (Rosenbaum,Kenny, & Derr, 1983; Schulze, Lüders, & Jäncke,2002). Participants are asked to tap their fingerseither to a presented sequence, or to their own “pre-ferred tapping rate.” Again, FDA allows the researcherto move beyond summary statistics that collapse overthe time period of the experiment, to more closelyanalyze effects as a function of time, as well as the deriv-atives of any timing and motor responses obtained.

Evoked Response Potentials (ERPs) are collected byplacing electrodes on the surface of the scalp, andelectrical activity is measured in response to variousstimuli or tasks the subject is asked to complete. Insuch studies, the ERP signal is typically collected from32, 64 or 128 channels distributed around the skull,and it is of interest to understand how the ERP signalvaries across time and across sites. Somewhat similarly,positron emission tomography (PET) and functionalmagnetic resonance imaging (fMRI) track the blood-oxygenated level-dependent signal in the brain, and

CP 48-3 9/18/07 11:17 AM Page 152

Page 19: Introduction to functional data analysis. Canadian - Daniel Levitin

Introduction to Functional Data Analysis 153

reveal brain regions and neural networks that are acti-vated during particular cognitive tasks. These studiesproduce millions of datapoints every few seconds, andthe points are correlated both in space (a group ofneurons is likely to be active if an adjacent group ofneurons is) and in time (neurons active at any time tare likely to still be active at t + 1). These data thusconsist of highly intercorrelated observations, and aremost often analyzed using statistical methods thatattempt to correct for multiple comparisons and suchintercorrelations (e.g. Friston et al., 1995;http://www.fil.ion.ucl.ac.uk/spm/; http://www.mrc-cbu.cam.ac.uk/Imaging/Common/). FDA techniquesare beginning to be applied to neuroimaging data(Beckman & Smith, 2004; Valdes-Sosa, 2004; Viviani,Grön, & Spitzer, 2003).

Other recent applications of FDA include the analy-sis of esophageal bolus flow (Stier, Stein, Schwaiger, &Heidecke, 2004) and analysis of gene expressionarrays (Barra, 2004).

ConclusionsWe described a number of places in the behaviour-

al and life sciences research in which functional datamay be found, and the tools we have outlined herecould potentially reveal relations not as easily seenwith conventional statistical approaches. In manycases, FDA will allow the researcher to ask questionsthat either could not be asked, or would be computa-tionally cumbersome to ask, using traditional statisti-cal methods. In this respect, FDA represents both nov-elty in analysis and parsimony in execution. Moreover,functional models facilitate a visual representation ofthe data, and thus provide explanatory power.

We illustrated key aspects of FDA using real datafrom a study of musical emotion. Note that in thisexample, as in most functional datasets, several keyfeatures were present:• The data were drawn from a continuous measure

(position of a slider that theoretically varied contin-uously both over its range and over time);

• We can assume that the process generating the datais a smooth, continuous one;

• The dataset contained replications (10 participantsin each of three conditions);

• The curves are differentiable, and the derivativesare a potential source of additional informationabout the nature of the data; and

• Each functional data curve constitutes an observa-tion (rather than the individual points that makeup the curve being considered observations).

We showed that an advantage of employing FDA fordatasets such as this one is that we can exploit thesefeatures to reveal information not otherwise attainablein the data. In particular, amplitude and phase vari-ability in the curves can be examined or eliminated(depending on the researcher’s hypothesis), curveregistration allows us to view and analyze the underly-ing process of interest, and derivates of the curves canbe analyzed to reveal dynamic aspects of the process.In addition, FDA contains methods for constructingcorrelations and Principal Components that are them-selves curves, and thus allow us to track when in thetime series (or where in a spatial series) any significantdifferences between and among curves exist. In con-structing functional F-tests, we can avoid the problemsof multiple comparisons inherent in treating eachpoint on the curve as an observation, by creating F-function curves that relate directly to the dynamiccharacter of the data.

FDA comprises a set of rapidly evolving tools. FDAcomprises a set of rapidly evolving tools. A good dealof this research energy goes into refining existingmethods such as the functional linear model and prin-cipal components analysis, but we tend to see twoproblems specific to FDA that are attracting particularattention. The first of these is the simultaneous model-ing of amplitude and phase variation, as opposed tothe procedure described in this paper of using regis-tration methods to remove phase variation beforemodeling amplitude variation. This integratedapproach seems important because amplitude andphase can interact in interesting ways, and alsobecause independent variables that affect amplitudevariation can also affect phase as well. This is obviouslythe case for the effect of gender on growth curves, forexample. A second general trend is toward usingdynamic models, often expressed in terms of systems ofdifferential or difference equations, to study rates ofchange directly. A good sample of current work of thisnature can be found in Walls and Schafer (2006). Weprovided links to a web site which is constantly updat-ed so that the reader can access software tools to per-form his or her own functional data analyses.

This paper was supported by grants from NSERC toDJL and JOR.

Correspondence concerning this article should beaddressed to Daniel J. Levitin, Department of Psychology,McGill University, 1205 Avenue Penfield, Montréal,Québec H3A 1B1 (Phone: (514) 398-8263; Fax: (514) 398-4896; E-mail: [email protected]).

CP 48-3 9/18/07 11:17 AM Page 153

Page 20: Introduction to functional data analysis. Canadian - Daniel Levitin

154 Levitin, Nuzzo, Vines, and Ramsay

RésuméLes psychologues et spécialistes du comportement col-lectent de plus en plus des données tirées des processussous-jacents continus. Dans le présent article, nousdécrivons un ensemble de méthodes quantitatives,l’analyse de données fonctionnelle, qui peutrépondre à de nombreuses questions auxquelles lesapproches statistiques traditionnelles ne peuvent répon-dre. Ces méthodes sont utiles pour l’analyse de donnéescommunes en psychologie expérimentale, notamment lesdonnées visant des séries temporelles, les mesuresrépétées et les données s’échelonnant dans le temps oul’espace comme c’est le cas pour les expériences en neuro-imagerie. Le principal avantage de l’analyse de donnéesfonctionnelle est qu’elle permet au chercheur d’examiner,lors de l’analyse de séries temporelles, quand des dif-férences peuvent-elles exister entre deux ou plusieursensembles. Notre discussion traite des corrélations fonc-tionnelles, des composantes principales, des dérivés descourbes fonctionnelles et de l’analyse des modèles de variances.

References

Barra, V. (2004). Analysis of gene expression data usingfunctional principal components. Computer Methods andPrograms in Biomedicine, 75, 1-9.

Beckmann, C. F., & Smith, S. A. (2004). Probabilistic inde-pendent component analysis for functional magneticresonance imaging. IEEE Transactions on MedicalImaging, 23, 137-152.

Boker, S. M., Xu, M., Rotondo, J. L., & King, K. (2002).Windowed cross-correlation and peak picking for theanalysis of variability in the association between behav-ioral time series. Psychological Methods, 7:1, 338-355.

Box, J. E. P., Jenkins, G. M., & Reinsel, G. C. (1994). Timeseries analysis: Forecasting and control. Englewood Cliffs,NJ: Prentice-Hall.

Cleveland, W. S. (1993). Visualizing data. Murray Hill, NJ:AT&T Bell Laboratories; Summit, NJ: Hobart Press.

Cudeck, R. (1996). Mixed effects models in the study ofindividual differences with repeated measures data.Multivariate Behavioral Research, 31:3, 371-403.

Cummins, F. (2003). Practice and performance in speechproduced synchronously. Journal of Phonetics, 31, 139-148.

Davey, H. E. (1999). Brush meditation: A Japanese way tomind & body harmony. St. Paul, MN: Stone Bridge Press.

De Boor, C. (2001). A practical guide to splines, revised edition.Berlin: Springer-Verlag.

Efron, B. (1979). Bootstrap methods: Another look at the

jackknife. The Annals of Statistics, 7, 1-26.Efron, B., & Tibshirani, R. J. (1993). An introduction to the

bootstrap. New York: Chapman and Hall.Fan, J., & Gijbels, I. (1996). Local polynomial modeling and its

applications. London: Chapman and Hall.Fernández de Castro, B., Guillas, S., & Gonzáles Manteiga,

W. (2005). Functional samples and bootstrap for pre-dicting sulfur dioxide levels. Technometrics, 47(2), 212-222.

Freedman, D. (1984). On bootstrapping two-stage leastsquares estimates in regression problems. Annals ofStatistics, 12, 827-842.

Friston, K. J., Holmes, A. P., Worsley, K. J., Poline, J. B.,Frith, C. D., & Frackowiak, R. S. J. (1995). Statisticalparametric maps in functional imaging: A general lin-ear approach. Human Brain Mapping, 2, 189-210.

Gauss, C. F. (1809/1963). Theory of the motion of the heavenlybodies moving about the sun in conic sections. New York:Dover Publications.

Green, P. J., & Silverman, B.W. (1994). Nonparametric regres-sion and generalized linear models. London: Chapman andHall

Guo, W. (2002). Functional mixed effects models.Biometrics, 58, 121-128.

Guo, W. (2003). Functional data analysis in longitudinalsettings using smoothing splines. Statistical Methods inMedical Research, 13, 1-14.

Hendry, D., & Juselius, K. (2000). Explaining cointegra-tion analysis: Part I. Energy Journal, 21, 1-42.

Huitema, B. E., & McKean, J. W. (1998). Irrelevant auto-correlation in least-squares intervention models.Psychological Methods, 3(1), 104-116.

Kneip, A., Li, X., MacGibbon, B., & Ramsay, J. (2000).Curve registration by local regression. Canadian Journalof Statistics, 28, 19-29.

Koenker, R., & Mizera, I. (2004). Penalized triograms:Total variation regularization for bivarite smoothing.Journal of the Royal Statistical Society Series B, 66, Part 1,145-163.

Krumhansl, C. K. (1990). Cognitive foundations of musicalpitch. New York: Oxford University Press.

McAdams, S., Winsberg, S., Donnadieu, S., & De Soeta, G.(1995). Perceptual scaling of synthesized musical tim-bres: Common dimensions, specificities, and latent sub-ject classes. Psychological Research, 58, 177-192.

McKnight, S. D., McKean, J. W., & Huitema, B. E. (2000).A double bootstrap method to analyze linear modelswith autoregressive error terms. Psychological Methods,5(1), 87-101.

Nuzzo, R. L. (2002) Stochastic models for lipid biochemistry.Unpublished doctoral dissertation, Stanford University,Palo Alto, CA. (DAI-B, 64/03, UMI #AAT 3085218).

Nyquist, H. (1928). Certain topics in telegraph transmis-sion theory. Transactions of the American Institute of

CP 48-3 9/18/07 11:17 AM Page 154

Page 21: Introduction to functional data analysis. Canadian - Daniel Levitin

Introduction to Functional Data Analysis 155

Electrical Engineers, 47, 617-644.Pinheiro, J. C., & Bates, D. M. (2000). Mixed-effects models in

S and S-PLUS. New York: Springer. Plomp, R. (1970). Timbre as a multidimensional attribute

of complex tones. In R. Plomp & G. F. Smoorenburg,(Eds.), Frequency analysis and periodicity detection in hear-ing (pp. 397-414). New York: Sijthoff, Leiden.

Ramsay, J. O. (1995). A similarity-based smoothingapproach to nondimensional item analysis.Psychometrika, 60, 323-339.

Ramsay, J. O. (1996). Principal differential analysis: Datareduction by differential operators. Journal of the RoyalStatistical Society, Series B, 58, 495-508.

Ramsay, J. O. (2000). Functional components of variationsin handwriting. Journal of the American StatisticalAssociation, 95, 9-15.

Ramsay, J. O. (2002). Multilevel modeling of longitudinaland functional data. In D. S. Moskowitz & S. L.Hershberger (Eds.), Modeling intraindiviudal variabilitywith repeated measures data. Mahwah, NJ: LawrenceErlbaum Associates.

Ramsay, J., & Dalzell, C. (1991). Some tools for functionaldata analysis (with discussion). Journal of the RoyalStatistical Society, Series B, 53, 539-572

Ramsay, J. O., & Li, X. (1998). Curve registration. Journalof the Royal Statistical Society, Series B, 60, 351-363

Ramsay, J. O., & Silverman, B. W. (2002). Applied functionaldata analysis: Methods and case studies. New York:Springer-Verlag.

Ramsay, J. O., & Silverman, B. W. (2005). Functional dataanalysis, second edition. New York: Springer-Verlag.

Rosenbaum, D. A., Kenny, S., & Derr, M. A. (1983).Hierarchical control of rapid movement sequences.Journal of Experimental Psychology: Human Perception andPerformance, 9, 86-102.

Schulze, K., Lüders, E., & Jäncke, L. (2002). Intermanualtransfer in a simple motor task. Cortex, 38, 805-815.

Shannon, C. E. (1949). Communication in the presence ofnoise. Proceedings of the Institute of Radio Engineers, 37, 10-21.

Shao, X. M., & Chen, P. X. (1987). Normalized auto- andcross-covariance functions for neuronal spike trainanalysis. International Journal of Neuroscience, 34, 85-95.

Shepard, R. N. (1972). Psychological representation ofspeech sounds. In E. E. David & P. B. Denes (Eds.),

Human communication: A unified view. New York:McGraw-Hill.

Shepard, R. N. (1982). Structural representations of musi-cal pitch. In D. Deutsch (Ed.), Psychology of music (pp.343-390). San Diego, CA: Academic Press.

Shikin, E. V., & Plis, A. I. (1995). Handbook on splines for theuser. Boca Raton, FL: CRC Press.

Simonoff, J. S. (1996). Smoothing methods in statistics. NewYork: Springer.

Sinnott, J. M., Brown, C. H., Malik, W. T., & Kressley, R. A.(1997). A multidimensional scaling analysis of voweldiscrimination in humans and monkeys. Perception &Psychophysics, 59, 1214-1224.

Stier, A. W., Stein, H. J., Schwaiger, M., & Heidecke, C. D.(2004). Modeling of esophageal bolus flow by function-al data analysis of scintigrams. Diseases of the Esophagus,17, 51-57.

Tufte, E. (1983). The visual display of quantitativeinformation. Chesire, CT: Graphics Press.

Tukey, J. (1977). Exploratory data analysis. Reading, MA:Addison-Wesley.

Unser, M. (1999). Splines: A perfect fit for signal andimage processing. IEEE Signal Processing, 16, 22-38.

Valdes-Sosa, P. A. (2004). Spatio-temporal autoregressivemodels defined over brain manifolds. Neuroinformatics,2, 239-250.

Vidakovic, B., & Ruggeri, F. (2001). BAMS method:Theory and simulations. The Indian Journal of Statistics.Special Issue on Wavelet Methods. 63, 234-249.

Vines, B. W., Nuzzo, R. L., Krumhansl, C. L., Wanderley,M. M., & Levitin, D. J. (2006). Crossmodal interactionsin the perception of musical affect and structure: Theimpact of seeing a clarinetist perform. Cognition, 101,80-113.

Vines, B. W., Nuzzo, R. L., & Levitin, D. J. (2005).Analyzing temporal dynamics in music: Differential cal-culus, physics and functional data analysis techniques.Music Perception, 23:2, 137-152.

Viviani, R., Grön, G., & Spitzer, M. (June 2003). Functionalprincipal component analysis as an explorative tool for fMRIdata. Poster #970 presented at the Human BrainMapping conference, New York.

Walls, T. A., & Schafer, J. L. (Eds). (2006). Models for inten-sive longitudinal data. Oxford: Oxford University Press.

CP 48-3 9/18/07 11:17 AM Page 155