IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO...

14
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 10, OCTOBER 2008 1397 Bayesian Tensor Approach for 3-D Face Modeling Dacheng Tao, Member, IEEE, Mingli Song, Member, IEEE, Xuelong Li, Senior Member, IEEE, Jialie Shen, Jimeng Sun, Xindong Wu, Senior Member, IEEE, Christos Faloutsos, and Stephen J. Maybank, Senior Member, IEEE Abstract—Effectively modeling a collection of three-dimen- sional (3-D) faces is an important task in various applications, especially facial expression-driven ones, e.g., expression genera- tion, retargeting, and synthesis. These 3-D faces naturally form a set of second-order tensors—one modality for identity and the other for expression. The number of these second-order tensors is three times of that of the vertices for 3-D face modeling. As for algorithms, Bayesian data modeling, which is a natural data analysis tool, has been widely applied with great success; however, it works only for vector data. Therefore, there is a gap between tensor-based representation and vector-based data analysis tools. Aiming at bridging this gap and generalizing conventional statistical tools over tensors, this paper proposes a decoupled probabilistic algorithm, which is named Bayesian tensor analysis (BTA). Theoretically, BTA can automatically and suitably deter- mine dimensionality for different modalities of tensor data. With BTA, a collection of 3-D faces can be well modeled. Empirical studies on expression retargeting also justify the advantages of BTA. Index Terms—Bayesian inference, Bayesian tensor analysis, face expression synthesis, 3-D face. I. INTRODUCTION T HREE-DIMENSIONAL (3-D) facial expression-driven applications have become increasingly popular nowa- days. This is mainly because of a wide range of requirements in film production and many challenges in producing realistic animated 3-D faces with different expressions. The objective Manuscript received June 20, 2007; revised September 19, 2007 and November 15, 2007. First published September 26, 2008; current version published October 24, 2008. This material is based upon work supported by the National Natural Science Foundation of China under Grant 60703037 and the National Science Foundation under Grants No. IIS-0705359. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of both the National Natural Science Foundation of China and the National Science Foundation. This paper was recommended by Associate Editor D. Schonfeld. D. Tao is with the School of Computer Engineering in the Nanyang Tech- nological University, 50 Nanyang Avenue, Singapore 639798 (e-mail: dacheng. [email protected]). M. Song is with the Microsoft Visual Perception Laboratory, College of Computer Science, Zhejiang University, Hangzhou 310027, China (e-mail: [email protected]). X. Li and S. J. Maybank are with the School of Computer Science and Infor- mation Systems, Birkbeck College, University of London, London WC1E 7HX, U.K. (e-mail: [email protected]; [email protected]). J. Shen is with the School of Information Systems, Singapore Management University, Singapore (e-mail: [email protected]). J. Sun is with the IBM T.J. Watson Laboratory, Hawthorne, NY 10523 USA (e-mail: [email protected]). X. Wu is with the Department of Computer Science, University of Vermont, Burlington, VT 05405 USA (e-mail: [email protected]). C. Faloutsos is with the Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213-3891 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCSVT.2008.2002825 of these types of applications is to generate expressive and plausible animations of a 3-D face, e.g., in point cloud format, based on a number of 3-D training faces and specific appli- cation requirements. For example, in 3-D facial expression retargeting, apart from a number of 3-D training faces, we also require users to provide two source 3-D face data in two different expressions, e.g., neutral and smiling, and a target 3-D face in a specific expression, e.g., neutral. The objective of retargeting is mapping the second expression (smile here) of the source 3-D face to the target 3-D face based on all provided 3-D faces. Since the 1970s, plenty of works have been done in the 3-D facial animation field, and a good summary can be found in [19]. Beginning with Parke’s work [17], [18], researchers have tried to obtain realistic 3-D facial deformation with parame- terized model. Meanwhile, performance-driven-based methods [20] achieved results with good quality. However, neither of them are satisfying. The former tries to simulate the physical de- formation in a face, but it fails to mimic details, e.g., the wrinkle. The latter requires a huge number of human interactions. More- over, both of them are specific but not generic. Therefore, a fa- cial action coding system (FACS) [7] and facial animation pa- rameters (FAPs) [9] were developed to define more generic fa- cial animation parameters, which are applied to synthesize facial expressions flexibly. However, they fail to give subtle expres- sion details on the target face because the input vertices are re- quired to be sparse. For high-resolution data, Noh and Neumann [15] applied the source expression onto the target face by con- sidering the transformation of each vertex. However, it cannot work well when motions between neighboring vertices are not consistent. Sumner et al. [21] transferred the deformation from the source to the target based on 3-D triangle meshes, but it can only work on triangle meshes. Fu and Zheng [8] developed an appearance-based photorealistic model for face rendering and Zeng et al. [38] proposed spontaneous emotional facial expres- sion detection. Recently, Vlasic et al. [29] applied multilinear algebra to model a collection of 3-D face data in point cloud format with different identities and expressions. This method performs the facial expression transfer flexibly. In this paper, we start from Bayesian tensor analysis to model a collection of 3-D faces in a different point of view in com- paring with the multilinear model for face transfer [29], which is not a full probabilistic tensor explanation. In detail, Vlasic et al. [29] applied the probabilistic principal component anal- ysis (PPCA) to form two projection matrices for two modalities (one for identity and another one for expression) independently over a collection of vectors, which are obtained by unfolding a third-order tensor. The third-order tensor is constructed by a col- lection of 3-D face data with different identities and different ex- 1051-8215/$25.00 © 2008 IEEE

Transcript of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO...

Page 1: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO ...static.tongtianta.site/paper_pdf/43a5f624-5f6b-11e9-bfe5-00163e08b… · a set of second-order tensors—one modality for identity

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 10, OCTOBER 2008 1397

Bayesian Tensor Approach for 3-D Face ModelingDacheng Tao, Member, IEEE, Mingli Song, Member, IEEE, Xuelong Li, Senior Member, IEEE, Jialie Shen,

Jimeng Sun, Xindong Wu, Senior Member, IEEE, Christos Faloutsos, and Stephen J. Maybank, Senior Member, IEEE

Abstract—Effectively modeling a collection of three-dimen-sional (3-D) faces is an important task in various applications,especially facial expression-driven ones, e.g., expression genera-tion, retargeting, and synthesis. These 3-D faces naturally forma set of second-order tensors—one modality for identity and theother for expression. The number of these second-order tensorsis three times of that of the vertices for 3-D face modeling. Asfor algorithms, Bayesian data modeling, which is a natural dataanalysis tool, has been widely applied with great success; however,it works only for vector data. Therefore, there is a gap betweentensor-based representation and vector-based data analysistools. Aiming at bridging this gap and generalizing conventionalstatistical tools over tensors, this paper proposes a decoupledprobabilistic algorithm, which is named Bayesian tensor analysis(BTA). Theoretically, BTA can automatically and suitably deter-mine dimensionality for different modalities of tensor data. WithBTA, a collection of 3-D faces can be well modeled. Empiricalstudies on expression retargeting also justify the advantages ofBTA.

Index Terms—Bayesian inference, Bayesian tensor analysis, faceexpression synthesis, 3-D face.

I. INTRODUCTION

T HREE-DIMENSIONAL (3-D) facial expression-drivenapplications have become increasingly popular nowa-

days. This is mainly because of a wide range of requirementsin film production and many challenges in producing realisticanimated 3-D faces with different expressions. The objective

Manuscript received June 20, 2007; revised September 19, 2007 andNovember 15, 2007. First published September 26, 2008; current versionpublished October 24, 2008. This material is based upon work supported bythe National Natural Science Foundation of China under Grant 60703037 andthe National Science Foundation under Grants No. IIS-0705359. Any opinions,findings, and conclusions or recommendations expressed in this materialare those of the author(s) and do not necessarily reflect the views of boththe National Natural Science Foundation of China and the National ScienceFoundation. This paper was recommended by Associate Editor D. Schonfeld.

D. Tao is with the School of Computer Engineering in the Nanyang Tech-nological University, 50 Nanyang Avenue, Singapore 639798 (e-mail: [email protected]).

M. Song is with the Microsoft Visual Perception Laboratory, College ofComputer Science, Zhejiang University, Hangzhou 310027, China (e-mail:[email protected]).

X. Li and S. J. Maybank are with the School of Computer Science and Infor-mation Systems, Birkbeck College, University of London, London WC1E 7HX,U.K. (e-mail: [email protected]; [email protected]).

J. Shen is with the School of Information Systems, Singapore ManagementUniversity, Singapore (e-mail: [email protected]).

J. Sun is with the IBM T.J. Watson Laboratory, Hawthorne, NY 10523 USA(e-mail: [email protected]).

X. Wu is with the Department of Computer Science, University of Vermont,Burlington, VT 05405 USA (e-mail: [email protected]).

C. Faloutsos is with the Department of Computer Science, Carnegie MellonUniversity, Pittsburgh, PA 15213-3891 USA (e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSVT.2008.2002825

of these types of applications is to generate expressive andplausible animations of a 3-D face, e.g., in point cloud format,based on a number of 3-D training faces and specific appli-cation requirements. For example, in 3-D facial expressionretargeting, apart from a number of 3-D training faces, wealso require users to provide two source 3-D face data in twodifferent expressions, e.g., neutral and smiling, and a target3-D face in a specific expression, e.g., neutral. The objectiveof retargeting is mapping the second expression (smile here) ofthe source 3-D face to the target 3-D face based on all provided3-D faces.

Since the 1970s, plenty of works have been done in the 3-Dfacial animation field, and a good summary can be found in[19]. Beginning with Parke’s work [17], [18], researchers havetried to obtain realistic 3-D facial deformation with parame-terized model. Meanwhile, performance-driven-based methods[20] achieved results with good quality. However, neither ofthem are satisfying. The former tries to simulate the physical de-formation in a face, but it fails to mimic details, e.g., the wrinkle.The latter requires a huge number of human interactions. More-over, both of them are specific but not generic. Therefore, a fa-cial action coding system (FACS) [7] and facial animation pa-rameters (FAPs) [9] were developed to define more generic fa-cial animation parameters, which are applied to synthesize facialexpressions flexibly. However, they fail to give subtle expres-sion details on the target face because the input vertices are re-quired to be sparse. For high-resolution data, Noh and Neumann[15] applied the source expression onto the target face by con-sidering the transformation of each vertex. However, it cannotwork well when motions between neighboring vertices are notconsistent. Sumner et al. [21] transferred the deformation fromthe source to the target based on 3-D triangle meshes, but it canonly work on triangle meshes. Fu and Zheng [8] developed anappearance-based photorealistic model for face rendering andZeng et al. [38] proposed spontaneous emotional facial expres-sion detection. Recently, Vlasic et al. [29] applied multilinearalgebra to model a collection of 3-D face data in point cloudformat with different identities and expressions. This methodperforms the facial expression transfer flexibly.

In this paper, we start from Bayesian tensor analysis to modela collection of 3-D faces in a different point of view in com-paring with the multilinear model for face transfer [29], whichis not a full probabilistic tensor explanation. In detail, Vlasicet al. [29] applied the probabilistic principal component anal-ysis (PPCA) to form two projection matrices for two modalities(one for identity and another one for expression) independentlyover a collection of vectors, which are obtained by unfolding athird-order tensor. The third-order tensor is constructed by a col-lection of 3-D face data with different identities and different ex-

1051-8215/$25.00 © 2008 IEEE

Page 2: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO ...static.tongtianta.site/paper_pdf/43a5f624-5f6b-11e9-bfe5-00163e08b… · a set of second-order tensors—one modality for identity

1398 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 10, OCTOBER 2008

pressions. The form of data is tensor but the modeling method-ology is the vector-based PPCA. Different from that work, westudy tensor form data directly and construct a new model forunsupervised learning. Not only does the model work for 3-Dface modeling, but it can be applied for applications in computervision and machine learning to data mining as well. The only re-quirement is that data are in tensor form, which always happensnaturally, e.g., gray-level face image in biometric, color imagein scene classification, color video shot in multimedia informa-tion management, TCP flow records in computer networks, andDBLP bibliography in data mining.

In recent years, a large number of learning models has beendeveloped for tensors from unsupervised learning to super-vised learning, e.g., high-order singular value decomposition[12], [1], [2], n-mode component analysis or tensor principalcomponent analysis (TPCA) [12], [28], [22], three-mode dataprincipal component analysis [11], tucker decomposition [27],tensorface [28], generalized low-rank approximations of matrix[37], 2-D linear discriminant analysis [36], dynamic tensoranalysis [22], general tensor discriminant analysis [23], concur-rent subspaces analysis [30], rank-one tensor decompositions[25], [31], tensor embedding methods [6], [25], tensor subspaceanalysis [10], multilinear discriminant analysis [35], and graphtensor embedding [34].

However, all of the above tensor-based learning models lacksystematic justifications at the probabilistic level. Therefore, itis impossible to conduct conventional statistical tasks, e.g., di-mensionality determination, over existing tensor-based learningmodels. The difficulty of applying probability theory, Bayesianinference, and probabilistic graphical modeling to tensor-basedlearning models comes from the gap between the tensor-baseddatum representation and the vector-based input requirement intraditional probabilistic models. Is it possible to apply the prob-ability theory and to develop probabilistic graphical models fortensors? Is it possible to apply the Bayesian inference over spe-cific tensor based learning models?

To answer these questions, we narrow down our focus on gen-eralizing the Bayesian principal component analysis (BPCA)[4], which is an important generative model [3] for subspace se-lection or dimension reduction for vectors. This generalizationforms the Bayesian tensor analysis (BTA), which is a generativemodel for tensors. The significances of BTA are given here.

1) Providing a full probabilistic analysis for tensors. Based onBTA, a probabilistic graphical model can be constructed,the dimension reduction procedure for tensors could be un-derstood at the probabilistic level, and the parameter esti-mation can be formulated by utilizing the expectation max-imization (EM) algorithm under the maximum likelihoodframework [3].

2) Providing a method for automatic dimensionality determi-nation based on the Bayesian framework [3]. It is impos-sible for conventional tensor-based subspace analysis tofind a criterion for dimensionality determination, i.e., de-termining the appropriate number of retained dimensionsto model the original tensors. In BTA, the number of re-tained dimensions can be chosen automatically by settinga series of hyper parameters.

Fig. 1. Third-order tensor� � � .

3) Providing a flexible framework for tensor data modeling.BTA assumes entries in tensor measurements are drawnfrom multivariate normal distributions. This assumptioncould be changed for different applications, e.g., multi-nomial/binomial distributions in sparse tensor modeling.With different assumptions, different dimension reductionalgorithm could be developed for different applications.

Based on the developed BTA, we compress a collectionof 3-D face data in different identities and different expres-sions, that is, we construct a training data set with tensorsin , where is the number of vertices tomodel a 3-D face. Based on BTA, these tensors are com-pressed into , where and

. Tensors in arecalled core tensors, which are ready for reconstruction basedapplications, e.g., 3-D face expression generation, retargeting,and synthesis. The approach here is based on an alternatingprojection optimization procedure.

The remainder of this paper is organized as follows.Section II provides fundamental materials about tensor algebra.Section III describes the novel BTA, and Section IV simplifiesBTA as probabilistic tensor analysis (PTA). Empirical studiesfor BTA- and PTA-based dimensionality determination aredemonstrated in Section V based on Tensor Matlab Toolbox[1], [2], and Section VI applies BTA for 3-D face modeling onfacial expression retargeting problem. Section VII concludes.

II. TENSOR ALGEBRA

Tensors [12] are arrays of numbers which transform in cer-tain ways under different coordinate transformations. The orderof a tensor , represented by a multidimen-sional array of real numbers, is . An element of is denotedas , where and . The thdimension (or modality) of is of size . A scalar is a ze-roth-order tensor, a vector is a first-order tensor, and a matrixis a second-order tensor. The structure of a third-order tensoris shown in Fig. 1. In the tensor terminology, we have the fol-lowing definitions.

Definition 1 (Tensor Product or Outer Product): The tensorproduct of a tensor and anothertensor is defined by

(1)

over all index values.

Page 3: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO ...static.tongtianta.site/paper_pdf/43a5f624-5f6b-11e9-bfe5-00163e08b… · a set of second-order tensors—one modality for identity

TAO et al.: BAYESIAN TENSOR APPROACH FOR 3-D FACE MODELING 1399

Fig. 2. Mode-1 matricizing of a third-order tensor�.

Fig. 3. Mode-2 product between a third-order tensor� � � and amatrix � � � results in a new tensor� � � .

For example, the tensor product of two vectors andis a matrix , i.e., .

Definition 2 (Mode- Matricizing or Matrix Unfolding):The mode- matricizing or matrix unfolding of an th-ordertensor is a set of vectors in ob-tained by keeping the index fixed and varying the otherindices. Therefore, the mode- matricizing or matrix unfoldingof an th-order tensor is a matrix , where

. We denote the mode- matricizing of asor briefly . An example of mode-1 matricizing

for a third-order tensor is given in Fig. 2.Definition 3 (Mode- Product): The mode- product

of a tensor and a matrix isan tensordefined by

(2)

The mode- product is a type of contraction.Fig. 3 shows a third-order tensor mode-2

products with a matrix . The result is a tensor in.

To simplify the notations in this paper, we denote

(3)

(4)

Definition 4 (Frobenius Norm): The Frobenius norm of atensor is given by

(5)

The Frobenius norm of a tensor measures the “size” of thetensor and its square is the “energy” of the tensor.

Definition 5 (Tensor Principal Component Analysis orTPCA): TPCA [12], [28], [22] is a multidimensional extensionof the conventional principal component analysis (PCA). Givena number of measurements ,TPCA minimizes the function over as

(6)

where .

III. BTA: BAYESIAN TENSOR ANALYSIS

Here, we first construct the latent tensor model, which isa multilinear mapping to relate the observed tensors withunobserved latent tensors. Based on the proposed latent tensormodel, we propose BTA with dimension reduction and datareconstruction.

A. Latent Tensor Analysis

Similar to the latent variable model [26], [4], the latenttensor model multilinearly relates observed tensors for

in the high-dimensional spaceto the corresponding latent tensors in the low-dimensionalspace , i.e.,

(7)

where is the th projection matrix andis the mean tensor of all obser-

vations , is the th residue tensor, and every entry offollows . Moreover, the number of effective dimensionof latent tensors is upper bounded by for the th mode, i.e.,

for the th projection matrix . Here,and is the number of observed tensors. Projection matrices

construct the multilinear mapping betweenthe observations and the latent tensors. In this model, we candefine prior distributions over both the latent variable and thenoise , so the model parameters can be estimated inthe maximum-likelihood scheme. This model is also a genera-tive model, because tensor measurements can be generatedby sampling from distributions of and based on (7).

B. BTA: Probabilistic Model

Armed with the latent tensor model and the Bayesian prin-cipal component analysis (BPCA) [4], a Bayesian treatment ofTPCA is constructed by introducing hyper-parameters with a

Page 4: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO ...static.tongtianta.site/paper_pdf/43a5f624-5f6b-11e9-bfe5-00163e08b… · a set of second-order tensors—one modality for identity

1400 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 10, OCTOBER 2008

prior distribution , where isa set contains all observed tensors. According to the Bayesiantheorem, the corresponding posterior distribution is the multi-plication of the prior, the likelihood function, and normalizing.Therefore, the Bayesian modeling of TPCA or BTA is definedby the predictive density, which is a marginalizing over param-eters, i.e.,

(8)

where isthe predictive density with the given full probabilistic model,and is the posterior probability. The modelparameters can be obtained by applying max-imum a posterior (MAP). In BTA, to obtain ,we have the following concerns.

1) The probabilistic distributions are defined over vectors butnot tensors. Although the vectorization operation is helpfulto utilize the conventional probability theory and infer-ences, the computational cost will be very high and almostintractable for practical applications. Moreover, for somemultimodalities data, e.g., a collection of 3-D faces withdifferent identities and different expressions, the vector-ization operation makes the data modeling and generationdifficult. Therefore, it is important to develop a method toobtain the probabilistic model in a computational tractableway. In the next part, the decoupled probabilistic model isdeveloped to significantly reduce the computational com-plexity.

2) We must figure out how to determine the number of re-tained dimensions to model observed tensors. In proba-bility theory and statistics, the Akaike information criterion(AIC) [3] and the Bayesian information criterion (BIC) [3]are popular for model selection, e.g., dimensionality de-termination here. However, both methods require enumer-ating all possible models for selection, i.e., we need to cal-culate models times. Therefore, the com-putational complexities are high for AIC- and BIC-basedmodel selections. In Bayesian treatment, the dimension-ality determination procedure could be conducted out auto-matically by introducing hyper-parameters or hierarchicalprior over the projection matrix . The th entrycontrols the energy of the th row of or, in detail, thehigher the value is, the lower the importance of the rowis. The probabilistic graphical model is shown in Fig. 4.

To obtain projection matrices is computationallyintractable, because the model requires obtaining all projec-tion matrices simultaneously. To construct a computationallytractable algorithm to obtain , we can construct decou-pled probabilistic model for (7), i.e., obtain each projectionmatrix separately and form an alternating optimization pro-cedure. The decoupled predictive densityis defined as the production of a series of probabilistic functionsand the th probabilistic function is defined as the probability

Fig. 4. Probabilistic graphical model of BTA.

of a projected observation over all projection matrices butthe th projection matrix with the given mean vector and thecorresponding noise variance, i.e.,

(9)

where is the mean vector for the th mode and isthe variance of the noise to model the residue of the th mode.

The decoupled posterior distribution with the given observedtensors is defined as the production of a series of probabilisticfunctions and the th probabilistic function is defined as theprobability of the th mean vector, the th projection matrix, andthe corresponding noise variance with the given observationsand the other projection matrices, i.e.,

(10)

Based on the Bayesian theorem, the posterior overthe th modality is proportional to the multiplicationof the its corresponding decoupled likelihood function

and the decoupled prior dis-tribution , i.e.,

(11)

Therefore, based on (9)–(11), the decoupled predictive den-sity is the marginalizing over all parameters for a series of pro-duction of the decoupled predictive density, decoupled likeli-hood function, and the decoupled prior distribution, i.e.,

(12)

The key point in BTA is to determine the dimension of theeach modality of the latent space automatically and and

Page 5: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO ...static.tongtianta.site/paper_pdf/43a5f624-5f6b-11e9-bfe5-00163e08b… · a set of second-order tensors—one modality for identity

TAO et al.: BAYESIAN TENSOR APPROACH FOR 3-D FACE MODELING 1401

Fig. 5. Decoupled probabilistic graphical model for BTA.

will be determined automatically with the given , so theprior probability distribution can be simplified as

, i.e., we can simplify (12) as

(13)

To automatically determine the dimension of the latent spaceof each mode, i.e., the dimension of or the size of the thprojection matrix , we introduce hyper-parameters , i.e.,

(14)

where is the th value in and is the transpose of theth row of . Similar to BPCA, the definition of (13) is also mo-

tivated by the framework of automatic relevance determination(ARD) [14], [4] in dimensionality determination. With (13) and(14), we can have the decoupled probabilistic graphical modelas shown in Fig. 5.

Here, controls the inverse variance of . To reducecomputational cost for practical applications [4], it can be ap-proximated as

(15)

Consider the column space of the projected data set, where the sample is the th column of

and , where .

Therefore, . Letand is the th column of . Suppose,

and the marginal distribu-tion of the latent tensors , then the marginaldistribution of the observed projected mode- vector is alsoGaussian, i.e., . The mean vector and thecorresponding covariance matrix are

and , respectively. Based on the property ofGaussian properties, we have

(16)

where .In this decoupled model, the objective is to find the th pro-

jection matrix based on MAP with given , i.e.,calculate by maximizing

(17)

where is the natural logarithm of and is the samplecovariance matrix of . The (17) is benefited from the decou-pled definition of the probabilistic model, defined by (13). Basedon (17), the total log of posterior distribution is

(18)

To implement MAP for the th projection matrix estima-tion, the EM algorithm is applied here. This is because the la-tent tensors are unknown and can be deemed as missing data,so the alternating least square estimation cannot be applied toobtain projection matrices. Because the joint distribution of ob-servations and latent tensors is available, we can calculate theexpectation of the log likelihood of complete data with respectto as

(19)

Because defined in (14),and defined in (16), based on Bayes theorem,

with the given is given by

(20)

Page 6: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO ...static.tongtianta.site/paper_pdf/43a5f624-5f6b-11e9-bfe5-00163e08b… · a set of second-order tensors—one modality for identity

1402 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 10, OCTOBER 2008

It is impossible to maximize with respect to all pro-jection matrices and the corresponding hyper parame-ters because different projection matrices are inter-re-lated [23] during optimization procedure, i.e., it is required toknow and to optimize and . Therefore, weneed to apply alternating optimization procedure [23], [24] foroptimization. To optimize the th projection matrix and thecorresponding hyper-parameters with , we need the de-coupled expectation of the log likelihood function on the thmode

(21)

Based on (16), we have

(22)

(23)

Equations (22) and (23) form the expectation step or E-step.They are obtained with respect to the posterior distribution oflatent tensors with given observations.

The maximization step or M-step is obtained by maximizingwith respect to and . This step

always increases the likelihood of interest until the likelihoodfunctions achieves its local maximum. In detail, by setting

, we have

(24)

and, by setting , we have

(25)

According to (15), is approximated by .In summary, Table I lists the algorithm to obtain the projectionmatrices of BTA.

TABLE ITRAINING STAGE OF BTA

C. Dimension Reduction and Data Reconstruction

After having projection matrices , the following oper-ations are important for different applications.

Dimension reduction: given the projection matricesand an observed tensor in the high dimensional

space , how do we find the corre-sponding latent tensor in the low-dimensional space

? From tensor algebra, the dimen-sion reduction is given by . However,the method is absent the probabilistic perspective. Underthe proposed decoupled probabilistic model, is ob-tained by maximizing . Because

Page 7: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO ...static.tongtianta.site/paper_pdf/43a5f624-5f6b-11e9-bfe5-00163e08b… · a set of second-order tensors—one modality for identity

TAO et al.: BAYESIAN TENSOR APPROACH FOR 3-D FACE MODELING 1403

, the procedure for thedimension reduction is given by

(26)

Data reconstruction: given the projection matricesand a latent tensor in the low dimensional space

, how do we approximate the corre-sponding observed tensor in the high-dimensional space

? Based on (26), the data reconstructionprocedure is given by

(27)

The reconstruction error rate is given by.

IV. PTA: PROBABILISTIC TENSOR ANALYSIS

The proposed BTA can be simplified by removing the hyper-parameters . This results in the probabilistic tensor anal-ysis (PTA), and the corresponding predictive density is given by

(28)

The projection matrices are obtained by maximizingthe posterior distribution with the given observed tensors , i.e.,

(29)

Similar to BTA, EM is applied to obtain , and thedecoupled expectation of the log likelihood function over the

th modality is given by (30), shown at the bottom of the page.The E-step in PTA is the same as the E-step in BTA.

Based on (30), we have the M-Step for PTA by maximizingwith respect to and . In detail,

by setting , we have

(31)

and, by setting , we have

(32)

By replacing the M-step in Table I with (31) and (32), we ob-tain the optimization procedure for PTA. In BTA, the model isselected automatically because hyper-parameters con-trol the energy of rows of projections matrices . In PTA,the conventional BIC could be applied to determine the sizeof . The exhaustive search based on BIC is applied formodel selection. In detail, for BIC-based model selection, weneed to calculate the score of BIC as

(33)

for each modality times, because the number ofrows in each projection matrix changes from 1 to .In determination stage, the optimal is

(34)

where .Remember BPCA, which is an extension of the PPCA

[26], by adding hyper-parameters for automatic model selec-tion, e.g., automatic dimensionality determination here. Theproposed PTA is a simplification of BTA by removing thehyper-parameters. Based on these two points, we have the rela-tionships of BPCA, PPCA, BTA, and PTA, as shown in Fig. 6.In detail, PTA/BTA is a tensor generalization of PPCA/BPCA.BTA/BPCA can be degenerated to PTA/PPCA by removing thehyper-parameters for automatic model selection.

V. EMPIRICAL STUDIES FOR MODEL SELECTION

Here, we utilize a synthetic data model, as shown in Fig. 7, toevaluate BTA and PTA in terms of accuracy for dimensionalitydetermination. The accuracy is measured by the dimensionalitydetermination error . Here, is the real model,i.e., the real dimension of th modality of the unobserved latenttensor, and is the selected model, i.e., the selected dimensionof the th modality of the unobserved latent tensor by usingBTA or BIC PTA. Fig. 7 shows the data generation model. Amultilinear transformation is applied to map the tensor from

(30)

Page 8: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO ...static.tongtianta.site/paper_pdf/43a5f624-5f6b-11e9-bfe5-00163e08b… · a set of second-order tensors—one modality for identity

1404 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 10, OCTOBER 2008

Fig. 6. Relationships of BPCA, PPCA, BTA, and PTA.

Fig. 7. Data generation model for empirical study.

the low-dimensional space to high-dimensionalspace by ,where and every entry of every unobservedlatent tensor is generated from a single Gaussian with meanzero and variance 1, i.e., is the noise tensor andevery entry is drawn from is a scalar and we set itas 0.01, the mean tensor is a random tensorand every entry in is drawn from the uniform distributionon the interval ; projection matrices arerandom matrices and every entry in is drawn from theuniform distribution on the interval ; and denotes the thtensor measurement. In the following experiments, we setas 0.005 because we can achieve reasonable performance formodel justification. Moreover, by setting with various smallvalues, BTA works similarly. For all experiments, .

In the first experiment, the data generator gives ten measure-ments by setting , and . Todetermine and based on BIC for PTA, we need to conductPTA times and obtain two BIC score matricesfor projection matrix of the first modality and the secondprojection matrix , respectively, as shown in Fig. 8. In thisfigure, every block corresponds to a BIC score and the darkerthe block is the smaller the corresponding BIC score is. We usea light rectangular to hint the darkest block in each BIC scorematrix and the block corresponds to the smallest value. In thefirst BIC score matrix, as shown in the left subfigure of Fig. 8,the smallest value locates at . Because this BIC score ma-trix is calculated for the projection matrix of the first modalitybased on (33), we can set according to (34). Similarto the determination of , we determine according tothe second BIC score matrix, as shown in the right of Fig. 11,because the smallest value locates at . For this example,the dimensionality determination error is .This means the dimensionality determination criterion definedin (33) and (34) can select correct model in PTA-based tensormodeling. Moreover, we need to run the PTA modeltimes, which is computationally expensive.

Fig. 8. BIC score matrices for the first and the second projection matrices. Eachblock corresponds to a BIC score. The darker the block is, the smaller the BICscore is. Based on this figure, we determine � � � � and � � � � based on BICobtained by PTA and the dimensionality determination error is 0.

(a) (b)

Fig. 9. Hinton diagram of (a) the first and (b) the second projection matricesobtained by PTA.

Fig. 9 shows the Hinton diagram of the first and second pro-jection matrices obtained by PTA in (a) and (b), respectively.Projection matrices are obtained from PTA by setting

and . In this figure, each row corresponds to onepotential principal direction in measurement space for first orsecond modality. Because no rows are vanishing, we can see thatPTA cannot suppress loadings, which correspond to noise. Thismeans that PTA cannot distinguish between signal and noise.

We then apply BTA for automatic dimensionality determina-tion. The result of fitting BTA model for the first and secondmodalities are shown as Hinton diagrams in Fig. 10(a) and (b),respectively. Each row in both subfigures represents one poten-tial principal direction in the measurement space for first andsecond modalities. For the first and the second projection ma-trices, BTA selects and , respectively. Theyare equivalent to the true values, so the dimensionality deter-mination error is 0. From the figure, we can see that a lot ofrows are vanishing and this means BTA has the ability to sup-press loadings, which are corresponding to noise. That is BTAcan distinguish between signal and noise. These results indi-cate that BTA can determine effective dimensions for differentmodalities automatically for principal component subspaces ofdifferent modalities. Therefore, it is a practical alternative toexhaustive search for dimensionality determination, e.g., PTAcombined with BIC.

1The Hinton diagram1 displays values in a data matrix. Every value is rep-resented by a square whose size measures the magnitude, and whose colourindicates the sign. Here, blue is negative and red is positive. Here it is helpfulto visualize the projection matrices obtained from PTA and BTA. Each row inFigs. 9 and 10 represents a loading in PTA and BTA.

Page 9: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO ...static.tongtianta.site/paper_pdf/43a5f624-5f6b-11e9-bfe5-00163e08b… · a set of second-order tensors—one modality for identity

TAO et al.: BAYESIAN TENSOR APPROACH FOR 3-D FACE MODELING 1405

(a) (b)

Fig. 10. Hinton diagram of (a) the first and (b) the second projection matricesobtained by BTA.

We repeat the experiments with the similar setting as the firstexperiment in this Section 30 times, but , and are ran-domly set with the following requirements:

, and . The total dimensionality de-termination errors for BTA and BIC with PTA are both 0. Wealso conduct 30 experiments for third-order tensor, with sim-ilar setting as described above and , and are set-ting with the following requirements:

, and . The total dimension-ality determination errors are also both 0 for BTA and BIC withPTA. The average reconstruction error rate is 4.78%.

VI. 3-D FACE DATA MODELING

For this BTA-based 3-D facial data modeling, a suitabletraining set should be built in advance. In this dataset, num-bers of vertices in different 3-D faces should be identical toeach other and each vertex in a 3-D face should be associatedwith corresponding vertices to the other 3-D faces. Moreover,numbers of expressions for different identities should be sameto each other and each expression of an identity should be as-sociated with corresponding expression of the other identities.Therefore, to build such a dataset, the following steps, as shownin Fig. 11, are taken consequently.

1) Feature points localization over a 3-D face: the original col-lected face data are different in their scale, location, orien-tation, and the number of vertices. Localizing the featurepoints is an important step to build the correspondence be-tween different faces. Feature points are localized automat-ically based on heuristic rules, e.g., the tip of the nose isalways the highest point along -coordinate, as shown inthe first column of Fig. 11. Slight adjustments are usuallycarried out to give a better precision for localization.

2) 3-D faces alignment: With the aid of feature points in both3-D training faces and the 3-D reference face, singularvalue decomposition (SVD) based rigid alignment [5] iscarried out to make 3-D faces have the identical scale,translation and orientation.

3) The cylindrical projection calculation and parameteriza-tion: For a vertex , its cylindrical coordinate is

, which is indexed by feature points, as shown in thesecond column of Fig. 11. Then, we can find correspon-dences between vertices in different faces. For non-densedata, as shown in the top row of third column of Fig. 11,vertex interpolation is carried out in (4).

Fig. 11. Correspondence construction.

4) Mesh image construction: A GPU [13] based interpolationis applied here. We treat as its pixel coordinates ina texture image, and as the corresponding RGBvalue for this pixel. This pseudo texture image is calledmesh image here, as shown in bottom of third column ofFig. 11.

5) Finding correspondence for vertices in the 3-D referenceface to get the rebuilt 3-D training face with the iden-tical number of vertices, as shown in the fourth column ofFig. 11.

Steps 3)–5) build correspondence of vertices between facesfrom different subjects. For a standard face with vertices, theparameterization can be carried out to obtain the bari coordi-nate of each vertex in a face, so we can find its correspondingvertex in another face by interpolation, as described in Step 3. Inthis paper, we implement a GPU-based interpolation process tospeed up the procedure. This process treats the cylindrical pro-jected mesh as a texture image and renders it with a higher reso-lution. The R, G, and B values in the texture image are actually

coordinates value in the corresponding 3-D mesh, so wecan resample on this rendered texture image and reconstruct the3-D mesh with predefined number of vertices. All these stepsare performed on the neutral faces of different subjects. Finally,we obtain the facial expression database as described above.

A. Tensor Face Construction

Vlasic et al. [29] form a third-order tensor to model a collec-tion of 3-D facial data with different identities and expressions.The first modality of the third-order tensor represents differentvertices of a 3-D face, the second modality represents differentexpressions; and the third modality represents different identi-ties. Unlike the modeling method used by Vlasic et al. [29], wecollect a number of 3-D facial data, as shown in Fig. 12, andform a series of second-order tensors. The first modality of asecond tensor describes expression and the other modality de-scribes identity.

The construction procedure of the series of second-order ten-sors is as follows, as shown in Fig. 13: 1) collecting the -coor-dinate value of the th vertex for all 3-D facial dataand forming data tensor (second-order tensor, i.e., matrix)with the first modality as identity and the second modality as ex-pression; 2) similar to step 1), constructing and ; and

Page 10: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO ...static.tongtianta.site/paper_pdf/43a5f624-5f6b-11e9-bfe5-00163e08b… · a set of second-order tensors—one modality for identity

1406 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 10, OCTOBER 2008

Fig. 12. Collection of 3-D facial data with different identities and different ex-pressions.

Fig. 13. Tensor set construction.

3) putting tensors , and together for all verticesand forming the data set with tensors .

Based on the constructed 3-D facial tensor set, we can apply

the proposed BTA directly and obtain core tensorsfor based on the latent tensor model

defined in (7), i.e.,

(35)

where is themean tensor over and projection matrices and are ob-tained from BTA, as described in Table I. For facial expressionretargeting task here, we only need to preserve core tensors( and ). Based onthese core tensors, the facial expression retargeting can be con-ducted by the following alternating projection optimization pro-cedure.

B. 3-D Facial Expression Retargeting

As described at the beginning of the Introduction, 3-D facialexpression retargeting relies on two source 3-D faces with dif-ferent expressions, i.e., and , which are in , and a targetface , which expression is the same as . The objec-tive of retargeting with given core tensors is to obtain a trans-formed expression of to , which is the same expression as

. With core tensors , it is possible to generate a 3-D facewith an arbitrary identity and an arbitrary expression by two

specific projection vectors and ,i.e., the th value of is given by . Based onthis point, we know that the retargeting task can be formulated

under the energy minimization framework, i.e., minimizing thesummation of reconstruction errors with given core tensors, thefirst expression of the first identity and the second identity, andthe second expression of the first identity

(36)

where and represent the linear combinationcoefficients to synthesize source and target 3-D faces based oncore tensors , respectively; and repre-sent the linear combination coefficients to synthesize the firstand the second expressions based on core tensors , respec-tively, and means the th entry of the vector and so dofor and . Based on (36), we can obtain and

simultaneously, and the th entry of retargeted expression forthe target 3-D facial data is given by .

Based on the above description and according to (36), thekey problem here is to find and by minimizing

:

(37)

To describe the alternating projection optimization procedureconveniently, we have the following definitions:

The matrix encodes the first identity information; en-codes the second identity information; encodes the first ex-pression information; and encodes the second expression in-formation. The 3-D face model defined in (35) decouples theidentity information and the expression information, i.e., the un-derlying assumption here is that the expression information isindependent with the identity information. Therefore, it is rea-sonable to define the minimization procedure (36) to obtain theprojection models , and .

Page 11: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO ...static.tongtianta.site/paper_pdf/43a5f624-5f6b-11e9-bfe5-00163e08b… · a set of second-order tensors—one modality for identity

TAO et al.: BAYESIAN TENSOR APPROACH FOR 3-D FACE MODELING 1407

According to the alternating projection optimization proce-dure used in [24] to obtain the solution of supervised tensorlearning, we need to have

(38)

By setting as 0, we can obtain as

(39)

Similar to (38) and (39), we also have

(40)

(41)

(42)

Based on (39)–(42), we find that the solution of dependson both and ; the solution of depends on ; the so-lution of depends on both and ; and the solution of

depends on . Therefore, there is no closed-form solutionfor (36). According to [24], the alternating projection optimiza-tion procedure can be applied to find the solution. The solutionis unique because is a convex function withrespect to and . The alternating projection opti-mization procedure is listed in Table II.

In practical applications, we may have source ex-pressions, so it will be time consuming to iteratively apply thealternating projection optimization procedure liste in Table II togenerate all expressions for the target 3-D face, i.e., we need toapply the algorithm times. Because there is a closed formsolution for with the given and , thecost of computation in this case can be significantly reduced.Therefore, we only need to apply the alternating projection op-timization procedure for the first time and preserve andfor future use.

C. Empirical Justification

For empirical study, we select eight identities with ten expres-sions for each identity to train core tensors based on the pro-posed BTA as described in Section III. Therefore, there are intotal 80 3-D faces, and each face consists of 12 316 vertices. ForBTA, we set as 0.005 and as 0.001. By applying BTA, orig-inal second-order tensors are transformed from to .The size of core tensors is comparable to that of original tensors,because the number of training faces is limited and it is very ex-pensive to obtain a large number of 3-D facial data. Experiments

TABLE IIALTERNATING PROJECTION OPTIMIZATION PROCEDURE FOR (36)

are conducted under Matlab 2007a in HP Pavilion dv6000. TheCPU is Intel Duo Core 2.0 and the memory is 1 GB.

At the retargeting stage, we select three identities, which arenot in the training set. One is source with 8 expressions in-cluding a neutral face and the other two are targets with neutralexpression only. By applying the algorithm listed in Table II, wecan generate a series of transformed 3-D target faces to emulateexpressions of 3-D source faces. With a result vertices list, the3-D Delaunay triangulation is carried out to reconstruct the cor-responding 3-D surface. As shown in Fig. 14 (the first columnare 3-D neutral faces), results (bottom two rows) produced byour method are natural and comparable with source faces (top).In Fig. 14, as shown by circled areas, our method can reproducethe subtle deformation as good as source faces. For example, asthe second column shown, the subtle wrinkle appearing on theleft mouth corner is generated, which is similar on the left mouthcorner in the fifth column; in the sixth column, the deformationin the cheekbone looks very realistically; and detailed variationsin the eyebrow are generated vividly in the last column.

As stated at the end of Section VI-C, with the given andto identify the identity of the 3-D source face and the 3-D

target face, we can have a closed-form solution of to identifythe transformed expression for the 3-D target face. Based onthis method, we obtain another group of experimental results, asshown in Fig. 15, which shows a similar performance to Fig. 14.

To make a further evaluation, we compare our results withthat from expression cloning [15], which is an example basedmethod to produce expression retargeting result. In this method,the affine transformation arisen by expression of each vertex isimposed to the target face from the source face. Though local

Page 12: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO ...static.tongtianta.site/paper_pdf/43a5f624-5f6b-11e9-bfe5-00163e08b… · a set of second-order tensors—one modality for identity

1408 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 10, OCTOBER 2008

Fig. 14. 3-D face expression retargeting based on the algorithm in Table II. Thetop row is 3-D source faces and the bottom two rows are 3-D target faces withtwo identities. The first column represents neutral faces. To retarget one facialexpression, we need around 180 s.

Fig. 15. 3-D face expression retargeting experimental results. The top row is3-D source faces and the bottom two rows are 3-D target faces with two iden-tities. The first column represents neutral faces. In the bottom two rows, thesecond column is generated based on the alternating projection optimizationprocedure listed in Table II and the other columns are generated based on thedescriptions at the end of Section 6.3. To retarget one facial expression, we needaround 0.64 s.

transformations are performed to reduce the effects of improperscales and directions on the target face, some unnatural effectsusually appear when the target face is significantly differentfrom the source face. In the proposed approach, both the iden-tity (i.e., the shape of a face) and its expressions are modelledtogether by the learnt core tensor, so it always gives natural re-sults.

As shown in Fig. 16, some regions indicated by red circlesare not natural and look unrealistic. Especially for eye areas,the expression cloning provides rough and tumble deformations.However, there are no such effects in both Figs. 14 and 15. Thisalso demonstrates the strength of the proposed BTA for expres-sion retargeting.

VII. CONCLUSION

A collection of 3-D faces with different identities and expres-sions naturally forms either a third-order tensor with modalitiesfor vertices, identities, and expressions or a set of second-ordertensors with modalities for identities and expressions. The sizeof the set is equivalent to three times of the number of ver-tices for 3-D face representation. In this paper, we start from thesecond understanding to obtain Bayesian inference over tensorsso that dimensionality determination procedure can be carriedout.

However, conventional probabilistic graphical models withBayesian inference only model data in vector format. Is it

Fig. 16. Expression cloning-based 3-D face expression retargeting experi-mental results. The top row is 3-D source faces and the bottom two rows are3-D target faces with two identities. The first column represents neutral faces.In the bottom two rows, the second column is generated based on the expressioncloning algorithm. Regions indicated by red circles show the unusual andunnatural effects in the retargeted faces.

possible to construct multimodal probabilistic graphical modelsfor tensor format data to conduct Bayesian inference, e.g.,automatic dimensionality determination? This paper provides apositive answer based on the proposed decoupled probabilisticmodel by developing the BTA, which determines dimension-ality automatically for tensor format data. Moreover, BTA isready for various applications in computer vision and datamining. Empirical studies demonstrate that BTA usually deter-mines correct number of dimensions for different modalitiesover tensor data.

Based on the proposed BTA, second-order tensors for3-D faces representation are compressed into small sizesecond-order core tensors, which consist of enough infor-mation to reconstruct an identity with a specific expression.Based on this point, an alternating projection optimizationprocedure is developed for 3-D facial expression retargetingtask and empirical studies show the combination of BTA withthis optimization procedure can achieve reasonable results forfacial expression driven applications.

ACKNOWLEDGMENT

The authors would like to thank Prof. A. Blake for encour-aging us to develop the probabilistic graphic model for tensorsand anonymous reviewers for their constructive comments.

REFERENCES

[1] B. W. Bader and T. G. Kolda, Efficient MATLAB ComputationsWith Sparse and Factored Tensors Sandia National Laboratories,Albuquerque, NM, Tech. Rep. SAND2006-7592, 2006.

[2] B. W. Bader and K. G. Kolda, “MATLAB tensor classes for fast algo-rithm prototyping,” ACM Trans. Math. Software, vol. 32, no. 4, Dec.2006.

[3] C. M. Bishop, Neural Networks for Pattern Recognition.. Oxford,U.K.: Oxford Univ. Press, 1995.

[4] C. M. Bishop, , M. Mozer, M. I. Jordan, and T. Petsche, Eds., “BayesianPCA,” in Advances in Neural Information Processing Systems 9.Denver, CO: , 1996, pp. 382–388.

[5] J. H. Challis, “A procedure for determining rigid body transformationparameters,” J. Biomech., vol. 28, no. 6, pp. 733–737, 1995.

[6] G. Dai and D. Yeung, “Tensor embedding methods,” in Proc. Nat. Conf.Artif. Intell., 2006, pp. 330–335.

[7] P. Ekman and W. V. Friesen, Facial Action Coding System. PaloAlto, CA: Consulting Psychol., 1978.

[8] Y. Fu and N. Zheng, “M-face: An appearance-based photorealisticmodel for multiple facial attributes rendering,” IEEE Trans. CircuitsSyst. Video Technol., vol. 16, no. 7, pp. 830–842, Jul. 2006.

Page 13: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO ...static.tongtianta.site/paper_pdf/43a5f624-5f6b-11e9-bfe5-00163e08b… · a set of second-order tensors—one modality for identity

TAO et al.: BAYESIAN TENSOR APPROACH FOR 3-D FACE MODELING 1409

[9] W. Gao, Y. Chen, R. Wang, S. Shan, and D. Jiang, “Learning andsynthesizing Mpeg-4 compatible 3-D face animation from video se-quence,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 11, pp.1 119–1 128, Nov. 2003.

[10] X. He, D. Cai, and P. Niyogi, “Tensor subspace analysis,” Adv. NeuralInf. Process. Syst., vol. 18, pp. 1–8, Dec. 2005.

[11] P. Kroonenberg and J. D. Leeuw, “Principal component analysis ofthree-mode data by means of alternating least square algorithms,” Psy-chometrika, vol. 45, pp. 69–97, 1980.

[12] L. D. Lathauwer, “Signal Processing Based on Multilinear Algebra,”Ph.D. dissertation, Dept. Elect. Eng., Katholike Universiteit Leuven,Leuven, Belgium, 1997.

[13] A. E. Lefohn, I. Buck, P. McCormick, J. D. Owens, T. Purcell, and R.Strzodka, “GPGPU: General-purpose computation on graphics proces-sors,” in IEEE Visualization, Tutorial, 2005.

[14] D. J. C. MacKay, “Probable networks and plausible predictions—Areview of practical bayesian methods for supervised neural networks,”Network: Computation Neural Syst., vol. 6, no. 3, pp. 469–505, 1995.

[15] J. Y. Noh and U. Neumann, “Expression cloning,” ACM SigGraph, pp.277–288, 2001.

[16] B. Park, H. Chung, T. Nishita, and S. Y. Shin, “A feature basedapproach to facial expression cloning,” J. Comput. Animation VirtualWorlds, vol. 16, no. 3–4, pp. 291–303, 2005.

[17] F. I. Parke, “A parametric model for human faces,” Ph.D. dissertation,Dept. Comp. Sci., Univ. Utah, Salt Lake City, Utah, 1974.

[18] F. I. Parke, “Parameterized models for facial animation,” IEEE Com-puter Graphics and Applications, vol. 2, pp. 61–68, 1982.

[19] F. I. Parke and K. Waters, Computer Facial Animation. Wellesley,MA: AK Peters, 1998.

[20] B. Robertson, “Locomotion,” Computer Graphics World, vol. 27, no.12, p. 1, 2005.

[21] R. W. Sumner and J. Popvic, “Deformation transfer for trianglemeshes,” ACM Trans. Graphics, vol. 23, no. 3, pp. 399–405, 2004.

[22] J. Sun, D. Tao, and C. Faloutsos, “Beyond streams and graphs: Dy-namic tensor analysis,” in Proc. 12th ACM SIGKDD Int. Conf. Knowl-edge Discovery and Data Mining, Philadelphia, PA, 2006, pp. 374–383.

[23] D. Tao, X. Li, X. Wu, and S. J. Maybank, “General tensor discriminantanalysis and Gabor features for gait recognition,” IEEE Trans. PatternAnal. Mach. Intell., vol. 29, no. 10, pp. 1700–1715, Oct. 2007.

[24] D. Tao, X. Li, W. Hu, S. J. Maybank, and X. Wu, “Supervised tensorlearning,” in Knowledge and Information Systems. Berlin, Germany:Springer-Verlag, 2007.

[25] D. Tao, X. Li, X. Wu, and S. J. Maybank, “Tensor rank one discriminantanalysis,” Neurocomputing, vol. 71, no. 10-12, pp. 1866–1882, 2008.

[26] M. E. Tipping and C. M. Bishop, “Probabilistic principal componentanalysis,” J. Royal Stat. Soc., ser. B, vol. 21, no. 3, pp. 611–622, 1999.

[27] L. R. Tucker, “Some mathematical notes on three-mode factor anal-ysis,” Psychometrika, vol. 31, no. 3, 1966.

[28] M. A. O. Vasilescu and D. Terzopoulos, “Multilinear subspace analysisof image ensembles,” in Proc. IEEE Int. Conf. Comput. Vis. PatternRecogn., Madison, WI, 2003, vol. 2, pp. 93–99.

[29] D. Vlasic, M. Brand, H. Pfister, and J. Popvic, “Face transfer with mul-tilinear models,” ACM SigGraph, pp. 426–433, 2005.

[30] D. Xu, S. Yan, L. Zhang, S. Lin, H. Zhang, and T. S. Huang, “Recon-struction and recognition of tensor-based objects with concurrent sub-spaces analysis,” IEEE Trans. Circuits Syst. Video Technol., vol. 18, no.1, pp. 36–47, Jan. 2008.

[31] D. Xu, S. Lin, S. Yan, and X. Tang, “Rank-one projections with adap-tive margin for face recognition,” IEEE Trans. Syst., Man Cybern., vol.37, no. 5, pt. B, pp. 1 226–1 236, Oct. 2007.

[32] D. Xu, S. Yan, L. Zhang, H. Zhang, Z. Liu, and H. Shum, “Concur-rent subspace analysis,” in Proc. IEEE Int. Conf. Comput. Vis. PatternRecogn., 2005, pp. 203–208.

[33] S. Yan, D. Xu, S. Lin, T. Huang, and S.-F. Chang, “Element rearrange-ment for tensor-based subspace learning,” in Proc. IEEE Int. Conf.Comput. Vis. Pattern Recogn., 2007, pp. 1–8.

[34] S. Yan, D. Xu, B. Zhang, H. Zhang, Q. Yang, and S. Lin, “Graph em-bedding and extensions: A general framework for dimensionality re-duction,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 1, pp.40–51, Jan. 2007.

[35] S. Yan, D. Xu, Q. Yang, L. Zhang, X. Tang, and H. Zhang, “Multi-linear discriminant analysis for face recognition,” IEEE Trans. ImageProcess., vol. 16, no. 1, pp. 212–220, Jan. 2007.

[36] B. Schölkopf, J. Platt, and T. Hofmann, Eds., “Two-dimensional lineardiscriminant analysis,” in Proc. Neural Inf. Proces. Syst., 2004, pp. 1,8.

[37] J. Ye, “Generalized low rank approximations of matrices,” Mach.Learning, vol. 61, no. 1–3, pp. 167–191, 2005.

[38] Z. Zeng, Y. Fu, G. I. Roisman, Z. Wen, Y. Hu, and T. S. Huang, “Spon-taneous emotional facial expression detection,” J. Multimedia, vol. 1,no. 5, pp. 1–8, Aug. 2006.

Dacheng Tao (M’07) received the B.Eng. degreefrom the University of Science and Technology ofChina (USTC), Hefei, China, the M.Phil. degreefrom the Chinese University of Hong Kong (CUHK),and the Ph.D. degree from the University of London(UoL), London, U.K.

Currently, he is a Nanyang Assistant Professorwith the School of Computer Engineering in theNanyang Technological University, a Visiting Pro-fessor in Xi’Dian University, a Guest Professor inWuhan University, and a Visiting Research Fellow at

Birkbeck in Lon. His research is mainly on applying statistics and mathematicsfor data analysis problems in data mining, computer vision, machine learning,multimedia, and visual surveillance. He has published more 90 scientific papersincluding IEEE TPAMI, TKDE, TIP, TMM, TCSVT, TSMC, CVPR, ECCV,ICDM; ACM TKDD, Multimedia, KDD etc., with one best paper runner upaward.

Dr. Tao has gained several Meritorious Awards from the International Inter-disciplinary Contest in Modeling, which is the highest level mathematical mod-eling contest in the world, organized by COMAP. He is an Associate Editorof IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, Neurocom-puting and the official Journal of the International Association for StatisticalComputing -- Computational Statistics & Data Analysis. He has authored/editedsix books and eight special issues, including CVIU, PR, PRL, SP, and Neu-rocomputing. He has (co-)chaired for special sessions, invited sessions, work-shops, and conferences. He has served with more than 50 major internationalconferences including CVPR, ICCV, ECCV, ICDM, KDD, and Multimedia, andmore than 15 top international journals including TPAMI, TKDE, TOIS, TIP,TCSVT, TMM, TIFS, TSMC-B, Computer Vision and Image Understanding(CVIU), and Information Science.

Mingli Song (M’06) received the Ph.D. degreein computer science from Zhejiang University,Zhejiang, China, in 2006.

He is a Researcher with Microsoft Visual Percep-tion Laboratory, Zhejiang University. His researchinterests include face modeling and facial expressionanalysis.

Xuelong Li (M’02–SM’07) holds a permanentpost with Birkbeck College, University of London,London, UK., and is a Visiting Professor with TianjinUniversity, Tianjin, China.

His research focuses on cognitive computing,image/video processing, pattern recognition, andmultimedia. His research activities are partly spon-sored by EPSRC, the British Council, Royal Society,and the Chinese Academy of Sciences.

Dr. Li has over 100 scientific publications (morethan 20 in IEEE TRANSACTIONS) with several Best

Paper Awards and finalists. He is an author/editor of four books, an AssociateEditor of IEEE TRANSACTIONS ON IMAGE PROCESSING, , IEEE TRANSACTIONS

ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, , IEEE TRANSACTIONS

ON SYSTEMS, MAN, AND CYBERNETICS —B: CYBERNETICS, , and IEEETRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—C: APPLICATIONS

AND REVIEWS.. He is also an associate editor (editorial board member) of tenother international journals and a guest co-editor of eight special issues. Hehas served as a chair of around twenty conferences and a program committeemember for more than eighty conferences. He has been a reviewer for over ahundred journals and conferences, including 11 IEEE TRANSACTIONS. He is aacademic committee member of the China Society of Image and Graphics, thechair of IEEE Systems, Man and Cybernetics Society Technical Committee

Page 14: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO ...static.tongtianta.site/paper_pdf/43a5f624-5f6b-11e9-bfe5-00163e08b… · a set of second-order tensors—one modality for identity

1410 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 18, NO. 10, OCTOBER 2008

on Cognitive Computing, and a member of several other technical committeesof IEEE Systems, Man and Cybernetics Society and IEEE Signal ProcessingSociety Technical Committee on Machine Learning for Signal Processing(MLSP). He is a Chapters Coordinator of the IEEE Systems, Man and Cyber-netics Society.

Jialie Shen (M’07) is currently an Assistant Pro-fessor with Singapore Management University,Singapore. His research interests can be summarizedas developing effective and efficient data analysisand retrieval techniques for novel data-intensiveapplications. Particularly, he is currently interestedin various techniques of multimedia data mining,multimedia information retrieval and databasesystems. His research results have been published inmany international peer-reviewed journals. He hasserved as a reviewer for a number of major journals

and conferences such as ICDE, IEEE TMM, IEEE TKDE, and IEEE TPAMI.Mr. Shen is a member of ACM, SIGMOD, and SIGIR.

Jimeng Sun received the B.S. and M.Phil. degrees incomputer science from Hong Kong University of Sci-ence and Technology in 2002 and 2003, respectively,and the M.S. and Ph.D. degree in computer sciencefrom Carnegie Mellon University, Pittsburgh, PA, in2006 and 2007, respectively.

He is currently a Researcher with IBM TJ WatsonLab, Hawthorne, NY. His research interests includedata mining for streams and networks, databases, andservice science. He has published over 20 refereedarticles and one book chapter. He filed four patents

and has given four tutorials.Dr. Sun was the recipient of the Best Research Paper Award at the SIAM Data

Mining Conference (SDM). He has served as a program committee member ofSIGKDD and SDM and a reviewer for TKDE, VLDB, and ICDE.

Xindong Wu (SM’95) received the Ph.D. degreein artificial intelligence from the University ofEdinburgh, Edinburgh, U.K.

He is a Professor and the Chair of the Departmentof Computer Science at the University of Vermont,Burlington, and a Visiting Chair Professor of DataMining with the Department of Computing, HongKong Polytechnic University. His research interestsinclude data mining, knowledge-based systems, andWeb information exploration. He has published over160 refereed papers in these areas in various journals

and conferences as well as 18 books and conference proceedings.

Dr. Wu is the Editor-in-Chief of the IEEE TRANSACTIONS ON KNOWLEDGE

AND DATA ENGINEERING (TKDE), the founder and current Steering CommitteeChair of the IEEE International Conference on Data Mining (ICDM), thefounder and a current Honorary Editor-in-Chief of Knowledge and InformationSystems, the Founding Chair (2002–2006) of the IEEE Computer SocietyTechnical Committee on Intelligent Informatics (TCII), and a Series Editor ofthe Springer Book Series on Advanced Information and Knowledge Processing.He was the recipient of the 2004 ACM SIGKDD Service Award, the 2006IEEE ICDM Outstanding Service Award, and the 2005 Chaired Professor in theCheung Kong (or Yangtze River) Scholars Programme at the Hefei Universityof Technology sponsored by the Ministry of Education of China and the LiKa Shing Foundation. He has been an invited/keynote speaker at numerousinternational conferences.

Christos Faloutsos is a Professor with CarnegieMellon University, Pittsburgh, PA.

He has received the Presidential Young Investi-gator Award by the National Science Foundation(1989), the Research Contributions Award in ICDM2006, 12 “Best Paper” awards, and several teachingawards. He has served as a member of the executivecommittee of SIGKDD; he has published over 170refereed articles, 11 book chapters, and one mono-graph. He holds five patents and he has given over 20tutorials and over 10 invited distinguished lectures.

His research interests include data mining for streams and graphs, fractals,database performance, and indexing for multimedia and bio-informatics data

Stephen J. Maybank (SM’05) received the B.A.degree in mathematics from King’s College, Cam-bridge, U.K., in 1976 and the Ph.D. degree incomputer science from Birkbeck College, Universityof London, London, U.K., in 1988.

He was a Research Scientist with GEC from 1980to 1995, first with MCCS, Frimley, and then, from1989, with the GEC Marconi Hirst Research Centre,London. In 1995, he became a Lecturer with theDepartment of Computer Science, University ofReading, and, in 2004, he became a Professor with

the School of Computer Science and Information Systems, Birkbeck College,University of London. His research interests include camera calibration, visualsurveillance, tracking, filtering, applications of projective geometry to com-puter vision and applications of probability, statistics and information theoryto computer vision. He is the author or coauthor of more than 100 scientificpublications and one book.

Dr. Maybank is a Fellow of the Institute of Mathematics and its Applicationsand a Fellow of the Royal Statistical Society.