Nonparametric Object and Parts Modeling With Lie Group...

Nonparametric Object and Parts Modeling with Lie Group Dynamics

David S. Hayden, Jason Pacheco, John W. Fisher III

Computer Science and Artificial Intelligence Laboratory

Massachusetts Institute of Technology

77 Massachusetts Ave., Cambridge, MA 02139

{dshayden, pachecoj, fisher}@csail.mit.edu

Abstract

Articulated motion analysis often utilizes strong prior

knowledge such as a known or trained parts model for

humans. Yet, the world contains a variety of articulating

objects–mammals, insects, mechanized structures–where

the number and configuration of parts for a particular ob-

ject is unknown in advance. Here, we relax such strong

assumptions via an unsupervised, Bayesian nonparametric

parts model that infers an unknown number of parts with

motions coupled by a body dynamic and parameterized by

SE(D), the Lie group of rigid transformations. We derive

an inference procedure that utilizes short observation se-

quences (image, depth, point cloud or mesh) of an object in

motion without need for markers or learned body models.

Efficient Gibbs decompositions for inference over distribu-

tions on SE(D) demonstrate robust part decompositions of

moving objects under both 3D and 2D observation models.

The inferred representation permits novel analysis, such as

object segmentation by relative part motion, and transfers

to new observations of the same object type.

1. Introduction

The world is full of moving objects comprised of articu-

lating parts. Despite the wide range and complexity of such

objects, humans have a remarkable ability to accurately dis-

cern both the number of articulating parts and their relation

to the whole with few observations. We are interested in de-

veloping reasoning methods and algorithms that mimic this

ability. While one might consider supervised methods that

rely on large amounts of labeled training data about every

conceivable object, articulated motion and view, the task of

data collection seems daunting and unnecessary.

Consequently, we develop a generative model that infers

an object decomposition solely from brief observations of

the object in motion. Specifically, we propose a parts-based

representation that leverages Bayesian nonparametric dy-

namical models while eschewing strong assumptions about

Figure 1. The number, rotation, translation, and shape of an ob-

ject’s parts are learned from a small number of observations of that

object in motion. Motion of the body and parts is parameterized

by the Lie group of rigid transformations in 3D or 2D. Supported

data sources include sequences of meshes / point clouds (A, hu-

man), depth data (B, marmoset), and 2D images (C, hand, spider).

the number or configuration of parts. The model simultane-

ously infers a dynamic body frame, the number of parts, and

their motion relative to the body frame. Diverse inputs are

supported–2D image sequences, 2.5D depth sequences and

3D point cloud or mesh sequences–without need for restric-

tive assumptions about target appearance or the existence of

specially-placed markers or sensors.

Objects and parts are assumed to rotate and translate

smoothly in space, leading to a natural parameterization in

SE(D), the Lie group of rigid transformations. By repre-

senting statistics of motion in the Lie algebra se(D), we

derive closed-form Gibbs updates on translation dynamics,

and an efficient sampler for rotation dynamics.

Contributions. We specify a novel Bayesian nonpara-

metric model that is well-suited to the properties of articu-

lated objects in motion (part persistence, rigid transforma-

tion dynamics, unknown number of parts). We demonstrate

a novel decomposition of inferring translations and rota-

tions in posterior distributions on SE(D) with concentrated

Gaussian [35] priors. We validate our methods on 2D and

3D sequences containing different object types. We show

that the parts in one data sequence transfer to other data

sequences of the same object type (but different instance).

Finally, we present novel analysis of the motion of different

17426

x1

θ1k

x2

θ2k

· · ·

· · ·

xT

θTk

y1n y2n yTn

z1n z2n zTnπ

N1 N2 NT

∞

Figure 2. Simplified graphical model for an unknown number

of time-varying parts {θtk}T,∞

t=1,k=1coupled by shared dynamics

{xt}Tt=1. Observations ytn are generated by part k if ztn = k.

Stick weights {πk}∞

k=1 influence the observation counts for each

part. Priors and {Q,ωk, Sk,Wk, Ek} omitted for clarity.

regions of an object, such as segmenting an object based on

the motion of its parts relative to the body frame.

2. Related Work

This work draws on body/parts models, Bayesian non-

parametric dynamical models and Lie groups. Each con-

tain a rich literature so we highlight only the most relevant

details. Importantly, we are aware of no work that mod-

els body and part motion over time with Lie group dynam-

ics, that is also unsupervised and nonparametric in the parts.

Body and Part Models The many treatments of part-based

modeling begin with the pioneering work on human models

of pictorial structures [10] and cardboard people [18]. Later

work on deformable parts models [9] removes the need to

define object-specific part configurations. Building on the

success of offline analysis, real-time human pose tracking is

now possible as well [27, 13]. All of these methods require

specifying the number of parts. More detailed shape and

pose models have been developed for a variety of objects,

using a combination of known body models, mesh represen-

tations and sophisticated collection schemes including mul-

tiple cameras, IMUs, lasers and/or specially-painted targets

[21, 41, 3, 24].

Unsupervised methods [36, 20, 26, 41, 40] have signif-

icant restrictions such as working for only 2D or only 3D

data, or requiring annotated landmarks or point correspon-

dences. In contrast, our unsupervised method works for 2D

and 3D inputs, requires only a single sensor observing an

object in motion, requires no distinctive or annotated object

markings, and no observation correspondences.

Lie Groups Our work relies on the Lie group SE(D), the

space of rigid transformations, for representing body and

part motion. Lie groups have been used extensively in

robotics and computer vision tasks such as SLAM [6], nav-

igation [19], and parts-based models [5, 14, 17]. Defining

observation models for Lie groups is challenging since the

group is not a vector space. As such, notions of distance

(and therefore distributions) require special care [23, 37],

e.g., simple additive noise models violate the group topol-

ogy. Our approach defines a distribution (Gaussian) in the

tangent plane about an element of the group [35]. Most

works that model dynamics with SE(D) perform inference

with approximate filters or smoothers, commonly the EKF

[4] or UKF [6]. One exception that does full posterior in-

ference is [28], though that work is not a dynamical model.

See [8] for an accessible introduction to Lie groups, and

[16] for a more thorough introduction.

Nonparametric Models Sequential models extending the

well-known Dirichlet process (DP) [1] include the HDP-

HMM [31], sticky HDP-HMM [12], infinite HMM [2], and

infinite factorial HMM (ifHMM) [15]. Each of these permit

an infinite number of states, but are restricted to discrete

labels. Extensions to continuously-varying latent states in-

clude the HDP-SLDS [11], dynamic HDP [25], mixture of

DPs [7], and the evolutionary HDP [38]. While each has the

desirable property of shared global dynamics, none capture

component persistence allowing new atoms at each time in-

stance. This is undesirable for parts modeling as objects do

not tend to acquire and lose parts over time and nonpara-

metric priors already risk creating duplicate parts [12].

Closely related is the infinite factorial dynamical

model [32], a continuous extension of the ifHMM which

only permits shared global binary on/off states, and the

Transformed Dirichlet Process [30], a DP allowing multi-

ple groups of observations to share the same set of atoms

(but with no dynamics). Most relevant, and what we use for

comparison, is the Bayesian nonparametric model of Zhou

et al. [39], a linear dynamical model where parts are inde-

pendently sampled from a Dirichlet process at each time

(but with no part persistence or Lie group representation).

3. Model

Let t = 1, . . . , T index time, k = 1, . . . ,∞ index

parts, and n = 1, . . . , Nt index observations at time t.

Most generally, our nonparametric parts model (Figure 2)

takes as its sole input observations {yt}Tt=1 where the tth

batch yt = {ytn}Nt

n=1 contains Nt observations with un-

known correspondence. There is a global (body) dynamic

with time-varying parameters xt and time-fixed parameter

Q. There are an unknown number of components (parts)

with time-varying parameters θtk and time-fixed parame-

ters {ωk, Sk, Ek,Wk}. Stochastic dynamics models f, g

and stochastic observation model h are, for each t, k, n,

xt ∼f(xt−1, Q) θtk ∼ g(θ(t−1)k, ωk, Sk)

ytn ∼ h(xt, θtztn , ωztn , Eztn)

where ztn = k indicates that observation ytn was generated

by component k. A prior probability of association is given

7427

by the discrete distribution of stick weights π (for α > 0):

ztn ∼ π π ∼ GEM(α) (1)

To specialize for object and parts modeling we must

further specify the domain of random variables

{ytn, xt, θtk, ωk, Sk, Ek,Wk, Q}, the form of priors

{Hx, Hθ, Hω, HS , HE , HW , HQ} and the forms of

{f, g, h}. First, we introduce distributions on Lie groups.

3.1. Lie Groups

A Matrix Lie group G is a continuous group whose ele-

ments can be described by matrices with special structure.

In this work, G = SE(D), the space of rigid transforma-

tions on RD or G = SO(D) the space of D-dimensional

proper rotations. Associated to G is Lie algebra g, which

can be viewed as a local vector space approximation about

the identity element of G. This approximation can be made

with respect to any element in G because group elements

compose via matrix multiplication and each element has an

inverse. For b, µ ∈ G we call the local vector space ap-

proximation about µ the tangent space of µ, denoted TµG.

Mappings to and from the tangent space of µ are accom-

plished via the left-invariant Riemannian logarithm and left-

invariant Riemannian exponential,

Log : G×G → g = Logµb = logG(µ−1b) (2)

Exp : G× g → G = Expµv = µ expG(v) (3)

where v ∈ g is a tangent vector in the tangent space of µ,

and logG, expG are the Lie group logarithm and exponential

maps, which can be computed using the matrix logarithm

and matrix exponential. Note that v is called a tangent vec-

tor even though it is represented as a matrix: this is because

a bijective mapping exists between a matrix and vector rep-

resentation of v. We omit additional notation for brevity.

3.1.1 Distributions on Lie Groups

Constructing a distribution on G to reason over body and

parts models is complicated by the fact that G is not a vec-

tor space. Exploiting the maps between group elements and

their tangent spaces, we can define a distribution with loca-

tion parameter µ ∈ G by mapping its support to the tangent

space of µ. Define the left-invariant concentrated Gaussian

NL(·) in terms of the multivariate Gaussian N(·):

NL(b|µ,Σ) = N(Logµb|0,Σ) (4)

In similar fashion to [29] this can be thought of as a Gaus-

sian in the tangent space about mean µ ∈ G. The covari-

ance Σ exists in the tangent space and can be understood

to operate the same as in the typical Euclidean case except

that vectors in TµG must be mapped back to the group by

Eqn. 3. See Figure 3 for a visualization on SO(2).

µ<latexit sha1_base64="g5K6J6SxKYS4WTidGSLH86noS7M=">AAACRXicbZDLSsNAFIYn9VbjrdWlm8EiuCghqbcsC25cKloVmlIm05M4OJmEmYlQQh7BrT6Sz+BDuBO3Oq1B1Hpghp///Ic584UZZ0q77otVm5tfWFyqL9srq2vrG43m5pVKc0mhR1OeypuQKOBMQE8zzeEmk0CSkMN1eHcy6V/fg1QsFZd6nMEgIbFgEaNEG+siSPJho+U67rTwrPAq0UJVnQ2bVjsYpTRPQGjKiVJ9z830oCBSM8qhtINcQUboHYmhb6QgCahBMd21xLvGGeEoleYIjafuz4mCJEqNk9AkE6Jv1d/exPyv18915A8KJrJcg6BfD0U5xzrFk4/jEZNANR8bQahkZldMb4kkVBs8th2MIDIMp/sUIsuGsSTjspBxWBbOoe+3v69yNptKImKo0gZZGzu+d9SeYCwNX+8vzVlx1XG8fadzftDq+hXpOtpGO2gPeegYddEpOkM9RFGMHtAjerKerVfrzXr/itasamYL/Srr4xOb0K31</latexit>

b<latexit sha1_base64="wltangdFlEPwKkPAAjQ1M4Ty6eA=">AAACQ3icdZDLSsNAFIYnXmu9tbp0M1gEFyUkXtNdwY3LFqwKbSmT6UkcnEzCzEQoIU/gVh/Jh/AZ3IlbwUmN4vXADD//+Q9z5vMTzpR2nEdrZnZufmGxslRdXlldW6/VN85VnEoKPRrzWF76RAFnAnqaaQ6XiQQS+Rwu/OuTon9xA1KxWJzpSQLDiISCBYwSbayuP6o1HNuZFv4iWi3P9VrYLZ0GKqszqlvNwTimaQRCU06U6rtOoocZkZpRDnl1kCpICL0mIfSNFCQCNcymm+Z4xzhjHMTSHKHx1P06kZFIqUnkm2RE9JX62SvMv3r9VAfeMGMiSTUI+v5QkHKsY1x8G4+ZBKr5xAhCJTO7YnpFJKHawKlWB2MIDMHpPplIklEoySTPZOjnmX3oec3PK/+djSURIZRpg6yJbc89ahY8c8P3AyL+X5zv2e6+vdc9aLS9knQFbaFttItcdIza6BR1UA9RBOgW3aF768F6sp6tl/fojFXObKJvZb2+AQJDrTE=</latexit>

c<latexit sha1_base64="PaAUfikLs0dfYBgrdirYKVm71hY=">AAACQ3icdVDLSsNAFJ34tj6rSzeDRXBRQpJWG3eCG5cK1hbaUibTmzg4mYSZiVBCvsCtfpIf4Te4E7eC0xrF54UZDuecy5w5QcqZ0o7zaM3Mzs0vLC4tV1ZW19Y3NqtblyrJJIU2TXgiuwFRwJmAtmaaQzeVQOKAQye4PpnonRuQiiXiQo9TGMQkEixklGhDndPhZs2xnelgxz7wjpquZ4DXbLiNFnZLqYbKORtWrXp/lNAsBqEpJ0r1XCfVg5xIzSiHotLPFKSEXpMIegYKEoMa5NOkBd4zzAiHiTRHaDxlv27kJFZqHAfGGRN9pX5qE/IvrZfp0B/kTKSZBkHfHwozjnWCJ9/GIyaBaj42gFDJTFZMr4gkVJtyKpX+CELT4DRPLtJ0GEkyLnIZBUVuH/h+/fMqfnsTSUQEpdtUVse27x7WTY1OYfr9KBH/Dy49223Y3nmzduyXTS+hHbSL9pGLWugYnaIz1EYUAbpFd+jeerCerGfr5d06Y5U72+jbWK9vAymtMQ==</latexit>

vc<latexit sha1_base64="+D/72OzYOboktcC3P+upsbIlVJ4=">AAACRXicdVDLSsNAFJ34tj6rSzeDRXBRQpK2Nu4ENy4VrQptKZPpTRycTMLMRCghn+BWP8lv8CPciVud1ig+L8xwOOdc5swJUs6UdpxHa2p6ZnZufmGxsrS8srq2Xt04V0kmKXRowhN5GRAFnAnoaKY5XKYSSBxwuAiuD8f6xQ1IxRJxpkcp9GMSCRYySrShTm8GdLBec2xnMtixW95+0/UM8JoNt9HGbinVUDnHg6pV7w0TmsUgNOVEqa7rpLqfE6kZ5VBUepmClNBrEkHXQEFiUP18krXAO4YZ4jCR5giNJ+zXjZzESo3iwDhjoq/UT21M/qV1Mx36/ZyJNNMg6PtDYcaxTvD443jIJFDNRwYQKpnJiukVkYRqU0+l0htCaDqc5MlFmg4iSUZFLqOgyO2W79c/r+K3N5FERFC6TWV1bPvuXt3U6BSm348S8f/g3LPdhu2dNGsHftn0AtpC22gXuaiNDtAROkYdRFGEbtEdurcerCfr2Xp5t05Z5c4m+jbW6xvdjq4a</latexit>

vµ<latexit sha1_base64="dHkjtW8f8VGYt8sryJVRZOSoTw8=">AAACSXicbZDLSsNAFIYnrdd6a3XpZrAILkpIvGZZcONSwdpCW8pkepIOnUzCzKRQQh7CrT6ST+BjuBNXTmoQtR6Y4ec//2HOfH7CmdKO82pVqiura+sbm7Wt7Z3dvXpj/0HFqaTQoTGPZc8nCjgT0NFMc+glEkjkc+j60+ui352BVCwW93qewDAioWABo0QbqzsbZYMozUf1pmM7i8LLwi1FE5V1O2pYrcE4pmkEQlNOlOq7TqKHGZGaUQ55bZAqSAidkhD6RgoSgRpmi31zfGycMQ5iaY7QeOH+nMhIpNQ88k0yInqi/vYK879eP9WBN8yYSFINgn49FKQc6xgXn8djJoFqPjeCUMnMrphOiCRUG0S12mAMgeG42CcTSTIKJZnnmQz9PLMvPK/1feXL2VgSEUKZNsha2Pbcy1aBseDr/qW5LB5ObffMPr07b7a9kvQGOkRH6AS56Aq10Q26RR1E0RQ9oif0bL1Yb9a79fEVrVjlzAH6VZXqJ5aCrus=</latexit>

vb<latexit sha1_base64="4T73uB9ChibxoOckynDtu4SwqgQ=">AAACR3icbZDLSsNAFIYn9VbjXZduBovgooSk3rIsuHGpYKvQhjKZnqRDJ5MwMymUkGdwq4/kI/gU7sSlk1pErQdm+PnPf5gzX5hxprTrvlq1peWV1bX6ur2xubW9s7u331VpLil0aMpT+RASBZwJ6GimOTxkEkgScrgPx1dV/34CUrFU3OlpBkFCYsEiRok2VmcyKMJysNtwHXdW+Ft4f0UDzetmsGc1+8OU5gkITTlRque5mQ4KIjWjHEq7nyvICB2TGHpGCpKACorZtiU+Ns4QR6k0R2g8c39OFCRRapqEJpkQPVJ/e5X5X6+X68gPCiayXIOgXw9FOcc6xdXX8ZBJoJpPjSBUMrMrpiMiCdUGkG33hxAZirN9CpFlg1iSaVnIOCwL59z3m99XuZhNJRExzNMGWRM7vnfRrHhWfBdoLopuy/FOndbtWaPtz0nX0SE6QifIQ5eoja7RDeogihh6RE/o2Xqx3qx36+MrWrPmMwfoV9WsT61Yrfw=</latexit>

NL(µ,Σ)<latexit sha1_base64="PliwDrbDhe3EJL3k1NtJv/Y7GfE=">AAACcXicbZDditQwFMcz9Wsdv2b1SrwJOwgr1tKuX71c8MYLkRWd3YXpUNLMaSdsmpbkVBxCnsKn8VafwufwBUxniuiOBxL+/M//kJNf0UphMI5/joIrV69dv7F3c3zr9p279yb7909N02kOM97IRp8XzIAUCmYoUMJ5q4HVhYSz4uJN3z/7DNqIRn3CdQuLmlVKlIIz9FY+eZYhfEH73uV2q945ZzMJJR5mdRfS7KOoapZpUa3wicsn0ziKN0V3RTKIKRnqJN8fhdmy4V0NCrlkxsyTuMWFZRoFl+DGWWegZfyCVTD3UrEazMJu/uXoY+8sadlofxTSjfv3hGW1Meu68Mma4cpc7vXm/3rzDst0YYVqOwTFtw+VnaTY0B4SXQoNHOXaC8a18LtSvmKacfQox+NsCaXnvdnHqrbNK83WzuqqcDZ6mabhn8vtZhvNVAVD2iMLaZQmr8IeY883uUxzV5weRcnz6OjDi+lxOpDeI4/IATkkCXlNjslbckJmhJOv5Bv5Tn6MfgUPAxocbKPBaJh5QP6p4OlvW928VA==</latexit>

TµG<latexit sha1_base64="UgJEOVPZwgghoz9hqbB+nMtkuLA=">AAACTHicbZDLSgMxFIYz9VbrrerSTbAILsow43WWggtdVrAqtqVk0jNjMJMZkoxQwryFW30k976HOxHM1EHUeiDh5z//ISdfmHGmtOe9OrWZ2bn5hfpiY2l5ZXWtub5xpdJcUujSlKfyJiQKOBPQ1UxzuMkkkCTkcB3en5b96weQiqXiUo8zGCQkFixilGhr3V4OTT/JC3NWDJstz/UmhaeFX4kWqqozXHfa/VFK8wSEppwo1fO9TA8MkZpRDkWjnyvICL0nMfSsFCQBNTCTlQu8Y50RjlJpj9B44v6cMCRRapyENpkQfaf+9krzv14v11EwMExkuQZBvx6Kco51isv/4xGTQDUfW0GoZHZXTO+IJFRbSo1GfwSRRTnZx4gsG8aSjAsj47Aw7mEQtL+vYjqbSiJiqNIWWRu7gX/ULjGWfP2/NKfF1Z7r77t7Fwetk6AiXUdbaBvtIh8doxN0jjqoiygS6BE9oWfnxXlz3p2Pr2jNqWY20a+qzX8CJQWwJg==</latexit>

Logµc = vc

<latexit sha1_base64="PXZo0M5eFbz32sF5wxb/PJYt4kM=">AAACYXicdZDLatwwFIY17i1xL5kky1AQHQpdDMb2TBp3UQhk00UXKXSSwHgwsubYEZElI8mhg9CqT9Nt+zRZ50Wqmbil1wMSP/9/Djr6ypYzbeL4ZhDcu//g4aOt7fDxk6fPdoa7e2dadorCjEou1UVJNHAmYGaY4XDRKiBNyeG8vDpZ5+fXoDST4qNZtbBoSC1YxSgx3iqGz3MDn4x9L2tX2LzpnA0xpqHDb/F1QYvhKI7iTeE4OkzfTJPUi3Q6SSZHOOmjEerrtNgdjPOlpF0DwlBOtJ4ncWsWlijDKAcX5p2GltArUsPcS0Ea0Au7+YfDL72zxJVU/giDN+6vE5Y0Wq+a0nc2xFzqP7O1+a9s3pkqW1gm2s6AoHcPVR3HRuI1FLxkCqjhKy8IVczviuklUYQajy4M8yVUnu9mHyvatqgVWTmr6tLZ6DDLxj8v93evVETU0Hd7ZGMcZcnrsccYO8/3B0T8f3GWRskkSj9MR8dZT3oLHaAX6BVK0BE6Ru/QKZohij6jL+gr+ja4DbaDYbB31xoM+pl99FsFB98BMES0Tw==</latexit>

G = SO(2)<latexit sha1_base64="zTM3QZLSgZIXJLXjDS79oKDT7zU=">AAACVHicdZBdSxtBFIZn149qWjXRS28GQ8FCWHbj194IQi/qnRGNCkkIs5Oz6+Ds7DJzNjQs+0962/4kof+lF53EWKrRAzO8vOc9nMMT5VIY9P3fjru0vLL6YW299vHTxuZWvbF9Y7JCc+jyTGb6LmIGpFDQRYES7nINLI0k3EYPX6f92zFoIzJ1jZMcBilLlIgFZ2itYb3+jZ7SPsJ3LK8uqv32l2G96XvH/kl4eEAXReD5s2qSeXWGDafVH2W8SEEhl8yYXuDnOCiZRsElVLV+YSBn/IEl0LNSsRTMoJydXtHP1hnRONP2KaQz9/+JkqXGTNLIJlOG9+Z1b2q+1esVGIeDUqi8QFD8aVFcSIoZnXKgI6GBo5xYwbgW9lbK75lmHC2tWq0/gtgind1TqjwfJppNqlInUVV6R2HY+vdVi9lMM5XAPG2RtagXBscti9GvLN9niPR9cdP2ggOvfXnYPAvnpNfILtkj+yQgJ+SMnJMO6RJOxuQH+Ul+OY/OH3fJXXmKus58Zoe8KHfzL4a/r8M=</latexit>

Expµvb = b<latexit sha1_base64="hZL+K7h+yxGaDA+More5o8aJ3FU=">AAACW3icbZDNatwwFIU1Tn9SN20mKaWLbkSHQheDsdM/bwKBUugyhU4SGA9G1lw7IrIspOuQQehpsk0fqIu+SzWTIbSZXpA4nHsuuvoqLYXFNP01iLYePHz0ePtJ/HTn2fPd4d7+ie16w2HCO9mZs4pZkELBBAVKONMGWFtJOK0uviz7p5dgrOjUD1xomLWsUaIWnGGwyuGrAuEK3dcr7UtXtL13l2XlD6tyOEqTdFX0TmT3xYis67jcG4yLecf7FhRyyaydZqnGmWMGBZfg46K3oBm/YA1Mg1SsBTtzqx94+jY4c1p3JhyFdOX+PeFYa+2irUKyZXhu7/eW5v960x7rfOaE0j2C4rcP1b2k2NElDjoXBjjKRRCMGxF2pfycGcYxQIvjYg51ILvaxymty8awhXemqbxLPub5+O7ym9nOMNXAOh2QjWmSZ5/GS54+8N2guSlODpLsfXLw/cPoKF+T3iavyRvyjmTkMzki38gxmRBOPLkmN+Tn4He0FcXRzm00GqxnXpB/Knr5B85As14=</latexit>

Figure 3. Location and scale distributions such as Gaussians can

be locally defined about element µ in Lie group G by mapping

their support into the local vector space approximation TµG.

3.1.2 SE(D) Actions

Represent b, c ∈ G = SE(D) as real block matrices,

b =

(

Rb db0 1

)

c =

(

Rc dc0 1

)

(5)

The R∗ are D×D rotation matrices (with determinant +1)

in SO(D) and the d∗ are translations in RD. Henceforth,

we use this notation to represent the rotation and translation

components of any element in SE(D); for example, if xt ∈SE(D) then it has rotation Rxt

and translation dxt.

Elements of SE(D) compose via matrix multiplication

(maintaining group closure), and act as a change of basis

for homogeneous coordinates p of point p ∈ RD:

bcp =

(

Rb (Rcp+ dc) + db1

)

(6)

If p are coordinates in frame c then cp can be interpreted

as its coordinates in frame b and bcp can be interpreted as

its coordinates in the standard (or world) basis. In general,

changes of bases are best viewed as composing from left to

right, but acting on points from right to left.

3.2. Body and Parts Model

Let G = SE(D) for dimension D ∈ {2, 3}. We seek

to infer a parts decomposition of an articulating object by

directly observing it in motion. Specifically, we model the

inputs ytn ∈ RD as being random collections of points sam-

pled within the object as it moves across time. Variable (in-

cluding no) observations are supported at each time, and no

correspondence between observations is assumed. Diverse

inputs are supported, including foreground pixels of 2D im-

age sequences, unprojected points from depth sequences,

and 3D point clouds sampled within mesh sequences.

We assume part persistence–an object does not gain or

lose parts over time. We also assume that parts move

smoothly through space but remain close (in an L2 sense)

to a common body which also moves smoothly. The re-

lation between body and part motion could be modeled in

7428

Wxtωkθtk<latexit sha1_base64="P7VyKS6Q/AwGx1jgDfqjNubk3Uo=">AAACh3icdVHLbtswEKTVRxL35bTHXogaBXowVClNHB1T5JJjWtRxAMsQKHolE6YogVwFMQh+UP+if9Br8zehHy2StF2AxGBmyF3M5o0UBqPophM8evzk6c7uXvfZ8xcvX/X2X1+YutUcRryWtb7MmQEpFIxQoITLRgOrcgnjfHG60sdXoI2o1TdcNjCtWKlEIThDT2W90xThGtf/WNU0WanZ0tmxu0PnkvGFs9cZOpvWFZQsW9AU54Ass7hwLuv14zBaF43CJD4aJkMPtsxvqU+2dZ7tdwbprOZtBQq5ZMZM4qjBqWUaBZfgumlroPFdWQkTDxWrwEztehxH33tmRota+6OQrtm7LyyrjFlWuXdWDOfmobYi/6VNWiySqRWqaREU3zQqWkmxpqvo6Exo4CiXHjCuhZ+V8jnTjKMPuNtNZ1D4LTzMUpe5s+FRkgz+XO5vb62ZKmHr9pENqA9yOFjFeC/f/4OLgzD+FB58OeyffP2xSXqXvCXvyAcSk2NyQs7IORkRTr6Tn+QXuQn2go/BMEg21qCz3c4bcq+Cz7fHmce4</latexit>

xtωkθtk<latexit sha1_base64="oM3e5pHITKZ57/02PVUBvSvHT7M=">AAACtXiclVHLbtQwFPUEaEt4TWHJxmJUicUoSgqdZlmpG5YFMZ1KkyhyHCdjjWMH+wZ1ZPnf+IuKj0HC8wC1BRZcydbROcf3WueWneAG4vj7IHjw8NHe/sHj8MnTZ89fDA9fXhrVa8qmVAmlr0pimOCSTYGDYFedZqQtBZuVy/O1PvvKtOFKfoZVx/KWNJLXnBLwVDH8kgG7hk0fWwpCl85eF+BsplrWkGKJM1gwIIWFpXPh0S237Lqi0WTl7Mz9T5NiOEqieFM4jtLkZJJOPNgxv6QR2tVFcTgYZ5WifcskUEGMmSdxB7klGjgVzIVZb1jnp5KGzT2UpGUmt5vvOHzkmQrXSvsjAW/Y2y8saY1ZtaV3tgQW5r62Jv+mzXuo09xy2fXAJN0OqnuBQeF10LjimlEQKw8I1dz/FdMF0YSCX0cYZhWr/c7uZ6mb0tnoJE3Hvy/3p1dpIhu2c/vIxtgHORmvY7yT77/B5XGUvIuOP74fnX36tk36AL1Gb9BblKBTdIY+oAs0RRTdoB+DvcF+cBrkQRXUW2sw2G3nFbpTgfoJo87aHA==</latexit>ωkθtk

<latexit sha1_base64="KyOahVT1hVY3OzBTWo4RMkMaKmM=">AAACzXiclVFdi9QwFE3r11o/dnbFJ0GLw4IPQ2lXd7aPC7745irOzsJ0KGnmthMmTUtyKzvE+Cr+K/+Gv8K/YOZDcVcFvZBwOPfk3nBO0QquMY6/ev616zdu3tq5Hdy5e+/+bm9v/0w3nWIwYo1o1HlBNQguYYQcBZy3CmhdCBgXi5er/vg9KM0b+Q6XLUxrWkleckbRUXnvs8maGiqaL8IM54A0N7iwNjjIEC5wPd8UgrKFNRc52n9Qy7bNK0WX1ozt/wzJe/0kitcVxlGaHA3ToQNb5kerT7Z1mu95g2zWsK4GiUxQrSdJ3OLUUIWcCbBB1mlo3VZawcRBSWvQU7P+jg0PHDMLy0a5IzFcs7++MLTWelkXTllTnOurvRX5p96kwzKdGi7bDkGyzaKyEyE24SqAcMYVMBRLByhT3P01ZHOqKEMXUxBkMyhdlle9VFVhTXSUpoOfl/1d2ygqK9iqnWWD0Bk5HKxsvOTv38HZYZQ8jw7fvOifvP2ycXqHPCJPyTOSkGNyQl6RUzIijHzzHnqPvSf+a7/zP/gfN1Lf26bzgFwq/9N3kwrjKw==</latexit>

xt<latexit sha1_base64="k/affCTIbqW8+NexCv7gMMzxpcE=">AAAC5XiclVHLbtQwFHXCoyW8pmXJxmJUicUQJS2dZlmJDcuCmE6lyShyPDcZaxwnsm9QR1Y+gV3FFvEn/AZ/g+cBoqVIcCVbR+ce32udkzdSGIyi755/5+69+zu7D4KHjx4/edrb2z83das5jHgta32RMwNSKBihQAkXjQZW5RLG+eLNqj/+CNqIWn3AZQPTipVKFIIzdFTW+5oiXOJ6js0l44vOXmbYBQc2rSsoWbagKc4BWWZx0Tn+dv0/qFXTZKVmy86Ou/8ZkvX6cRiti0ZhEh8Pk6EDW+Znq0+2dZbteYN0VvO2AoVcMmMmcdTg1DKNgkvogrQ10LitrISJg4pVYKZ2/Z2OHjhmRotau6OQrtnfX1hWGbOscqesGM7Nzd6KvK03abFIplaopkVQfLOoaCXFmq6CoTOhgaNcOsC4Fu6vlM+ZZhxdfEGQzqBwGd/0Upd5Z8PjJBn8uro/tbVmqoSt2lk2oM7I4WBl4zV//w7OD8P4KDx897p/+v7bxuld8py8IC9JTE7IKXlLzsiIcG/He+UNvRO/9D/5V/7njdT3tuk8I9fK//IDsQvsww==</latexit>

Wxt<latexit sha1_base64="xYcjBSsdkTcoJ/MwqHPvTRCAIbQ=">AAADFniclVJNj9MwEHXC1xK+unDkYlFV4lBFyS5bclyJC8cF0e1KTYkcd5JadT5kT9BWVv4HvwZxQVwRN/4NTltQdykSjGRr9N6bGeuN01oKjUHww3Fv3Lx1+87BXe/e/QcPH/UOH5/rqlEcxrySlbpImQYpShijQAkXtQJWpBIm6fJVx08+gNKiKt/hqoZZwfJSZIIztFDS+z6IES5x3cikkvFlay4TbL0duKzrJFds1ZpJa/bLW29g4qqAnCVLGuMCkCUGlx2+v+Af1Ltj/6dJ0uuHfrAOGvhReDKKRjbZIr+oPtnGWXLoDON5xZsCSuSSaT0NgxpnhikUXIL1otFQ26ksh6lNS1aAnpn1c1o6sMicZpWyp0S6RncrDCu0XhWpVRYMF/o614H7uGmDWTQzoqwbhJJvBmWNpFjRbpN0LhRwlCubMK6EfSvlC6YYR7tvz4vnkNlPcd1Llaet8U+iaPj7av/UVoqVOWzV1rIhtUaOhp2NV/z9e3J+5IfH/tGbF/3Tt583Th+Qp+QZeU5C8pKcktfkjIwJdyLnvZM7C/ej+8n94n7dSF1nu50n5Eq4334CT/4DQg==</latexit> W

<latexit sha1_base64="DRievMG95UXWxuwQyh/DLg9Fx+o=">AAADLXiclVLLbtQwFHXCq4TXFJZsLEaVWIyipKXTLCuxYUdBTKfSZBQ5HidjjeNE9g3qyHL5JH6Bn2CBhNgi/gLPg2qmtBJcydbVuec+dO7NG8E1RNE3z791+87dezv3gwcPHz1+0tl9eqrrVlE2oLWo1VlONBNcsgFwEOysUYxUuWDDfPZ6ER9+ZErzWn6AecPGFSklLzgl4KCs8ysFdg7LOkY2TVYqMrfDYG8DzgWhM2vOM7Bb+CXdDK25nu8STFpXrCTZDKcwZUAyAzNrb2rwD+zNtv9TJOt04zBaGo7CJD7sJ33nrJE/oS5a20m26/XSSU3bikmggmg9iqMGxoYo4FQwG6StZo3rSko2cq4kFdNjsxzH4j2HTHBRK/ck4CW6mWFIpfW8yh2zIjDVV2ML8LrYqIUiGRsumxaYpKtGRSsw1HixYDzhilEQc+cQqribFdMpUYSCO4MgSCescLdyVUtV5taEh0nSu/zs39xaEVmyNdtJ1sNOyH5vIeOWvjc7p/thfBDuv3vVPX7/ZaX0DnqOXqCXKEZH6Bi9QSdogKj31mu9C++T/9n/6n/3f6yovrfezjO0Zf7P38/XDUY=</latexit>

Part k<latexit sha1_base64="DKbLaZRT3q4m0C+jgDah9SfVwLM=">AAADN3iclVJLj9MwEHbCawmvLhy5WHRX4lBFSdGWHFfiwrEgul3UVJHjOqkVx4nsCdrKirjCT+LG3+DEDXFF+wfWfYC2S1eCkWyNvvnmoW8mrQXXEATfHPfGzVu37+zd9e7df/DwUWf/8YmuGkXZiFaiUqcp0UxwyUbAQbDTWjFSpoKN0+LVMj7+wJTmlXwHi5pNS5JLnnFKwEJJ53xIFOCD4sA7jIGdwaqikXWd5Ios2vEWnApCi9acJdDupptxa3bzbYKJq5LlJClwDHMGJDFQtO11Df6Bfbnt/xRJOt3QD1aGAz8KjwbRwDob5HeoizY2TPadXjyraFMyCVQQrSdhUMPUWOk4Faz14kaz2nYlOZtYV5KS6alZjdPiQ4vMcFYp+yTgFXo5w5BS60WZWmZJYK6vxpbgrtikgSyaGi7rBpik60ZZIzBUeLlqPOOKURAL6xCquJ0V0zlRhII9CM+LZyyzV3NVS5WnrfGPoqj352v/5laKyJxt2FayHrZCDnpLGbf0vd456fvhC7//pt89fvt1rfQeeoqeoecoRC/RMXqNhmiEqPPe+eh8cj67X9zv7g/355rqOpvtPEFb5v66AJ/dEEE=</latexit>

Body<latexit sha1_base64="1Nj+wmoqP7zdHc7WXndy5Qgn9qw=">AAADPXiclVJNj9MwEHXC1xK+unDkYtGtxKGKkq5ge1zBhWNBdLtSU0WO46RREjuyJ6uNrBy5wk+Cv8EP4Ia4Im44bYV22yLBSLae3rzxWG8mqopMged9tewbN2/dvnNw17l3/8HDR73Dx2dK1JKyKRWFkOcRUazIOJtCBgU7ryQjZVSwWZS/7vKzCyZVJvh7aCq2KEnKsySjBAwV9n69EnHjDCZEAj7Kj5xBAOwSVu9qXlVhKknTzq7RUUFo3urLENr9cj1r9X69KdCBKFlKwhwHsGRAQg152/6twT+or7b9n0fCXt9zvVXgXeBvQB9tYhIeWsMgFrQuGQdaEKXmvlfBQhvrMlqw1glqxSrTlaRsbiAnJVMLvfpOiweGiXEipDkc8Iq9WqFJqVRTRkZZEliq7VxH7svNa0jGC53xqgbG6bpRUhcYBO4GjuNMMgpFYwChMjN/xXRJJKFg1sJxgpglZne2vZRp1Gr3xXg8/HO1u1ohCU/ZRm0sG2J37L8cdjZ2/vrbbu6Cs5HrH7ujt6P+6bsva6cP0FP0DD1HPjpBp+gNmqApolZofbA+Wp/sz/Y3+7v9Yy21rc10nqBrYf/8DTj5Ehc=</latexit>

World<latexit sha1_base64="zaqxH9lwIJ2dCyWyXQ8W5teNUbE=">AAADW3iclVJNj9MwEHUaPpawwC4ILlwsupU4VFFSBPS4ggvHguh2paaKHMdJozhxZE/QRpZ/DVf4D/wNDvwX3A+h3bZIMJKtpzdvZqznSRpeKAiCn07PvXX7zt2je9794wcPH52cPr5QopWUTangQl4mRDFe1GwKBXB22UhGqoSzWVK+X+VnX5hUhag/Q9ewRUXyusgKSsBS8anzLAJ2BetGum6aOJekM3omJE+NN3gn0s4bTIgEfFaeeYND4tkNOuGElkZfxWAOy/XM6MN6W6AjUbGcxCWOYMmAxBpKY/424B/U18f+T5P4pB/4wTrwPgi3oI+2MbFODqNU0LZiNVBOlJqHQQMLba0rKGfGi1rFGjuV5GxuYU0qphZ6/RyDB5ZJcSakPTXgNXu9QpNKqa5KrLIisFS7uRV5KDdvIRsvdFE3LbCabgZlLccg8GodcFpIRoF3FhAqC/tWTJdEEgp2aTwvSllmN2vXS5knRvuvx+Phn8vsa4Ukdc62amvZEPvj8M1wZePK33DXzX1wMfLDV/7o46h//unHxukj9By9QC9RiN6ic/QBTdAUUcc4X51vzvfeL9d1Pfd4I+052995gm6E+/Q3p9oaDw==</latexit>

Figure 4. The frames that comprise an object at time t. Per-time body frames xt are rigid transformations from world frame W . Each part

k contains a time-fixed canonical part frame ωk and a per-time part frame θtk. ωk are a rigid transformation from body frame xt while

θtk are a rigid transformation from ωk. Using stabilized random walk dynamics, each per-time part frame θtk is designed to transform

smoothly over time but remain near the origin of their respective canonical part frame ωk.

many ways: one naive extreme would be to model them

as floating bodies with linear dynamics while the other ex-

treme would be to model them as existing in a skeletal net-

work of joints. Linear dynamics would fail to capture part

articulation while skeletal networks are overly restrictive.

We take a middle ground: parts {θtk, ωk, Sk, Ek,Wk}are modeled as floating bodies that rotate and translate

smoothly through space about a body frame xt ∈ G, but

whose origins tend to remain near the origin of a canonical

part frame ωk ∈ G through stabilized random walk dynam-

ics. Canonical part frames are close to the body frame and

remain fixed across time but parts also have per-time frames

θtk ∈ G. Parts are not fixed in their spatial extent; instead,

they have a probabilistic, ellipsoidal shape model governed

by Gaussian covariance Ek. Part dynamics are governed by

covariance Sk and body dynamics are governed by covari-

ance Q. The dispersion of canonical part frames about the

body frame is governed by covariance Wk. Figure 4 graph-

ically depicts how body and part frames compose.

3.2.1 Body and Part Dynamics

Body frames xt and parts evolve independently, but are im-

plicitly coupled through the observation model. In particu-

lar, the body frame stochastic dynamics model is:

xt ∼ NL(xt | xt−1, Q) (7)

Object dynamics are a non-linear random walk on G whose

noise covariance Q exists in the tangent space about the

body frame at the previous time. Canonical part frames ωk

are dispersed about the body frame with covariance Wk,

ωk ∼ Hω = NL(· | I,Wk) (8)

where I ∈ G is the identity element (no translation or rota-

tion) and covariance Wk can be thought to (implicitly) exist

in the tangent space of xt. Each part has per-time dynamics

θtk with driving noise covariance Sk governed by:

θtk =

(

ExpRθ(t−1)kφtk A dθ(t−1)k

+B mtk

0 1

)

(9)

with constants A = diag (√a, . . . ,

√a), B =

diag(√

1− a, . . . ,√1− a

)

and Exp in Eqn. 9 the Rieman-

nian exponential for SO(D). φtk ∈ so(D) is a vector in the

tangent space of Rθ(t−1)k. Part translation driving noise mtk

and rotation driving noise φtk are jointly distributed:

(mtk, φtk) ∼ N (0, Sk) (10)

As proven in the supplemental, carefully chosen coefficients

of matrices A,B (a = 0.95) cause the asymptotic covari-

ance of the part translation dθtk to equal the covariance of

translation driving noise mtk. This form enables parts to

transform smoothly, but never too far from their canonical

location, and mitigated part confusion during inference.

All driving noise covariances are drawn from Inverse-

Wishart distributions, where we note that our model sup-

ports arbitrary correlations between translation and rotation

for object, canonical part, and part transformations:

Q ∼ HQ = IW(·|vQ0 ,ΛQ0) (11)

Sk ∼ HS = IW(·|vS0 ,ΛS0) (12)

Wk ∼ HW = IW(·|vW0 ,ΛW0) (13)

And initial body and part frames are drawn according to:

x1 ∼ Hx = NL(·|x0,Σx) (14)

θ1k ∼ Hθ = NL(·|θ0,Σθ) (15)

7429

3.2.2 Observation Models for 3D and 2D data

Input ytn is assumed to be in world coordinate system W ,

which is assumed to be aligned with the sensor’s coordi-

nate system (hence, W has no rotation or translation and is

henceforth omitted). Parts generate observations in their re-

spective part coordinate systems and are mapped to world

coordinates via θtk, ωk and the body frame xt. That is, part

k generates point etn ∼ N(0, Ek) which is then mapped

to world coordinates ytn = xtωkθtketn if ztn = k (where

(·) is a homogeneous projection of (·)). The transformation

is linear in etn allowing straightforward mean and variance

computations of the homogeneous point in world coordinate

ytn, yielding the following observation model (for ztn = k)

ytn ∼ N(ytn|xtωkθtk0R, xtωkθtkEkθ⊤

tkω⊤

k x⊤

t ) (16)

where 0R is the homogeneous zero vector in RD and Ek is a

degenerate block covariance matrix Ek with a zero row and

column (a covariance in homogeneous coordinates). With-

out homogeneous coordinates, this is (via Eqn. 5):

ytn ∼ N (ytn | µtk,Σtk) (17)

µtk = RxtRωk

(dθtk + dωk) (18)

Σtk = RxtRωk

RθtkEkR⊤

θtkR⊤

ωkR⊤

xt(19)

While simple, it accommodates image plane observations

in 2D, depth observations in 2.5D and XYZ observations

in 3D. Incorporating additional terms (e.g., appearance) is

straightforward, but were not needed for our purposes. As

with most generative models, robustness to missing data

(common for depth sensors) is handled seamlessly.

The observation covariance Σtk for ytn is some rotation

of Ek for ztn = k due to the composition of body and part

frames. Consequently, Ek is constrained to be diagonal (i.e.

axis-aligned) so as to avoid ambiguity. While the use of Ek

implies a probabilistic, ellipsoid part shape model, its pri-

mary function is to yield robust associations ztn of observa-

tions to parts. Here, we use the following prior:

Ek ∼ HE = IW(·|vE0,ΛE0

) (20)

4. Inference

We wish to sample from the posterior, which has likeli-

hood proportional to the product of Equations (1, 7, 8, 10,

11, 12, 13, 14, 15, 17, 20) over all t, k, n. This is accom-

plished with Markov Chain Monte Carlo (MCMC) infer-

ence that exploits Gaussian statistics in the tangent space

for efficient updates while simultaneously respecting the ge-

ometry of the Lie group via Eqns. 2 and 4. This is accom-

plished by sampling from the full conditional distributions

of each latent variable, grouped in order of discussion,

(xt, θtk, ωk) ztn (π,Ek, Sk, Q) (21)

xt

xt−1

x−1t−1xt

G = SE(3)<latexit sha1_base64="zWzBwy2qvs0nsiW3tmn9a/Tlg+U=">AAACVHicbZBdSxtBFIZn1/qV+hHtpTdDQ0EhLLux1dwIUin10lIThSSE2cnZzZDZ2WXmrBiW/Sfe6k8S/C+96CQu0iYemOHlPe/hHJ4wk8Kg77847sqH1bX1jc3ax63tnd363n7XpLnm0OGpTPVtyAxIoaCDAiXcZhpYEkq4CScXs/7NHWgjUnWN0wwGCYuViARnaK1hvf6TntE+wj0Wv3+Uh8dHw3rD9/x50WURVKJBqroa7jnN/ijleQIKuWTG9AI/w0HBNAouoaz1cwMZ4xMWQ89KxRIwg2J+ekm/WGdEo1Tbp5DO3X8nCpYYM01Cm0wYjs1ib2a+1+vlGLUHhVBZjqD466IolxRTOuNAR0IDRzm1gnEt7K2Uj5lmHC2tWq0/gsgind9TqCwbxppNy0LHYVl439rt5ttXLmdTzVQMVdoia1KvHZw0ZxhLyzdYpLksui0vOPZav742zr9XpDfIAflMDklATsk5uSRXpEM4uSMP5JE8Oc/OH3fFXX2Nuk4184n8V+7OX+FDr3A=</latexit>

xt<latexit sha1_base64="QhKa348BPxP/haUfM0RDFiLhFZA=">AAADYHiclVLLbhMxFPUkPMLwaAM7urFII7GIokwLbZYVbFgGRJpKmWjk8TiT0XjskX0HdWR5w9ewhS/gN9jyJTgPoTZJJbiSraNzz73XOr5xyTMNg8Evr9G8d//Bw9Yj//GTp88ODtvPL7WsFGVjKrlUVzHRjGeCjSEDzq5KxUgRczaJ8/fL/OQLUzqT4jPUJZsVJBXZPKMEHBW1vaPrCPxuCOwaVt2MKMsoVaS2ZiIVT6zffSeT2u+OiAJ8nB/vF09u0TEnNLfGtbZ39bZmv94VmFAWLCVRjkNYMCCRgdzauwb8g/rm2P9pEh12Bv3BKvAuCDaggzYxcnb2wkTSqmACKCdaT4NBCTPjrMsoZ9YPK81KN5WkbOqgIAXTM7N6jsVdxyR4LpU7AvCKvVlhSKF1XcROWRBY6O3cktyXm1YwH85MJsoKmKDrQfOKY5B4uRM4yRSjwGsHCFWZeyumC6IIBbc5vh8mbO7Wa9tLlcbW9N8Oh72/l93VSkVEyjZqZ1kP94fBWW9p49LfYNvNXXB50g9O+ycf33QuPv1cO91CR+gVeo0CdI4u0Ac0QmNEva/eN++796Pxu9lqHjTba2nD2/zOC3Qrmi//AMs8G70=</latexit>

xt−1<latexit sha1_base64="17/U8n0uP/fIzkAu0s9jdXAE8mM=">AAADZHiclVJNj9MwEHVaPpawwC4rTkhg0a3EoUTJskCPK7hwLIhuV2qqyHGcNIoTR/Zk1cjKlV/DFc78Df4AvwOnrdBu25VgJFtPb55nRs8TljxV4Lq/rE731u07d/fu2ff3Hzx8dHD4+FyJSlI2poILeRESxXhasDGkwNlFKRnJQ84mYfahzU8umVSpKL5AXbJZTpIijVNKwFDBofV8EWh45TV23we2gGVFXZRlkEhSN3oiJI9M8r2Iars/IhLwcXa8Wzy5Roec0KzRiwBurN3o3XrzQPsiZwkJMuzDnAExQ2ZNc1ODf1Bfbfs/RYKDnuu4y8DbwFuDHlrHyFg68CNBq5wVQDlRauq5Jcy0sS6lnDW2XylWmq4kYVMDC5IzNdPLcRrcN0yEYyHNKQAv2asvNMmVqvPQKHMCc7WZa8lduWkF8XCm06KsgBV01SiuOAaB273AUSoZBV4bQKhMzayYzokkFMz22LYfsdis2KaXMgkb7bwZDgd/r2ZbKyQpErZWG8sG2Bl6bwetja2/3qab2+D8xPFeOyefTntnn3+unN5DT9EL9BJ56B06Qx/RCI0Rtb5a36zv1o/O7+5+96j7ZCXtWOvfOULXovvsD2MMHTs=</latexit>

Txt−1G

<latexit sha1_base64="bPVk2WaGf1Usa8+HqTBa6yeqXPo=">AAACUHicdZDLbhMxFIbPhFsZLm1hycYiQmIRRnabppNdBQtYFqlpKyVh5HHOTK16PCPbg4iseQ+28EjseBN24KQp4nok27/+8x/56MsbJa2j9GvUu3Hz1u07W3fje/cfPNze2X10auvWCJyIWtXmPOcWldQ4cdIpPG8M8ipXeJZfvlr1z96jsbLWJ27Z4LzipZaFFNwF691J5j9k3r1gXedfd9lOnyajw+EBo4Qmw3FK98dBMMrGI0ZYQtfVh00dZ7vRYLaoRVuhdkJxa6eMNm7uuXFSKOziWWux4eKSlzgNUvMK7dyv1+7Is+AsSFGbcLQja/fXCc8ra5dVHpIVdxf2z97K/Fdv2roinXupm9ahFlcfFa0iriYrBmQhDQqnlkFwYWTYlYgLbrhwgVQczxZYBJzrfbxumqw0fNl5U+adTw7SdPDz6v7O1obrEjfpgGxAkpSNBgEjXfG9hkj+L073Eraf7L0d9o9ebkhvwRN4Cs+BwSEcwRs4hgkIMPARPsHn6Ev0Lfrei66i1y88ht+qF/8AzB+xBg==</latexit>

Logxt−1

xt−1<latexit sha1_base64="GrO6wag0xrxrpzOgCbL58hsCDdk=">AAACYXicbZDNTtwwFIU9oS0w/WGAJapkdYTEYhol/M4StRsWXVCpA0gzo8jx3AQLx47sG8TIyoqn6bZ9GtZ9EZwhQi3TK9k6Ovdc+fpLSyksRtFDJ1h59frN6tp69+279x82eptbF1ZXhsOIa6nNVcosSKFghAIlXJUGWJFKuExvvjb9y1swVmj1A+clTAuWK5EJztBbSe/jBOEO3Ted14m7Sxx+juv6WSS9fhRGi6LLIm5Fn7R1nmx2BpOZ5lUBCrlk1o7jqMSpYwYFl1B3J5WFkvEblsPYS8UKsFO3+EdNd70zo5k2/iikC/fvCccKa+dF6pMFw2v7steY/+uNK8yGUydUWSEo/vRQVkmKmjZQ6EwY4CjnXjBuhN+V8mtmGEePrtudzCDzfBf7OFWWSW7YvHYmT2sXHg2Hg+erXs5qw1QObdojG9BwGB8PGowN3/glzWVxsR/GB+H+98P+6ZeW9BrZIZ/IHonJCTklZ+ScjAgn9+Qn+UV+d/4E60Ev2HqKBp12Zpv8U8HOI+0ftcY=</latexit>

Logxt−1

xt<latexit sha1_base64="2nJrJ/QF8g96tP0eoyPZBcSYUnU=">AAACXXicdZDPbhMxEMadLaUllDaFA6q4WERIHMJqt3/3WMGFQw+tRNpKSbTyOrNbq157Zc+iRpbVp+EKz8OJV8FJAyoURrL16Ztv5PGvaKSwmCTfO9HKo9XHa+tPuk83nm1u9bafn1vdGg5DrqU2lwWzIIWCIQqUcNkYYHUh4aK4/jDvX3wGY4VWn3DWwKRmlRKl4AyDlfd2xgg36E505XN3kzt8l3ofBPq810/iw+Qo29+jD0UaJ4vqk2Wd5tudwXiqeVuDQi6ZtaM0aXDimEHBJfjuuLXQMH7NKhgFqVgNduIWf/D0TXCmtNQmHIV04d6fcKy2dlYXIVkzvLJ/9+bmv3qjFsts4oRqWgTF7x4qW0lR0zkQOhUGOMpZEIwbEXal/IoZxjFg63bHUygD28U+TjVNXhk2885UhXfxQZYNfl/+YVYbpipYpgOyAY2z9HAQMCZzvr8g0v+L89043Yt3z/b7x++XpNfJK/KavCUpOSLH5CM5JUPCyS35Qr6Sb50f0Wq0EW3eRaPOcuYF+aOilz8BQwe0nA==</latexit>

NL(xt | xt−1, Q)<latexit sha1_base64="ffjVVQOnpuKCJlUeoqX9Y0tq9jU=">AAACdHicbZDNbtQwFIU94a+En05hCQuLUaUihSgpUGZZwYYFQq3EtJUmo8hxbjJWHSeyb6oZWXkNnoYtvAMvwhpnGiHocCVbR+eeK19/WSOFwSj6OfJu3b5z997Off/Bw0ePd8d7T85M3WoOM17LWl9kzIAUCmYoUMJFo4FVmYTz7PJD3z+/Am1Erb7guoFFxUolCsEZOisdRwnCCu3nLrWfOptIKPDAX6VIk0rkdJVafBV3AT31Ey3KJb7s0vEkCqNN0W0RD2JChjpJ90ZBkte8rUAhl8yYeRw1uLBMo+ASOj9pDTSMX7IS5k4qVoFZ2M3XOrrvnJwWtXZHId24f09YVhmzrjKXrBguzc1eb/6vN2+xmC6sUE2LoPj1Q0UrKda050RzoYGjXDvBuBZuV8qXTDOOjqbvJzkUDvlmH6uaJi01W3dWl1lnw7fTafDn6raztWaqhCHtkAU0nMZHQY+x5xvfpLktzg7D+HV4ePpmcvx+IL1DnpEX5IDE5B05Jh/JCZkRTr6Sb+Q7+TH65T33Jt7+ddQbDTNPyT/lhb8B/TG7/w==</latexit>

Body Frame Projection Tangent Distribution

Figure 5. Left: Object dynamics of body frame at time t are pro-

jected into coordinates at time t − 1 by the Lie group operation

x−1

t−1xt. Right: The projection is in SE(3) with Gaussian statis-

tics in the tangent plane of xt−1. The figure notionally depicts two

degrees of freedom, whereas SE(3) would have 6 degrees.

where t = 1, . . . , T, k = 1, . . . ,∞, n = 1, . . . , Nt and

omitted leading subscripts are taken to mean joint depen-

dence (i.e. y = {yt}Tt=1 and yt = {ytn}Nt

n=1). Inference

complexity is linear in the number of observations and parts.

In our experiments, chains were generally mixed after about

300 samples, with approximately 1 minute per sample being

the worst-case timing for any data we tested on.

In the sequel we sketch the sampling of body transforma-

tions xt. Full details are in the supplement, along with sam-

pling of the canonical parts ωk and part transformations θtkwhich take a similar form. We also discuss sampling part as-

sociations ztn, which are conjugate except when sampling

assignments to the base measure. The conditionals in the

third grouping (π,Ek, Sk, Q) can be sampled analytically

due to conjugate priors, which we defer to the supplement.

Finally, parts {θtk, ωk, Sk,Wk, Ek} can be sampled in par-

allel across k and ztn can be sampled in parallel across t, n.

4.1. Decomposition of Lie Group dynamics

We exploit the Lie algebra to develop an efficient Gibbs

sampler for dynamical terms {xt, ωk, θtk}. For example,

the operation x−1t−1xt transforms the body frame at time t

into that of the body frame at time t − 1 (Fig. 5, left). This

operation is an element of SE(D):

x−1t−1xt ,

(

Rx−1t−1,xt

dx−1t−1,xt

0 1

)

, (22)

where Rx−1t−1,xt

= RTxt−1

Rxtand dx−1

t−1,xt= RT

xt−1(dxt

−dxt−1

). Elements in the frame xt are mapped to the tangent

space of xt−1 via the Riemannian Log map (Fig. 5, right):

Logxt−1xt , logG(x

−1t−1xt) =

(

V −1

x−1t−1,xt

dx−1t−1,xt

φx−1t−1,xt

)

(23)

The first entry V −1

x−1t−1,xt

dx−1t−1,xt

are tangent space coordi-

nates of translation and the second entry φx−1t−1,xt

is a rota-

7430

tion vector. The invertible linear operator V −1

x−1t−1,xt

is com-

putable from rotation Rx−1t−1,xt

(or from φx−1t−1,xt

). This is

well-defined for x−1t−1xt sufficiently close to identity and

consistent with small incremental motions.

4.2. Gibbs Sampling Updates

Recall that Eqns. (22) and (23) map xt to the tangent

space of xt−1. When conditioned on rotation, this map-

ping is linear in the translation component dxt. This ob-

servation, combined with Gaussian statistics in the tangent

space, yields closed-form Gibbs updates for translation. To

see this, observe that the distribution over dynamics in the

tangent space is (Fig. 5, right),

NL(xt | xt−1, Q) = N

((

Cdxt+ u

φx−1t−1,xt

)

∣

∣

∣0, Q

)

(24)

where C = V −1

x−1t−1,xt

R⊤xt−1

and u = −Vx−1t−1,xt

R⊤xt−1

dxt−1.

Conditioned on rotation Rxtand previous body frame

xt−1, the corresponding rotation vector φx−1t−1,xt

and matrix

Vx−1t−1,xt

are fixed quantities. This renders C and u com-

putable and yields a Gaussian conditional distribution for

dxt. This conditional constitutes our prior belief about dxt

given Rxt, xt−1 and covariance Q. Similar logic allows us

to derive a Gaussian conditional on dxtgiven future trans-

formation xt+1. These can be analytically combined to pro-

vide a Gaussian distribution for dxt| Rxt

, xt−1, xt+1. Be-

cause this is Gaussian, and the observation model is also

a product of Gaussians whose parameters are known given

{ωk, Ek, θtk}∞k=1 and {ztn}Nt

n=1, it follows that the poste-

rior on dxtis also Gaussian, and analytically computable.

In contrast, sampling of rotation parameters lacks a

closed form. We utilize univariate slice sampling [22] for

the full conditional of each rotation parameter, along with

a fixed number of MCMC proposals to correct for known

rotational symmetries. Details are in the supplement.

Part Association The conditional distribution for a

single assignment to an existing part k ≥ 1 is given by

p(ztn = k | ytn, xt, ω, θt, π, E) ∝ πk p(ytn|xt, ωk, θtk, Ek).Conversely, association to a new part is given by,

p(ztn= −1 |ytn, xt, ω, θt, π, E) ∝ (25)

π∗

∫

p(ytn|xt, ω∗, θt∗, E∗)p(ω∗, θt∗, E∗) d(ω∗, θt∗, E∗)

where π∗ is the stick weight corresponding to the base mea-

sure (i.e. all uninstantiated parts). This is not analytic in our

model, but can be effectively approximated by Monte Carlo

sampling of parts (need only be done once) or approxima-

tion by a constant (since the predictive distribution of parts

will be broad, but centered at the object frame of reference).

We obtain satisfactory results with both approaches.

5. Results

We compare quantitatively and qualitatively to nonpara-

metric and parametric baselines in 5.1. We present results

on dynamic mesh data in 5.2 and demonstrate object seg-

mentation based on relative part motion in 5.3. We show

transfer of learned representations to a novel dataset and

synthesize motion from the learned representation in the

supplement. The video supplemental animates these results.

5.1. Quantitative Comparison

We examine part discovery performance on three ob-

ject motion datasets and compare to manually-annotated

ground-truth. We emphasize that annotations are not in-

corporated into the inference procedure. We refer to the

datasets as hand, spider, and marmoset. hand and

spider are 2-D image data, while marmoset is 3D

data unprojected from a depth camera. Inference utilizes

12 − 44 frames (depending on the dataset) and results are

compared to five manually-annotated ground-truth frames

(where ground truth is the number of parts and their seg-

mentations – examples are in the supplement). In each

dataset, parts have nearly indistinguishable appearances and

none of the compared methods use an appearance model.

Consequently, part discovery is achieved via analysis of mo-

tion dynamics. Inputs only contain foreground (i.e. back-

ground is removed), as is done in related works [20].

We report multi-object tracking and segmentation

(MOTS) metrics [34], which measure how well the part

associations overlap with groundtruth part segmentations

(MOTSA, sMOTSA, MOTSP) and how stable the part as-

sociations are over time (IDS). These metrics are intended

for segmenting multiple targets, but we repurpose them to

segment multiple parts. Comparisons are with IoU 0.3.

We compare against two baselines: the Bayesian non-

parametric model of [39] (discussed in Section 2), which

we call the nonparametric extents model npe, and a para-

metric modification of [39], so that it is given the advan-

tage of knowing the true number of parts. We call this the

parametric extents model pe. Neither npe nor pe consider

part persistence over time (as we do), so for these methods

we use the Hungarian algorithm to compute part correspon-

dences between pairs of timesteps on the distance (in the

body frame) of component means.

Taken together, our model, and the two baselines, consti-

tute an ablation study in which we consider unknown num-

ber of parts with Lie group dynamics, and unknown / known

number of parts, without Lie group dynamics. In all cases,

we compute mean and standard deviation of MOTS statis-

tics on 100 samples taken from a Markov chain of 1000samples, use data-dependent priors (specified in the supple-

mental), and set concentration parameter α = 0.1. Figure 6

(left) shows quantitative results while Figure 9 show quali-

tative comparisons between our method and the baseline.

7431

Dataset Method IDS MOTSA MOTSP sMOTSA

hand

ours 0.00± 0.00 2.79± 0.30 0.71± 0.01 1.34± 0.24

npe 4.45± 1.84 1.93± 0.8 0.51± 0.01 −4.2± 0.78

pe 4.03± 2.11 1.57± 0.44 0.47± 0.01 −0.33± 0.37

spider

ours 5.14± 1.49 3.44± 0.25 0.55± 0.02 1.26± 0.18

npe 19.6± 2.88 −4.4± 0.92 0.51± 0.01 −6.72± 0.9

pe 17.28± 3.06 1.73± 0.31 0.52± 0.01 −0.24± 0.27

marmoset

ours 1.24± 0.65 1.39± 0.89 0.49± 0.02 −0.47± 0.71

npe 3.18± 1.28 −32.44± 2.78 0.35± 0.01 −34.06± 2.72

pe 0.43± 0.51 3.86± 0.17 0.48± 0.00 1.77± 0.17

average

ours 2.12± 0.71 2.54± 0.48 0.58± 0.02 0.71± 0.38

npe 9.07± 2.0 −11.63± 1.5 0.46± 0.01 −15.0± 1.47

pe 7.25± 1.89 2.39± 0.31 0.49± 0.01 0.39± 0.270 10 20 30 40

Time

0

1

2

3

4

5

6

7

Log

Cum

ulat

ive

Ener

gy

Figure 6. (Left): Quantitative comparison of our nonparametric parts model (ours) with nonparametric baseline npe and parametric

baseline pe using MOTS metrics. Lower IDS is better, higher MOTSA, MOTSP, and sMOTSA is better. Best-performing method is

emboldened. (Right): Object segmentation based on part motion across time. Whereas the parts nearest the body center exhibit little

motion (in the body frame), the extremities of spider exhibit large amounts of motion. (Top-Left): Part associations. (Top-Right): Part

segmentation based on motion energy. (Bottom): Log cumulative part motion energy across time (color-coordinated to associations).

Figure 7. Dynamic mesh segmentation. By using points sampled

inside a mesh as the input to our nonparametric parts model, then

computing associations to mesh vertices, our model can learn parts

and dynamics from mesh data. Additional views in Figure 1

Figure 8. Part posteriors for hand and spider. Dotted ellipses are

the mean part covariance, solid ellipses visualize the part posterior

location covariance. Points are observed part locations used for the

posterior updates. The leg locations of spider are smeared due to

their articulation whereas the fingers of the hand are concentrated.

Our model outperforms the nonparametric baseline in all

datasets and metrics. The pe baseline (which benefits from

knowing the number of parts) outperforms our method on

label switches (IDS) and overall quality (sMOTSA) on the

3D marmoset data. This is largely due to noisy data from

the depth sensor generating observations from the back-

ground that are distant from the object, but not so distant

as to be relegated to the base measure. We see very little ID

switching (IDS) and relatively high precision (MOTSP) in

our model, which we attribute to the canonical parts ωk en-

forcing that each part transformation θtk move stably. Visu-

ally, part assignments correspond best to ground-truth parts

that are extremities (fingers, legs, tails), but tend to over-

segment large object interiors (palms, bodies). We attribute

this to the ellipsoidal observation model but find that, for the

purposes of part analysis, it has no obvious negative impact.

5.2. Dynamic Mesh Segmentation

We apply our method to the squat1 sequence in the ar-

ticulated mesh dataset of [33], decomposing the mesh se-

quence into parts as shown in Figure 7. Note that legs are

segmented into two parts each, while arms are segmented

into one part. This is consistent with the movement in

this sequence where the legs bend, but the arms are held

straight. Minor artifacts appear when the lower-left leg (red)

has small numbers of associations above the knee when the

person is squatting, but not when standing straight up. Qual-

itatively, the results conform to human part interpretation.

5.3. Motion Analysis

We show how our model facilitates novel object / part

analysis. Beginning with Figure 8, we visualize part dia-

grams for hand and spider. Dotted ellipses show the ob-

servation noise model Ek for each part (in the object frame),

while solid ellipses show the covariance for that part’s trans-

lation across time. Because the part translation covariances

are spatially separated, the model resists label switching be-

tween parts because they tend to stay proximate to their

canonical frame. We observe that the part translation co-

variances are tight for the hand, but horizontally smeared

for the spider–this is expected, because the fingers moved

7432

Figure 9. Example part associations in hand, spider and marmoset. For each sequence, example frames from the original video

are shown (top-row) with part associations and object/part coordinate frames overlaid from our method (middle-row) and baseline NPE

associations (bottom-row). Parts estimated by our method are largely consistent over time, even for the highly-articulated spider legs.

very little in hand compared to the legs in spider.

One analysis that our model enables is the comparison of

part motions in the body frame (i.e. motion not from the ob-

ject moving, but from its parts). By integrating each part’s

motion over time within the body frame we can determine

which areas of an object experience high or low relative mo-

tion. Figure 6 (right) shows that, for spider, the legs are

able to be segmented from other parts.

6. Discussion

In this work we demonstrated that our nonparametric

representation of kinematic bodies infers meaningful part

decompositions of objects in an unsupervised way, by sim-

ply observing them in motion. Furthermore, our Lie group

representation constrains articulations of moving parts to

physically plausible kinematic states, without the require-

ment of object-specific knowledge such as skeletal struc-

tures. Part decompositions are learnable on very short se-

quences, and generalize to other datasets and instances of

the same object type. In contrast to methods which rely on

extensive training data and/or object-specific 2D/3D mod-

els, we were able to demonstrate robust analysis by direct

observation of single instances of an object.

Our model simplifies inference and motion analysis

while suggesting straightforward extensions. For exam-

ple, part persistence ensures that the representation of parts

persists over a video sequence, even if parts become oc-

cluded. A hierarchical model over multiple videos of simi-

lar objects would thus be robust to occlusions in any single

video. Furthermore, Gaussian tangent-space conditionals

allows closed-form Gibbs updates for translation, efficient

slice sampling of rotation, and proves sufficient for motion

analysis. Explicit models of part shape may avoid over-

segmenting large regions and is the focus of current work.

Acknowledgements This work was partially supported

by ONR N00014-17-1-2072 and NIH 5R01MH111916.

7433

References

[1] Charles E Antoniak. Mixtures of dirichlet processes with ap-

plications to bayesian nonparametric problems. The annals

of statistics, pages 1152–1174, 1974. 2

[2] Matthew J Beal, Zoubin Ghahramani, and Carl E Ras-

mussen. The infinite hidden markov model. In Advances

in neural information processing systems, pages 577–584,

2002. 2

[3] Federica Bogo, Javier Romero, Gerard Pons-Moll, and

Michael J Black. Dynamic faust: Registering human bod-

ies in motion. In Proceedings of the IEEE conference on

computer vision and pattern recognition, pages 6233–6242,

2017. 2

[4] Guillaume Bourmaud, Remi Megret, Marc Arnaudon, and

Audrey Giremus. Continuous-discrete extended kalman fil-

ter on matrix lie groups using concentrated gaussian dis-

tributions. Journal of Mathematical Imaging and Vision,

51(1):209–228, 2015. 2

[5] C. Bregler and J. Malik. Tracking people with twists and

exponential maps. Proceedings. 1998 IEEE Computer Soci-

ety Conference on Computer Vision and Pattern Recognition

(Cat. No.98CB36231), pages 8–15, 1998. 2

[6] Martin Brossard, Silvere Bonnabel, and Jean-Philippe Con-

domines. Unscented kalman filtering on lie groups. In 2017

IEEE/RSJ International Conference on Intelligent Robots

and Systems (IROS), pages 2485–2491. IEEE, 2017. 2

[7] David B. Dunson. Bayesian dynamic modeling of latent trait

distributions. Biostatistics, 7(4):551–568, 2006. 2

[8] Ethan Eade. Lie groups for computer vision. Cambridge

Univ., Cam-bridge, UK, Tech. Rep, 2014. 2

[9] Pedro F Felzenszwalb, Ross B Girshick, David McAllester,

and Deva Ramanan. Object detection with discriminatively

trained part-based models. IEEE transactions on pattern

analysis and machine intelligence, 32(9):1627–1645, 2010.

2

[10] Martin A Fischler and Robert A Elschlager. The representa-

tion and matching of pictorial structures. IEEE Transactions

on computers, (1):67–92, 1973. 2

[11] Emily Fox, Erik B Sudderth, Michael I Jordan, and Alan S

Willsky. Bayesian nonparametric inference of switching dy-

namic linear models. IEEE Transactions on Signal Process-

ing, 59(4):1569–1585, 2011. 2

[12] Emily B Fox, Erik B Sudderth, Michael I Jordan, Alan S

Willsky, et al. A sticky hdp-hmm with application to speaker

diarization. The Annals of Applied Statistics, 5(2A):1020–

1056, 2011. 2

[13] Katerina Fragkiadaki, Sergey Levine, Panna Felsen, and Ji-

tendra Malik. Recurrent network models for human dynam-

ics. In Proceedings of the IEEE International Conference on

Computer Vision, pages 4346–4354, 2015. 2

[14] Oren Freifeld, Alexander Weiss, Silvia Zuffi, and Michael J

Black. Contour people: A parameterized model of 2d artic-

ulated human shape. In 2010 IEEE Computer Society Con-

ference on Computer Vision and Pattern Recognition, pages

639–646. IEEE, 2010. 2

[15] Jurgen V Gael, Yee W Teh, and Zoubin Ghahramani. The in-

finite factorial hidden markov model. In Advances in Neural

Information Processing Systems, pages 1697–1704, 2009. 2

[16] Brian C Hall. Lie groups, lie algebras, and representations.

In Quantum Theory for Mathematicians, pages 333–366.

Springer, 2013. 2

[17] Søren Hauberg, Francois Lauze, and Kim Steenstrup Peder-

sen. Unscented kalman filtering on riemannian manifolds.

Journal of Mathematical Imaging and Vision, 46(1):103–

120, 2013. 2

[18] Shanon X Ju, Michael J Black, and Yaser Yacoob. Card-

board people: A parameterized model of articulated image

motion. In Proceedings of the Second International Con-

ference on Automatic Face and Gesture Recognition, pages

38–44. IEEE, 1996. 2

[19] Giuseppe Loianno, Michael Watterson, and Vijay Kumar.

Visual inertial odometry for quadrotors on SE(3). Proceed-

ings - IEEE International Conference on Robotics and Au-

tomation, 2016-June(3):1544–1551, 2016. 2

[20] Dominik Lorenz, Leonard Bereska, Timo Milbich, and Bjorn

Ommer. Unsupervised part-based disentangling of object

shape and appearance. In Proceedings of the IEEE Con-

ference on Computer Vision and Pattern Recognition, pages

10955–10964, 2019. 2, 6

[21] Naureen Mahmood, Nima Ghorbani, Nikolaus F Troje, Ger-

ard Pons-Moll, and Michael J Black. Amass: Archive of mo-

tion capture as surface shapes. In Proceedings of the IEEE

International Conference on Computer Vision, pages 5442–

5451, 2019. 2

[22] Radford M Neal et al. Slice sampling. The annals of statis-

tics, 31(3):705–767, 2003. 6

[23] Frank C. Park. Distance Metrics on the Rigid-Body Mo-

tions with Applications to Mechanism Design. Journal of

Mechanical Design, 117(1):48–54, 1995. 2

[24] Gerard Pons-Moll, Andreas Baak, Thomas Helten,

Meinard Muller, Hans-Peter Seidel, and Bodo Rosen-

hahn. Multisensor-fusion for 3d full-body human motion

capture. In 2010 IEEE Computer Society Conference on

Computer Vision and Pattern Recognition, pages 663–670.

IEEE, 2010. 2

[25] Lu Ren, David B. Dunson, and Lawrence Carin. The dy-

namic hierarchical Dirichlet process. Proceedings of the 25th

international conference on Machine learning - ICML ’08,

pages 824–831, 2008. 2

[26] David A Ross, Daniel Tarlow, and Richard S Zemel. Un-

supervised learning of skeletons from motion. In European

Conference on Computer Vision, pages 560–573. Springer,

2008. 2

[27] J Shotton, A Fitzgibbon, M Cook, T Sharp, M Finocchio,

R Moore, A Kipman, and A Blake. Real-time human pose

recognition in parts from single depth images. In Proceed-

ings of the 2011 IEEE Conference on Computer Vision and

Pattern Recognition, pages 1297–1304. IEEE Computer So-

ciety, 2011. 2

[28] Julian Straub, Jason Chang, Oren Freifeld, and John

Fisher III. A dirichlet process mixture model for spherical

data. In Artificial Intelligence and Statistics, pages 930–938,

2015. 2

7434

[29] Julian Straub, Oren Freifeld, Guy Rosman, John J Leonard,

and John W Fisher III. The manhattan frame model–

manhattan world inference in the space of surface normals.

IEEE Transactions on Pattern Analysis and Machine Intelli-

gence, 2017. 3

[30] Erik B Sudderth, Antonio Torralba, William T Freeman, and

Alan S Willsky. Describing visual scenes using transformed

dirichlet processes. In Advances in neural information pro-

cessing systems, pages 1297–1304, 2006. 2

[31] Yee Whye Teh, Michael I. Jordan, Matthew J. Beal, and

David M. Blei. Hierarchical Dirichlet processes. Journal of

the American Statistical Association, 101(476):1566–1581,

2006. 2

[32] Isabel Valera, Francisco Ruiz, Lennart Svensson, and Fer-

nando Perez-Cruz. Infinite factorial dynamical model. In

Advances in Neural Information Processing Systems, pages

1666–1674, 2015. 2

[33] Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan

Popovic. Articulated mesh animation from multi-view sil-

houettes. In ACM Transactions on Graphics (TOG), vol-

ume 27, page 97. ACM, 2008. 7

[34] Paul Voigtlaender, Michael Krause, Aljosa Osep, Jonathon

Luiten, Berin Balachandar Gnana Sekar, Andreas Geiger,

and Bastian Leibe. Mots: Multi-object tracking and segmen-

tation. In Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition, pages 7942–7951, 2019. 6

[35] Yunfeng Wang and Gregory S Chirikjian. Error propaga-

tion on the euclidean group with applications to manipulator

kinematics. IEEE Transactions on Robotics, 22(4):591–602,

2006. 1, 2

[36] Zhenjia Xu, Zhijian Liu, Chen Sun, Kevin Murphy,

William T Freeman, Joshua B Tenenbaum, and Jiajun Wu.

Unsupervised discovery of parts, structure, and dynamics.

International Conference on Learning and Representation,

2019. 2

[37] Milos Zefran, Vijay Kumar, and Christopher Croke. Choice

of Riemannian Metrics for Rigid Body Kinematics. ASME

Design Engineering Technical Conference and Computers in

Engineering Conference, (3):1–11, 1996. 2

[38] Jianwen Zhang, Yangqiu Song, Changshui Zhang, and

Shixia Liu. Evolutionary hierarchical dirichlet processes for

multiple correlated time-varying corpora. Proceedings of the

16th ACM SIGKDD international conference on Knowledge

discovery and data mining - KDD ’10, page 1079, 2010. 2

[39] Yang Zhou, Jitendra Sharma, Qiong Ke, Rogier Landman,

Jingli Yuan, Hong Chen, David S Hayden, John W Fisher,

Minqing Jiang, William Menegas, et al. Atypical be-

haviour and connectivity in shank3-mutant macaques. Na-

ture, page 1, 2019. 2, 6

[40] Silvia Zuffi, Angjoo Kanazawa, and Michael J Black. Li-

ons and tigers and bears: Capturing non-rigid, 3d, articulated

shape from images. In Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition, pages 3955–

3963, 2018. 2

[41] Silvia Zuffi, Angjoo Kanazawa, David W Jacobs, and

Michael J Black. 3d menagerie: Modeling the 3d shape and

pose of animals. In Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition, pages 6365–

6373, 2017. 2

7435

Nonparametric Object and Parts Modeling With Lie Group...

Documents

Transcript of Nonparametric Object and Parts Modeling With Lie Group...