Contribution of this talk
Fast GPU implementations of registration algorithms for 3D point sets. Softassign [Gold et al., 1998] EM-ICP [Granger et al., 2002] (Weighted) Horn’s method [Horn, 1987]
So, what is “registartion” ?
3D registration algorithm
Input Two point sets: and
Output Rotation matrix Translation vector
X Y
and
Algorithms for registration
Horn’s method• Corresponding point sets are
given.• Estimate R and t.
ICP (Iterative closest point)• Unknown correspondence.• Fast, standard.• Easily fail due to local minimum.• A lot of variants follow.
Softassign• Unknown correspondence.• Robust.• Very slow because of iterations.
EM-ICP• Unknown correspondence.• Robust.• Very slow because of iterations.
Registration algorithm
Algorithms for registration
Horn’s method• Corresponding point sets are
given.• Estimate R and t.
ICP (Iterative closest point)• Unknown correspondence.• Fast, standard.• Easily fail due to local minimum.• A lot of variants follow.
Softassign• Unknown correspondence.• Robust.• Very slow because of iterations.
EM-ICP• Unknown correspondence.• Robust.• Very slow because of iterations.
Registration algorithm
Horn’s method: correspondence is known.
𝑋 𝑌
X Y
?
Unknown correspondence
X Y
Known correspondence𝒙 1 𝒚 1𝒙 2 𝒚 2⋮⋮
𝑇𝑇
𝑇𝑇
𝒙 1=(𝑥1𝑥 ,𝑥1 𝑦 ,𝑥1𝑧)𝑇
Horn’s method: correspondence is known.
𝑋 𝑌
𝒙 1 𝒚 1𝒙 2 𝒚 2⋮⋮
𝑇𝑇
𝑇𝑇
𝒙 𝒚Compute centers
�̂� 𝑌
Centering
𝑋− 𝒙𝑌 − 𝒚
�̂� 𝑌𝑆¿
𝐾¿
Computer 1st Eigenvector : quaternion
Convert to
𝒕=𝒙−𝑅 𝒚1 2
3
4
5
Algorithms for registration
Horn’s method• Corresponding point sets are
given.• Estimate R and t.
ICP (Iterative closest point)• Unknown correspondence.• Fast, standard.• Easily fail due to local minimum.• A lot of variants follow.
Softassign• Unknown correspondence.• Robust.• Very slow because of iterations.
EM-ICP• Unknown correspondence.• Robust.• Very slow because of iterations.
Registration algorithm
ICP: correspondence is unknown.
𝑋 𝑌
𝒙 1 𝒚 1𝒙 2 𝒚 2
⋮⋮
𝑇𝑇
𝑇𝑇
Find closest(nearest) pointto in
𝑌 ∗
𝒚 𝑖
𝒚 𝑖
Put the pointto
ICP: correspondence is unknown.
𝑋 𝑌
𝒙 1 𝒚 1𝒙 2 𝒚 2
⋮⋮
𝑇𝑇
𝑇𝑇
Find closest(nearest) pointto in
𝑌 ∗
𝒚 𝑗
𝒚 𝑖
Put the pointto
𝒚 𝑗
⋮
Horn’s methodwith and
Estimate and
ICP: correspondence is unknown.
𝑋 𝑅𝑌 +𝒕
𝒙 1 𝒚 1𝒙 2 𝒚 2
⋮⋮
𝑇𝑇
𝑇𝑇
Find closest(nearest) pointto in
𝑌 ∗
𝒚 𝑗
𝒚 𝑖
Put the pointto
𝒚 𝑗
⋮
Horn’s methodwith and
Estimate and
Repeat
Fast, but easy to faildue to hard correspondence.
Algorithms for registration
Horn’s method• Corresponding point sets are
given.• Estimate R and t.
ICP (Iterative closest point)• Unknown correspondence.• Fast, standard.• Easily fail due to local minimum.• A lot of variants follow.
Softassign• Unknown correspondence.• Robust.• Very slow because of iterations.
EM-ICP• Unknown correspondence.• Robust.• Very slow because of iterations.
Registration algorithm
GPU!
GPU!
Softassign: soft correspondence.
𝑋
𝑌
𝒙 𝑖
𝒚 𝑗
𝑚𝑖𝑗
𝑚𝑖𝑗=¿∨𝒙 𝑖− (𝑅 𝒚 𝑗+𝒕 )∨¿
𝑀WeightedHorn’s methodwith and
Estimate and
Repeat
GPU!
Each row and columnshould be normalized to 1by Shinkhorn iterations
Shinkhorn iterations
𝑀
Each row and columnshould be normalized to 1by Shinkhorn iterations
𝑚𝑖𝑗
sum up to 1
sum up to 1sum up to 1
⋮
sum up to 1
Repeat row and column normalization until converge.
Shinkhorn iterations
𝑀
Each row and columnshould be normalized to 1by Shinkhorn iterations
𝑚𝑖𝑗
sum
up to
1
sum
up to
1
sum
up to
1⋮
sum
up to
1
Repeat row and column normalization until converge.
Shinkhorn.GPU (row normalization)
𝑀
Each row and columnshould be normalized to 1by Shinkhorn iterations
𝟏
𝑹𝑀❑
Using sgemv of CUBLAS
Shinkhorn.GPU (row normalization)
𝑀
Each row and columnshould be normalized to 1by Shinkhorn iterations
𝑹𝑀❑
Using CUDA kernel
Row-wisedivision
Column normalization is done by the same way.
Weighted Horn’s method
�̂� 𝑌𝑆¿ �̂� 𝑌𝑆¿ 𝑀
3 3
Normal version Weighted version
Using CUBLAS sgemv twice.
Centering.GPU (weighted version)
𝑋
𝑹𝑀❑ 𝟏
𝑋
∗∗
CUDAkernel
CUBLASsasum
𝑹𝑀❑ 𝟏
∗
CUBLASsasum
𝒙
Weightedcenter
Same as for
Weightedsum
Pipeline of Softassing.GPU
𝑋𝑌
CPU GPU
𝑋
𝑌
𝑀
�̂� 𝑌𝑆¿ 𝑀
Compute with CUDA kernel
Shinkhorn.GPU
Centering.GPU
𝑆
Weighted Horn’s method
𝐾
and
SolveEigenvalueproblem
𝒙 ,𝒚
Algorithms for registrationH
o
r
n
’
s
m
e
t
h
o
d
C
o
rr
e
s
p
o
n
d
i
n
g
p
o
i
n
t
s
e
t
s
a
r
e
g
i
v
e
n
.
E
s
ti
m
a
t
e
R
a
n
d
t.
ICP (Iterative closest point) Unknown correspondence.Fast, standard.Easily fail due to local minimum.A lot of variants follow.
Softassign
Unknown correspondence.Robust.Very slow because of iterations.
Registration algorithm
GPU!
GPU!
EM-ICP: soft correspondence.
𝑌
𝑋
𝒚 𝑖
𝒙 𝑗
𝑑𝑖𝑗
𝑑𝑖𝑗=¿∨𝒙 𝑗− (𝑅 𝒚 𝑖+𝒕 )∨¿
𝐴WeightedHorn’s methodwith and
Estimate and
Repeat
𝑋 ′
𝒙 ′ 𝑖
Pseudo correspondence
GPU!
Each row is normalized once.
Weighted Horn’s method
�̂� ′ 𝑌𝑆¿
3
Weighted version
0
0𝜆1𝜆2⋱
𝝀�̂� ′
∗
CUDAkernel
�̂�
’ 𝑌𝑆¿
CUBLASsgemm
3
Weighted version (2 steps)
(not efficient)
Pipeline of EM-ICP.GPU
𝑋𝑌
CPU GPU
𝑋
𝑌
𝐴
Compute with CUDA kernel
Row normalization on GPU
Centering.GPU
𝑆
2 step weighted Horn’s method
𝐾
and
SolveEigenvalueproblem
𝒙 ,𝒚
𝝀�̂� ′
∗
�̂�
�̂� ′𝑌
𝑆
¿
Computing time over different number of points
Successfully aligned5000 points less than 7 seconds.
Slightly fast, but failed.
GPU: GeForce8800GT CPU: Intel Core2 Quad + OpenMP (4 cores)
Summary
Implemented 3D registration algorithms on a GPU are: Softassign, EM-ICP, Weighted Horn’s method.
EM-ICP.GPU is able to align 5000 points within 7 seconds, 60 times faster than EM-ICP.CPU, more robust than ICP.CPU.
Code, binary, and movies are available at: http://home.hiroshima-u.ac.jp/tamaki/study/cuda_softassign_emicp/
Limitations
Number of points Should be less than 8000 for
GeForce8800GT with 512MB memory. More memory, more points.
Stopping condition requires to store whole matrix or , and
compare with previous ones: inefficient. Hence, currently, number of iterations is
fixed.
Algorithms for registrationH
o
r
n
’
s
m
e
t
h
o
d
C
o
rr
e
s
p
o
n
d
i
n
g
p
o
i
n
t
s
e
t
s
a
r
e
g
i
v
e
n
.
E
s
ti
m
a
t
e
R
a
n
d
t.
ICP (Iterative closest point) Unknown correspondence.Fast, standard.Easily fail due to local minimum.A lot of variants follow.
Softassign
Unknown correspondence.Robust.Very slow because of iterations.
Registration algorithm
Top Related