On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of...

Post on 26-Dec-2015

214 views 1 download

Tags:

Transcript of On the Preprocessing and Postprocessing of HRTF Individualization Based on Sparse Representation of...

On the Preprocessing and Postprocessing of HRTF Individualization Based on

Sparse Representation of Anthropometric Features

Jianjun HE, Woon-Seng Gan, and Ee-Leng Tan

24th

April 2015

Jhe007@e.ntu.edu.sg

Digital Signal Processing Lab,

School of Electrical and Electronic Engineering,

Nanyang Technological University, Singapore

WHY

2

1. Head-related transfer functions (HRTFs) are highly individualized;

2. HRTFs are closely related to Anthropometry (torso, head, pinna);

3. Anthropometry can be used for HRTF individualization.

IndividualizationAnthropometry

of a new person

HRTF of the

new person

Anthropometry database

HRTF database

In this paper, we aim to answer:

1. Whether the preprocessing and postprocessing methods affect the performance of HRTF individualization?

2. If so, what is the best preprocessing and postprocessing combination?

3. And, how good is it?

CIPIC Anthropometric data (35 subjects)

3V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano, “The CIPIC HRTF database,” in Proc. IEEE WASPAA, New Paltz, NY, Oct. 2001.

Methodology

4

IndividualizationA1 H1?

A

H

Anthropometry database A : S subjects * 1 set of Anthropometry (F features)

Anthropometry of a new person A1: 1 subjects * 1 set of Anthropometry

HRTF database H : S subjects * 1 set of HRTF (D directions * K points)

HRTF of a new person H1 : 1 subjects * 1 set of HRTF

P. Bilinski, J. Ahrens, M. R. P. Thomas, I. Tashev, and J. C. Plata, “HRTF magnitude synthesis via sparse representation of anthropometric features,” in Proc. IEEE ICASSP, Florence, Italy,

pp. 4501-4505, May 2014.

Methodology

5

Preprocessing of Anthropometry i

1. Direct

2. Min-max normalization

3. Standard score

4. Standard deviation normalization

0 0

0

0

, 1

-min , 2

max min

1,-mean , 3

std

, 4std

i

f i

f fi

f f

f ff fi

f

fi

f

A

A A

A A

A A A

A

A

A

0 1

2,..., .

where

F

A A A

Preprocessing of HRTF m

1. Magnitude

2. Log magnitude

3. Power

10

2

, , 11,2,..., ;

, 20 log , , 2 1, 2,...,

, , 3

m

d k md D

d k d k mk K

d k m

H

H H

H

Sparse representation j

1. Direct

2. Nonnegative

,

,, ,

,

1

, 1

. , 2

i j

i ji j l

Si j

s

l

lw s

A

AH

A

w

ww

, ,

, ,

,, , , 20

1

, ,

, , 1

ˆ , 10 , 2 .

, , 3

i j l m

i j l m

d ki j l m

i j l m

d k m

d k m

d k m

H

H

w H

H

w H

H

w H

2,11

2 1

2,21

2 1

arg min ,

arg min , s.t. 0.

i

i

i i i i i

i i i i i i

A

A

A A Aw

A A A Aw

w A w A w

w A w A w w

1

ii i AwA A Postprocessing of Anthropometry l

1. Direct

2. Normalized

In total, we have variants of methods! 484× 3× 2× 2 =

PreprocessingH

PreprocessingA

HRTF of the new person

HRTF databasePreprocessing

A

SynthesisSparse

representation

Anthropometry database

Anthropometry of a new person wA

(i,j)

Postprocessing A

A(i)

A1(i)

H(m)

wH(i,j,l)

A

A1

, , ,1

ˆ i j l mH

H

PreprocessingA

PreprocessingA

Anthropometry database

Anthropometry of a new person

A(i)

A1(i)

A

A1

Sparse representation

wA(i,j)

PreprocessingH

HRTF database

H(m)H

Postprocessing A

wH(i,j,l)

HRTF of the new personSynthesis

, , ,1

ˆ i j l mH

Evaluation

6

, , , ,

2, , , ,

101 1 1

Spectral distortion SD

ˆ ,1 1 120log dB

,

test

i j l m n

i j l m nS D Ks

s d ktest s

d k

S D K d k

H

H

• CIPIC HRTF database;

• Cross validation technique to selection the regularization

parameter;

• Stest = 35 test cases, all 1250 directions, and full frequency

range.

Performance varies among different preprocessing and postprocessing methods!

Sparse representation PostA PreH

PreA

Direct Min-max Standard score

Standard deviation

Direct

Direct

Mag 6.37 6.57 81.00 6.23

Log mag 6.40 6.50 21.61 6.17

Power 6.56 6.60 78.94 6.46

Normalized

Mag 6.36 6.35 15.97 6.25

Log mag 6.37 6.26 8.89 6.17

Power 6.60 6.77 25.21 6.52

Nonnegative

Direct

Mag 6.32 6.32 6.47 6.23

Log mag 6.38 6.47 6.79 6.17

Power 6.52 6.37 6.55 6.46

Normalized

Mag 6.31 6.26 6.10 6.25

Log mag 6.35 6.20 5.86 6.17

Power 6.53 6.54 6.54 6.52

1 2 3 45.6

5.8

6

6.2

6.4

6.6

6.8(a)

SD

(dB

)

Preprocessing method for A1 2 3 4

5.6

5.8

6

6.2

6.4

6.6

6.8(b)

SD

(dB

)

Preprocessing method for A1 2 3 4

5.6

5.8

6

6.2

6.4

6.6

6.8(c)

SD

(dB

)

Preprocessing method for A1 2 3 4

5.6

5.8

6

6.2

6.4

6.6

6.8(d)

SD

(dB

)

Preprocessing method for A

Mag Log mag Power

1 2 3 45.6

5.8

6

6.2

6.4

6.6

6.8(a)

SD

(dB

)

Preprocessing method for A1 2 3 4

5.6

5.8

6

6.2

6.4

6.6

6.8(b)

SD

(dB

)

Preprocessing method for A1 2 3 4

5.6

5.8

6

6.2

6.4

6.6

6.8(c)

SD

(dB

)

Preprocessing method for A1 2 3 4

5.6

5.8

6

6.2

6.4

6.6

6.8(d)

SD

(dB

)

Preprocessing method for A

Mag Log mag Power

Results

7

Direct sparse representation

1. PreA: standard deviation best, standard score worst;

2. PreH: log mag best, power worst;

3. PostA, PostH: minimal effect for good PreA, PreH.

Nonnegative sparse representation

1. Better than corresponding direct sparse representation (especially for

standard score);

2. Trend in PreA/PreH not obvious;

3. Normalized PostA can improve the performance (especially for standard

score).

(a) Direct sparse; Direct PostA

(b) Direct sparse; Normalized PostA

(c) Nonnegative sparse; Direct PostA

(d) Nonnegative sparse; Normalized PostA

Method Specification SD (dB)

Single bestSelect one single set of HRTF

with the corresponding closest anthropometry

8.11

Tashev et alMin-max PreA

Magnitude PreHDirect sparse representation No reported postprocessing

6.57

Our bestStandard score PreALog magnitude PreH

Nonnegative sparse representationNormalized PostA

5.86

Lower boundLinear regression based HRTF

individualization 5.12

Comparison

8

opt 2 21

w H H

Conclusions

9

1. Introduced preprocessing and postprocessing in HRTF individualization based on sparse

representation of anthropometric features.

2. Investigated 48 variants of preprocessing and postprocessing methods, and found

a) Preprocessing and postprocessing methods do affect the performance of HRTF individualization, though the effects

differ in different combinations;

b) Adding nonnegative constraints in sparse representation improves the performance;

c) The best combination for HRTF individualization is

< standard score + log magnitude + nonnegative + normalized >.

3. Established the lower bound for this type of HRTF individualization and verified that “our best”

combination outperforms existing approaches and is quite close to the lower bound.

4. Future work: subjective evaluation of HRTF individualization.

References

10

[1] D. R. Begault, 3-D Sound for Virtual Reality and Multimedia, Cambridge, MA: Academic Press, 1994.

[2] J. Blauert, Spatial Hearing: The Psychophysics of Human Sound Localization, The MIT Press, revised edition, 1996.

[3] H. Møller, “Fundamentals of Binaural Technology,” Applied Acoustics, vol. 36, 171-218, 1992.

[4] W. G. Gardner, and K. D. Martin, “HRTF Measurements of a KEMAR,” J. Acoust. Soc. Amer., vol., vol. 97, pp. 3907-3908, 1995. See also http://www.sound.media.mit.edu/KEMAR.html.

[5] H. Møller, M. F. Sørensen, D. Hammershøi, and C. B. Jensen, “Head-Related Transfer Functions of Human Subjects,” J. Aud. Eng. Soc., vol. 43, pp. 300-321, 1995.

[6] V. R. Algazi, R. O. Duda, D. M. Thompson, and C. Avendano, “The CIPIC HRTF database,” in Proc. IEEE WASPAA, New Paltz, NY, USA, Oct. 2001.

[7] E. M. Wenzel, M. Arruda, D. J. Kistler, and F. L. Wightman, “Localization Using Non-individualized Head-Related Transfer Functions,” J. Acoust. Soc. Amer., vol. 94, pp. 111-123, 1993.

[8] S. Xu, Z. Li, and G. Salvendy, “Individualization of Head-related transfer function for three-dimensional virtual auditory display: a review,” in R. Shumaker (Ed.): Virtual Reality, HCII 2007,

LNCS 4563, pp. 397–407, 2007.

[9] K. Sunder, J. He, E. L. Tan, and W. S. Gan, “Natural sound rendering for headphones,” IEEE Signal Processing Magazine, vol. 32, no.2, pp. 100-113, Mar. 2015.

[26] P. Bilinski, J. Ahrens, M. R. P. Thomas, I. Tashev, and J. C. Plata, “HRTF magnitude synthesis via sparse representation of anthropometric features,” in Proc. IEEE ICASSP, Florence, Italy, pp.

4501-4505, May 2014.

[28] S. J. Kim, K. Koh, M. Lusig, S. Boyd, and D. Gorinevsky, “An interior-point method for large-scale l1-regularized least squares,” J. Selected topics in signal processing, vol. 1, no. 4, pp. 606-

617, Dec. 2007.

[30] J. Breebaart, F. Nater, and A. Kohlrausch, “Spectral and spatial parameter resolution requirements for parametric, filter-bank-based HRTF processing,” J. Audio Eng. Soc., vol. 58, no. 3, pp. 126-

140, Mar. 2010.

Acknowledgement

11

THIS WORK IS SUPPORTED BY THE SINGAPORE MINISTRY OF EDUCATION ACADEMIC RESEARCH FUND

TIER-2, UNDER RESEARCH GRANT MOE2010-T2-2-040.

On the Preprocessing and Postprocessing of HRTF Individualization Based on

Sparse Representation of Anthropometric Features

Jianjun HE

Jhe007@e.ntu.edu.sg

Nanyang Technological University, Singapore

Thank you!