Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School...

Post on 05-Jan-2016

217 views 0 download

Tags:

Transcript of Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School...

Study on Frequency Domain Primary-Ambient Extraction (PAE)

HE Jianjun

PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore

Email: JHE007@e.ntu.edu.sg

Introduction– PAE based Spatial Audio System

2

PAEInput

Primary components

Pre-processing

Ambient components

Primaryrendering

Ambient rendering

Spatial attributes

Post- processing

Output

Primary components highly correlated

Ambient components uncorrelated

Primary ambient components uncorrelated

Ambient power balanced

Stereo Signal Model

3

L L L

R R R

x p a

x p a

Signal = Primary + Ambient

: Left channel

: Right channel

L

R

Assumptions

R Lkp p

L Ra a

L RP Pa a

( ) ( )L R L Rp a

Stereo Signal Model

4

2

Primary panning factor PPF:

12 2

R

L

RR LL RR LL

LR LR

r r r

r

k

rk

r

p

p

Total primary powerPrimary power ratio PPR:

Total signal power

2, [0,1]LR RR LL

RR LL

r r r k

r r k

, : autocorrelation of the left, right channel; : cross correlation between the left and right channelLL RR LRr r r

k

1

Center

RightLeft

1/10 10

PAE in full band, time domain

5

0 0.5 1 1.5 2 2.5 3 3.5 4

x 105

-0.02

0

0.02

0 0.5 1 1.5 2 2.5 3 3.5 4

x 105

-0.02

0

0.02

200 400 600 800 1000 1200 1400 1600 1800 20000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Compute parameters: k, γ

22

1,2 2

LR RR LLRR LL RR LL

LR LR RR LL

r r r kr r r rk

r r r r k

22

2

2 2

2

22

2 2

111

1 1

111

1

ˆ

1

ˆ

ˆ

ˆ

L

R

L

R

L

R

k

kkk k

k kkkkk

k

k k

p

xp

a

xa

500 1000 1500 2000 2500 3000 3500 4000-2

0

2x 10

-3

500 1000 1500 2000 2500 3000 3500 4000-2

0

2x 10

-3

PAE in full band, frequency domain

6

0 0.5 1 1.5 2 2.5 3 3.5 4

x 105

-0.02

0

0.02

0 0.5 1 1.5 2 2.5 3 3.5 4

x 105

-0.02

0

0.02

200 400 600 800 1000 1200 1400 1600 1800 20000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Compute parameters: k, γ

500 1000 1500 2000 2500 3000 3500 4000-2

0

2x 10

-3

500 1000 1500 2000 2500 3000 3500 4000-2

0

2x 10

-3

5001000

15002000

25003000

35004000

0

0.05

5001000

15002000

25003000

35004000

0

0.05

FFT

PAE in subband, frequency domain

7

5001000

15002000

25003000

35004000

0

0.05

5001000

15002000

25003000

35004000

0

0.05

f

L R

X(f) k(f)

k

k

k

k

k

k

k

k

k

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

X(f) k represent the panning of the source

AssumptionIn each band, only one source is dominant. The overlapping among the spectrum of different sources shall be minimal.

Correlation computation and time shifting

8

, 44,44N

xyc x n y n

*

*

,0 44

, 44 0xy

IDFT X f Y fc

IDFT X f Y f

2 o

DFTj f N

o Nx n X f e

2

2

ˆ

ˆ

o

o

j f NLpL LpRL L

j f NRR RpL RpR

w f w f eP f X f

X fP f w f e w f

In time domain

In frequency domain

Find the inter-channel time difference (ICTD) arg maxo xyc

Apply ICTD in frequency domain

φ0

How to partition the bands?

9

5001000

15002000

25003000

35004000

0

0.05

5001000

15002000

25003000

35004000

0

0.05

f

L R

X(f) Ideally, the number of partitions = number of sources

Fixed partitioning: independent of input• Uniform (2, 4, 8, etc.)• Non-uniform (e.g. ERB)

Based on inter-channel cross- correlation coefficient (ICC) φ, Two thresholds: φL , φH

φ1

Adaptive partitioning: dependent of input • Top-down• Bottom-up

…Conditions for partition: •φ0 < φH

•max(φ1, φ2) > φ0

•min(φ1, φ2) > φL

Unknown

φ2

Multiple (2) sources

10

Three cases for the directions of two sources:1.At different sides (DS)2.One at the center (C)3.At the same side (SS)

Four ways to synthesize the source direction1.Amplitude panning (AP)2.Time shifting (TS)3.Amplitude panning and time shifting (APTS)4.HRTF filtering (HRTF)

Simulation testing: setup

11

Primary components: Speech, musicAmbient components: white Gaussian noisePrimary power ratio = 0.9Frame length: 4096Hanning window, 50% overlapping

We test PCA and SPCA with different frequency partitioning•Time domain, full band (Reference)•Uniform partitioning with [1, 2, 4, 8, 16, 32, 64] partitions•Non-uniform partitioning with 20 partitions (Faller, BCC [6])•Top-down (TD) partitioning, with φL = 0.1; φH =0.8

[6] C. Faller, and F. Baumgarte, “Binaural cue coding-part II: schemes and applications,” IEEE Tran. Speech and Audio Processing, vol. 11, no. 6, Nov. 2003.

Performance measure: Error-to-Signal Ratio

ˆ ˆ

10

,

( ) 10log .2

L L R R

L R

P P P PL R

P P

L R

P PESR ESR

P P

ESR ESRESR dB

Simulation Results: 1 source

12

Primary component: speech shifted by 20 lags, panned by k=3.

T 1 2 4 8 16 32 64 20non TDPCA -3.69 -3.72 -3.38 -3.45 -3.34 -3.32 -3.16 -3.19 -3.33 -3.72

SPCA -14.78 -14.85 -12.34 -12.05 -11.52 -11.35 -10.63 -9.30 -10.34 -14.38

1. Generally, SPCA better than PCA.2. The time domain PCA (SPCA) is very close to the frequency domain PCA (SPCA)

when there is only one partition.3. Significant worse performance is found in the frequency domain approaches with

fixed partitioning.4. The performance of the top down partitioning is acceptable.

0 2 4 6 8 10 12 14 16

-40

-30

-20

-10

0

10

20

30

40

Frequency partition k

ICT

D

estimated

true

0 2 4 6 8 10 12 14 160.5

1

1.5

2

2.5

3

3.5

4

Frequency partition k

PP

F

estimated

true

Primary panning factor

ICTD

Simulation Results: 2 sources

13

Four ways to synthesize the source direction1.Amplitude panning (AP)2.Time shifting (TS)3.Amplitude panning and time shifting (APTS)4.HRTF filtering (HRTF)

Three cases for the directions of two sources:1.At different sides (DS)2.One at the center (C)3.At the same side (SS)

Simulation Results: 2 sources-AP

14

1. Generally, the performance of SPCA and PCA is similar.

DS T 1 2 4 8 16 32 64 20non TDPCA -7.95 -8.10 -8.13 -8.22 -8.34 -8.56 -8.56 -9.80 -10.86

SPCA -7.94 -8.15 -8.18 -8.26 -8.36 -8.61 -8.39 -9.33 -9.95 -8.36

C T 1 2 4 8 16 32 64 20non TDPCA -10.15 -10.25 -10.14 -10.22 -10.27 -10.38 -10.34 -11.16 -11.99

SPCA -10.14 -10.33 -10.22 -10.24 -10.30 -10.43 -10.04 -10.29 -10.12 -10.38

SS T 1 2 4 8 16 32 64 20non TDPCA -13.04 -13.10 -11.82 -11.88 -11.75 -11.53 -11.31 -11.40 -11.93

SPCA -13.02 -13.23 -12.04 -11.81 -11.65 -11.56 -10.30 -10.24 -10.52 -13.21

2. The performance is better when the two directions become closer.3. The frequency domain approaches with fixed partitioning show some advantage when the primary components are not in the same side.

4. The frequency domain approach with top down partitioning yields a good performance.

Simulation Results: 2 sources-TS

15

1. Clearly, SPCA perform better than PCA.

DS T 1 2 4 8 16 32 64 20non TDPCA -5.16 -5.21 -5.23 -5.24 -5.23 -5.23 -5.26 -5.38 -5.47 -5.21

SPCA -7.98 -8.44 -8.43 -8.59 -8.69 -8.73 -8.62 -9.12 -8.58 -8.91

C T 1 2 4 8 16 32 64 20non TDPCA -9.10 -9.14 -9.18 -9.20 -9.18 -9.17 -9.14 -9.18 -9.27 -9.14

SPCA -9.13 -9.60 -9.70 -9.85 -9.97 -9.85 -9.54 -9.91 -9.35 -10.05

SS T 1 2 4 8 16 32 64 20non TDPCA -5.37 -5.38 -5.40 -5.42 -5.41 -5.42 -5.41 -5.44 -5.49 -5.38

SPCA -11.15 -11.65 -11.71 -11.78 -11.94 -11.86 -11.02 -11.20 -9.42 -11.68

2. The performance of SPCA is better when no directions in the center.3. The frequency domain approaches with fixed partitioning show some slightly advantage and does not vary too much in different partitioning.

4. The frequency domain approach with top down partitioning yields the best overall performance.

Simulation Results: 2 sources-APTS

16

1. Clearly, SPCA perform better than PCA.

DS T 1 2 4 8 16 32 64 20non TDPCA -5.16 -5.21 -5.23 -5.23 -5.23 -5.23 -5.26 -5.38 -5.47

SPCA -7.99 -8.44 -8.43 -8.59 -8.69 -8.73 -8.62 -9.12 -8.58 -8.91

C T 1 2 4 8 16 32 64 20non TDPCA -8.06 -8.28 -8.19 -8.27 -8.34 -8.46 -8.44 -9.04 -9.55 -8.28

SPCA -8.07 -8.43 -8.38 -8.40 -8.57 -8.63 -8.44 -8.70 -9.07 -8.68

SS T 1 2 4 8 16 32 64 20non TDPCA -4.18 -4.18 -3.95 -3.97 -3.91 -3.92 -3.89 -3.87 -3.98 -4.19

SPCA -10.16 -10.60 -9.89 -9.75 -9.80 -9.77 -9.07 -8.68 -7.29 -10.82

2. The performance is better when two directions are closer.3. The frequency domain approaches with fixed partitioning perform better when the two directions are not in the same side.

4. The frequency domain approach with top down partitioning yields the best overall performance for all cases.

Simulation Results: 2 sources-APTS

17

1. Clearly, SPCA perform better than PCA.

DS T 1 2 8 3220no

nTD18

0.05,0.8

0.2,0.8

0.2,0.7

0.1,0.7

0.05,0.7

0.05,0.9

0.1,0.9

0.2,0.9

PCA -4.74 -5.04 -5.04 -5.22 -5.48 -6.85 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03

SPCA -6.45 -6.85 -6.85 -7.11 -7.25 -7.73 -6.85 -8.36 -6.85 -6.85 -6.85 -7.93 -8.58 -6.85 -6.85

C T 1 2 8 3220no

nTD18

0.05,0.8

0.2,0.8

0.2,0.7

0.1,0.7

0.05,0.7

0.05,0.9

0.1,0.9

0.2,0.9

PCA -8.06 -8.28 -8.19 -8.34 -8.44 -9.55 -8.28 -8.44 -8.28 -8.28 -8.28 -8.44 -8.44 -8.27 -8.27

SPCA -8.07 -8.43 -8.38 -8.57 -8.44 -9.07 -8.68 -9.06 -8.44 -8.44 -8.68 -8.58 -9.93 -8.6 -8.44

SS T 1 2 8 3220no

nTD18

0.05,0.8

0.2,0.8

0.2,0.7

0.1,0.7

0.05,0.7

0.05,0.9

0.1,0.9

0.2,0.9

PCA -4.18 -4.18 -3.95 -3.91 -3.89 -3.98 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19

SPCA-

10.16-

10.60-9.89 -9.80 -9.07 -7.29

-10.82

-10.11

-10.6 -10.6 -10.6-

10.41-8.53

-10.27

-10.6

2. The performance is better when two directions are closer.3. The frequency domain approaches with fixed partitioning perform better when the two directions are not in the same side.

4. The frequency domain approach with top down partitioning yields the best overall performance for all cases.

Simulation Results: 2 sources-APTS

18

1. Clearly, SPCA perform better than PCA.

DS T 2 8 3220no

n0.05,0.7

TD180.05,0.8

0.2,0.8

0.2,0.7

0.1,0.7

0.05,0.7

0.05,0.9

0.1,0.9

0.2,0.9

PCA -4.74 -5.04 -5.22 -5.48 -6.85 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03

SPCA -6.45 -6.85 -7.11 -7.25 -7.73 -7.93 -6.85 -8.36 -6.85 -6.85 -6.85 -7.93 -8.58 -6.85 -6.85

C T 2 8 3220no

n0.05,0.7

TD180.05,0.8

0.2,0.8

0.2,0.7

0.1,0.7

0.05,0.7

0.05,0.9

0.1,0.9

0.2,0.9

PCA -8.06 -8.19 -8.34 -8.44 -9.55 -8.44 -8.28 -8.44 -8.28 -8.28 -8.28 -8.44 -8.44 -8.27 -8.27

SPCA -8.07 -8.38 -8.57 -8.44 -9.07 -8.58 -8.68 -9.06 -8.44 -8.44 -8.68 -8.58 -9.93 -8.6 -8.44

SS T 2 8 3220no

n0.05,0.7

TD180.05,0.8

0.2,0.8

0.2,0.7

0.1,0.7

0.05,0.7

0.05,0.9

0.1,0.9

0.2,0.9

PCA -4.18 -3.95 -3.91 -3.89 -3.98 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19

SPCA-

10.16-9.89 -9.80 -9.07 -7.29

-10.41

-10.82

-10.11

-10.6 -10.6 -10.6-

10.41-8.53

-10.27

-10.6

2. The performance is better when two directions are closer.3. The frequency domain approaches with fixed partitioning perform better when the two directions are not in the same side.

4. The frequency domain approach with top down partitioning yields the best overall performance for all cases.

Simulation Results: 2 sources-HRTF

19

1. Clearly, SPCA perform better than PCA.

DS T 1 2 4 8 16 32 64 20non TDPCA -3.28 -3.46 -3.46 -3.44 -3.50 -3.74 -3.77 -3.89 -3.85 -2.47

SPCA -6.49 -6.07 -6.08 -6.13 -6.42 -7.33 -5.70 -5.56 -5.71 -6.52

C T 1 2 4 8 16 32 64 20non TDPCA -6.96 -7.16 -7.09 -7.12 -7.21 -7.37 -7.50 -7.60 -7.65 -7.16

SPCA -7.41 -7.97 -7.89 -7.92 -7.87 -8.33 -6.97 -6.15 -7.17 -8

SS T 1 2 4 8 16 32 64 20non TDPCA -1.14 -1.26 -1.14 -1.15 -1.12 -1.06 -1.04 -1.07 -1.19 -1.88

SPCA -6.58 -6.81 -6.48 -6.48 -6.70 -7.44 -2.51 -2.44 -2.90 -6.98

2. The performance of SPCA is better when one direction is in the center.3. The frequency domain approaches with fixed partitioning show better performance only in some partitionings.

4. The frequency domain approach with top down partitioning yields the best overall performance for all cases.

Summary of Simulation Results: 2 sources

20

2. Generally, SPCA perform better than PCA in almost all cases.

3. The performance varies as the directions of the sources change.

4. The frequency domain approaches with fixed partitioning cannot always give a better performance.

5. The frequency domain approach with top down partitioning approach yields the best overall performance in most of the cases.

1. The overall performance of PAE: AP > TS > APTS > HRTF.

How about perceptual testing?Usually only one source (speech) is better extracted. Because the spectrum of speech (as compared to music) is more focused in certain bands.

Conclusions and thoughts

21

2. Many considerations should go to the partitioning of the frequency bands.

4. The performance of PAE in frequency domain with fixed partitioning is not consistent as the directions of the sources change and the number of partitions changes.

3. Generally, SPCA outperforms PCA in almost all cases.

5. The frequency domain approach with top down partitioning approach yields some promising results in the performance in most of the cases.

1. A study of frequency domain PAE with different partitioning is conducted. It is targeted for multiple primary components that come from different directions concurrently. Two PAE approaches tested are PCA and shifted PCA.

Thoughts:A more robust partitioning is required! How to determine the threshold for other input signals.Need more accurate estimation of primary panning factor, and ICTD/ICC.How about other PAE approaches such as least squares ?