Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School...

21
Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore Email: [email protected]

Transcript of Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School...

Page 1: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Study on Frequency Domain Primary-Ambient Extraction (PAE)

HE Jianjun

PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore

Email: [email protected]

Page 2: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Introduction– PAE based Spatial Audio System

2

PAEInput

Primary components

Pre-processing

Ambient components

Primaryrendering

Ambient rendering

Spatial attributes

Post- processing

Output

Page 3: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Primary components highly correlated

Ambient components uncorrelated

Primary ambient components uncorrelated

Ambient power balanced

Stereo Signal Model

3

L L L

R R R

x p a

x p a

Signal = Primary + Ambient

: Left channel

: Right channel

L

R

Assumptions

R Lkp p

L Ra a

L RP Pa a

( ) ( )L R L Rp a

Page 4: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Stereo Signal Model

4

2

Primary panning factor PPF:

12 2

R

L

RR LL RR LL

LR LR

r r r

r

k

rk

r

p

p

Total primary powerPrimary power ratio PPR:

Total signal power

2, [0,1]LR RR LL

RR LL

r r r k

r r k

, : autocorrelation of the left, right channel; : cross correlation between the left and right channelLL RR LRr r r

k

1

Center

RightLeft

1/10 10

Page 5: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

PAE in full band, time domain

5

0 0.5 1 1.5 2 2.5 3 3.5 4

x 105

-0.02

0

0.02

0 0.5 1 1.5 2 2.5 3 3.5 4

x 105

-0.02

0

0.02

200 400 600 800 1000 1200 1400 1600 1800 20000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Compute parameters: k, γ

22

1,2 2

LR RR LLRR LL RR LL

LR LR RR LL

r r r kr r r rk

r r r r k

22

2

2 2

2

22

2 2

111

1 1

111

1

ˆ

1

ˆ

ˆ

ˆ

L

R

L

R

L

R

k

kkk k

k kkkkk

k

k k

p

xp

a

xa

500 1000 1500 2000 2500 3000 3500 4000-2

0

2x 10

-3

500 1000 1500 2000 2500 3000 3500 4000-2

0

2x 10

-3

Page 6: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

PAE in full band, frequency domain

6

0 0.5 1 1.5 2 2.5 3 3.5 4

x 105

-0.02

0

0.02

0 0.5 1 1.5 2 2.5 3 3.5 4

x 105

-0.02

0

0.02

200 400 600 800 1000 1200 1400 1600 1800 20000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Compute parameters: k, γ

500 1000 1500 2000 2500 3000 3500 4000-2

0

2x 10

-3

500 1000 1500 2000 2500 3000 3500 4000-2

0

2x 10

-3

5001000

15002000

25003000

35004000

0

0.05

5001000

15002000

25003000

35004000

0

0.05

FFT

Page 7: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

PAE in subband, frequency domain

7

5001000

15002000

25003000

35004000

0

0.05

5001000

15002000

25003000

35004000

0

0.05

f

L R

X(f) k(f)

k

k

k

k

k

k

k

k

k

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

X(f) k represent the panning of the source

AssumptionIn each band, only one source is dominant. The overlapping among the spectrum of different sources shall be minimal.

Page 8: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Correlation computation and time shifting

8

, 44,44N

xyc x n y n

*

*

,0 44

, 44 0xy

IDFT X f Y fc

IDFT X f Y f

2 o

DFTj f N

o Nx n X f e

2

2

ˆ

ˆ

o

o

j f NLpL LpRL L

j f NRR RpL RpR

w f w f eP f X f

X fP f w f e w f

In time domain

In frequency domain

Find the inter-channel time difference (ICTD) arg maxo xyc

Apply ICTD in frequency domain

Page 9: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

φ0

How to partition the bands?

9

5001000

15002000

25003000

35004000

0

0.05

5001000

15002000

25003000

35004000

0

0.05

f

L R

X(f) Ideally, the number of partitions = number of sources

Fixed partitioning: independent of input• Uniform (2, 4, 8, etc.)• Non-uniform (e.g. ERB)

Based on inter-channel cross- correlation coefficient (ICC) φ, Two thresholds: φL , φH

φ1

Adaptive partitioning: dependent of input • Top-down• Bottom-up

…Conditions for partition: •φ0 < φH

•max(φ1, φ2) > φ0

•min(φ1, φ2) > φL

Unknown

φ2

Page 10: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Multiple (2) sources

10

Three cases for the directions of two sources:1.At different sides (DS)2.One at the center (C)3.At the same side (SS)

Four ways to synthesize the source direction1.Amplitude panning (AP)2.Time shifting (TS)3.Amplitude panning and time shifting (APTS)4.HRTF filtering (HRTF)

Page 11: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Simulation testing: setup

11

Primary components: Speech, musicAmbient components: white Gaussian noisePrimary power ratio = 0.9Frame length: 4096Hanning window, 50% overlapping

We test PCA and SPCA with different frequency partitioning•Time domain, full band (Reference)•Uniform partitioning with [1, 2, 4, 8, 16, 32, 64] partitions•Non-uniform partitioning with 20 partitions (Faller, BCC [6])•Top-down (TD) partitioning, with φL = 0.1; φH =0.8

[6] C. Faller, and F. Baumgarte, “Binaural cue coding-part II: schemes and applications,” IEEE Tran. Speech and Audio Processing, vol. 11, no. 6, Nov. 2003.

Performance measure: Error-to-Signal Ratio

ˆ ˆ

10

,

( ) 10log .2

L L R R

L R

P P P PL R

P P

L R

P PESR ESR

P P

ESR ESRESR dB

Page 12: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Simulation Results: 1 source

12

Primary component: speech shifted by 20 lags, panned by k=3.

T 1 2 4 8 16 32 64 20non TDPCA -3.69 -3.72 -3.38 -3.45 -3.34 -3.32 -3.16 -3.19 -3.33 -3.72

SPCA -14.78 -14.85 -12.34 -12.05 -11.52 -11.35 -10.63 -9.30 -10.34 -14.38

1. Generally, SPCA better than PCA.2. The time domain PCA (SPCA) is very close to the frequency domain PCA (SPCA)

when there is only one partition.3. Significant worse performance is found in the frequency domain approaches with

fixed partitioning.4. The performance of the top down partitioning is acceptable.

0 2 4 6 8 10 12 14 16

-40

-30

-20

-10

0

10

20

30

40

Frequency partition k

ICT

D

estimated

true

0 2 4 6 8 10 12 14 160.5

1

1.5

2

2.5

3

3.5

4

Frequency partition k

PP

F

estimated

true

Primary panning factor

ICTD

Page 13: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Simulation Results: 2 sources

13

Four ways to synthesize the source direction1.Amplitude panning (AP)2.Time shifting (TS)3.Amplitude panning and time shifting (APTS)4.HRTF filtering (HRTF)

Three cases for the directions of two sources:1.At different sides (DS)2.One at the center (C)3.At the same side (SS)

Page 14: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Simulation Results: 2 sources-AP

14

1. Generally, the performance of SPCA and PCA is similar.

DS T 1 2 4 8 16 32 64 20non TDPCA -7.95 -8.10 -8.13 -8.22 -8.34 -8.56 -8.56 -9.80 -10.86

SPCA -7.94 -8.15 -8.18 -8.26 -8.36 -8.61 -8.39 -9.33 -9.95 -8.36

C T 1 2 4 8 16 32 64 20non TDPCA -10.15 -10.25 -10.14 -10.22 -10.27 -10.38 -10.34 -11.16 -11.99

SPCA -10.14 -10.33 -10.22 -10.24 -10.30 -10.43 -10.04 -10.29 -10.12 -10.38

SS T 1 2 4 8 16 32 64 20non TDPCA -13.04 -13.10 -11.82 -11.88 -11.75 -11.53 -11.31 -11.40 -11.93

SPCA -13.02 -13.23 -12.04 -11.81 -11.65 -11.56 -10.30 -10.24 -10.52 -13.21

2. The performance is better when the two directions become closer.3. The frequency domain approaches with fixed partitioning show some advantage when the primary components are not in the same side.

4. The frequency domain approach with top down partitioning yields a good performance.

Page 15: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Simulation Results: 2 sources-TS

15

1. Clearly, SPCA perform better than PCA.

DS T 1 2 4 8 16 32 64 20non TDPCA -5.16 -5.21 -5.23 -5.24 -5.23 -5.23 -5.26 -5.38 -5.47 -5.21

SPCA -7.98 -8.44 -8.43 -8.59 -8.69 -8.73 -8.62 -9.12 -8.58 -8.91

C T 1 2 4 8 16 32 64 20non TDPCA -9.10 -9.14 -9.18 -9.20 -9.18 -9.17 -9.14 -9.18 -9.27 -9.14

SPCA -9.13 -9.60 -9.70 -9.85 -9.97 -9.85 -9.54 -9.91 -9.35 -10.05

SS T 1 2 4 8 16 32 64 20non TDPCA -5.37 -5.38 -5.40 -5.42 -5.41 -5.42 -5.41 -5.44 -5.49 -5.38

SPCA -11.15 -11.65 -11.71 -11.78 -11.94 -11.86 -11.02 -11.20 -9.42 -11.68

2. The performance of SPCA is better when no directions in the center.3. The frequency domain approaches with fixed partitioning show some slightly advantage and does not vary too much in different partitioning.

4. The frequency domain approach with top down partitioning yields the best overall performance.

Page 16: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Simulation Results: 2 sources-APTS

16

1. Clearly, SPCA perform better than PCA.

DS T 1 2 4 8 16 32 64 20non TDPCA -5.16 -5.21 -5.23 -5.23 -5.23 -5.23 -5.26 -5.38 -5.47

SPCA -7.99 -8.44 -8.43 -8.59 -8.69 -8.73 -8.62 -9.12 -8.58 -8.91

C T 1 2 4 8 16 32 64 20non TDPCA -8.06 -8.28 -8.19 -8.27 -8.34 -8.46 -8.44 -9.04 -9.55 -8.28

SPCA -8.07 -8.43 -8.38 -8.40 -8.57 -8.63 -8.44 -8.70 -9.07 -8.68

SS T 1 2 4 8 16 32 64 20non TDPCA -4.18 -4.18 -3.95 -3.97 -3.91 -3.92 -3.89 -3.87 -3.98 -4.19

SPCA -10.16 -10.60 -9.89 -9.75 -9.80 -9.77 -9.07 -8.68 -7.29 -10.82

2. The performance is better when two directions are closer.3. The frequency domain approaches with fixed partitioning perform better when the two directions are not in the same side.

4. The frequency domain approach with top down partitioning yields the best overall performance for all cases.

Page 17: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Simulation Results: 2 sources-APTS

17

1. Clearly, SPCA perform better than PCA.

DS T 1 2 8 3220no

nTD18

0.05,0.8

0.2,0.8

0.2,0.7

0.1,0.7

0.05,0.7

0.05,0.9

0.1,0.9

0.2,0.9

PCA -4.74 -5.04 -5.04 -5.22 -5.48 -6.85 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03

SPCA -6.45 -6.85 -6.85 -7.11 -7.25 -7.73 -6.85 -8.36 -6.85 -6.85 -6.85 -7.93 -8.58 -6.85 -6.85

C T 1 2 8 3220no

nTD18

0.05,0.8

0.2,0.8

0.2,0.7

0.1,0.7

0.05,0.7

0.05,0.9

0.1,0.9

0.2,0.9

PCA -8.06 -8.28 -8.19 -8.34 -8.44 -9.55 -8.28 -8.44 -8.28 -8.28 -8.28 -8.44 -8.44 -8.27 -8.27

SPCA -8.07 -8.43 -8.38 -8.57 -8.44 -9.07 -8.68 -9.06 -8.44 -8.44 -8.68 -8.58 -9.93 -8.6 -8.44

SS T 1 2 8 3220no

nTD18

0.05,0.8

0.2,0.8

0.2,0.7

0.1,0.7

0.05,0.7

0.05,0.9

0.1,0.9

0.2,0.9

PCA -4.18 -4.18 -3.95 -3.91 -3.89 -3.98 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19

SPCA-

10.16-

10.60-9.89 -9.80 -9.07 -7.29

-10.82

-10.11

-10.6 -10.6 -10.6-

10.41-8.53

-10.27

-10.6

2. The performance is better when two directions are closer.3. The frequency domain approaches with fixed partitioning perform better when the two directions are not in the same side.

4. The frequency domain approach with top down partitioning yields the best overall performance for all cases.

Page 18: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Simulation Results: 2 sources-APTS

18

1. Clearly, SPCA perform better than PCA.

DS T 2 8 3220no

n0.05,0.7

TD180.05,0.8

0.2,0.8

0.2,0.7

0.1,0.7

0.05,0.7

0.05,0.9

0.1,0.9

0.2,0.9

PCA -4.74 -5.04 -5.22 -5.48 -6.85 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03 -5.03

SPCA -6.45 -6.85 -7.11 -7.25 -7.73 -7.93 -6.85 -8.36 -6.85 -6.85 -6.85 -7.93 -8.58 -6.85 -6.85

C T 2 8 3220no

n0.05,0.7

TD180.05,0.8

0.2,0.8

0.2,0.7

0.1,0.7

0.05,0.7

0.05,0.9

0.1,0.9

0.2,0.9

PCA -8.06 -8.19 -8.34 -8.44 -9.55 -8.44 -8.28 -8.44 -8.28 -8.28 -8.28 -8.44 -8.44 -8.27 -8.27

SPCA -8.07 -8.38 -8.57 -8.44 -9.07 -8.58 -8.68 -9.06 -8.44 -8.44 -8.68 -8.58 -9.93 -8.6 -8.44

SS T 2 8 3220no

n0.05,0.7

TD180.05,0.8

0.2,0.8

0.2,0.7

0.1,0.7

0.05,0.7

0.05,0.9

0.1,0.9

0.2,0.9

PCA -4.18 -3.95 -3.91 -3.89 -3.98 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19 -4.19

SPCA-

10.16-9.89 -9.80 -9.07 -7.29

-10.41

-10.82

-10.11

-10.6 -10.6 -10.6-

10.41-8.53

-10.27

-10.6

2. The performance is better when two directions are closer.3. The frequency domain approaches with fixed partitioning perform better when the two directions are not in the same side.

4. The frequency domain approach with top down partitioning yields the best overall performance for all cases.

Page 19: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Simulation Results: 2 sources-HRTF

19

1. Clearly, SPCA perform better than PCA.

DS T 1 2 4 8 16 32 64 20non TDPCA -3.28 -3.46 -3.46 -3.44 -3.50 -3.74 -3.77 -3.89 -3.85 -2.47

SPCA -6.49 -6.07 -6.08 -6.13 -6.42 -7.33 -5.70 -5.56 -5.71 -6.52

C T 1 2 4 8 16 32 64 20non TDPCA -6.96 -7.16 -7.09 -7.12 -7.21 -7.37 -7.50 -7.60 -7.65 -7.16

SPCA -7.41 -7.97 -7.89 -7.92 -7.87 -8.33 -6.97 -6.15 -7.17 -8

SS T 1 2 4 8 16 32 64 20non TDPCA -1.14 -1.26 -1.14 -1.15 -1.12 -1.06 -1.04 -1.07 -1.19 -1.88

SPCA -6.58 -6.81 -6.48 -6.48 -6.70 -7.44 -2.51 -2.44 -2.90 -6.98

2. The performance of SPCA is better when one direction is in the center.3. The frequency domain approaches with fixed partitioning show better performance only in some partitionings.

4. The frequency domain approach with top down partitioning yields the best overall performance for all cases.

Page 20: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Summary of Simulation Results: 2 sources

20

2. Generally, SPCA perform better than PCA in almost all cases.

3. The performance varies as the directions of the sources change.

4. The frequency domain approaches with fixed partitioning cannot always give a better performance.

5. The frequency domain approach with top down partitioning approach yields the best overall performance in most of the cases.

1. The overall performance of PAE: AP > TS > APTS > HRTF.

How about perceptual testing?Usually only one source (speech) is better extracted. Because the spectrum of speech (as compared to music) is more focused in certain bands.

Page 21: Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.

Conclusions and thoughts

21

2. Many considerations should go to the partitioning of the frequency bands.

4. The performance of PAE in frequency domain with fixed partitioning is not consistent as the directions of the sources change and the number of partitions changes.

3. Generally, SPCA outperforms PCA in almost all cases.

5. The frequency domain approach with top down partitioning approach yields some promising results in the performance in most of the cases.

1. A study of frequency domain PAE with different partitioning is conducted. It is targeted for multiple primary components that come from different directions concurrently. Two PAE approaches tested are PCA and shifted PCA.

Thoughts:A more robust partitioning is required! How to determine the threshold for other input signals.Need more accurate estimation of primary panning factor, and ICTD/ICC.How about other PAE approaches such as least squares ?