Estimation of Violin Bowing Features from Audio...

1
ffw NN Estimation of Violin Bowing Features from Audio Recordings with Convolutional Networks Alfonso Perez-Carrillo Music Technology Group Universitat Pompeu Fabra , Barcelona, Spain [email protected] Hendrik Purwins The School of Engineering and Science Aalborg University Copenhagen Copenhagen, DK [email protected] ML4Audio The measurement (or direct acquisition ) of musical gestures usually involves the use of expensive sensing systems and complex setups that are generally intrusive in practice. In this work, we present an indirect acquisition method to estimate violin bowing controls from audio signal analysis based on training Convolutional Neural Networks with a previously recorded database of multimodal data (bowing controls and sound features) of violin performances. sound bowing Sinusoidal Model (SMS) Inputs & Outputs sound harmonic residual Harmonic Energy in 40 harmonic + 40 residual frequency bands. samples Residual 40 30 20 10 40 30 20 10 Logarithmic band centers, 50%overlap Frequency [Hz] Triangular analysis windows Harmonic/residual spectrum Outputs: Bowing Controls (measured with sensors) which string bowing pressure bowing speed bow-bridge distance Inputs: Auditory EnergyGram X 9 9 20 2 2 2x2x1x9 40 18 9 X 2 2 9 3 5 2x2x9x9 9 X 2 2 9 2 3 2x2x9x9 9 9 x 3 x 2 18 100 fully connected layer 100 flatten 9 x 3 x 2 x + 100 x + 18 fully connected layer 25 bow control Network Architecture Correlation Coefficient Mean Absolute Error Avg. error in parameter units Relative Absolute Error Unit-less avg. error percentage Root Relative Squared Error Similar to RAE but weights outliers more heavily due to the square. Evaluation

Transcript of Estimation of Violin Bowing Features from Audio...

Page 1: Estimation of Violin Bowing Features from Audio …media.aau.dk/smc/wp-content/uploads/2017/09/ML4... · ffw NN Estimation of Violin Bowing Features from Audio Recordings with Convolutional

ffwNN

Estimation of Violin Bowing Features from Audio Recordings with Convolutional NetworksAlfonso Perez-Carrillo

Music Technology Group Universitat Pompeu Fabra , Barcelona, Spain

[email protected]

Hendrik Purwins The School of Engineering and Science

Aalborg University Copenhagen Copenhagen, DK [email protected] ML4Audio

The measurement (or direct acquisition ) of musical gestures usually involves the use of expensive sensing systems and complex setups that are generally intrusive in practice. In this work, we present an indirect acquisition method to estimate violin bowing controls from audio signal analysis based on training Convolutional Neural Networks with a previously recorded database of multimodal data (bowing controls and sound features) of violin performances.

sound bowing

Sinusoidal Model (SMS)

Inputs & Outputs

sound

harmonic

residual

Har

mon

ic

Energy in 40 harmonic + 40 residual frequency bands.

samples

Resi

dual

40

30 20

10

40

30

20

10

Logarithmic band centers, 50%overlapFrequency [Hz]

Triangular analysis windows

Harmonic/residual spectrum

Outputs: Bowing Controls (measured with sensors)which string bowing pressure bowing speed bow-bridge distance

Inputs: Auditory EnergyGram

X9

9

20

2

2

2x2x1x9

40

18

9

X

2

2

93

5

2x2x9x9

9

X

2

2

9 2

3

2x2x9x9

99 x 3 x 2

18 100

fully connected layer

100 flatten

9 x 3 x 2

x + 100

x + 18

fully connected layer

25bow control

Network Architecture

Correlation Coefficient

Mean Absolute ErrorAvg. error in parameter unitsRelative Absolute ErrorUnit-less avg. error percentage Root Relative Squared ErrorSimilar to RAE but weights outliers more heavily due to the square.

Evaluation