Groundwork)for)aResource)in)Computaonal…staff ·...

1
Groundwork for a Resource in Computa5onal Hearing for Extended String Techniques Amy V. Beeston 1 and Mark A. C. Summers 2 1 Department of Computer Science, 2 Department of Music, University of Sheffield, UK {a.beeston, m.summers}@sheffield.ac.uk Illustra5ons Introduc5on Extended techniques (ETs) are now mainstream in contemporary music. ETs are variable by nature and can be problema5c to perform consistently. Datasets addressing variability in instrumental performance are rare. A fruiTul integra5on of acous5c instrument and signal processing technology is desirable [cf. 1–5]. Human listeners are sensi5ve to context; we adapt to our environment and perceive ‘interes5ng’ varia5on in a signal [6–7]. Machine listeners o^en rely on cues that vary uninten5onally. We examine variability in recording strategy and performance itera5on in order to enhance machine listening for live instrumental performance. 1. M Parker (2007). Proc. Verband Deutscher Tonmeister Symposium 2. MW Young (2007). Proc. ICMC 508511 3. D Van Nort, J Braasch & P Oliveros (2009). Proc. SMC 131135 4. W Hsu (2010). Leonardo Music J. 20, 3339 5. PA Tremblay & D Schwarz (2010). Proc. NIME 1518 6. AJ Watkins (2005). J. Acoust. Soc. Am. 118 (1) 249262 7. CE S5lp, JM Alexander, M Kie^e & KR Kluender (2010). Anen. Percept. Psycho. 72 (2), 470480 8. P Strange & A Strange (2001). The contemporary violin: extended performance techniques. University of California Press, Berkeley 9. B Turestzky (1989). The contemporary contrabass. 2 nd ed. University of California Press, Berkeley 10. Praat – hnp://www.praat.org 11. G Peeters, BL Giordano, P Susini, N Misdariis & S McAdams (2011). J. Acoust. Soc. Am. 130 (5) 29022916 12. A Francis (2004). Business mathema5cs and sta5s5cs. 6 th ed. 150155. Thomson Learning, London. 13. JM Grey & JW Gordon (1978). J. Acoust. Soc. Am. 63 (5), 14931500 Methods Small number of ETs for viola da gamba selected from survey [cf. 8–9]. Fixed pitch (A 3 ), loudness (RMS) and dura5on (2 seconds). Sound produced on 6 strings, with 5 bowing techniques. Click track and notated score to aid performance. 6 itera5ons (repe55ons) of each technique. 1. Selec5on of techniques 2. Selec5on of microphones Recordings were made at The University of Sheffield Sound Studios in an acous5cally isolated room (volume 34.7 m 3 ). Three ‘close’ microphones and one ‘far’ microphone were used. Signals recorded to control room via RME Fireface 800 audio interface. Microphone Direction Proximity Placement DPA 4060 omni close below bridge, under highest (1st) string DPA 4060 omni close below bridge, under middle (4th) string Neumann KM184 cardioid close 0.1 m in front of instrument’s bridge Neumann KM184 cardioid far 3.6 m distant to front, raised 1.8 m Individual samples extracted from the long audio recordings. 24 audio files for each technique (6 itera5ons X 4 microphones). Twostage process of segmenta5on: 1. Start/stop 5mes of bow movement marked in Praat TextGrid [10]. 2. TextGrid read in Matlab to excise samples and equalise RMS level. 3. Sample extrac5on Automa5c annota5on achieved using the Timbre Toolbox [11]. To match human audi5on, we reason that the best parameters should capture the `interes5ng’ varia5on. Thus a small variance is desired for unimportant changes in recording strategy and for unintended changes in performance repe55on. Peeters et al. stress importance of parameters capturing the central tendency and temporal variability of spectrotemporal proper5es, the temporal envelope and the periodicity of the signal [11]. We inspect varia5on according to the first and last of these, using spectral centroid and spectral flatness measures. 4. Timbral annota5on Human variability was measured with a rela5ve, dimensionless measure [12]: the quar5le coefficient of dispersion (QCD). First, the median and interquar5le range (iqr = Q3–Q1) were derived for individual audio samples by 5mevarying, framebased analysis methods [11]. QCD quan5fies quar5le devia5on (iqr/2) as a percentage of the median QCD = (iqr / 2) x (100 / median) Stable parameters result in low QCD values (close to zero). A high QCD value implies a high degree of variability. 5. Variability measure The open string (2) showed a high centre of gravity or `brightness’, especially when recorded by the DPAs. The three ‘close’ microphones recorded consistently lower values than the ‘far’ microphone for the stopped strings (3–7). 1. Recording strategy Extended techniques were unstable and resulted in higher QCD scores. The standard bowing techniques were more consistent throughout the dura5on of the sound and achieved lower QCD values. 2. Performance itera5on Discussion Much work done in recent years to extract control parameters from audio signals in live performance, however signal variability arising from recording strategy and performance itera5on is typically unreported. Two perceptuallycorrelated parameters were used to quan5fy varia5on anributable to the microphone setup and to human reproduceability for a range of normal and extended performance techniques. ETs were found to contain more inherent varia5on than normal bowing. 2 3 4 5 6 7 200 400 600 800 1000 String number Spectral centroid median (Hz) DPA string 1 DPA string 4 Neumann close Neumann far Mean and standard error of the spectral centroid median of the ShortTerm Fourier Transform (STFT) power spectrum for six itera5ons of standard bowing of the pitch A 3 on strings 2 to 7. 01 02 03 04 05 06 07 08 09 10 0 10 20 30 40 Technique Spectral flatness QCD 01 = str 2, colegtratnot 02 = str 2, alf 03 = str 2, sulpont 04 = str 2, sultast 05 = str 5, bow 06 = str 2, bow 07 = str 3, bow 08 = str 4, bow 09 = str 6, bow 10 = str 7, bow Ten versions of the pitch A 3 ranked according to the QCD derived from the spectral flatness of the STFT power spectrum. Mean and standard error incorporate the three close microphone posi5ons that might be used in performance. d Workflow Timbral parameters Spectral centroid – indicates centre of mass – correlates with brightness [13] Spectral flatness – 1 if noisy (flat spectra) – 0 if tonal (peaky spectra) [11]

Transcript of Groundwork)for)aResource)in)Computaonal…staff ·...

Page 1: Groundwork)for)aResource)in)Computaonal…staff · Groundwork)for)aResource)in)Computaonal)Hearing)for)Extended)String)Techniques) ... for)unintended)changes)in)performance) ...

Groundwork  for  a  Resource  in  Computa5onal  Hearing  for  Extended  String  Techniques  

Amy  V.  Beeston1  and  Mark  A.  C.  Summers2   1Department  of  Computer  Science,  2Department  of  Music,  University  of  Sheffield,  UK  {a.beeston,  m.summers}@sheffield.ac.uk  

Illustra5ons  Introduc5on  •  Extended  techniques  (ETs)  are  now  mainstream  in  contemporary  music.  •  ETs  are  variable  by  nature  and  can  be  problema5c  to  perform  consistently.  • Datasets  addressing  variability  in  instrumental  performance  are  rare.  • A   fruiTul   integra5on   of   acous5c   instrument   and   signal   processing  technology  is  desirable  [cf.  1–5].  • Human   listeners   are   sensi5ve   to   context;   we   adapt   to   our   environment  and  perceive  ‘interes5ng’  varia5on  in  a  signal  [6–7].  • Machine  listeners  o^en  rely  on  cues  that  vary  uninten5onally.  • We  examine  variability  in  recording  strategy  and  performance  itera5on  in  order  to  enhance  machine  listening  for  live  instrumental  performance.  

1.  M  Parker  (2007).  Proc.  Verband  Deutscher  Tonmeister  Symposium  2.  MW  Young  (2007).  Proc.  ICMC  508-­‐511  3.  D  Van  Nort,  J  Braasch  &  P  Oliveros  (2009).  Proc.  SMC  131-­‐135  

4.  W  Hsu  (2010).  Leonardo  Music  J.  20,  33-­‐39  5.  PA  Tremblay  &  D  Schwarz  (2010).  Proc.  NIME  15-­‐18  6.  AJ  Watkins  (2005).  J.  Acoust.  Soc.  Am.  118  (1)  249-­‐262  

7.  CE  S5lp,  JM  Alexander,  M  Kie^e  &  KR  Kluender  (2010).  Anen.  Percept.  Psycho.  72  (2),  470-­‐480  

8.  P  Strange  &  A  Strange  (2001).  The  contemporary  violin:  extended  

performance  techniques.  University  of  California  Press,  Berkeley  9.  B  Turestzky  (1989).  The  contemporary  contrabass.  2nd  ed.  

University  of  California  Press,  Berkeley  

10.  Praat  –  hnp://www.praat.org  11.  G  Peeters,  BL  Giordano,  P  Susini,  N  Misdariis  &  S  McAdams  (2011).  

J.  Acoust.  Soc.  Am.  130  (5)  2902-­‐2916  

12.  A  Francis  (2004).  Business  mathema5cs  and  sta5s5cs.  6th  ed.  150-­‐155.  Thomson  Learning,  London.  

13.  JM  Grey  &  JW  Gordon  (1978).  J.  Acoust.  Soc.  Am.  63  (5),  1493-­‐1500  

Methods  

•  Small  number  of  ETs  for  viola  da  gamba  selected  from  survey  [cf.  8–9].  •  Fixed  pitch  (A3),  loudness  (RMS)  and  dura5on  (2  seconds).  •  Sound  produced  on  6  strings,  with  5  bowing  techniques.  •  Click  track  and  notated  score  to  aid  performance.  •  6  itera5ons  (repe55ons)  of  each  technique.  

1.  Selec5on  of  techniques  

2.  Selec5on  of  microphones  

• Recordings  were  made  at  The  University  of  Sheffield  Sound  Studios  in  an  acous5cally  isolated  room  (volume  34.7  m3).  •  Three  ‘close’  microphones  and  one  ‘far’  microphone  were  used.  •  Signals  recorded  to  control  room  via  RME  Fireface  800  audio  interface.  

Computational Hearing for Extended String Techniques 3

2 Methods

In the pilot study described below, a prototype corpus was used to examinethe variation naturally arising in normal and extended performance techniquesdue to (i) the recording conditions and (ii) iteration of the technique by theperformer. This section describes four main operations undertaken to gatherdata appropriate to the task: selection of performance techniques; selection ofmicrophones and their placement; sample extraction and storage; automatic an-notation with timbral descriptors.

2.1 Selection of Techniques

The current study draws its sound material from an ongoing project documentingthe sound world of the viola da gamba. An instrument-specific list of techniques(normal and extended) has been compiled, informed by the performing back-ground of one of the present authors (MS) with cross-reference to other surveysof extended techniques on string instruments [7], [8], [17]. A list of 90 individualtechniques serves as the basis for the corpus.

A small number of these techniques have been picked for illustrative analysesin Section 3. Firstly, we fix the pitch, loudness and duration (as in typical timbrestudies), and examine bowing this pitch normally on six different strings. Sec-ondly, we use a single string to examine the effect of different bowing techniques.

2.2 Selection of Microphones and their Placement

Recordings were made in an acoustically isolated room in the University ofSheffield Sound Studios (volume 34.7 m3). Two walls were covered with heavyfelt curtains, and there was an upright piano on another wall. The player sat inone corner pointing diagonally towards a ‘far’ room microphone at a distanceof 3.6 meters. Three further ‘close’ microphones were placed on or near theinstrument as described in Table 1.

The signal arriving at each microphone was recorded via an RME Fireface800 audio interface connected to a MacBook in an adjoining control studio,running Audacity software [19]. Two DPA microphones were directly attached tothe instrument itself, and represent the highest signal-to-noise ratio practicably

Table 1. Description of microphones selected, their directional characteristics andplacement in regard to the instrument and room.

Microphone Direction Proximity Placement

DPA 4060 omni close below bridge, under highest (1st) stringDPA 4060 omni close below bridge, under middle (4th) stringNeumann KM184 cardioid close 0.1 m in front of instrument’s bridgeNeumann KM184 cardioid far 3.6 m distant to front, raised 1.8 m

•  Individual  samples  extracted  from  the  long  audio  recordings.  •  24  audio  files  for  each  technique  (6  itera5ons  X  4  microphones).  •  Two-­‐stage  process  of  segmenta5on:  1.  Start/stop  5mes  of  bow  movement  marked  in  Praat  TextGrid  [10].  2.  TextGrid  read  in  Matlab  to  excise  samples  and  equalise  RMS  level.  

3.  Sample  extrac5on  

• Automa5c  annota5on  achieved  using  the  Timbre  Toolbox  [11].  •  To  match  human  audi5on,  we  reason  that  the  best  parameters  should  capture  the  `interes5ng’  varia5on.  •  Thus  a  small  variance  is  desired  for  unimportant  changes  in  recording  strategy  and  for  unintended  changes  in  performance  repe55on.  •  Peeters   et   al.   stress   importance   of   parameters   capturing   the   central  tendency  and  temporal  variability  of  spectro-­‐temporal  proper5es,  the  temporal  envelope  and  the  periodicity  of  the  signal  [11].  • We   inspect   varia5on   according   to   the   first   and   last   of   these,   using  spectral  centroid  and  spectral  flatness  measures.  

4.  Timbral  annota5on  

• Human   variability   was   measured   with   a   rela5ve,   dimensionless  measure  [12]:  the  quar5le  coefficient  of  dispersion  (QCD).  •  First,  the  median  and  inter-­‐quar5le  range  (iqr  =  Q3–Q1)  were  derived  for   individual   audio   samples   by   5me-­‐varying,   frame-­‐based   analysis  methods  [11].  • QCD  quan5fies  quar5le  devia5on  (iqr/2)  as  a  percentage  of  the  median  

QCD  =  (iqr  /  2)  x  (100  /  median)  •  Stable  parameters  result  in  low  QCD  values  (close  to  zero).  • A  high  QCD  value  implies  a  high  degree  of  variability.  

 

5.  Variability  measure  

•  The   open   string   (2)   showed   a   high   centre   of   gravity   or   `brightness’,  especially  when  recorded  by  the  DPAs.  •  The  three  ‘close’  microphones  recorded  consistently  lower  values  than  the  ‘far’  microphone  for  the  stopped  strings  (3–7).  

1.  Recording  strategy  

•  Extended  techniques  were  unstable  and  resulted  in  higher  QCD  scores.  •  The  standard  bowing  techniques  were  more  consistent  throughout  the  dura5on  of  the  sound  and  achieved  lower  QCD  values.  

2.  Performance  itera5on  

Discussion  • Much  work  done  in  recent  years  to  extract  control  parameters  from  audio  signals   in   live   performance,   however   signal   variability   arising   from  recording  strategy  and  performance  itera5on  is  typically  unreported.  •  Two  perceptually-­‐correlated   parameters  were   used   to   quan5fy   varia5on  anributable  to  the  microphone  setup  and  to  human  reproduceability  for  a  range  of  normal  and  extended  performance  techniques.  •  ETs  were  found  to  contain  more  inherent  varia5on  than  normal  bowing.  

2 3 4 5 6 7

200

400

600

800

1000

String number

Spect

ral c

entr

oid

media

n (

Hz)

DPA string 1 DPA string 4 Neumann close Neumann far

Mean  and  standard  error  of  the  spectral  centroid  median  of  the  Short-­‐Term  Fourier  Transform  (STFT)  power  spectrum  for  six  itera5ons  of  standard  bowing  of  the  pitch  A3  on  strings  2  to  7.  

01 02 03 04 05 06 07 08 09 100

10

20

30

40

Technique

Spec

tral f

latn

ess

QC

D

01 = str 2, colegtratnot02 = str 2, alf03 = str 2, sulpont04 = str 2, sultast05 = str 5, bow06 = str 2, bow07 = str 3, bow08 = str 4, bow09 = str 6, bow10 = str 7, bow

Ten  versions  of  the  pitch  A3  ranked  according  to  the  QCD  derived  from  the  spectral  flatness  of  the  STFT  power  spectrum.  Mean  and  standard  error  incorporate  the  three  close  microphone  posi5ons  that  might  be  used  in  performance.  

d  

Workflow  

Timbral  parameters  Spectral  centroid    –  indicates  centre  of  mass    –  correlates  with  brightness  [13]  

Spectral  flatness    –  1  if  noisy  (flat  spectra)    –  0  if  tonal  (peaky  spectra)  [11]