[IEEE International Conference on Neural Networks - Washington, DC, USA (10-16 July 1999)] IJCNN'99....

4
Connectionist System for Music Interpretation Alejandro Pazos, A. Santos del Riego, Juliiin Dorado and J.J. Romero Cardalda Computer Science Dept. University of A Coruiia 15071 A C o d a . Spain emails: [email protected],[email protected], [email protected],[email protected] (34)981-167000 E: 1302 Abstract This paper uses an adaptive technique such as Artificial Neural Networks (here in afier A"s) for musical interpretation, solving the problems showed by traditional approaches. Human rhythmic perception is studied through several experiments, the suitability of ANNs for solving these kind of problems is analyzed compared to other approaches, and a real time connectionist model is presented together with its integration with the musician. This model uses relative inputs and temporal windows for information representation, and it can be used in collaboration with human musicians or independentb. Introduction The world of computer music is as old as the dream of creating a machine. The very mathematical origin of musical theory in the Greek world could make us suppose that a machine capable of an incredible mathematical treatment would also enable the creation of musical works. However, the traditional approaches of computer science in general [3] and Artificial Intelligence [lo] in particular, do not offer valid solutions in a world such as the musical one, characterized by its subjective and irrational character [l 11 [Zl] and by a kind of learning based on examples. This paper introduces a research focused on one of the most characteristic and irrational parts of music, rhythmic interpretation. In order to carry out this research, a series of experiments have been tried, with the purpose of clarifying the perception of rhythrmc variation in musical performance. From these experiments, a model based on ANNs has been implemented, which predicts these rhythmic variations in real time. Objectives This research has two main objectives. On the one hand, from a scientific point of view, it intends to deepen the kaowledge about music interpretation through its modelization via adaptive techniques. Besides, the study of complex cognoscitive tasks such as those related to music, may inspire the creation of new adaptive models or variants of the present ones. On the other hand, from an engineering point of view, this research intends to implement artificial interpretation systems which can be used in collaboration with musicians or which can make music software more fluency. Tempo Tracking Two important concepts can be distinguished within western music: musical theme and musical performance. A musical theme constitutes an abstract conceptualization which can be represented by notation or language. Music scores are a typical example of this kind of representation. Here time is represented in a hierarchical organization. In this way, an interval and a location fixed in a theme can be assigned to every note or musical event. Musical performance is the interpretation of a musical theme. Each time a musical theme is interpreted by a musician (or groups of musicians) a new performance is done. If exact measurements of the notes played during a musical performance are made, then it can be seen that their interval does not agree with the reference of a musical theme, but rather, it varies slightly. In the present study we propose the development of a system for the prediction of these variations. We use the gestures made by the musicians and listeners while performing or listening to a musical performance. These gestures or taps establish a reliable and universal hint of the rhythmic perception that the musician has during performance. The information is captured as Time Between Taps (TBT), expressing the time interval of each beat. Related Research There are some approaches to musical interpretation but they are based on traditional Artificial Intelligence paradigms such as problem-solving methods [4] or mathematical models [l]. The main problem of these 0-7803-5529-6/99/$10.00 01999 IEEE 4002

Transcript of [IEEE International Conference on Neural Networks - Washington, DC, USA (10-16 July 1999)] IJCNN'99....

Page 1: [IEEE International Conference on Neural Networks - Washington, DC, USA (10-16 July 1999)] IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339)

Connectionist System for Music Interpretation

Alejandro Pazos, A. Santos del Riego, Juliiin Dorado and J.J. Romero Cardalda Computer Science Dept. University of A Coruiia

15071 A C o d a . Spain

emails: [email protected], [email protected], [email protected], [email protected] (34)981-167000 E: 1302

Abstract This paper uses an adaptive technique such as Artificial Neural Networks (here in afier A"s) for musical interpretation, solving the problems showed by traditional approaches. Human rhythmic perception is studied through several experiments, the suitability of ANNs for solving these kind of problems is analyzed compared to other approaches, and a real time connectionist model is presented together with its integration with the musician. This model uses relative inputs and temporal windows for information representation, and it can be used in collaboration with human musicians or independentb.

Introduction The world of computer music is as old as the dream of creating a machine. The very mathematical origin of musical theory in the Greek world could make us suppose that a machine capable of an incredible mathematical treatment would also enable the creation of musical works. However, the traditional approaches of computer science in general [3] and Artificial Intelligence [lo] in particular, do not offer valid solutions in a world such as the musical one, characterized by its subjective and irrational character [l 11 [Zl] and by a kind of learning based on examples.

This paper introduces a research focused on one of the most characteristic and irrational parts of music, rhythmic interpretation. In order to carry out this research, a series of experiments have been tried, with the purpose of clarifying the perception of rhythrmc variation in musical performance. From these experiments, a model based on ANNs has been implemented, which predicts these rhythmic variations in real time.

Objectives This research has two main objectives. On the one hand, from a scientific point of view, it intends to deepen the kaowledge about music interpretation through its modelization via adaptive techniques. Besides, the study of complex cognoscitive tasks such as those related to music,

may inspire the creation of new adaptive models or variants of the present ones.

On the other hand, from an engineering point of view, this research intends to implement artificial interpretation systems which can be used in collaboration with musicians or which can make music software more fluency.

Tempo Tracking Two important concepts can be distinguished within western music: musical theme and musical performance. A musical theme constitutes an abstract conceptualization which can be represented by notation or language. Music scores are a typical example of this kind of representation. Here time is represented in a hierarchical organization. In this way, an interval and a location fixed in a theme can be assigned to every note or musical event.

Musical performance is the interpretation of a musical theme. Each time a musical theme is interpreted by a musician (or groups of musicians) a new performance is done. If exact measurements of the notes played during a musical performance are made, then it can be seen that their interval does not agree with the reference of a musical theme, but rather, it varies slightly. In the present study we propose the development of a system for the prediction of these variations. We use the gestures made by the musicians and listeners while performing or listening to a musical performance. These gestures or taps establish a reliable and universal hint of the rhythmic perception that the musician has during performance. The information is captured as Time Between Taps (TBT), expressing the time interval of each beat.

Related Research There are some approaches to musical interpretation but they are based on traditional Artificial Intelligence paradigms such as problem-solving methods [4] or mathematical models [l]. The main problem of these

0-7803-5529-6/99/$10.00 01999 IEEE 4002

Page 2: [IEEE International Conference on Neural Networks - Washington, DC, USA (10-16 July 1999)] IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339)

models is their difficulty for adapting to certain circumstances, resulting in general interpretation models.

Other works deal with the study of rhythm from the viewpoint of its structure, including the possibility of using real performances. However, these studies do not include the processing or the understanding of the performances. They just add a time margin to the perception of music notes [7][9].

There are no works that, as Desain [6] suggests, face the problem of tempo tracking independently, so that they can be integrated into real-time situations, being suitable to the circumstances of each performance.

Domain Study A number of experiments have been canied out with the purpose of completing the knowledge about music interpretation [2][5][17]. These experiments measured the errors made in different actions carried out by musicians of different skills. For example, one of these experiments was aimed at predicting the time of a sound produced by a metronome. The errors present an average of 50 milliseconds, with a minimum and maximum error of 25 and 109 milliseconds respectively. In this study, the importance of the experience was proved at different levels; global, coordination of several musicians, and particular, at the level of the musical theme. Those experienced musicians, with a previous experience with the partner and the musical theme obtained better results in the experiments. The importance of visual and gestural information was also proved. This is a factor of great complexity when we try to include it in a computer system.

Applicability Study One of the main characteristics that a system for music interpretation must possess is the possibility of real-time work [3]. ANNs solve this problem adequately, particularly the model used in the present research. Besides, as it was said in the introduction, the characteristics of musical information make it impossible to be formalized, especially in the case of rhythmic interpretation. Thus, the alternative of using a knowledge based system is not available in this case [14].

Rhythmic variations present a continuous character, therefore, their use in a symbolic system is complex. Meanwhile, ANNs, due to their nature [8][12][13][22], allow to add this type of data, and to work with a high level of error, as in the case of human performances, where there is a high occurrence of coordination and mechanical errors.

Finally, it is important to take into account the characteristics observed in the initial experimentation, which stress the importance of experience and adaptation to the circumstances of each performance. These conclusions agree with the type of music learning based on the use of examples. These characteristics can only be dealt by using adaptive techniques such as connectionist ones.

Methods The musical information used in this work is introduced in the form of pulsations of a MIDI pedal made by the musicians. The TBTs that mark the time duration between two consecutive pulsations are established from these pulses.

In this research we intended to use absolute or relative magnitudes. The absolute magnitudes used in other researches about ANNs in computer music [20] posed the problem in the present job of suitability to tempo range in western music, which ranges from 20 to 200 quarters per minute. However, in a music theme, depending on the theme's global tempo, a small variation may be very significative.

Therefore, in this research relative magnitudes are used in the inner representation of the pulses (TEiTs). The inputs of the ANN represent accelerations of the actual TBT with regard to its closest environment. This makes quick calculations possible and uses information from the environment. Moreover, the relation between variations in two performances played at different but similar global tempos can be observed directly, due to the information model used. The equation shows how the ANN inputs are calculated is the following:

ti l -til-, I =- *il-,

where Zi is the input to the process element i and ti, is the TBT

i=l

The limit for the variation produced between consecutive pulses was set at a 20% increase/decrease. This was established after a series of experiments which were part of the initial experimentation, where different artificial variations were established and the responses of a group of musicians were measured.

4003

Page 3: [IEEE International Conference on Neural Networks - Washington, DC, USA (10-16 July 1999)] IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339)

The ANN of the Connectionist System for Music Interpretation (here in afier CSMI) uses a temporal window that includes the latest accelerations and decelerations produced during the musician’s performance. The final aim of the CSMI is to predict the next acceleratioddeceleration, beginning with these data. For this reason, the output of the CSMI represents this value as a unique element of the process. Experimentally, the size of the temporal window used in the input was determined with seven accelerations/decelerations, each of them represented by a process element in the input layer of the ANN.

The ANN of the CSMI is a multilayered and feedforward one. The interconnection between the processing elements is complete, containing seven elements in the input layers, five in the hidden one and one in the output layer. The training file is made up of 330 patterns which have been obtained by capturing the pulsation on a pedal executed by 6 musicians playing one theme. The Delta Normal rule with a 16 epoch has been used for the learning process.

Conclusions The CSMI has proved to be efficient for the proposed task prediction of rhythm in real time. This was assessed observing the degree of convergence obtained in several cases, with a RMS error of 0.065. The CSMI results with anothex theme were worse than those obtained with the original one. This is n o d , taking into account the importance of experience shown in the domain study.

The model can work in collaboration with a musician, then the reference would be the pulses executed by the musician

or independently. It feedbacks itself, so that the output is an ANN input in

Future Works In order to improve this functionality, the possibility of including other parameters in the training of the system is proposed (density, tonal information, velocity, harmony, etc.), with the aim of obtaining a better understanding of the system in different themes.

Data acquisition poses an important problem in this kind of systems. In order to make it easier, a version of the system that allows to be used from internet is being developed. This will facilitate the acquisition of musical information worldwide in a simple way.

We are also planning the possibility of integrating this tool into a system for artificial music composition, implementing a complete artificial musician. This system, that will integrate Genetic Algorithms and ANNs for the composition of musical themes, is being developed [151[161[181[191.

Acknowledges The authors would like to thank Eugenia Camera Fornoso, Ana BelCn Port0 Pazos, Jose Luis Rodriguez Carballal, Manuel Riveiro Henno and Eva Celeiro Loureda for collaborating in the composition and translation of this article. This research was partly supported by grants from CICYT (TEL98-0291).

References [l] Canaza, S.& Rodii, A. 1999 “Adding Expressiveness in Musical Performance in Real Time”. In Proceedings of the AISB’99 Symposium on Musical Creativity. Edinburg. The Society for the Study of Artificial Intelligence and Simulation of Behaviour. pp. 134-140. [2] Clarke, E. 1987. “Categorical Rhythm Perception: An Ecological Perspective.” In Gabrielsson A. Eds. Action and Perception in Rhythm and Music. Stockholm: Royal Swedish Academy of Music. [3] Damenberg, R B. 1989. “Real-Time Scheduling and Computer Accompaniment”. In Mathews, M.V. and Pierce, J. R E&. Current Directions in Computer Music Research. Cambridge Mass.: The MIT Press. [4] Caiiamero, D., Lluis A., J., Mantaras, R 1999. “Imitating Human Performauces to Automatically Generate Expressive Jazz Ballads”. In Proceedings of the AISB’99 Symposium on Imitation in Animals and Artifacts. Edinburg. The Society for the Study of Artificial Intelligence and Simulation of Behaviour. pp. 115-121

4004

Page 4: [IEEE International Conference on Neural Networks - Washington, DC, USA (10-16 July 1999)] IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339)

[5] Desain, P., and Honing, H. 1993. “Tempo curves considered harmful”. In m e r , J. D. (Eds.), Time in contemporary musical thought. Contempomy Music Review. 7(2): 123-138. [6] Desain, P., and Honing, H. 1995. “Music, Mind, Machine. Computational Modeling of Temporal Structu in Musical Knowledge and Music Cognition”. Research proposal

[fl Devin M.J. 1994. ‘Time as Phase: Dynamic Model of Time Perception”. In hceedm ’ gs of the Sixteenh Annual Conference of the Cognitive Science Society. New Jersey: Lawrence Erlbaum Associates, Publisers, pp. 607-612. [8] Haykin, S. 1994. “ N e d Networks A Comprehensive Foundation.” New York MacMillan College Publ. Comp. Inc. [9] Large, E. W. 1994. “Models of Metrical Structure in Music”. In Proceedings of the Sixteenh Annual Conference of the Cognitive Science Society. New Jersey: Lawrence Erlbaum Associates, Publisers, pp. 537-542. [lo] Lerdahl, F., and Jackendoff, R 1983. “A generative theory of tonal music. “Cambridge, W. h4IT Press. [ll] Marvin, M., and Laske, 0. 1992. “Foreword: A Conversation with Marvin Misky”. In Balaban, C.S., Ebciagly, K., & Laske, 0. Eds. Understanding Music with AI. Perspectives on Music Cognition. Cambridge, MA: The AAAI P r e m Press. 1121 Pazos, A., and Col 1991. “Estructura, didmica y aplicaciones de la Redes de Neuronas Artificiales“. Spain: Ed. Centro de Estudios Ram6n Areces, S.A. In Spanish. [13] Pazos, S. A., Dorado, C. J., and Santos, R A. 1996. “Detection of pattern in Radiographs using ANN Designed and Trained with GA”. In R. Koza, J., E. Goldberg, D., B. Fogel, D. and L. Riolo, R, Genetic Programing. Proceedings of the First Annual Conference. Cambridge, MA: Mit Press. pp 432433 [14] Pazos, A., RomemCardalda, J., Santos, A. and Dorado, J. 1996 “Interactive Cormectionist System for Rhytmic Prediction.” In Progress in Neural Information Processing. Singapore: Springer. pp 1108-1112. El51 Pazos, A., RomeWardalda, J., Santos, A. and Dorado, J. 1999. “Adaptive Aspects of Rhythmic Composition: Genetic Music”. To Publish in Proceedings of Genetic and Evolutionary Computation Conference 99. [16] Pazos, A., R o d a r d a l d a , J., Santos, A. and Dorado, J.

Congress on Evolutionary Computation 99. [17] Repp, B. H. 1992. “Diversity and commonality in music performance: An analysis of timing microstructure in Schumann’s ‘Triiume&”. Journal of the Acoustical Society of America. 9 2 2546-2568. [18] Romero Cardalda, J. 1993 “Genemdor Interactive de Frases Musicales”. Graduation Project. Department of Computer Science. Faculty of Informatics. University of A C o m k In

(Manuscript).

1999. “Genetic Music Compositor“. To publish in Pmceedm gsof

Spanish.

[19] Romero Cardalda, J. 1996. “Sistemas Adaptativos Musicales”. Bachelor Project. Department of Computer Science. Faculty of Informatics. University of A Coruiia In Spanish. [20] Todd, N. P. 1985. “A Model of Expressive Timing in Tonal Music’’. Music Perception, 3 (1). [21] Rowe, R. 1993. “Interactive Music Systems: Machine listening and composing.” Cambridge: The MIT Press. [22] Rumelhart, D. E., and McClelland, J. L. 1986. “Parallel Distributed Processing. Foundations, Vol I. “Cambridge MA: MIT Press.

4005