Workshop Engineering and Music

"Human Supervision and Control in Engineering and Music"

Pierre-Yves Rolland

Music Information Retrieval:
a brief Overview of Current and Forthcoming Research

Abstract

A synopsis of the exploding field of music information retrieval (MIR) is given through an overview of current and future research themes I see as key ones. Also encompassed are key topics currently under reflection in the MIR community, such as standardization (evaluation, benchmark databases, etc.).
A number of pointers to current (and past) MIR projects are proposed via the provided bibliography.

Music Information Retrieval: a brief Overview
of Current and Forthcoming Research

We are dealing here with the rapidly-growing research area of music information retrieval (MIR). Concepts, techniques and systems are being developed for retrieving music information from collections of music content, using some query-based or navigation-based paradigm. There are diverse contexts where MIR is, or could be, used. These range from searches for pieces in music libraries to consumer-oriented e-commerce of music.

The field of MIR is exploding. Pioneering work includes ‘manual’ indexing efforts such as Barlow & Morganstern’s well-known Dictionary of Musical Themes (1948). Computational approaches began to appear some 15 years later (see e.g. [Lincoln 1970]). Since then, an exponentially growing number of MIR systems have been developed. Many are designed to offer users the possibility to search by content. In this paradigm, called WYHIWYG (‘What You Hum is What You Get’) or “query-by-humming”, the search query is hummed by the user into his/her computer’s microphone. In certain cases the query can also be whistled, sung (with lyrics) or played on some acoustic instrument. MelDex [McNab et al. 1996] was among of the first WYHIWYG systems to offer an online interface. Several later systems have been characterized by some significantly enhanced feature(s), for instance Semex ([Lemström & Perttu 2000]: very fast string searching algorithm) or Melodiscov ([Rolland et al. 1999]: advanced monophonic transcription module, very sensitive search…).

As will be seen through the next paragraphs MIR research is multidisciplinary by essence, and in particular builds on a number of 'engineering' areas. These areas include computational/artificial intelligence, human-computer interaction, acoustic signal processing, psychoacoustics. Additionally, its interactions with musicology, both ‘traditional’ and computational, are of course strong.

Key Research Directions and Topics

In many MIR systems, music is manipulated in a symbolic format such as MIDI or some score-oriented format, as opposed to an audio format. This means in particular that the searched database is in that symbolic format, and that search algorithms work, typically, by comparing a symbolic sequence query to a set of symbolic sequence contents. One can thus capitalize on the large body of work that has been devoted to the development of fast and/or sensitive string algorithms applicable to musical data. For WYHIWYG paradigms, this implies that the hummed/sung/whistled/played query must be automatically transcribed, i.e. converted from audio to symbolic, before the database search can be carried out. An active branch of foundational research for MIR concerns automated transcription techniques. Part of this research concerns the perfecting of monophonic transcription algorithms typically used for query transcription (see e.g. [Rolland et al. 1999]), another concerns the still largely open problem of polyphonic transcription. The latter has a great potential for converting music databases in audio format into symbolic format, allowing the use of the above-mentioned advanced string searching techniques. It is nevertheless an extremely difficult problem: music in audio format is generally a mix of several different musical instruments, one or several singing voices (including lyrics), drums and percussion sounds, noise and so on. The separation of all these various sources in view of their transcription is essentially an entirely open problem today.

The problem of working with music databases in symbolic format is that much (or, rather, most) of the music contents currently available are in audio format. To overcome this problem, one approach is the one mentioned above: developing techniques for automatically generating symbolic data from polyphonic audio data. But because many problems remain to be solved if one is to achieve this satisfactorily one day, an alternative approach is to entirely eliminate the symbolic stage: audio-to-audio matching, in other words. In several recent (e.g. [Tzanetakis & Cook 2000]) and forthcoming papers (including several at ISMIR’2001, October), an audio query is used to ‘directly’ search an audio database.

A recent and emergent body of MIR research concerns the pre-processing of music data — essentially, pre-processing of the searched database(s), with several different purposes.

One is to extract an appropriate monody (or monophony) from polyphonic content. This allows to translate the ‘monodic query vs. polyphonic database’ problem into the currently better-mastered ‘monodic query vs. polyphonic database’ one. Different extraction strategies have been investigated and compared [Uitdenborgerd & Zobel 1998, 1999].

Another pre-processing aim is to reduce complexity of the search process (in algorithmic parlance) by extracting redundant information from the content database before carrying out any search. In practice, recurring patterns and/or themes, which are not known to the system beforehand, are automatically extracted from the content database (see e.g. [Rolland 2001a, 1999; Liu et al. 1999]).There are two paradigms for utilizing the result of pattern/theme extraction to speed-up search. In the first, more radical one, the search is carried out only in the extracted database of themes, as opposed to searching in the entire initial database. In other words, it is assumed that the user query will always be part of the theme of some song (or piece). While this paradigm potentially allows a huge reduction in search complexity, it has several potential shortcomings. First, it heavily relies on the perfection of the theme extraction algorithm’s performance, which is of course risky. Second, it can’t handle the when the user’s query is in fact a part of one of the song’s non-theme passages. For these reasons a second, more nuanced paradigm is the following. Using a pattern extraction algorithm such as FlExPat [Rolland 2001a, 1999], a prototype is obtained for every extracted pattern P. A prototype takes the form of a musical fragment that is maximally representative of the various occurrences of the pattern thoughout the database. For every P it can be sufficient to compare the query once to its prototype instead of comparing it to all of its occurrences. Since there are, in practice, huge numbers of such patterns in e.g. classic/baroque or pop/rock databases, a significant search time reduction can thus be achieved while avoiding the shortcomings of the other paradigm above.

The potential of equipping MIR systems with user modeling capabilities has recently began to be studied. In [Chai & Vercoe 2000], the use of general, essentially text field-based user models are suggested. In [Rolland 2001b] what is modeled is the user’s sense of musical similarity. A music psychology-derived model of musical similarity is used, taking into account the multiple simultaneous musical dimensions of music. In a user’s model, each dimension has an individual weight that depends on the user. A user’s model (a vector containing musical dimension weights) is incrementally enhanced throughout the user’s interaction with the system, using relevance feedback.

Standardization is emerging as a key reflection topics within the young but fast-growing MIR community [Downie 2000]. A consensually-designed set of evaluation standards, benchmark/test databases or contents and/or queries, etc. could be very useful, if not indispensable, for the maturing of MIR research as a whole. In that respect, it would appear pertinent to build on the similar standardization efforts what has been constructed in past years in the text retrieval community (cf. TREC test — http://trec.nist.gov/). Strong connections between MIR research and international standardization efforts such as MPEG-7/21 are also important to mention.

Concluding remarks

The relatively young MIR field is acquiring some maturity, one of the first obvious signs being the creation of an international event entirely devoted to it. The International Symposium on Music Information Retrieval will be held in its second edition this October. Consensual efforts within the MIR community, e.g. related to standardization as mentioned in this paper, should contribute to reinforce its maturity through the next few years.

References

Chai, Wei and Vercoe, Barry. (2000). Using user models in music information retrieval systems. Proc. International Symposium on Music Information Retrieval, Oct. 2000.

Downie, J. S. (2000). Thinking About Formal MIR System Evaluation: Some Prompting Thoughts. Proc. International Symposium on Music Information Retrieval.

Ghias, A., J. Logan, D. Chamberlin, B.C. Smith. (1995). Query by humming - Musical information retrieval in an audio database, ACM Multimedia'95 - Electronic Proceedings

Lemstrom, K. and S. Perttu. (2000). SEMEX - An efficient Music Retrieval Prototype. Proc. International Symposium on Music Information Retrieval.

Lincoln, H. (ed.). (1970). The Computer and Music. Cornell University Press.
Liu, C.C. , Hsu, J.L. & Chen, A.L.P. (1999). "Efficient Theme and Non-Trivial Repeating Pattern Discovering in Music Databases". Proc. IEEE International Conference on Data Engineering.

McNab, R.J., L.A. Smith, I.H. Witten, C.L. Henderson and S.J. Cunningham. (1996). Towards the digital music library: tune retrieval from acoustic input, Proceedings of ACM Digital Libraries'96, pp. 11-18.

Rolland, P.Y. (1999). Discovering Patterns in Musical Sequences. Journal of New Music Research 28:4, December 1999. Pages 334-350.

Rolland, P.Y. (2001a). FlExPat: Flexible Extraction of Sequential Patterns. Proceedings IEEE International Conference on Data Mining (ICDM’01). San Jose - Silicon Valley, USA, November 29 -December 2, 2001 (to appear).

Rolland, P.Y. (2001b). Adaptive User Modeling in a Content-Based Music Retrieval System. Proceedings 2nd International Symposium on Music Information Retrieval (ISMIR’01). Bloomington, USA, October 15-17, 2001 (to appear).

Rolland, P.Y., Ganascia, J.G. (1999). Musical Pattern Extraction and Similarity Assessment. In Miranda, E. (ed.). Readings in Music and Artificial Intelligence. New York and London: Gordon & Breach - Harwood Academic Publishers.

Rolland, P.Y., Raskinis, G., Ganascia, J.G. (1999). Musical Content-Based Retrieval : an Overview of the Melodiscov Approach and System. In Proceedings of the Seventh ACM International Multimedia Conference, Orlando, November 1999.

Shmulevich, I., Yli-Harja, O., Coyle, E., Povel D., Lemström, K. (2001). Perceptual issues in music pattern recognition – complexity of rhythm and key finding. In Rolland, P.Y., Cambouropoulos, E., Wiggins, G. (editors). Pattern Processing in Music Analysis and Creation. Special Issue of Journal: Computers and the Humanities Vol. 35, No. 1. Kluwer Academic Publishers
Tzanetakis, G. & Cook, P. (2000). Audio Information Retrieval (AIR) Tools. Proc. International Symposium on Music Information Retrieval (ISMIR’00).
Uitdenborgerd, A.L. and Zobel, J. (1998). Manipulation of Music for Melody Matching, Proc. ACM Multimedia 1998, pp235-240

Uitdenborgerd, A.L. and Zobel, J. (1999). Melodic Matching Techniques for Large Music Databases. Proc. ACM Multimedia 1999.