Multimedia Computing and Computer Vision Lab












Student Theses


Source Code / Datasets




WS 07/08: Audio Information Retrieval

From Multimedia Computing Lab - University of Augsburg

Title: Audio Information Retrieval
Instructors: Prof. Dr. Rainer Lienhart, Gregor van den Boogaart
Synopsis: Audio information retrieval (AIR) deals with the problem of automatically deriving higher information from an audio signal by directly processing the content of the signal. Typical applications are:
  • Classifying sounds (e.g. silence, applause, speech, music)
  • Artist or genre recognition
  • Music recommendation
  • Speech and speaker recognition

An AIR system normally uses techniques from signal processing and psychoacoustics combined with techniques from machine learning. The design is strongly driven by knowledge from the specific field of application (e.g. linguistics, music).

The lecture introduces the underlying techniques of AIR. Outline:

  • Digital signal processing (DSP)
  • Advanced applications of DSP
  • Basics for machine learning on audio signals (features, techniques, systems)
  • Applications of AIR
Language: The lecture will be held in German, but most of the literature will be English.
Time: Fr: 08:15-09:45 (V); Room 202, Eichleitnerstraße 30

The course starts on friday 19. October 2007 at 08:15 a.m.

Registration: Closed. Was handled via LectureReg.
Exam: Written exam at end of course. Registration (You have to register in both systems!):
  1. LectureReg: Open until 08. February 2008
  2. Studis: 3. - 15. January 2008

Date: 07. March 2008, 10:00 a.m.; Room 207, Eichleitnerstraße 30

Auxiliary material:

  • Two sheets DIN A 4, handwritten(!)
  • Pocket calculator (non programmable)
Credits: 2+0 SWS, Schein: yes, 4 LP
Multimedia Teilbereiche: Multimedia-Methoden, Systemnahe Grundlagen von Multimedia
Prerequisites: Intended for 5th and higher semesters. Knowledge in linear algebra and analysis required, knowledge in statistics and machine learning useful.
Related Course: The lecture AIR is not solely but also intended as preparation for the upcoming course "Praktikum Audio Signal Processing" in summer term 2008.


  • The lecture on friday 26. October 2007 has to be canceled. The next lecture will take place on friday 02. November 2007.
  • No lecture on friday 08. February 2008. Instead a questions & answers lecture is held on tuesday 04. March 2008, 10:00-11:30 a.m. in room 104, Eichleitnerstraße 30 (below room 207).
  • The exam is held on friday 07. March 2008, 10:00 a.m. in room 207, Eichleitnerstraße 30. Please register!

Online Material


Chapter 4 slides per page
1 AIR_chap01_p4.pdf
2 AIR_chap02_p4.pdf
3 AIR_chap03_p4.pdf
4 AIR_chap04_p4.pdf
5 AIR_chap05_p4.pdf
6 AIR_chap06_p4.pdf
7 AIR_chap07_p4.pdf
8 AIR_chap08_p4.pdf
9 AIR_chap09_p4.pdf
10 AIR_chap10_p4.pdf
11 AIR_chap11_p4.pdf
12 AIR_chap12_p4.pdf


Resource File, Download
en - de dictionary AIR_dict.pdf

Homework, Exercises

Due date Task
07.12.2007 Vikas Raykar, Igor Kozintsev, Rainer Lienhart. Position Calibration of Microphones and Loudspeakers in Distributed Computing Platforms. IEEE Transactions on Speech and Audio Processing, Vo. 13, No. 1, pp. 70-83, Jan. 2005. PDF alt.: PDF
07.12.2007 Exercise, Sheet 1: PDF
14.12.2007 Jonathan Foote. An Overview of Audio Information Retrieval. Multimedia Systems. Vo. 7, No. 1, pp. 2-10, 1999. PDF
ASAP Elena Ranguelova and Mark Huiskes. Pattern Recognition for Multimedia Content Analysis, in: "Blanken, Henk M. and Blok, Henk Ernst and Feng, Ling and Vries, de Arjen P., Multimedia Retrieval." Springer Verlag, pp. 53-95, 2007
21.12.2007 Beth Logan. Mel frequency cepstral coefficients for music modeling. Proceedings of the First International Symposium on Music Information Retrieval (ISMIR), 2000. PDF
11.01.2008 Burges, C., Platt, J., and Jana, S., Distortion discriminant analysis for audio fingerprinting. IEEE Transactions on Speech and Audio Processing, 11(3), pp. 165–174, 2003. PDF alt.: PDF
18.01.2008 Rabiner, L. R. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE. Vo. 77, No. 2, pp. 257-286, 1989 PDF
18.01.2008 Bilmes, J. A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Technical report, University of California, Berkeley, 1997 PDF
25.01.2008 Kyogu Lee, Malcolm Slaney. Automatic chord recognition from audio using a supervised HMM trained with audio-from-symbolic data. ACM Multimedia, pp. 11-20, 2006 PDF alt.: PDF
25.01.2008 Özgür Izmirli. Tonal Similarity from Audio Using a Template Based Attractor Model. ISMIR, 2005 PDF
01.02.2008 George Tzanetakis, Perry Cook, Musical Genre Classification of Audio Signals. IEEE Transactions on Speech and Audio Processing, 2002 PDF alt.: PDF
01.02.2008 Simon Dixon. Audio Beat Tracking Evaluation: BeatRoot. MIREX, 2006 PDF