Multimedia Computing and Computer Vision Lab












Student Theses


Source Code / Datasets




WS 08/09: Audio Information Retrieval

From Multimedia Computing Lab - University of Augsburg


Title: Audio Information Retrieval
Instructors: Prof. Dr. Rainer Lienhart, Gregor van den Boogaart
Synopsis: Audio information retrieval (AIR) deals with the problem of automatically deriving higher information from an audio signal by directly processing the content of the signal. Typical applications are:
  • Classifying sounds (e.g. silence, applause, speech, music)
  • Artist or genre recognition
  • Music recommendation
  • Speech and speaker recognition

An AIR system normally uses techniques from signal processing and psychoacoustics combined with techniques from machine learning. The design is strongly driven by knowledge from the specific field of application (e.g. linguistics, music).

The lecture introduces the underlying techniques of AIR. Outline:

  • Digital signal processing (DSP)
  • Advanced applications of DSP
  • Basics for machine learning on audio signals (features, techniques, systems)
  • Applications of AIR
Language: The lecture will be held in German, but most of the literature will be English.
  • Mo: 12:15-13:45 (V/lec); Room 202, Eichleitnerstraße 30
  • Mo: 14:00-15:30 (Ü/ex); Room 202, Eichleitnerstraße 30
  • The course starts on monday 13. October 2008 at 12:15 a.m. with the lecture. The start of the exercises is announced in the lecture.
Registration: Closed. Was handled via LectureReg.
Exam: Written exam at end of course. Registration (You have to register in both systems!):
  1. LectureReg: Open until 17. February 2009
  2. Studis: 5. - 15. January 2009

Date: 20. February 2009, 10:00 a.m.; Room 317, Eichleitnerstraße 30

Auxiliary material:

  • Two sheets DIN A 4, handwritten(!)
  • Pocket calculator (non programmable)
Credits: 2+1 SWS, Schein: yes, 4 LP
Multimedia Teilbereiche: Multimedia-Methoden, Systemnahe Grundlagen von Multimedia
Prerequisites: Intended for 5th and higher semesters. Knowledge in linear algebra and analysis required, knowledge in statistics and machine learning useful.
Related Courses:
  • The Seminar Audiosignalverarbeitung is also held in this winter term.
  • The lecture AIR is not solely but also intended as preparation for the upcoming course "Multimedia Praktikum (Audio)" in summer term 2009.


  • The lecture has been moved from friday to monday. Therefore it already starts on monday, 13. October 2008.
  • A questions & answers lecture is held on tuesday 17. February 2009, 10:00-11:30 a.m. in room 309, Eichleitnerstraße 30.
  • The exam is held on friday 20. February 2009, 10:00 a.m. in room 317, Eichleitnerstraße 30. Please register!

Online Material


Chapter 4 slides per page
1 Introduction AIR_chap01_p4.pdf
2 Mathematical and other basics AIR_chap02_p4.pdf
3 Digital Signals AIR_chap03_p4.pdf
4 Digital Systems AIR_chap04_p4.pdf
5 Random Signals AIR_chap05_p4.pdf
6 Short Time Fourier Transforms AIR_chap06_p4.pdf
7 Applications of DSP AIR_chap07_p4.pdf
8 Machine Learning Basics AIR_chap08_p4.pdf
9 Features AIR_chap09_p4.pdf
10 Techniques AIR_chap10_p4.pdf
11 Models and Classifiers AIR_chap11_p4.pdf
12 Applications of AIR AIR_chap12_p4.pdf


Resource File, Download
en - de dictionary AIR_dict.pdf

Homework, Exercises

Due date Task
Get ASAP. Oppenheim, A. V., Schafer, R. W., and Buck, J. R. Discrete-time signal processing. Prentice-Hall, Inc., 2nd edition. 1999
  • Read chapters 2.1, 4.1 - 4.3, 4.8 from Oppenheim et al, 1999 (preparation for lecture 3).
  • Exercise, Sheet 1: PDF
  • Read chapters 2.2 2.4, 2.6 - 2.9, 5.1 from Oppenheim et al, 1999 (preparation for lecture 4).
  • Exercise, Sheet 2: PDF
  • Read chapters 2.10, A.1-A.2, A.4 from Oppenheim et al, 1999 (preparation for lecture 5).
  • Read chapters 8.5 - 8.8, 10.1 - 10.5, 7.2 from Oppenheim et al, 1999 (preparation for lecture 6).
  • Exercise, Sheet 3: PDF
  • Exercise, Sheet 4: PDF
  • Read Vikas Raykar, Igor Kozintsev, Rainer Lienhart. Position Calibration of Microphones and Loudspeakers in Distributed Computing Platforms. IEEE Transactions on Speech and Audio Processing, Vo. 13, No. 1, pp. 70-83, Jan. 2005. PDF alt.: PDF
  • Exercise, Sheet 5: PDF
05.12.2008 Read:
  • Jonathan Foote. An Overview of Audio Information Retrieval. Multimedia Systems. Vo. 7, No. 1, pp. 2-10, 1999. PDF
  • Elena Ranguelova and Mark Huiskes. Pattern Recognition for Multimedia Content Analysis, in: "Blanken, Henk M. and Blok, Henk Ernst and Feng, Ling and Vries, de Arjen P., Multimedia Retrieval." Springer Verlag, pp. 53-95, 2007
  • Beth Logan. Mel frequency cepstral coefficients for music modeling. Proceedings of the First International Symposium on Music Information Retrieval (ISMIR), 2000. PDF
  • Read: Burges, C., Platt, J., and Jana, S., Distortion discriminant analysis for audio fingerprinting. IEEE Transactions on Speech and Audio Processing, 11(3), pp. 165–174, 2003. PDF alt.: PDF
  • Exercise, Sheet 6: PDF
  • Read: Rabiner, L. R. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE. Vo. 77, No. 2, pp. 257-286, 1989 PDF
  • Read: Bilmes, J. A gentle tutorial on the em algorithm and its application to parameter estimation for gaussian mixture and hidden markov models. Technical report, University of California, Berkeley, 1997 PDF
  • Exercise, Sheet 7: PDF
  • Read: Kyogu Lee, Malcolm Slaney. Automatic chord recognition from audio using a supervised HMM trained with audio-from-symbolic data. ACM Multimedia, pp. 11-20, 2006 PDF alt.: PDF
  • Read: Özgür Izmirli. Tonal Similarity from Audio Using a Template Based Attractor Model. ISMIR, 2005 PDF
  • Exercise, Sheet 8: PDF
  • Read: George Tzanetakis, Perry Cook, Musical Genre Classification of Audio Signals. IEEE Transactions on Speech and Audio Processing, 2002 PDF alt.: PDF
  • Read Simon Dixon. Audio Beat Tracking Evaluation: BeatRoot. MIREX, 2006 PDF
  • Exercise, Sheet 9: PDF