Multimedia Computing and Computer Vision Lab












Student Theses


Source Code / Datasets




SS 18: Multimedia Projekt

From Multimedia Computing Lab - University of Augsburg

Registration is now open at

File: obj_detect_coco1.png File: obj_detect_coco2.png File: obj_detect_coco3.png

Images taken from [2].


Instructors: Stephan Brehm, Christian Eggert, Prof. Dr. Rainer Lienhart
Time Lecture:

Group A: Tuesday 8:15-9:45, Room 1020N. Starts Apr 10th.

Group B: Wednesday 8:15-9:46, Room 1020N. Starts Apr 11th.

Time Exercise:

Group A: Tuesday 10:00-13:45, Room 1020N.

Group B: Wednesday 10:00-13:45, Room 1020N.

Assignments are handed out every Tuesday 8:15, and must be submitted until the following Tuesday 23:59

Credits: 6 SWS, 10 LP
Language: German


  • [02.03.2018] Page creation


The topic of this course is the detection of humans in images.

Object detection is one of the most challenging tasks in the field of computer vision and machine learning. The difficulty is because many objects have complex appearances; for instance, humans often adopt varying poses, and have different sizes.

The goal of this project is the detection of object instances in images using local features and supervised learning methods. The students will implement a detector for humans which performs localization by specifying a tight bounding box around each instance.

During this project you will learn

  • the basics of deep neural networks
  • an approach to object detection detection based on deep neural networks
  • to use the | TensorFlow machine learning framework
  • how to objetively evaluate the performance of an object detector

This course is divided into two phases:

  • Assignments (handed out every week) will introduce students to programming with Python, step-by-step build the detection pipeline and provide first hands-on experience in image processing and machine learning.
  • Student groups will work on a project in weekly sessions

In the end, each team will present the results of their implementation to the other teams.


The course is held in German but slides will be provided in English.


  1. Finding People in Images and Videos, Navneet Dalal. PhD Thesis. Institut National Polytechnique de Grenoble / INRIA Grenoble , Grenoble, July 2006.
  2. SSD: Single Shot MultiBox Detector, W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Fu, A. Berg, ECCV 2016