Multimedia Computing and Computer Vision Lab

Login  

Home

     

Courses

     

People

     

Research

     

Publications

     

Student Theses

     

Source Code / Datasets

     

Contact

     

Research

From Multimedia Computing Lab - University of Augsburg


Deep Swim Pose

Sample detection and extracted stroke rate.

The success of a professional athlete depends quite strongly on the assessment and active improvement of his or her technique. In the field of competitive swimming, a quantitative evaluation is highly desirable to supplement the typical qualitative analysis. However, quantitative (manual) evaluations are very time consuming and therefore only used in individual cases.

In a joint project with the Institute of Applied Training Science in Leipzig (Institut für angewandte Trainingswissenschaften, IAT), we are developing a system for detecting a swimmer in a swimming channel and continuously estimating his or her pose in order to capture (inner-)cyclic structures and derive kinematic parameters for a biomechanical analysis. Human pose recovery in aquatic environments faces a lot of challenges, from heavily cluttered fore- and background to partial occlusion.

The purpose of this project is to build a human pose detector based on recent advancements in the field of deep learning. Accurately estimated joint position are used for a precise and reliable derivation of different kinematic parameters.


For more information please visit the project page or contact Dan Zecha

References:

  • Dan Zecha, Christian Eggert, Rainer Lienhart, Pose Estimation for Deriving Kinematic Parameters of Competitive Swimmers, Computer Vision Applications in Sports, part of IS&T Electronic Imaging 2017, Burlingame, California, January 2017. [PDF] (to appear)
  • Dan Zecha and Rainer Lienhart. Key-Pose Prediction in Cyclic Human Motion. IEEE Winter Conference on Applications of Computer Vision 2015 (WACV15), Waikoloa Beach, HI, January 6-9, 2015 [PDF]
  • Dan Zecha, Thomas Greif, and Rainer Lienhart. Swimmer Detection and Pose Estimation for Continuous Stroke Rate Determination. Multimedia Content Access: Algorithms and Systems VI, part of IS&T/SPIE Electronic Imaging, 23 January 2012, Burlingame, California, USA
    Also Technical Report 2011-13, University of Augsburg, Institute of Computer Science, July 2011. [PDF] [Video]
  • Dan Zecha and Rainer Lienhart. Bestimmung intrazyklischer Phasengeschwindigkeiten von Schwimmern im Schwimmkanal mittels vollautomatischer Videoanalyse. Technical Report 2014-04, University of Augsburg, Institute of Computer Science, July 2014. [PDF]


fertilized forest library

The fertilized forests project has the aim to provide an easy to use, easy to extend, yet fast library for decision forests. It summarizes the research in this field and provides a solid platform to extend it.

The library is thoroughly tested and highly flexible. It is available under the permissive 2-clause BSD license.

Feature highlights are:

  • Object oriented model of the unified decision forest model of Antonio Criminisi and Jamie Shotton, as well as extensions (e.g., Hough forests).
  • Templated C++ classes for maximum memory and calculation efficiency.
  • Compatible to the Microsoft Visual C++, the GNU, and the Intel compiler.
  • Platform independent serialization: train forests and trees on a Linux cluster and use them on a Windows PC.
  • Documented and consistent interfaces in C++, Python and Matlab.

First research results include the development of the newly introduced Induced Entropy and a successful application for uncertainty sampling in the context of self organizing adaptive systems.

References:

  • Christoph Lassner and Rainer Lienhart. Norm-induced entropies for decision forests. IEEE Winter Conference on Applications of Computer Vision 2015 (WACV15), Waikoloa Beach, HI, January 6-9, 2015
For more information, see the project homepage or contact Christoph Lassner.


Swimmer Detection and Pose Estimation for Continuous Stroke Rate Determination

The success of a professional athlete depends quite strongly on the assessment and active improvement of his or her technique. In the field of competitive swimming, a quantitative evaluation is highly desirable to supplement the typical qualitative analysis. However, quantitative (manual) evaluations are very time consuming and therefore only used in individual cases.

In a joint project with the Institute of Applied Training Science in Leipzig (Institut für angewandte Trainingswissenschaften, IAT), we are developing a system for detecting a swimmer in a swimming channel and continuously estimating his or her pose in order to capture (inner-)cyclic structures and derive kinematic parameters for a biomechanical analysis. Human pose recovery in aquatic environments faces a lot of challenges, from heavily cluttered fore- and background to partial occlusion.

The purpose of this work is two-fold: firstly, we are developing a robust method for accurately detecting individual key poses with specifically trained object detectors. The procedure is fully automatic and retrieves stroke frequency, stroke length and inner-cycle intervals. Secondly, we optimize our approach in terms of time consumption through algorithmic optimizations, parallelization and GPU programming, allowing for real time application of our system.

Sample detection and extracted stroke rate.



For more information please visit the project page or contact Dan Zecha

References:

  • Dan Zecha, Thomas Greif, and Rainer Lienhart. Swimmer Detection and Pose Estimation for Continuous Stroke Rate Determination. Multimedia Content Access: Algorithms and Systems VI, part of IS&T/SPIE Electronic Imaging, 23 January 2012, Burlingame, California, USA
    Also Technical Report 2011-13, University of Augsburg, Institute of Computer Science, July 2011. [PDF] [Video]
  • Dan Zecha and Rainer Lienhart. Bestimmung intrazyklischer Phasengeschwindigkeiten von Schwimmern im Schwimmkanal mittels vollautomatischer Videoanalyse. Technical Report 2014-04, University of Augsburg, Institute of Computer Science, July 2014. [PDF]


2D and 3D Human Pose Estimation in Single Images

We address the task of unconstrained 2D and 3D human pose estimation in single images. Both have a wide field of applicability. This ranges e.g. from video indexing over security and safety applications to entertainment purposes and marker less motion capture. The recovery of a human pose in a single image, however, is still a challenging problem. Highly articulated human poses, cluttered background and partial or complete occlusions require robust methods. The absence of a temporal model makes this particularly challenging.

Our research goal is to develop robust algorithms for this task. We aim to design methods that are on the one hand generic and robust, but on the other hand try to make use very simple techniques so that the overall complexity of the models stays low. This is extremely important in order to reach real time capable pose estimation in images.

Examples of recovered 2D and 3D body poses in single images.



For more information please contact Thomas Greif

References:

  • Thomas Greif, Debabrata Sengupta, Rainer Lienhart. Monocular 3D Human Pose Estimation by Classification. IEEE International Conference on Multimedia and Expo 2011 (ICME11), Barcelona, July 2011. [PDF]
  • Thomas Greif and Rainer Lienhart A kinematic model for Bayesian tracking of cyclic human motion, IS&T/SPIE Electronic Imaging, San Jose, USA, January 2010. [PDF]

Software:
KAET (Kinect Annotation and Evaluation Tool)


Image Classification using Different Levels of Quality in Representation and Feedback

In this project, we want to consider the field of image classification with the help of a human expert. Image Classification deals with the problem of determining the occurrence of known objects and concepts in an image. We want to extend the classic image classification approach significantly by introducing new paradigms of image representation and active learning with a human expert (i.e. suitable user interaction) in order to make it applicable on real-world image databases.

The problem of most image classification tasks today lies in the high complexity of calculating the object features as well as in the high number of possible classes and the costly annotation from a human expert. This complexity has a strong influence in the training phase and in the application phase. The goal of this project is to extend the conventional image classification approach by using different levels of quality in the description of an object/concept and different levels of quality in the feedback from the human expert.

This requires new algorithms that are designed to automatically determine the best level for the object description and the best form of feedback from the human expert. A central aspect is the balance of complexity and gain. The main advantage of the methods that will be developed is the intelligent and adaptive use of resources, which is superior to static methods. The savings in memory and CPU resources will have a great impact for resource-intensive and time- critical tasks (e.g. real-time image classification in a robot). With new forms of feedback from a human expert, the interaction with a classification system will be simplified, which increases the speed and robustness of the training process.

Different levels of quality in image representation and human feedback.


For more information please contact Nicolas Cebron

References:

  • Nicolas Cebron. Active Improvement of Hierarchical Object Features under Budget Constraints, 10th IEEE International Conference on Data Mining (ICDM), Dec. 2010, Sydney, Australia. DOI: 10.1109/ICDM.2010.74
    Also Technical Report 2011-01, University of Augsburg, Institute of Computer Science, Feb. 2011. [PDF]

Unsupervised One-class Image Classification

We are developing a classification framework for digital images which is capable of identifying images which belong to a certain class. In other words we want to design filters which find images in a given database which feature certain content (e.g. brand logos).
However, our framework should learn class models in an unsupervised manner. The user is only required to provide images which contain some common object or concept as positive training examples without further annotation or knowledge.
Our framework then finds common properties of the positive training images based on color and visual words. Thus it consists of two main stages: A color-based pre-filter (or region of interest detector) and a classifier trained on histograms of visual words ("bags-of-words").

If we want to apply color-based filters we have to make the assumption that the objects we want to identify have a distinctive color distribution. That is, all instances of the object appear in a reasonably small number of different colors.
Since we want the learning process of the color model to be unsupervised, we are confronted with two major problems: First we have to identify the colors of the object without manual annotation. Second, we have to deal with color deviations due to different lighting conditions.
Besides it is not straightforward to classify images or localize objects based on color models.

Unsupervised detection of region of interest for brand logo based on color histogram.

The second stage of our framework uses bag-of-words models to classify images. We compute spatial histograms of visual words for positive and negative training images and then train a binary classifier using these histograms. Since we want to find positive images among large scale databases we aim for a very low false positive rate. Thus, for classification we opt for a cascade of AdaBoost classifiers.
Obviously there is a vast number of choices to be made which influence the classification performance. For instance, many different local feature descriptors exist which can be used for the bag-of-words model. Also, the clustering process which yields our visual vocabulary and the AdaBoost classifier depend on many parameters. Therefore our main research focus is on finding the optimal configuration and evaluating novel enhancements.

For more information please contact Christian Ries

Automatic Detection of Offensive or Illegal Images

In a joint project with Advanced U.S. Technology Group (ATG) we are working on filtering and detection techniques for offensive and illegal images. This is a one-class image classification problem and thus closely related to the project on Unsupervised One-class Image Classification.

The purpose of this work is two-fold. Our first goal is to reliably and quickly filter offensive images from large databases of images, for instance in order to prevent minors from being exposed to such images.

The second application is the automatic detection of illegal image content. In this project we work jointly with Swiss authorities. For example, we want to facilitate the work of police officers and prosecutors who have to search a suspect's storage device for illegal content.

For more information please contact Christian Ries

Feature Bundling for Object Retrieval / Logo Recognition

Computer vision and image retrieval are inherently linked with methods that describe visual information and by this the spatial layout of image intensities and colors. Analogous to sentences, where the position of single words is subject to grammar rules, the position of visual structures in images is not arbitrary but depends on the depicted content. In other words, the spatial distribution of individual visual features does have a semantic meaning.
Derivation and storage of feature bundles for a logo brand.

In this project we explore feature bundling techniques suitable for object retrieval and logo recognition. Discriminative visual signatures that include both visual and spatial information are formed by bundling local features within a combined representation. Each bundle is stored in a hash-based index and associated with the underlying object class. Multi-class recognition of objects in unknown test images is then performed by testing if bundles of the test image are contained in this index.

Several examples of feature bundles.

For more information please contact Stefan Romberg

References:

  • Stefan Romberg, Lluis Garcia Pueyo, Rainer Lienhart, Roelof van Zwol. Scalable Logo Recognition in Real-World Images. ACM International Conference on Multimedia Retrieval 2011 (ICMR11), Trento, April 2011.
    Also Technical Report 2011-04, University of Augsburg, Institute of Computer Science, March 2011 [PDF] [Slides] [Dataset]

Learning to Reassemble Shredded Documents

All images are taken from 'Bild der Wissenschaft 08/2010'
The problem of having to reconstruct shredded documents is often faced by historians and forensic investigators. For instance, there is currently ongoing work on reassembling documents related to the Stasi which was the secret police of the GDR.

However, reconstructing documents is a difficult and laborious job due to the large number of permutations of fragment arrangements. For this reason, this project deals with the automation of the reassembly process, which incorporates the use of various local image features as well as combinatorial optimization strategies.
Our approach is evaluated on a real world dataset consisting of magazine pages that have been shredded by hand.

For more information please contact Fabian Richter

References:

  • Fabian Richter, Christian X. Ries, Rainer Lienhart. A Graph Algorithmic Framework for the Assembly of Shredded Documents. IEEE International Conference on Multimedia and Expo 2011 (ICME11), Barcelona, July 2011
    Also Technical Report 2011-05, University of Augsburg, Institute of Computer Science, March 2011 [PDF]

Image Retrieval on Large Scale Image Databases

Nowadays there exist online image repositories containing hundreds of millions of images of all kinds of quality, size and content.

These image repositories grow day by day making techniques for navigating, indexing, and searching prudent. Currently indexing is mainly based on manually entered tags and/or individual and group usage patterns. Manually entered tags, however, are very subjective and not necessarily referring to the shown image content. This subjectivity and ambiguity of tags makes image retrieval based on manually entered tags difficult.

In this project we employ the image content as the source of information to retrieve images and study the representation of images by topic models. The developed approaches are evaluated on real world, large scale image databases.
Main
result retrieval
References:
  • Rainer Lienhart, Eva Hörster, Stefan Romberg. Multilayer pLSA for Multimodal Image Retrieval. ACM International Conference on Image and Video Retrieval (CIVR 2009), July 8-10, 2009.
    Also Technical Report 2009-02, University of Augsburg, Institute of Computer Science Apr. 2009 [PDF]
  • Eva Hörster, Rainer Lienhart and Malcolm Slaney. Image Retrieval on Large-Scale Image Databases. ACM International Conference on Image and Video Retrieval (CIVR) 2007 pp. 17-24, Amsterdam, Netherlands, July 2007. also Technical Report Apr. 2007 [PDF]
  • Eva Hörster and Rainer Lienhart. Fusing Local Image Descriptors for Large-Scale Image Retrieval. International Workshop on Semantic Learning Applications in Multimedia (SLAM), Minneapolis, USA, June 2007. also as Technical Report [PDF]
  • Rainer Lienhart and Malcolm Slaney. PLSA on Large Scale Image Databases. IEEE International Conference on Acoustics, Speech and Signal Processing 2007 (ICASSP 2007), Hawaii, USA, April 2007. also Technical Report Dec. 2006 [PDF]

An annotated data set for pose estimation of swimmers

In this work we present an annotated data set for two-dimensional pose estimation of swimmers. The data set contains fifteen cycles of swimmers swimming backstroke with more than 1200 annotated video frames. A wide variety of subjects was used to create this data set, ranging from adult to teenage swimmers, both, male and female. For each frame of a cycle, the absolute positions of fourteen points corresponding to human joints were manually labeled.

The data set proves to be very challenging with respect to partial occlusions and a high amount of background noise, however, it does not contain any out of plane motions that would further complicate the task of full body pose estimation. It thus aims at pose estimation and pose tracking algorithms trying to advance in the field of recovering human poses in videos with frequently missing parts and under difficult conditions.

We explain in detail the creation of the data set, discuss the difficulties we faced, and finally demonstrate how it is used to create a training data set containing normalized cycles for action-specific pose tracking.



The data set is available for download.

References:
Thomas Greif and Rainer Lienhart "An annotated data set for pose estimation of swimmers," Technical Report, 2009. PDF

For more information please contact Thomas Greif

On the Optimal Placement of Multiple Visual Sensors

Visual sensor arrays are used in many novel multimedia applications such as video surveillance, sensing rooms, assisted living or immersive conference rooms. Often several different types of cameras are available. They differ in their ranges of view, intrinsic parameters, image sensor resolutions, optics, and costs.

Most of the above mentioned applications require the layout of video sensors to assure a minimum level of image quality or image resolution. Thus, an important issue in designing visual sensor arrays is the appropriate placement of the cameras such that they achieve one or mulitple predefined goals. As video sensor arrays are getting larger, efficient camera placement strategies need to be developed.
result configuration by greedy approach
For more information on optimal camera placement please contact Eva Hörster

Audio Brush: What You See is What You Hear

Hearing, analyzing and evaluating sounds is possible for everyone. The reference-sensor for audio, the human ear, is of amazing capabilities and high quality. In contrast editing and synthesizing audio is an indirect and non-intuitive task needing great expertise.

To overcome these limitations we are creating Audio Brush, a smart visual audio editing tool. Audio Brush allows to edit the spectrogram of a sound in the visual domain similar to editing bitmaps. At the core is a very flexible audio spectrogram based on the Gabor analysis and synthesis. It gives maximum accuracy of the representation, is fully invertible, and enables manipulating the signal at any chosen time-frequency resolution.
Audio Brush screen shot by greedy approach
For more information on Audio Brush please contact Gregor van den Boogaart


Real-Time Event Detection and Control in Live Video Streams

It is nowadays very common that public places such as pubs, restaurants, and fitness club have large TV screens to entertain their customers -- especially during national or international sports championship events. For the venue owner it would be desirable if they could control which commercials are shown to their audience. In other words they may have the desire to replace untargeted commercials by target commericals of their choice.

In this joint project with Half Minute Media Ltd. we research algorithms for robost real-time commercial detection and control (such as replacement) in live streams. We are especially developing fast and extremely reliable algorithms for
  • Mining video channels automatically in order to extract all commercials and
  • Detecting known commericials in live streams using highly compact, but discriminate clip descriptors

References:

  • Rainer Lienhart, Christoph Kuhmünch and Wolfgang Effelsberg. On the Detection and Recognition of Television Commercials, Proc. IEEE Conf. on Multimedia Computing and Systems, Ottawa, Canada, pp. 509 - 516, June 1997. also Technical Report TR-96-016, University of Mannheim, Dezember 1996.

Bayesian Face Recognition on Infrared Image Data

The availability of high-performance and low-cost desktop computing systems and digital camera equipment has given rise to a public interest towards applications that include the visual identification of human individuals. Examples for such applications are surveillance, biometrical identification or computer-human interaction.

To that effect, research in biometrical technologies follows naturally. Above other methods, images of human faces offer a non-intrusive and easy-to-use means of identification. Although the recognition of faces is a problem that is effortlessly solved by human beings during their daily routine, it poses a challenge for researchers and scientists. Boundary conditions like illumination and occlusion, as well as pose and expression of an individual lead to intrapersonal variations that often exceed those between images of different persons under similar conditions.

In association with Falcontrol Security GmbH we are researching reliable face recognition algorithms by using Bayesian methods on infrared image data.

Parallel Algorithms for Fast Machine Learning

Machine learning applications are emerging as the most promising approaches to many current problems in computer science. However, machine learning algorithms typically require the processing of large data sets and thus, long training times (sometimes on the order of several days or even weeks). Especially for newly developed approaches, high performance implementations are not available; most implementations are designed with a serial model of execution in mind.
At the same time, shared memory multiprocessing architectures are becoming more and more commonplace. The computational power of these machines could be used to solve machine learning problems much faster and in parallel, if we only knew how to properly exploit it.

The goal of our research is to reduce training times speed up machine learning algorithms by developing design patterns and strategies for parallelizing them on multiprocessor computers.