Our lab is currently funded by several agencies. For past funding, please see below.

ERC Consolidator Grant

Robust, Explainable Deep Networks in Computer Vision

Our research lab is supported in a significant part through an ERC Consolidator Grant.

This ERC-funded project is concerned with the development of more robust and explainable deep learning models. Particular goals include bridging the domain gap in images and increasing user trust, e.g., in the context of autonomous driving.

Funding duration: 2020 – 2025

Deep learning approaches, mostly in the form of convolutional neural networks (CNNs), have taken the field of computer vision by storm. While the progress in recent years has been astounding, it would be dangerous to believe that important problems in computer vision are close to being solved. Many canonical deep networks for vision tasks ranging from image understanding to 3D reconstruction or motion estimation perform incredibly well “”on dataset“”, i.e.~in the very setting in which they have been trained. The generalization to novel, related scenarios is still lacking, however. Moreover, large amounts of labeled data are required for training, which are not available in all potential application areas. In addition, the majority of deep networks in computer vision show deficiencies in terms of explainability. That is, the role of network components is often opaque and most deep networks in vision do not output reliable quantifications of the uncertainty of the prediction, limiting the comprehension by users. In this project, we aim to significantly advance deep networks in computer vision toward improved robustness and explainability. To that end, we will investigate structured network architectures, probabilistic methods, and hybrid generative/discriminative models, all with the goal of increasing robustness and gaining explainability. This is accompanied by research on how to assess robustness and aspects of explainability via appropriate datasets and metrics. While we aim to develop a toolbox that is as independent of specific tasks as possible, the work program is grounded in concrete vision problems to monitor progress. We specifically consider the challenges of 3D scene analysis from images and video, including tasks such as panoptic segmentation, 3D reconstruction, and motion estimation. We expect the project to have significant impact in applications of computer vision where robustness is key, data is limited, and user trust is paramount.


The Adaptive Mind

Website: The Adaptive Mind

Funding duration: 2021-2025


The Third Wave of Artificial Intelligence

Researchers at Darmstadt University of Technology want to pioneer a new era in the development of artificial intelligence (AI): These AI systems will acquire human-like communication and thinking skills, recognize and classify new situations, and adapt to them autonomously. The novel AI systems will not only be able to learn, but they will also grasp facts and link them to forms of abstract thinking. They will draw logical conclusions and make contextual decisions and learn from them again. This new future perspective is referred to as the “third wave of AI” in reference to the two previous developmental thrusts in artificial intelligence.

Funding duration: 2021 – 2025


Explainable models for human and artificial intelligence

The LOEWE Research Cluster WhiteBox is aimed at developing methods at the intersection between Cognitive Science and AI to make human and artificial intelligence more understandable.

Funding duration: 2021 – 2024

Until a few years ago, intelligent systems such as robots and digital voice assistants had to be tailored towards narrow and specific tasks and contexts. Such systems needed to be programmed and fine tuned by experts. But, recent developments in artificial intelligence have led to a paradigm shift: instead of explicitly representing knowledge about all information processing steps at time of development, machines are endowed with the ability to learn. With the help of machine learning it is possible to leverage large amounts of data samples, which hopefully transfer to new situations via pattern matching. Groundbreaking achievements in performance have been obtained over the last years with deep neural networks, whose functionality is inspired by the structure of the human brain. A large number of artificial neurons interconnected and organized in layers process input data under large computational costs. Although experts understand the inner working of such systems, as they have designed the learning algorithms, often they are not able to explain or predict the system’s intelligent behavior due to its complexity. Such systems end up as blackboxes raising the question of how such systems’ decisions can be understood and trusted.


Emergency Responsive Digital Cities

Started in 2020, the LOEWE center emergenCITY is researching resilient infrastructures of digital cities that can withstand crises and disasters. emergenCITY is organized as an interdisciplinary and multi-site cooperation led by Technische Universität Darmstadt, Universität Kassel, and Philipps-Universität Marburg as well as the Federal Office of Civil Protection and Disaster Assistance and the City of Darmstadt. The center partners with several other institutions from academia, industry, and public administration.

Funding duration: 2020 – 2023


Intel Network on Intelligent Systems (NIS)

Member since 2017

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

Past Projects

The following projects have been completed:

ERC Starting Grant

Visual Learning and Inference in Joint Scene Models (VISLIM)

Our research lab is supported in a significant part through an ERC Starting Grant.

This ERC-funded project is concerned with the joint estimation of several scene attributes from one or more images, with the aim of leveraging their dependencies. The project covers aspects of modeling, learning and inference (in) such models.

Funding duration: 2013 – 2018

One of the principal difficulties in processing, analyzing, and interpreting digital images is that many attributes of visual scenes relate in complex manners. Despite that, the vast majority of today's top-performing computer vision approaches estimate a particular attribute (e.g., motion, scene segmentation, restored image, object presence, etc.) in isolation; other pertinent attributes are either ignored or crudely pre-computed by ignoring any mutual relation. But since estimating a singular attribute of a visual scene from images is often highly ambiguous, there is substantial potential benefit in estimating several attributes jointly.

The goal of this project is to develop the foundations of modeling, learning and inference in rich, joint representations of visual scenes that naturally encompass several of the pertinent scene attributes. Importantly, this goes beyond combining multiple cues, but rather aims at modeling and inferring multiple scene attributes jointly to take advantage of their interplay and their mutual reinforcement, ultimately working toward a full(er) understanding of visual scenes. While the basic idea of using joint representations of visual scenes has a long history, it has only rarely come to fruition. VISLIM aims to significantly push the current state of the art by developing a more general and versatile toolbox for joint scene modeling that addresses heterogeneous visual representations (discrete and continuous, dense and sparse) as well as a wide range of levels of abstractions (from the pixel level to high-level abstractions). This is expected to lead joint scene models beyond conceptual appeal to practical impact and top-level application performance. No other endeavor in computer vision has attempted to develop a similarly broad foundation for joint scene modeling. In doing so we aim to move closer to image understanding, with significant potential impact in other disciplines of science, technology and humanities.

Smiths Detection

Collaborative research

Funding duration: 2016 – 2018


Faculty Support Program

Funding duration: 2015 – 2017

Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.


Harvesting Dynamic 3D Worlds from Commodity Sensor Clouds (Harvest4D)

This EU-funded STREP is concerned with harvesting data from commodity sensor clouds, such as cell phones and inexpensive range sensors, for capturing 3D models of our dynamic world.

Partners: TU Vienna, TU Darmstadt, CNR, U Bonn, TelecomParisTech, TU Delft
Funding period: 2013 – 2016

Project page

The current acquisition pipeline for visual models of 3D worlds is based on a paradigm of planning a goal-oriented acquisition – sampling on site – processing. The digital model of an artifact (an object, a building, up to an entire city) is produced by planning a specific scanning campaign, carefully selecting the (often costly) acquisition devices, performing the on-site acquisition at the required resolution and then post-processing the acquired data to produce a beautified triangulated and textured model. However, in the future we will be faced with the ubiquitous availability of sensing devices that deliver different data streams that need to be processed and displayed in a new way, for example smartphones, commodity stereo cameras, cheap aerial data acquisition devices, etc.

We therefore propose a radical paradigm change in acquisition and processing technology: instead of a goal-driven acquisition that determines the devices and sensors, we let the sensors and resulting available data determine the acquisition process. Data acquisition might become incidental to other tasks that devices/People to which sensors are attached carry out. A variety of challenging problems need to be solved to exploit this huge amount of data, including: dealing with continuous streams of time-dependent data, finding means of integrating data from different sensors and modalities, detecting changes in data sets to create 4D models, harvesting data to go beyond simple 3D geometry, and researching new paradigms for interactive inspection capabilities with 4D data sets. In this project, we envision solutions to these challenges, paving the way for affordable and innovative uses of information technology in an evolving world sampled by ubiquitous visual sensors.

Our approach is high-risk and an enabling factor for future visual applications. The focus is clearly on Basic research questions to lay the foundation for the new paradigm of incidental 4D data capture.

DFG Research Training Group

Cooperative, Adaptive and Responsive Monitoring in Mixed Mode Environments (GRK 1362)

The DFG-funded Research Training Group addresses fundamental scientific and technological challenges arising from the collaboration of networked autonomous entities that accomplish a common task through actively monitoring the environment and through requisite responses via a variety of stationary and mobile sensors/actuators. The sensing/actuating entities (and the collaborative system thus formed) monitor, acquire, manage and disseminate data with the goal of deriving higher level (context/event) information upon which the system can respond appropriately.

Funding duration: 2006 – 2015

Microsoft Research Scholarship Program

Microsoft Research provided a PhD scholarship to Uwe Schmidt.

Project partners: Microsoft Research Cambridge
Funding duration:
2011 – 2013

German Ministry of Education & Research (BMBF)

Sicherheits-Untersuchungen mittels Röntgenbild-Analyse (SICURA)

This project was concerned with the automatic detection of objects in multiple-view x-ray images.

Project partners: Smiths Detection, U Kaiserslautern, U of Applied Sciences Rhein Main, U Frankfurt
Funding duration: 2010 – 2013

German Research Foundation (DFG)

Heinz Maier-Leibnitz-Prize

Funding duration: 2012

Adolf Messer Foundation

Adolf Messer Prize

Funding duration: 2011