Vision and Language

Vision and Language

Vision-language models (VLMs) represent a multimodal architecture that does simultaneous comprehension of image and text data modalities. Leveraging computer vision (CV) and natural language processing (NLP) models, these architectures skillfully correlate information, including embeddings, from both visual and linguistic domains. Our research delves into refining this synergy, integrating scene graphs for contextual depth, deploying captioning techniques for detailed descriptions, advancing visual question answering for nuanced understanding and enhancing the capabilities of VLMs through instruction-tuning and synthetic data augmentation by leveraging foundation models.

Recent Publications

JavaScript is required to view this publication list. If your browser does not support JavaScript, please refer to DBLP or Google Scholar for reasonably current publication lists.

Experts

  Name Contact
_
Jan-Martin Steitz M.Sc.
+49 6151 16-21424
S2|02 A326
Gopika Sudhakaran M.Sc.
_
Simone Schaub-Meyer Dr. Sc.
+49 6151 16-25411
S2|02 A306
Prof. Stefan Roth, Ph.D.
+49 6151 16-21425
S2|02 A304