Learning Scene and Video Understanding with Limited Labelled Data

Dec 2, 2022

Speakers

About

Image and video understanding goal is to make inferences about the surrounding world from the corresponding image or video data, e.g., identification and localization of objects. It can also be extended towards identifying actions and/or recognizing relations between objects. It has attracted great attention in the research community both because of its widespread applications (e.g., to automated driving and robotics) and the fascinating scientific and engineering challenges that it brings (e.g., designing a system that can learn about the 3D, time-vary world from a mere video). Through the use of deep learning, great advances in scene/video understanding have been seen in recent years. A major limitation of most such approaches is that they require large-scale labelled data for learning, where the annotation cost can be expensive, especially when annotating pixel-wise segmentation masks in videos. In this talk, I will focus on how to learn a scene and video understanding provided with a few labelled examples, and to use the interpretability of deep spatiotemporal models to give us insights on how to improve their generalization capabilities. This research direction has the potential to decolonize Computer Vision by enabling developing countries with limited resources and labelled data to contribute to the field and work on applications that serve their own communities.

Organizer

Like the format? Trust SlidesLive to capture your next event!

Professional recording and live streaming, delivered globally.

Sharing

Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow NeurIPS 2022