DriveCLIP: Zero-shot transfer for distracted driving activity understanding using CLIP

Dec 2, 2022



Distracted driving action recognition from naturalistic driving is crucial for both driver and pedestrian's safe and reliable experience. However, traditional computer vision techniques sometimes require a lot of supervision in terms of a large amount of annotated training data to detect distracted driving activities. Recently, the vision-language models have offered large-scale visual-textual pretraining that can be adapted to unsupervised task-specific learning like distracted activity recognition. The contrastive image-text pretraining models like CLIP have shown significant promise in learning natural language-guided visual representations. In this paper, we propose a CLIP-based driver activity recognition framework that predicts whether a driver is distracted or not while driving. CLIP's vision embedding offers zero-shot transfer, which can identify distracted activities by the driver from the driving videos. Our result suggests this framework offers SOTA performance on zero-shot transfer for predicting the driver's state on three public datasets. We also developed DriveCLIP, a classifier on top of the CLIP's visual representation for distracted driving detection tasks, and reported the results here.


Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%


Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow NeurIPS 2022