Zero-Shot Video Question Answering via Frozen Bidirectional Language Models - antoyang.github.io

## Metadata
- Author: **antoyang.github.io**
- Full Title: Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
- Category: #articles
- Tags: #ai
- URL: https://antoyang.github.io/frozenbilm.html
## Highlights
- In particular, (i) we combine visual inputs with the frozen BiLM using light trainable modules,
(ii) we train such modules using Web-scraped multi-modal data, and finally
(iii) we perform zero-shot VideoQA inference through masked language modeling, where the masked text is the answer to a given question.
Our proposed approach, FrozenBiLM, outperforms the state of the art in zero-shot VideoQA by a significant margin on a variety of datasets, including LSMDC-FiB, iVQA, MSRVTT-QA, MSVD-QA, ActivityNet-QA, TGIF-FrameQA, How2QA and TVQA.
It also demonstrates competitive performance in the few-shot and fully-supervised setting.
Our code and models will be made publicly available.