# Metadata
Source URL:: https://antoyang.github.io/frozenbilm.html
Topics:: #ai
---
# Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
## Highlights
> [!quote]+ Updated on 061022_105714
>
> In particular, (i) we combine visual inputs with the frozen BiLM using light trainable modules,
> (ii) we train such modules using Web-scraped multi-modal data, and finally
> (iii) we perform zero-shot VideoQA inference through masked language modeling, where the masked text is the answer to a given question.
> Our proposed approach, FrozenBiLM, outperforms the state of the art in zero-shot VideoQA by a significant margin on a variety of datasets, including LSMDC-FiB, iVQA, MSRVTT-QA, MSVD-QA, ActivityNet-QA, TGIF-FrameQA, How2QA and TVQA.
> It also demonstrates competitive performance in the few-shot and fully-supervised setting.
> Our code and models will be made publicly available.