azshue.github.io - Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models

Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models - azshue.github.io ![rw-book-cover|200x400](https://readwise-assets.s3.amazonaws.com/static/images/article4.6bc1851654a0.png) ## Metadata - Author: **azshue.github.io** - Full Title: Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models - Category: #articles - Tags: #ai - URL: https://azshue.github.io/TPT/ ## Highlights - Pre-trained vision-language models (e.g., CLIP) have shown impressive zero-shot generalization in various downstream tasks with properly designed text prompts. Instead of relying on hand-engineered prompts, recent works learn prompts using training data from downstream tasks, but this can be expensive and hard to generalize to new tasks and distributions. To this end, we propose test-time prompt tuning (TPT) as the first prompt tuning method that can learn adaptive prompts on the fly with a single test sample. TPT optimizes the prompt by minimizing the entropy with confidence selection so that the model has consistent predictions across different augmented views of each test sample