Test-Time Prompt Tuning for Zero-shot Generalization in Vision-Language Models

# Metadata Source URL:: https://azshue.github.io/TPT/ Topics:: #ai --- # Test-Time Prompt Tuning for Zero-shot Generalization in Vision-Language Models ## Highlights > [!quote]+ Updated on 160922_113359 > > Pre-trained vision-language models (e.g., CLIP) have shown impressive zero-shot >generalization in various downstream tasks with properly designed text prompts. Instead of relying on hand-engineered prompts, recent works learn prompts using training data from downstream tasks, but this can be expensive and hard to generalize to new tasks and distributions. To this end, we propose test-time prompt tuning (TPT) as the first prompt tuning method that can learn adaptive prompts on the fly with a single test sample. TPT optimizes the prompt by minimizing the entropy with confidence selection so that the model has consistent predictions across different augmented views of each test sample