# Metadata
Source URL:: https://arxiv.org/abs/2208.06366
---
# BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Masked image modeling (MIM) has demonstrated impressive results in
self-supervised representation learning by recovering corrupted image patches.
However, most methods still operate on low-level...
## Highlights
> [!quote]+ Updated on 240822_195914
>
> most methods still operate on low-level image pixels, which hinders
>the exploitation of high-level semantics for representation models. In this
>study, we propose to use a semantic-rich visual tokenizer as the reconstruction
>target for masked prediction, providing a systematic way to promote MIM from
>pixel-level to semantic-level. Specifically, we introduce vector-quantized
>knowledge distillation to train the tokenizer, which discretizes a continuous
>semantic space to compact codes