CLIP
or: Learning Transferable Visual Models From Natural Language Supervision
Contrastive learning is applied to image captioning.
from a laptop in Sunnyvale
or: Learning Transferable Visual Models From Natural Language Supervision
Contrastive learning is applied to image captioning.