Learning with Enriched Inductive Biases for Vision-Language Models

Learning with Enriched Inductive Biases for Vision-Language Models Research Background and Problem Statement In recent years, Vision-Language Models (VLMs) have made significant progress in the fields of computer vision and natural language processing. These models are pre-trained on large-scale image-text pairs to construct a unified multimodal re...