blip-vqa-base
BLIP-VQA-Base is Salesforce's visual question answering model from the Bootstrapping Language-Image Pre-training (BLIP) paper, operating at the base model scale. It takes an image and a natural-language question as input and produces a short textual answer. The BSD-3-Clause license permits commercial use, and availability in both PyTorch and TensorFlow makes it broadly accessible.
416,948 ↓ · 194 ♡