Qwen3-VL-2B-Instruct
Qwen3-VL-2B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
60 models · ranked by HuggingFace downloads
Qwen3-VL-2B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen2.5-VL-7B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen3.5-9B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
gemma-4-31B-it is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
gemma-4-26B-A4B-it is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen3.5-4B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Kimi-K2.5 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen3-VL-8B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen2-VL-2B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen3.5-35B-A3B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen2.5-VL-3B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
gemma-4-26B-A4B-it-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen3.5-27B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen3.5-0.8B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
llava-1.5-7b-hf is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
moondream2 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
gemma-3-12b-it is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen3-VL-4B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
DeepSeek-OCR is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
gemma-3-4b-it is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen2-VL-7B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
gemma-4-31B-it-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen3.6-35B-A3B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
gemma-4-E4B-it-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen3.6-35B-A3B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen3.6-35B-A3B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen3.5-2B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen2-VL-7B-Instruct-AWQ is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Phi-3.5-vision-instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
MinerU2.5-2509-1.2B is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen3.5-35B-A3B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
DeepSeek-OCR-2 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen3.5-397B-A17B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen3.5-27B-FP8 is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen3-VL-235B-A22B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
Qwen3-VL-32B-Instruct is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.
InternVL2-2B is a compact vision-language model combining a 300M parameter vision encoder (InternViT) with an 1.8B parameter language model (InternLM2), enabling multimodal understanding at 2B total parameters. Designed for efficient deployment while maintaining strong performance on vision-language tasks across multiple languages.
Qwen3.5-35B-A3B-GGUF is an open-source image-text-to-text model available on HuggingFace. Details are sourced from the public model registry.