Gemma 4-26B-A4B-IT is Google DeepMind's 26-billion-total-parameter MoE (Mixture-of-Experts) vision-language model, with approximately 4 billion active parameters per token. The MoE design means it achieves 26B parameter quality while activating only ~4B per forward pass, reducing per-token compute relative to a dense 26B model. Apache 2.0 licensed.
13,172,985 ↓ · 1,197 ♡
Gemma 4-31B-IT is Google DeepMind's 31-billion-parameter instruction-tuned vision-language model from the Gemma 4 family, supporting both image and text inputs. It offers strong multimodal reasoning at open-weight scale, with Apache 2.0 licensing making it directly deployable for commercial applications. Part of the gemma4 architecture with improvements over Gemma 2.
11,138,450 ↓ · 3,071 ♡
Qwen2.5-VL-7B-Instruct is Alibaba Cloud's 7-billion-parameter vision-language model from the Qwen2.5-VL series, accepting image and video inputs alongside text for visual question answering, document understanding, and grounding tasks. It supports multiple image resolutions dynamically and shows improved OCR and document reasoning compared to the earlier Qwen-VL series. Apache 2.0 licensed.
9,588,885 ↓ · 1,595 ♡
Qwen3.5-9B is a 9-billion-parameter instruction-tuned vision-language model from Alibaba Cloud's Qwen3.5 series, fine-tuned from Qwen3.5-9B-Base for multimodal conversational tasks. It accepts image and text inputs for visual reasoning, document understanding, and grounded question answering. Apache 2.0 licensed.
9,427,429 ↓ · 1,628 ♡
Qwen3.5-4B is Alibaba Cloud's 4-billion-parameter instruction-tuned vision-language model from the Qwen3.5 series, fine-tuned from Qwen3.5-4B-Base for multimodal conversational tasks. It handles image and text inputs at a scale deployable on consumer GPUs with 8-12GB VRAM. Apache 2.0 licensed.
9,224,801 ↓ · 693 ♡
FP8-quantized version of Qwen3.6-35B-A3B for deployment on hardware with FP8 support (H100/H200). Reduces memory footprint and inference latency compared to BF16 with minimal quality degradation on most benchmarks.
5,908,430 ↓ · 285 ♡
Qwen3.6-27B is a large checkpoint for vision-language understanding, distributed on the HuggingFace Hub. The Apache 2.0 license keeps Qwen3.6-27B unrestricted for commercial reuse. Weighing in near 27000M parameters, Qwen3.6-27B trades some ceiling for cheaper, faster inference. Treat Qwen3.6-27B's published metrics as a starting point and validate against your workload.
5,641,436 ↓ · 1,826 ♡
Qwen3-VL-32B-Instruct is a qwen3-based open-weight model aimed at vision-language understanding. Qwen3-VL-32B-Instruct's 32000M-parameter size keeps hosting requirements modest relative to frontier models. Permissive Apache 2.0 terms let Qwen3-VL-32B-Instruct go straight into commercial pipelines. Read Qwen3-VL-32B-Instruct's card for hardware requirements and licensing fine print before deploying.
5,638,104 ↓ · 207 ♡
Qwen 3.6 is a Mixture-of-Experts model with 35B total parameters but only 3B active per token, giving MoE inference efficiency at near-35B capacity. It handles image and text inputs and is competitive with dense 14–20B models on standard benchmarks.
5,612,440 ↓ · 2,261 ♡
Qwen2.5-VL-3B-Instruct is Alibaba's 3B parameter vision-language model from the Qwen2.5-VL series, supporting image and video frame understanding alongside text instruction-following. It targets edge and mobile deployment where 7B+ VL models are too memory-intensive, while maintaining reasonable accuracy on OCR, chart reading, and visual QA. Instruction-tuned for conversational use.
5,462,056 ↓ · 668 ♡
Qwen3-VL-8B-Instruct is Alibaba Cloud's 8-billion-parameter vision-language model from the Qwen3-VL series, extending the VL line with improved visual reasoning and document understanding. It targets mid-tier server GPU deployment where 2B VLMs are insufficient and 30B+ is impractical. Apache 2.0 licensed.
5,372,098 ↓ · 973 ♡
FP8-quantized version of Qwen 3.6 27B for H100/H200 serving. Reduces memory from ~54GB (BF16) to approximately 27GB while maintaining near-BF16 quality on most benchmarks for a dense multimodal model.
5,064,642 ↓ · 289 ♡
An AWQ 4-bit quantized version of Gemma 4's 26B MoE model (4B active parameters), reducing the memory footprint for local deployment on consumer hardware. Community-produced quantization targeting llama.cpp and vLLM compatibility.
4,740,178 ↓ · 82 ♡
Qwen3-VL 4B is Alibaba's compact vision-language instruction model supporting image and video understanding at 4B scale. It targets use cases where Qwen2-VL-7B quality is acceptable but deployment must fit tighter memory constraints.
4,048,243 ↓ · 403 ♡
Qwen2-VL-2B-Instruct is a 2B parameter vision-language model from Alibaba's Qwen team, supporting image and video understanding alongside text instruction-following. At 2B parameters it runs on consumer GPUs while retaining competitive OCR, chart reading, and visual QA accuracy. It is the instruction-tuned version of the Qwen2-VL-2B base.
3,755,149 ↓ · 512 ♡
Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive is a frontier-scale checkpoint for vision-language understanding, distributed on the HuggingFace Hub. Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive is multilingual by design rather than English-only. The Apache 2.0 license keeps Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive unrestricted for commercial reuse. Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive is community-maintained, so track upstream changes and pin a known-good revision.
3,331,475 ↓ · 2,279 ♡
LLaVA 1.5 7B connects a CLIP ViT-L/14@336 vision encoder to Vicuna 7B via a simple MLP projection. It was a state-of-the-art open multimodal model at release and remains widely used as a baseline for vision-language research.
3,116,367 ↓ · 366 ♡
DeepSeek-OCR-2 is a deepseek-based open-weight model aimed at vision-language understanding. Permissive Apache 2.0 terms let DeepSeek-OCR-2 go straight into commercial pipelines. Training spans multiple languages, so DeepSeek-OCR-2 covers cross-lingual vision-language understanding from one checkpoint. Check the DeepSeek-OCR-2 model card for benchmarks and intended use before adopting it.
3,091,674 ↓ · 1,010 ♡
Built for vision-language understanding, Florence-2-base is a florence-based model with publicly available weights. Florence-2-base is MIT-licensed, clearing it for closed-source and paid products. Florence-2-base ships without a hosted SLA, so budget for self-managed deployment and monitoring.
2,620,147 ↓ · 382 ♡
Qwen 3.5 27B is a dense image-text-to-text model from Alibaba, positioned between the 14B and 72B variants for users who need more capacity than 14B but can't serve 72B. It handles both vision and language instructions.
2,592,374 ↓ · 994 ♡
Qwen 3.5 0.8B is Alibaba's smallest production language model in the 3.5 series, designed for on-device and edge inference. Despite its size, it supports the same instruction format as larger Qwen models and is suitable for simple classification, extraction, and short-form generation.
2,486,238 ↓ · 598 ♡
Kimi-K2.6 targets vision-language understanding and is shipped as an open-weight, self-hostable checkpoint. Licensing for Kimi-K2.6 is unspecified or custom — clear it before commercial use. Kimi-K2.6 is community-maintained, so track upstream changes and pin a known-good revision.
2,431,452 ↓ · 1,490 ♡
DeepSeek OCR is a vision-language model from DeepSeek optimized specifically for optical character recognition from natural scene and document images. It aims to handle mixed layouts, multi-language text, and complex typographic scenarios.
2,212,617 ↓ · 3,291 ♡
Qwen3-VL-2B-Instruct is a 2-billion-parameter vision-language model from Alibaba Cloud that jointly processes images and text for visual question answering, captioning, and document understanding. Its 2B scale positions it as one of the smaller instruction-tuned VLMs capable of zero-shot visual reasoning. Apache 2.0 licensed.
2,137,459 ↓ · 433 ♡
Qwen3.5-35B-A3B is a 35B total parameter mixture-of-experts multimodal model from Alibaba, with approximately 3B active parameters per token during inference. It combines vision and language understanding for image captioning, visual QA, and document analysis tasks at lower compute cost than a dense 35B model. Apache 2.0 licensed.
2,095,052 ↓ · 1,450 ♡
Gemma 3 12B is Google's mid-size instruction-tuned model in the Gemma 3 family, designed to balance capability and deployment cost. It handles text-only instruction following and is positioned between the 4B and 27B variants.
2,019,766 ↓ · 763 ♡
gemma-4-31B-it-FP8-block is a FP8 quantization for reduced VRAM on supported GPU backends (vLLM, llm-compressor) version of Google's Gemma 4 multimodal (text + image) instruction-tuned model. 31B parameters are reduced to lower-precision weights for deployment on memory-constrained hardware or Apple Silicon, with quality degradation typically small for general chat tasks. The base model is Apache-2.0 licensed.
1,966,351 ↓ · 35 ♡
Qwen2-VL 7B is Alibaba's second-generation vision-language model, instruction-tuned to follow text+image prompts. It handles variable-resolution inputs natively and scores competitively against GPT-4V on standard multimodal benchmarks at the 7B scale.
1,910,265 ↓ · 1,281 ♡
AWQ 4-bit quantization of Qwen3.6-35B-A3B, a mixture-of-experts model that activates approximately 3B parameters per token despite 35B total parameters. The cyankiwi quantization uses compressed-tensors format compatible with vLLM. MoE architecture means memory footprint scales with total parameters, not active ones.
1,816,218 ↓ · 80 ♡
Qwen2-VL-7B-Instruct-AWQ is a mid-sized checkpoint for vision-language understanding, distributed on the HuggingFace Hub. The Apache 2.0 license keeps Qwen2-VL-7B-Instruct-AWQ unrestricted for commercial reuse. Weighing in near 7000M parameters, Qwen2-VL-7B-Instruct-AWQ trades some ceiling for cheaper, faster inference. Qwen2-VL-7B-Instruct-AWQ is community-maintained, so track upstream changes and pin a known-good revision.
1,815,863 ↓ · 49 ♡
Qwen3-VL-235B-A22B-Instruct is a frontier-scale checkpoint for vision-language understanding, distributed on the HuggingFace Hub. The Apache 2.0 license keeps Qwen3-VL-235B-A22B-Instruct unrestricted for commercial reuse. Weighing in near 235000M parameters, Qwen3-VL-235B-A22B-Instruct trades some ceiling for cheaper, faster inference. Treat Qwen3-VL-235B-A22B-Instruct's published metrics as a starting point and validate against your workload.
1,717,380 ↓ · 398 ♡
Qwen3.5-2B targets vision-language understanding and is shipped as a mid-sized, self-hostable checkpoint. It is a fine-tune of qwen3.5-2b-base, inheriting that base model's general competence. Permissive Apache 2.0 terms let Qwen3.5-2B go straight into commercial pipelines. Qwen3.5-2B is community-maintained, so track upstream changes and pin a known-good revision.
1,708,773 ↓ · 320 ♡
AutoRound INT4 quantization of Qwen3.6-27B with W4G128 weight grouping and W4A16 configuration. AutoRound uses sign gradient descent to minimize quantization error, generally outperforming GPTQ at the same bit-width. Includes multi-token prediction (MTP) head for speculative decoding, which can increase throughput when paired with a draft model.
1,696,003 ↓ · 115 ♡
chandra-ocr-2 is an open-weight checkpoint for vision-language understanding, distributed on the HuggingFace Hub. chandra-ocr-2 is subject to OpenRAIL terms, so confirm licensing before commercial use. Evaluate chandra-ocr-2 on your own data before trusting it in production.
1,669,187 ↓ · 417 ♡
As an internvl-based mid-sized model, InternVL2-2B focuses on vision-language understanding. Training spans multiple languages, so InternVL2-2B covers cross-lingual vision-language understanding from one checkpoint. The MIT license keeps InternVL2-2B unrestricted for commercial reuse. InternVL2-2B ships without a hosted SLA, so budget for self-managed deployment and monitoring.
1,642,315 ↓ · 80 ♡
Gemma 3 4B Instruct is Google's compact instruction-following model, targeting deployment on single-GPU and edge devices. It covers both text and image inputs and is suitable for conversational AI applications with moderate resource constraints.
1,641,090 ↓ · 1,383 ♡
Kimi-K2.5 is an open-weight model aimed at vision-language understanding. Kimi-K2.5 lists a non-standard license, so confirm permissions before deployment. Read Kimi-K2.5's card for hardware requirements and licensing fine print before deploying.
1,619,907 ↓ · 2,825 ♡
Qwen3.5-35B-A3B-FP8 is an openly licensed vision-language understanding model in the qwen3 family. Prebuilt FP8 weights make local and edge inference of Qwen3.5-35B-A3B-FP8 straightforward. Qwen3.5-35B-A3B-FP8 is Apache 2.0-licensed, clearing it for closed-source and paid products. Treat Qwen3.5-35B-A3B-FP8's published metrics as a starting point and validate against your workload.
1,556,640 ↓ · 152 ♡
Moondream2 is a 1.9B parameter vision-language model designed to be the smallest model that can meaningfully answer questions about images. It pairs a SigLIP vision encoder with a Phi-1.5 language backbone and achieves surprising capability at its size.
1,542,219 ↓ · 1,424 ♡
As a phi-based open-weight model, Phi-3.5-vision-instruct focuses on vision-language understanding. The MIT license keeps Phi-3.5-vision-instruct unrestricted for commercial reuse. Training spans multiple languages, so Phi-3.5-vision-instruct covers cross-lingual vision-language understanding from one checkpoint. Before relying on Phi-3.5-vision-instruct, reproduce its key numbers on representative inputs.
1,530,133 ↓ · 736 ♡
Built for vision-language understanding, Qwen3.5-122B-A10B-FP8 is a qwen3-based model with publicly available weights. At about 122000M parameters, Qwen3.5-122B-A10B-FP8 sits in the frontier-scale tier, which sets its memory and latency budget. Qwen3.5-122B-A10B-FP8 is Apache 2.0-licensed, clearing it for closed-source and paid products. Read Qwen3.5-122B-A10B-FP8's card for hardware requirements and licensing fine print before deploying.
1,459,817 ↓ · 107 ♡
gemma-4-26B-A4B-it-GGUF is Unsloth's GGUF quantization of Google's Gemma 4 26B mixture-of-experts instruction-tuned multimodal model. With approximately 4B active parameters per token, it runs on 16–24GB VRAM in GGUF format while retaining vision and text understanding capabilities. GGUF format provides llama.cpp and Ollama compatibility for local self-hosted deployment.
1,458,376 ↓ · 917 ♡
Qwen2.5-VL-7B-Instruct-AWQ targets vision-language understanding and is shipped as a mid-sized, self-hostable checkpoint. Permissive Apache 2.0 terms let Qwen2.5-VL-7B-Instruct-AWQ go straight into commercial pipelines. Prebuilt AWQ weights make local and edge inference of Qwen2.5-VL-7B-Instruct-AWQ straightforward. Like most open checkpoints, Qwen2.5-VL-7B-Instruct-AWQ rewards a quick in-domain eval before commitment.
1,381,338 ↓ · 106 ♡
gemma-4-12b-it-GGUF is a GGUF format (quantized) for llama.cpp, LM Studio, and compatible runtimes version of Google's Gemma 4 multimodal (text + image) instruction-tuned model. parameters are reduced to lower-precision weights for deployment on memory-constrained hardware or Apple Silicon, with quality degradation typically small for general chat tasks. The base model is Apache-2.0 licensed.
1,360,553 ↓ · 705 ♡
Qwen3.5-27B-FP8 targets vision-language understanding and is shipped as a large, self-hostable checkpoint. Qwen3.5-27B-FP8's 27000M-parameter size keeps hosting requirements modest relative to frontier models. Prebuilt FP8 weights make local and edge inference of Qwen3.5-27B-FP8 straightforward. Qwen3.5-27B-FP8 is community-maintained, so track upstream changes and pin a known-good revision.
1,353,794 ↓ · 135 ♡
Qwen3.6-27B-AWQ-INT4 is a large checkpoint for vision-language understanding, distributed on the HuggingFace Hub. The Apache 2.0 license keeps Qwen3.6-27B-AWQ-INT4 unrestricted for commercial reuse. Prebuilt AWQ/INT4 weights make local and edge inference of Qwen3.6-27B-AWQ-INT4 straightforward. Qwen3.6-27B-AWQ-INT4 is community-maintained, so track upstream changes and pin a known-good revision.
1,241,668 ↓ · 86 ♡
Llama-3.1-Nemotron-Nano-VL-8B-V1 is an open-weight vision-language understanding model in the nemotron family. At about 8000M parameters, Llama-3.1-Nemotron-Nano-VL-8B-V1 sits in the large tier, which sets its memory and latency budget. Licensing for Llama-3.1-Nemotron-Nano-VL-8B-V1 is unspecified or custom — clear it before commercial use. Like most open checkpoints, Llama-3.1-Nemotron-Nano-VL-8B-V1 rewards a quick in-domain eval before commitment.
1,177,023 ↓ · 181 ♡
diffusiongemma-26B-A4B-it is Google's experimental diffusion-based language model built on the Gemma 4 MoE architecture, applying masked diffusion to text generation instead of autoregressive decoding. At 26B active-parameter scale it explores whether diffusion LMs can match autoregressive quality on instruction-following tasks. It accepts text and image inputs and produces text through iterative denoising.
1,163,263 ↓ · 1,072 ♡
Unsloth's NVFP4 (NVIDIA FP4) quantization of Qwen3.6-27B, targeting inference on H100/H200 GPUs with FP4 hardware support. FP4 enables significant throughput gains over BF16 on Ada Lovelace and Hopper-architecture GPUs that support native FP4 compute.
1,114,801 ↓ · 96 ♡
gemma-3-27b-it is a large checkpoint for vision-language understanding, distributed on the HuggingFace Hub. It is a fine-tune of gemma-3-27b-pt, inheriting that base model's general competence. Weighing in near 27000M parameters, gemma-3-27b-it trades some ceiling for cheaper, faster inference. Like most open checkpoints, gemma-3-27b-it rewards a quick in-domain eval before commitment.
1,061,956 ↓ · 1,984 ♡
Qwen3.5-9B-GGUF is a qwen3-based open-weight model aimed at vision-language understanding. Qwen3.5-9B-GGUF's 9000M-parameter size keeps hosting requirements modest relative to frontier models. Permissive Apache 2.0 terms let Qwen3.5-9B-GGUF go straight into commercial pipelines. Check the Qwen3.5-9B-GGUF model card for benchmarks and intended use before adopting it.
1,054,200 ↓ · 723 ♡
gemma-4-26B-A4B-it-QAT-MLX-4bit is a MLX 4-bit quantized weights optimized for Apple Silicon inference version of Google's Gemma 4 MoE-based multimodal (text + image) instruction-tuned model. 26B parameters are reduced to lower-precision weights for deployment on memory-constrained hardware or Apple Silicon, with quality degradation typically small for general chat tasks. The base model is Apache-2.0 licensed.
1,014,048 ↓ · 1 ♡
Qwen3.6-27B-MLX-4bit is an MLX 4-bit quantized version of Qwen3.6-27B, packaged by lmstudio-community for inference on Apple Silicon via the MLX framework. MLX quantization converts the model to integer weights while preserving floating-point activations, enabling the 27B model to run within 16-24GB unified memory on M2/M3 Pro or Ultra configurations. Intended for use in LM Studio or direct MLX inference.
980,969 ↓ · 6 ♡
SmolVLM-256M-Instruct is a compact checkpoint for vision-language understanding, distributed on the HuggingFace Hub. The Apache 2.0 license keeps SmolVLM-256M-Instruct unrestricted for commercial reuse. Weighing in near 256M parameters, SmolVLM-256M-Instruct trades some ceiling for cheaper, faster inference. Like most open checkpoints, SmolVLM-256M-Instruct rewards a quick in-domain eval before commitment.
975,385 ↓ · 371 ♡
Qwen3.5-4B-GGUF targets vision-language understanding and is shipped as a mid-sized, self-hostable checkpoint. Qwen3.5-4B-GGUF's 4000M-parameter size keeps hosting requirements modest relative to frontier models. Permissive Apache 2.0 terms let Qwen3.5-4B-GGUF go straight into commercial pipelines. Evaluate Qwen3.5-4B-GGUF on your own data before trusting it in production.
973,238 ↓ · 296 ♡
Qwen3.6-27B-MLX-8bit is an MLX 8-bit quantized version of Qwen3.6-27B, packaged by lmstudio-community for inference on Apple Silicon via the MLX framework. MLX quantization converts the model to integer weights while preserving floating-point activations, enabling the 27B model to run within 16-24GB unified memory on M2/M3 Pro or Ultra configurations. Intended for use in LM Studio or direct MLX inference.
957,785 ↓ · 3 ♡
llava-onevision-qwen2-0.5b-ov-hf is a compact checkpoint for vision-language understanding, distributed on the HuggingFace Hub. The Apache 2.0 license keeps llava-onevision-qwen2-0.5b-ov-hf unrestricted for commercial reuse. Weighing in near 500M parameters, llava-onevision-qwen2-0.5b-ov-hf trades some ceiling for cheaper, faster inference. Treat llava-onevision-qwen2-0.5b-ov-hf's published metrics as a starting point and validate against your workload.
956,325 ↓ · 55 ♡
gemma-4-31B-it-AWQ-4bit is an openly licensed vision-language understanding model in the gemma family. Prebuilt AWQ/4BIT weights make local and edge inference of gemma-4-31B-it-AWQ-4bit straightforward. It is a fine-tune of gemma-4-31b-it, inheriting that base model's general competence. Evaluate gemma-4-31B-it-AWQ-4bit on your own data before trusting it in production.
954,151 ↓ · 50 ♡
QuantTrio's AWQ 4-bit quantization of Qwen3.6-27B, a dense (non-MoE) multimodal model supporting image and text inputs. Tagged for vLLM serving with compressed-tensors compatibility. Qwen3.5/3.6 dense variants trade MoE routing complexity for more predictable latency.
943,618 ↓ · 17 ♡
Unsloth's GGUF-converted and optionally quantized version of Qwen3.6-35B-A3B, optimized for local inference via llama.cpp and Ollama. Unsloth applies custom quantization recipes to reduce size while minimizing quality loss.
917,675 ↓ · 1,280 ♡
Qwen3.6-27B-MLX-6bit is an MLX 6-bit quantized version of Qwen3.6-27B, packaged by lmstudio-community for inference on Apple Silicon via the MLX framework. MLX quantization converts the model to integer weights while preserving floating-point activations, enabling the 27B model to run within 16-24GB unified memory on M2/M3 Pro or Ultra configurations. Intended for use in LM Studio or direct MLX inference.
905,339 ↓ · 1 ♡
Qwen3.6-27B-MLX-5bit is an MLX 5-bit quantized version of Qwen3.6-27B, packaged by lmstudio-community for inference on Apple Silicon via the MLX framework. MLX quantization converts the model to integer weights while preserving floating-point activations, enabling the 27B model to run within 16-24GB unified memory on M2/M3 Pro or Ultra configurations. Intended for use in LM Studio or direct MLX inference.
902,774 ↓ · 0 ♡
Unsloth's GGUF quantisation of Qwen3.6-27B with Multi-Token Prediction (MTP) heads, enabling speculative decoding with compatible runtimes like llama.cpp. MTP allows the model to predict multiple future tokens per step, increasing throughput on CPU and single-GPU machines. Unsloth applies imatrix-based importance weighting to reduce quality loss in lower-bit GGUF variants.
879,458 ↓ · 865 ♡
Qwen3.5-35B-A3B-GPTQ-Int4 is a frontier-scale checkpoint for vision-language understanding, distributed on the HuggingFace Hub. Weighing in near 35000M parameters, Qwen3.5-35B-A3B-GPTQ-Int4 trades some ceiling for cheaper, faster inference. The Apache 2.0 license keeps Qwen3.5-35B-A3B-GPTQ-Int4 unrestricted for commercial reuse. Qwen3.5-35B-A3B-GPTQ-Int4 is community-maintained, so track upstream changes and pin a known-good revision.
869,434 ↓ · 90 ♡
Qwen3-VL-2B-Instruct-GGUF is Unsloth's GGUF distribution of Qwen3-VL-2B-Instruct, making the 2B vision-language model directly usable in llama.cpp, LM Studio, and Ollama. At 2B parameters, it is intended for on-device or memory-constrained deployment scenarios where a capable VLM must run locally. Quantization options (Q4, Q5, Q8) allow further trade-offs between quality and memory.
850,686 ↓ · 33 ♡
deepseek-vl2-tiny is a deepseek-based open-weight model aimed at vision-language understanding. deepseek-vl2-tiny lists a non-standard license, so confirm permissions before deployment. deepseek-vl2-tiny ships without a hosted SLA, so budget for self-managed deployment and monitoring.
848,254 ↓ · 248 ♡
Qwen3.5-397B-A17B-FP8 is an openly licensed vision-language understanding model in the qwen3 family. Prebuilt FP8 weights make local and edge inference of Qwen3.5-397B-A17B-FP8 straightforward. Qwen3.5-397B-A17B-FP8 is Apache 2.0-licensed, clearing it for closed-source and paid products. Qwen3.5-397B-A17B-FP8 is community-maintained, so track upstream changes and pin a known-good revision.
838,924 ↓ · 181 ♡
Unsloth's GGUF quantisation of Qwen3.6-35B, a sparse MoE model with 35B total parameters but only ~3B active per token, enhanced with Multi-Token Prediction heads for speculative decoding. The imatrix calibration in Unsloth's quantisation pipeline reduces perplexity loss compared to uncalibrated GGUF. At 35B total capacity this is a large multimodal model that fits on consumer hardware only through aggressive quantisation.
830,418 ↓ · 595 ♡
AxionML's NVFP4 quantisation of Qwen3.5-9B using NVIDIA's ModelOpt toolkit, targeting sglang and vLLM serving on Hopper GPUs. Qwen3.5-9B is a multimodal model with image-text input capability; the NVFP4 format enables deployment at reduced memory cost while leveraging H100 4-bit tensor cores for throughput. ModelOpt-based quantisation preserves calibration-aware weight scaling.
811,372 ↓ · 18 ♡
MiniCPM-V-4.6 is OpenBMB's MiniCPM-V 4.6, a lightweight on-device multimodal model optimized for image+text tasks at minimal parameter count. Version 4.6 targets improved document OCR, mathematical diagram understanding, and multilingual captioning within the constraints of mobile or edge deployment. It is compatible with deployment via llama.cpp or the MiniCPM-specific inference stack.
803,289 ↓ · 1,128 ♡
Qwen3.5-122B-A10B targets vision-language understanding and is shipped as a frontier-scale, self-hostable checkpoint. Permissive Apache 2.0 terms let Qwen3.5-122B-A10B go straight into commercial pipelines. Qwen3.5-122B-A10B's 122000M-parameter size keeps hosting requirements modest relative to frontier models. Like most open checkpoints, Qwen3.5-122B-A10B rewards a quick in-domain eval before commitment.
799,331 ↓ · 578 ♡
gemma-4-26B-A4B-it-qat-GGUF is a GGUF format (quantized) for llama.cpp, LM Studio, and compatible runtimes version of Google's Gemma 4 MoE-based multimodal (text + image) instruction-tuned model. 26B parameters are reduced to lower-precision weights for deployment on memory-constrained hardware or Apple Silicon, with quality degradation typically small for general chat tasks. The base model is Apache-2.0 licensed.
789,040 ↓ · 225 ♡
LFM2.5-VL-450M is LiquidAI's 450M-parameter multimodal edge model from the LFM2.5-VL family, supporting 10 languages and designed for on-device deployment on mobile and embedded hardware. It uses LiquidAI's custom LFM2 architecture (not a standard transformer) for efficient inference at the sub-500M scale. Despite the small size, it handles image+text inputs across English, Japanese, Korean, French, Spanish, German, Arabic, Chinese, Portuguese, and others.
771,852 ↓ · 191 ♡
EXAONE-4.5-33B is an open-weight vision-language understanding model. EXAONE-4.5-33B is multilingual by design rather than English-only. Licensing for EXAONE-4.5-33B is unspecified or custom — clear it before commercial use. Treat EXAONE-4.5-33B's published metrics as a starting point and validate against your workload.
757,019 ↓ · 163 ♡
Built for vision-language understanding, gemma-4-E4B-it-GGUF is a gemma-based model with publicly available weights. At about 4000M parameters, gemma-4-E4B-it-GGUF sits in the mid-sized tier, which sets its memory and latency budget. GGUF builds of gemma-4-E4B-it-GGUF are published alongside the full checkpoint for low-memory serving. Before relying on gemma-4-E4B-it-GGUF, reproduce its key numbers on representative inputs.
748,944 ↓ · 521 ♡
AWQ 4-bit quantization of Qwen3.5-9B, a dense image-text-to-text model. At 9B parameters with AWQ INT4, inference requires roughly 6-8 GB VRAM, placing it within reach of RTX 3080/4070-class cards. compressed-tensors format is vLLM-native.
742,996 ↓ · 33 ♡
Built for vision-language understanding, gemma-4-E2B-it-GGUF is a gemma-based model with publicly available weights. gemma-4-E2B-it-GGUF is Apache 2.0-licensed, clearing it for closed-source and paid products. At about 2000M parameters, gemma-4-E2B-it-GGUF sits in the mid-sized tier, which sets its memory and latency budget. gemma-4-E2B-it-GGUF ships without a hosted SLA, so budget for self-managed deployment and monitoring.
730,592 ↓ · 248 ♡
Llama 4 Scout is Meta's first MoE entry in the Llama series: 17B parameters per expert across 16 experts, with a small number active per token. The instruct variant follows instructions and handles image-text inputs natively, supporting 12 languages. Scout targets deployments where multimodal capability is needed at a lower active-parameter cost than dense Llama 3 models.
728,950 ↓ · 1,314 ♡
llava-v1.6-mistral-7b-hf targets vision-language understanding and is shipped as a mid-sized, self-hostable checkpoint. llava-v1.6-mistral-7b-hf's 7000M-parameter size keeps hosting requirements modest relative to frontier models. Permissive Apache 2.0 terms let llava-v1.6-mistral-7b-hf go straight into commercial pipelines. llava-v1.6-mistral-7b-hf is community-maintained, so track upstream changes and pin a known-good revision.
727,771 ↓ · 310 ♡
olmOCR-2-7B-1025-FP8 is AllenAI's FP8-quantized vision-language model for optical character recognition and document understanding, fine-tuned from Qwen2.5-VL-7B. It is optimized for extracting text from PDFs, research papers, and complex document layouts including tables, equations, and multi-column formats. The FP8 quantization allows deployment on a single A100 with reduced memory footprint.
725,023 ↓ · 242 ♡
Gemma-4-31B-it-qat-w4a16-ct is a W4A16 quantization-aware trained (QAT) version of Google's Gemma 4 31B instruction-tuned multimodal model, packaged in compressed-tensors format. QAT bakes quantization into the training process rather than applying it post-hoc, generally preserving more quality than standard PTQ at the same bit width. The model handles interleaved image and text inputs for instruction-following tasks.
722,730 ↓ · 32 ♡
Built for vision-language understanding, tiny-Qwen2_5_VLForConditionalGeneration is a qwen2-based model with publicly available weights. Read tiny-Qwen2_5_VLForConditionalGeneration's card for hardware requirements and licensing fine print before deploying.
713,219 ↓ · 0 ♡
blip2-opt-2.7b is a blip-based open-weight model aimed at vision-language understanding. Permissive MIT terms let blip2-opt-2.7b go straight into commercial pipelines. blip2-opt-2.7b's 2700M-parameter size keeps hosting requirements modest relative to frontier models. Check the blip2-opt-2.7b model card for benchmarks and intended use before adopting it.
711,990 ↓ · 445 ♡
SmolVLM2-500M-Video-Instruct is an openly licensed vision-language understanding model. At about 500M parameters, SmolVLM2-500M-Video-Instruct sits in the compact tier, which sets its memory and latency budget. SmolVLM2-500M-Video-Instruct is Apache 2.0-licensed, clearing it for closed-source and paid products. SmolVLM2-500M-Video-Instruct is community-maintained, so track upstream changes and pin a known-good revision.
696,002 ↓ · 155 ♡
Built for vision-language understanding, Qwen3-VL-8B-Instruct-FP8 is a qwen3-based model with publicly available weights. Qwen3-VL-8B-Instruct-FP8 is Apache 2.0-licensed, clearing it for closed-source and paid products. At about 8000M parameters, Qwen3-VL-8B-Instruct-FP8 sits in the large tier, which sets its memory and latency budget. Read Qwen3-VL-8B-Instruct-FP8's card for hardware requirements and licensing fine print before deploying.
685,743 ↓ · 72 ♡
Built for vision-language understanding, Molmo2-8B is an olmo-based model with publicly available weights. At about 8000M parameters, Molmo2-8B sits in the large tier, which sets its memory and latency budget. Molmo2-8B is Apache 2.0-licensed, clearing it for closed-source and paid products. Before relying on Molmo2-8B, reproduce its key numbers on representative inputs.
680,025 ↓ · 189 ♡
Cosmos-Reason2-2B is NVIDIA's 2B visual reasoning model from the Cosmos series, fine-tuned from Qwen3-VL-2B for physical world understanding tasks. It is trained to reason about spatial relationships, object interactions, and temporal dynamics in images and videos, targeting robotics and autonomous system perception research. Despite the 2B scale, the Cosmos training pipeline includes extensive world-model data.
659,161 ↓ · 109 ♡
As a florence-based open-weight model, Florence-2-large focuses on vision-language understanding. The MIT license keeps Florence-2-large unrestricted for commercial reuse. Before relying on Florence-2-large, reproduce its key numbers on representative inputs.
658,971 ↓ · 1,825 ♡
This is a 4-bit MLX-format quantization of Alibaba's Qwen3.5-9B multimodal base model, packaged by the LM Studio community for Apple Silicon deployment. MLX is Apple's machine learning framework optimized for M-series chips, so this checkpoint is specifically intended for local inference on macOS without CUDA. The underlying Qwen3.5-9B supports interleaved image and text inputs for conversational tasks.
650,423 ↓ · 4 ♡
Built for vision-language understanding, gemma-4-E4B-it is a gemma-based model with publicly available weights. The weights start from gemma-4-e4b-it and specialize it for the target task. gemma-4-E4B-it is Apache 2.0-licensed, clearing it for closed-source and paid products. Before relying on gemma-4-E4B-it, reproduce its key numbers on representative inputs.
618,070 ↓ · 23 ♡
InternVL2-1B is an openly licensed vision-language understanding model in the internvl family. At about 1000M parameters, InternVL2-1B sits in the mid-sized tier, which sets its memory and latency budget. InternVL2-1B is multilingual by design rather than English-only. Like most open checkpoints, InternVL2-1B rewards a quick in-domain eval before commitment.
613,452 ↓ · 82 ♡
dots.mocr is RedNote's multimodal OCR model based on a custom Transformer architecture, designed for high-accuracy text extraction from documents including complex layouts, tables, formulas, and mixed Chinese-English content. It goes beyond standard OCR by understanding document structure, making it suitable for parsing invoices, forms, and academic papers in both Chinese and English.
609,846 ↓ · 136 ♡
gemma-4-31B-it-GGUF targets vision-language understanding and is shipped as a large, self-hostable checkpoint. gemma-4-31B-it-GGUF's 31000M-parameter size keeps hosting requirements modest relative to frontier models. Prebuilt GGUF weights make local and edge inference of gemma-4-31B-it-GGUF straightforward. gemma-4-31B-it-GGUF is community-maintained, so track upstream changes and pin a known-good revision.
602,257 ↓ · 503 ♡
Gemma-4-E4B-Uncensored-HauhauCS-Aggressive is a gemma-based open-weight model aimed at vision-language understanding. Training spans multiple languages, so Gemma-4-E4B-Uncensored-HauhauCS-Aggressive covers cross-lingual vision-language understanding from one checkpoint. Because Gemma-4-E4B-Uncensored-HauhauCS-Aggressive uses Gemma, vet the conditions against your deployment plan. Gemma-4-E4B-Uncensored-HauhauCS-Aggressive ships without a hosted SLA, so budget for self-managed deployment and monitoring.
593,362 ↓ · 844 ♡
medgemma-27b-it is Google's 27B medical vision-language model, fine-tuned from Gemma 3 on radiology reports, chest X-rays, histopathology slides, dermatology images, and ophthalmology fundus photographs. It is designed for medical image interpretation research, not clinical deployment. The model accepts image+text input and outputs clinical-style text descriptions, differentials, or structured findings.
592,624 ↓ · 372 ♡
granite-docling-258M is a 258M-parameter vision-language model fine-tuned specifically for document understanding tasks within the Docling pipeline. It handles OCR, layout parsing, table extraction, formula recognition, and chart reading in a single inference pass. The model is built on the Idefics3 architecture and integrates directly with the open-source Docling library.
592,037 ↓ · 1,184 ♡
GLM-4.1V-9B-Thinking is Zhipu AI's 9B-parameter vision-language model with an integrated chain-of-thought reasoning module. The 'Thinking' variant explicitly generates internal reasoning steps before producing final answers, improving performance on complex visual question answering and multi-step visual reasoning tasks. It supports English and Chinese natively.
587,684 ↓ · 776 ♡
Qwen3-VL-30B-A3B-Instruct is an openly licensed vision-language understanding model in the qwen3 family. At about 30000M parameters, Qwen3-VL-30B-A3B-Instruct sits in the large tier, which sets its memory and latency budget. Qwen3-VL-30B-A3B-Instruct is Apache 2.0-licensed, clearing it for closed-source and paid products. Like most open checkpoints, Qwen3-VL-30B-A3B-Instruct rewards a quick in-domain eval before commitment.
586,420 ↓ · 581 ♡
Qwen3.5-9B-AWQ is a 4-bit AWQ quantization of Qwen3.5-9B, packaged for vLLM deployment. Qwen3.5 is the multimodal variant of the Qwen3 series, and the 9B size targets a balance of quality and throughput. AWQ (Activation-aware Weight Quantization) calibrates quantization ranges to minimize output degradation, making this suitable for serving in production environments.
581,310 ↓ · 22 ♡
Kimi-K2.7-Code is Moonshot AI's code-focused multimodal model built on the kimi_k25 architecture, accepting both image and text inputs. It uses compressed-tensors for efficient weight storage and exposes custom model code, indicating non-standard architectural components beyond base Transformers.
576,927 ↓ · 1,003 ♡
A 4-bit AWQ quantisation of Qwen3.5-27B, a multimodal model combining image and text understanding at 27B parameters. AWQ preserves the most activationally important weights at higher precision, minimising accuracy loss compared to round-to-nearest quantisation. The result fits in significantly less GPU memory than the BF16 checkpoint while remaining compatible with vLLM and transformers backends.
573,310 ↓ · 41 ♡
LocateAnything-3B is a 3-billion-parameter vision-language model from NVIDIA that performs open-vocabulary object grounding and localization via natural language queries. It is fine-tuned from Qwen2.5-3B-Instruct using NVIDIA's Eagle visual encoder framework and targets conversational grounding workflows. The model supports referring expression comprehension and visual question answering with spatial outputs.
570,466 ↓ · 2,412 ♡
A 40B Qwen3.6 GGUF fine-tune from DavidAU's Heretic series, blending coding capability with extended thinking mode and abliterated safety filters. The imatrix quantization and 'Deckard' suffix suggest alignment distillation from a larger teacher model.
569,962 ↓ · 468 ♡
Qwen3.5-9B-MLX-8bit is an 8-bit quantized version of Alibaba's Qwen3.5-9B multimodal model, packaged for Apple Silicon via the MLX framework. It handles interleaved image and text inputs for conversational use cases, making it accessible on consumer Mac hardware without requiring a discrete GPU.
565,851 ↓ · 1 ♡
Qwen3.6-35B-A3B-MLX-4bit is a 4-bit MLX-format quantization of Qwen3.6-35B-A3B, a mixture-of-experts image-text-to-text model targeting Apple Silicon inference via the MLX framework. With 35B total parameters but only ~3B active per forward pass, it is designed for efficient multimodal inference on M-series Macs. The model is Apache-2.0 licensed.
564,893 ↓ · 1 ♡
As a nemotron-based open-weight model, NVIDIA-Nemotron-Parse-v1.1 focuses on vision-language understanding. NVIDIA-Nemotron-Parse-v1.1 lists a non-standard license, so confirm permissions before deployment. Before relying on NVIDIA-Nemotron-Parse-v1.1, reproduce its key numbers on representative inputs.
560,822 ↓ · 170 ♡
Qwen3.6-35B-A3B-MLX-8bit is an 8-bit MLX quantization of Qwen's 35B-parameter Mixture-of-Experts model (qwen3_5_moe), where only 3B parameters are active per forward pass. This makes it feasible to run a nominally large MoE model on Apple Silicon with reduced memory pressure.
555,671 ↓ · 0 ♡
Qwen3.6-27B-GGUF is an openly licensed vision-language understanding model in the qwen family. Qwen3.6-27B-GGUF is Apache 2.0-licensed, clearing it for closed-source and paid products. At about 27000M parameters, Qwen3.6-27B-GGUF sits in the large tier, which sets its memory and latency budget. Evaluate Qwen3.6-27B-GGUF on your own data before trusting it in production.
555,223 ↓ · 825 ♡
An 'aggressive' uncensored abliterated GGUF variant of Qwen3.6-27B, with safety refusal mechanisms removed via abliteration. Available in imatrix-calibrated GGUF quantizations. Safety removals affect the model's ability to decline harmful requests — this is a community fine-tune without safety evaluation.
551,914 ↓ · 473 ♡
As a gemma-based large model, gemma-4-31B-it-unsloth-bnb-4bit focuses on vision-language understanding. The Apache 2.0 license keeps gemma-4-31B-it-unsloth-bnb-4bit unrestricted for commercial reuse. Weighing in near 31000M parameters, gemma-4-31B-it-unsloth-bnb-4bit trades some ceiling for cheaper, faster inference. Before relying on gemma-4-31B-it-unsloth-bnb-4bit, reproduce its key numbers on representative inputs.
543,631 ↓ · 20 ♡
Qwen3-VL-4B-Thinking targets vision-language understanding and is shipped as a mid-sized, self-hostable checkpoint. Permissive Apache 2.0 terms let Qwen3-VL-4B-Thinking go straight into commercial pipelines. Qwen3-VL-4B-Thinking's 4000M-parameter size keeps hosting requirements modest relative to frontier models. Qwen3-VL-4B-Thinking is community-maintained, so track upstream changes and pin a known-good revision.
542,482 ↓ · 111 ♡
Qwen3.6-35B-A3B-MLX-6bit is a 6-bit MLX quantization of Qwen/Qwen3.6-35B-A3B, a mixture-of-experts multimodal model targeting Apple Silicon via the MLX framework. With 35B total parameters and roughly 3B active per forward pass, it offers significantly reduced memory footprint compared to the full-precision base.
538,331 ↓ · 0 ♡
As a qwen3-based frontier-scale model, Qwen3.5-397B-A17B focuses on vision-language understanding. Weighing in near 397000M parameters, Qwen3.5-397B-A17B trades some ceiling for cheaper, faster inference. The Apache 2.0 license keeps Qwen3.5-397B-A17B unrestricted for commercial reuse. Before relying on Qwen3.5-397B-A17B, reproduce its key numbers on representative inputs.
536,934 ↓ · 1,521 ♡
Gemma 4 31B is Google's base (non-instruct) multimodal language model with image-text-to-text capability, released under Apache 2.0. As a base model, it is intended for fine-tuning and research rather than direct deployment as a chat assistant. The 31B parameter count puts it in a tier where it competes with Mistral-Medium and Llama 3.1 70B in terms of raw capability before instruction tuning.
518,969 ↓ · 433 ♡
MiniMax-M3-MXFP8 is an MXFP8-quantized variant of MiniMaxAI's MiniMax-M3 multimodal mixture-of-experts model, described in arxiv:2606.13392. It supports image, text, and video inputs and is designed for agent and coding workflows. The MXFP8 quantization reduces memory footprint compared to the full-precision base while targeting hardware that supports the MX floating-point standard.
496,836 ↓ · 42 ♡
Qwen3.6-35B-A3B-AWQ targets vision-language understanding and is shipped as a frontier-scale, self-hostable checkpoint. Permissive Apache 2.0 terms let Qwen3.6-35B-A3B-AWQ go straight into commercial pipelines. Qwen3.6-35B-A3B-AWQ's 35000M-parameter size keeps hosting requirements modest relative to frontier models. Treat Qwen3.6-35B-A3B-AWQ's published metrics as a starting point and validate against your workload.
486,396 ↓ · 26 ♡
As a qwen3-based frontier-scale model, Qwen3.5-35B-A3B-AWQ-4bit focuses on vision-language understanding. Weighing in near 35000M parameters, Qwen3.5-35B-A3B-AWQ-4bit trades some ceiling for cheaper, faster inference. The Apache 2.0 license keeps Qwen3.5-35B-A3B-AWQ-4bit unrestricted for commercial reuse. Qwen3.5-35B-A3B-AWQ-4bit ships without a hosted SLA, so budget for self-managed deployment and monitoring.
480,602 ↓ · 46 ♡
Qwen2.5-VL-72B-Instruct is Qwen's 72B vision-language model, the largest in the Qwen2.5-VL series, handling image, video, and text inputs with a 32K token context window. At 72B scale it targets document understanding, complex visual reasoning, and structured extraction from multi-page documents. It supports bounding box output for grounded visual answers.
478,956 ↓ · 630 ♡
AWQ 4-bit quantization of Qwen3.5-4B, a dense multimodal model supporting image-text-to-text tasks. At 4B parameters with AWQ compression, inference fits within ~4 GB VRAM, making it accessible on mid-range consumer cards. compressed-tensors format targets vLLM serving.
477,268 ↓ · 15 ♡
MinerU2.5-Pro is a 1.2B-parameter vision-language model from OpenDataLab fine-tuned for high-accuracy document parsing, built on the Qwen2-VL backbone. It handles mixed Chinese-English documents, extracting structured content from PDFs including formulas, tables, and figures. Version 2.5-Pro targets production-grade accuracy improvements over the earlier MinerU pipeline.
476,853 ↓ · 157 ♡
gemma-4-26B-A4B is a mixture-of-experts image-text-to-text model from Google with 26B total parameters and approximately 4B active parameters per forward pass. It is part of the Gemma 4 model family and supports multimodal inputs. The model is released under Apache 2.0 and is compatible with Transformers and HuggingFace inference endpoints.
468,613 ↓ · 327 ♡
An abliterated (refusal-removed) GGUF fine-tune of Qwen3.6-35B-A3B, produced via a Wasserstein-distance-guided weight adjustment technique to remove model refusal behaviour. The 'uncensored' label means safety filters have been deliberately removed. This is a community research release targeting users who need the model to engage with content that safety-tuned models decline.
468,102 ↓ · 96 ♡
MedGemma-4B-it is Google's 4B instruction-tuned multimodal model specialized for medical image and text understanding, covering radiology, dermatology, pathology, and ophthalmology. It accepts medical images (chest X-rays, skin images, histology slides, fundus photos) paired with clinical questions. Not cleared for clinical decision support — research and development only.
466,523 ↓ · 975 ♡
Qwen3-VL-4B-Instruct-FP8 is an FP8-quantized vision-language model from Alibaba, derived from the Qwen3-VL-4B-Instruct base and designed for image-grounded conversation. At 4B active parameters it targets deployments where GPU memory is constrained but multimodal instruction following is required. The model is backed by multiple arXiv publications covering the VL and Qwen3 training methodologies.
463,134 ↓ · 63 ♡
Qwen3-VL-32B-Thinking-FP8 is a 32B FP8-quantized Qwen3 vision-language model with extended reasoning ('Thinking') mode, enabling multi-step chain-of-thought for complex visual analysis tasks. FP8 quantization allows it to run on a single 80GB GPU rather than requiring multi-GPU setup for the full BF16 model. The Thinking mode produces visible reasoning traces before the final answer, improving accuracy on math, logic, and diagram interpretation.
461,171 ↓ · 26 ♡
Qwen3.6-27B-AWQ-BF16-INT4 is an AWQ INT4 quantization of Alibaba's Qwen3.6-27B multimodal model, supporting both image and text inputs. The compressed-tensors format is used for weight storage, and the model targets deployments where the full BF16 27B weight set exceeds available VRAM. Being a community quantization, it inherits the Apache 2.0 license from the base model.
458,279 ↓ · 38 ♡
Step3-VL-10B is StepFun's 10B-parameter vision-language model with a custom transformer architecture (step_robotics). It targets multimodal understanding tasks including image captioning, visual QA, and document reading. The model uses safetensors weights with custom inference code and is positioned as a mid-size VLM in StepFun's model family.
455,127 ↓ · 410 ♡
Qwen2.5-VL-32B-Instruct is Alibaba's 32B-parameter vision-language instruction model, part of the Qwen2.5-VL series described in arxiv:2502.13923. It processes image and text inputs jointly and is optimized for instruction-following tasks including document understanding, chart analysis, and visual question answering. With nearly 500 likes and broad Azure deployment support, it occupies the mid-to-large scale segment of open-weight multimodal models.
455,098 ↓ · 491 ♡
Qwen3.6-35B-A3B-GPTQ-Int4 is a GPTQ INT4 quantization of Alibaba's Qwen3.6-35B-A3B sparse mixture-of-experts model, which activates approximately 3B parameters per token despite a 35B total parameter count. It supports speculative decoding via MTP (multi-token prediction) and covers English, Thai, and Chinese. The quantization targets GPU memory reduction while preserving the MoE efficiency advantage of the base model.
453,627 ↓ · 24 ♡
This is a dynamically quantized FP8 variant of Google's Gemma-3-27B-IT, packaged by Red Hat AI for deployment with vLLM and compressed-tensors. The quantization reduces memory footprint compared to the BF16 base, enabling the 27B model to run on hardware that would otherwise require multi-GPU setups. It is derived from the base Gemma-3-27B-IT weights and inherits both its vision capabilities and Apache 2.0 license.
444,062 ↓ · 13 ♡
Gemma-3-4B-PT is Google's 4B pre-trained (base, non-instruction-tuned) multimodal model from the Gemma-3 family, accepting image and text inputs. As a base model it requires fine-tuning or careful prompting for task-specific use. The gemma license applies.
440,651 ↓ · 155 ♡
gemma-4-E4B-it-unsloth-bnb-4bit is a 4-bit bitsandbytes quantization of Google's Gemma-4-E4B instruction-tuned model, packaged by the Unsloth team. Unsloth quantizations are commonly used to enable fine-tuning and inference of Gemma models on consumer GPUs with limited VRAM. It inherits the Apache 2.0 license from the base model.
430,718 ↓ · 22 ♡
InternVL2-8B is an internvl-based open-weight model aimed at vision-language understanding. Permissive MIT terms let InternVL2-8B go straight into commercial pipelines. Training spans multiple languages, so InternVL2-8B covers cross-lingual vision-language understanding from one checkpoint. Before relying on InternVL2-8B, reproduce its key numbers on representative inputs.
429,442 ↓ · 187 ♡
Phi-3.5-vision-instruct-int8-ov is an INT8-quantized version of Microsoft's Phi-3.5-Vision-Instruct, converted to OpenVINO IR format for CPU and Intel GPU inference. It targets edge and on-premise deployments where NVIDIA GPUs are unavailable or cost-prohibitive. The MIT license makes it suitable for commercial use, though the OpenVINO format limits portability to non-Intel runtimes.
426,983 ↓ · 2 ♡
Idefics3-8B-Llama3 is HuggingFace's open multimodal model combining a SigLIP vision encoder with a Llama 3 8B language backbone. It is designed to follow instructions over interleaved image-text inputs and was released alongside training infrastructure on the HuggingFace Hub. The model is positioned as a fully open (weights + training code + datasets) alternative to commercial VLMs.
419,636 ↓ · 304 ♡
A 27B-parameter GGUF quantization of Qwen3.6 fine-tuned for creative writing, fiction, and code, with abliterated safety filters via the Heretic series. The imatrix quantization preserves perplexity better than naive integer quantization at the same bit-width.
418,023 ↓ · 355 ♡
MedGemma-1.5-4B-IT is Google's 4-billion-parameter multimodal model fine-tuned for medical imaging and clinical reasoning tasks, built on the Gemma 3 architecture. It targets radiology, dermatology, pathology, and ophthalmology use cases, including chest X-ray interpretation and conversational clinical support. The model ships under a non-standard license that restricts certain commercial deployments in medical contexts.
417,421 ↓ · 695 ♡
Built for vision-language understanding, MinerU2.5-2509-1.2B is a model with publicly available weights. Distribution of MinerU2.5-2509-1.2B is under AGPL-3.0, which is worth reading before you ship. At about 1200M parameters, MinerU2.5-2509-1.2B sits in the mid-sized tier, which sets its memory and latency budget. Read MinerU2.5-2509-1.2B's card for hardware requirements and licensing fine print before deploying.
409,174 ↓ · 356 ♡
An AWQ 4-bit quantized version of Qwen/Qwen3.5-2B, adapted for image-text-to-text conversational use. The compressed-tensors format targets inference frameworks that support quantized loading such as vLLM. At 2B parameters in 4-bit precision, this checkpoint is aimed at deployments where memory is constrained.
406,368 ↓ · 3 ♡
As an open-weight model, surya-ocr-2 focuses on vision-language understanding. surya-ocr-2 is subject to OpenRAIL terms, so confirm licensing before commercial use. Read surya-ocr-2's card for hardware requirements and licensing fine print before deploying.
396,234 ↓ · 68 ♡
gemma-4-31B-it-NVFP4 is a large checkpoint for vision-language understanding, distributed on the HuggingFace Hub. Weighing in near 31000M parameters, gemma-4-31B-it-NVFP4 trades some ceiling for cheaper, faster inference. The Apache 2.0 license keeps gemma-4-31B-it-NVFP4 unrestricted for commercial reuse. Treat gemma-4-31B-it-NVFP4's published metrics as a starting point and validate against your workload.
387,108 ↓ · 51 ♡
Qwen2.5-VL-7B-Instruct-GGUF targets vision-language understanding and is shipped as a mid-sized, self-hostable checkpoint. Permissive Apache 2.0 terms let Qwen2.5-VL-7B-Instruct-GGUF go straight into commercial pipelines. Qwen2.5-VL-7B-Instruct-GGUF's 7000M-parameter size keeps hosting requirements modest relative to frontier models. Treat Qwen2.5-VL-7B-Instruct-GGUF's published metrics as a starting point and validate against your workload.
383,212 ↓ · 192 ♡
Qwen3.5-122B-A10B-AWQ-4bit is an openly licensed vision-language understanding model in the qwen3 family. Qwen3.5-122B-A10B-AWQ-4bit is Apache 2.0-licensed, clearing it for closed-source and paid products. Prebuilt AWQ/4BIT weights make local and edge inference of Qwen3.5-122B-A10B-AWQ-4bit straightforward. Evaluate Qwen3.5-122B-A10B-AWQ-4bit on your own data before trusting it in production.
381,760 ↓ · 39 ♡
As a qwen-based large model, Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 focuses on vision-language understanding. The Apache 2.0 license keeps Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 unrestricted for commercial reuse. Weighing in near 27000M parameters, Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2 trades some ceiling for cheaper, faster inference. Before relying on Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2, reproduce its key numbers on representative inputs.
380,877 ↓ · 122 ♡
Qwopus3.6-27B is a GGUF-quantised preview fine-tune of Qwen3.6-27B positioned as a 'Claude Opus-style' reasoning and instruction model — the name blends Qwen and Opus. It targets advanced instruction following, multilingual reasoning, and multimodal (vision-language) tasks. As a v1 preview, evaluation is still community-driven and production use should be preceded by task-specific benchmarking.
375,239 ↓ · 125 ♡
Qwen3-VL-8B-Thinking is a large checkpoint for vision-language understanding, distributed on the HuggingFace Hub. The Apache 2.0 license keeps Qwen3-VL-8B-Thinking unrestricted for commercial reuse. Weighing in near 8000M parameters, Qwen3-VL-8B-Thinking trades some ceiling for cheaper, faster inference. Evaluate Qwen3-VL-8B-Thinking on your own data before trusting it in production.
374,948 ↓ · 210 ♡
Qwen3-VL-2B-Instruct-AWQ-4bit is a mid-sized checkpoint for vision-language understanding, distributed on the HuggingFace Hub. The Apache 2.0 license keeps Qwen3-VL-2B-Instruct-AWQ-4bit unrestricted for commercial reuse. Prebuilt AWQ/4BIT weights make local and edge inference of Qwen3-VL-2B-Instruct-AWQ-4bit straightforward. Treat Qwen3-VL-2B-Instruct-AWQ-4bit's published metrics as a starting point and validate against your workload.
373,889 ↓ · 1 ♡
Gemma-3n-E2B-it is Google's instruction-tuned 2B edge model from the Gemma-3n family, combining image, audio, video, and text understanding in a single model. The 'n' suffix indicates the next-generation architecture with per-layer embeddings for efficiency. Gemma license applies — allows research and commercial use with restrictions.
373,251 ↓ · 307 ♡
An 8-bit MLX quantization of Google's Gemma 4 31B instruct model, prepared by the LM Studio community for Apple Silicon local inference. Gemma 4 31B is a dense instruction-tuned model targeting the mid-to-high capability tier.
367,599 ↓ · 2 ♡
gemma-4-31B-it-AWQ is a large checkpoint for vision-language understanding, distributed on the HuggingFace Hub. Prebuilt AWQ weights make local and edge inference of gemma-4-31B-it-AWQ straightforward. The Apache 2.0 license keeps gemma-4-31B-it-AWQ unrestricted for commercial reuse. gemma-4-31B-it-AWQ is community-maintained, so track upstream changes and pin a known-good revision.
363,458 ↓ · 11 ♡
Kanana-1.5-V is Kakao's 3B vision-language instruct model, part of their Kanana model family targeting Korean-English bilingual multimodal tasks. Optimized for practical deployment at 3B parameters while maintaining decent visual understanding.
358,740 ↓ · 55 ♡
AIN is an open-weight checkpoint for vision-language understanding, distributed on the HuggingFace Hub. The MIT license keeps AIN unrestricted for commercial reuse. Evaluate AIN on your own data before trusting it in production.
356,535 ↓ · 21 ♡
Official Alibaba GPTQ INT4 quantization of Qwen3.5-27B, a dense multimodal model for image and text tasks. GPTQ INT4 reduces memory to approximately 15-18 GB, making the model accessible on A100 or RTX 4090-class hardware. Apache-2.0 licensed.
354,659 ↓ · 55 ♡
As a mid-sized model, Nanonets-OCR2-3B focuses on vision-language understanding. Training spans multiple languages, so Nanonets-OCR2-3B covers cross-lingual vision-language understanding from one checkpoint. The weights start from qwen2.5-vl-3b-instruct and specialize it for the target task. Nanonets-OCR2-3B ships without a hosted SLA, so budget for self-managed deployment and monitoring.
349,723 ↓ · 509 ♡
As a gemma3-based large model, gemma-3-27b-it-quantized.w4a16 focuses on vision-language understanding. Weighing in near 27000M parameters, gemma-3-27b-it-quantized.w4a16 trades some ceiling for cheaper, faster inference. gemma-3-27b-it-quantized.w4a16 is subject to Gemma terms, so confirm licensing before commercial use. Read gemma-3-27b-it-quantized.w4a16's card for hardware requirements and licensing fine print before deploying.
349,612 ↓ · 13 ♡
SmolVLM-500M-Instruct is an openly licensed vision-language understanding model. At about 500M parameters, SmolVLM-500M-Instruct sits in the compact tier, which sets its memory and latency budget. SmolVLM-500M-Instruct is Apache 2.0-licensed, clearing it for closed-source and paid products. SmolVLM-500M-Instruct is community-maintained, so track upstream changes and pin a known-good revision.
349,125 ↓ · 195 ♡
InternVL3-8B-AWQ is a large checkpoint for vision-language understanding, distributed on the HuggingFace Hub. InternVL3-8B-AWQ is multilingual by design rather than English-only. Weighing in near 8000M parameters, InternVL3-8B-AWQ trades some ceiling for cheaper, faster inference. Evaluate InternVL3-8B-AWQ on your own data before trusting it in production.
348,390 ↓ · 8 ♡
As a gemma3-based large model, gemma-3-27b-it-GPTQ-4b-128g focuses on vision-language understanding. gemma-3-27b-it-GPTQ-4b-128g is subject to Gemma terms, so confirm licensing before commercial use. GPTQ builds of gemma-3-27b-it-GPTQ-4b-128g are published alongside the full checkpoint for low-memory serving. Check the gemma-3-27b-it-GPTQ-4b-128g model card for benchmarks and intended use before adopting it.
348,069 ↓ · 44 ♡
UI-TARS-1.5-7B is ByteDance's 7B GUI agent model built on Qwen2.5-VL, fine-tuned for autonomous interaction with graphical user interfaces. It can interpret screenshots, identify UI elements, and generate action sequences (click, type, scroll) to complete computer tasks from natural language instructions. Version 1.5 improves over 1.0 on web-based task completion and cross-platform generalization.
347,990 ↓ · 571 ♡
A GGUF-quantised Qwen3.5-9B fine-tuned with DeepSeek V4 Flash distillation — the model has been trained on reasoning traces from a larger DeepSeek teacher to improve chain-of-thought quality at 9B scale. It targets multilingual reasoning with long-context CoT traces, supporting English, Chinese, Korean, Japanese, Spanish, and Russian. The GGUF format enables llama.cpp local inference.
346,746 ↓ · 239 ♡
PaliGemma 3B pretrained at 224×224 resolution is Google's compact vision-language model checkpoint before instruction fine-tuning. The PT (pretrained) variant is intended as a foundation for task-specific fine-tuning rather than direct deployment.
345,806 ↓ · 484 ♡
LightOnOCR-2-1B is an open-weight model aimed at vision-language understanding. Training spans multiple languages, so LightOnOCR-2-1B covers cross-lingual vision-language understanding from one checkpoint. LightOnOCR-2-1B's 1000M-parameter size keeps hosting requirements modest relative to frontier models. LightOnOCR-2-1B ships without a hosted SLA, so budget for self-managed deployment and monitoring.
344,735 ↓ · 691 ♡
An AWQ 4-bit quantization of Qwen3.5-35B-A3B (a 35B MoE with 3B active parameters) by QuantTrio, enabling memory-efficient inference on single high-VRAM GPUs. The MoE architecture means the 4-bit quantization applies to all expert weights rather than a dense 35B weight matrix.
343,576 ↓ · 18 ♡
Built for vision-language understanding, Qwopus3.6-27B-v2-MTP-GGUF is a model with publicly available weights. Qwopus3.6-27B-v2-MTP-GGUF is Apache 2.0-licensed, clearing it for closed-source and paid products. At about 27000M parameters, Qwopus3.6-27B-v2-MTP-GGUF sits in the large tier, which sets its memory and latency budget. Before relying on Qwopus3.6-27B-v2-MTP-GGUF, reproduce its key numbers on representative inputs.
342,530 ↓ · 340 ♡
Qwen3-VL-32B-Instruct-FP8 targets vision-language understanding and is shipped as a frontier-scale, self-hostable checkpoint. Qwen3-VL-32B-Instruct-FP8's 32000M-parameter size keeps hosting requirements modest relative to frontier models. Permissive Apache 2.0 terms let Qwen3-VL-32B-Instruct-FP8 go straight into commercial pipelines. Treat Qwen3-VL-32B-Instruct-FP8's published metrics as a starting point and validate against your workload.
342,456 ↓ · 46 ♡
Built for vision-language understanding, SmolVLM2-2.2B-Instruct is a model with publicly available weights. The weights start from smolvlm-instruct and specialize it for the target task. At about 2200M parameters, SmolVLM2-2.2B-Instruct sits in the mid-sized tier, which sets its memory and latency budget. Before relying on SmolVLM2-2.2B-Instruct, reproduce its key numbers on representative inputs.
342,056 ↓ · 324 ♡
As a qwen3-based frontier-scale model, Qwen3.5-397B-A17B-AWQ-4bit focuses on vision-language understanding. Weighing in near 397000M parameters, Qwen3.5-397B-A17B-AWQ-4bit trades some ceiling for cheaper, faster inference. AWQ builds of Qwen3.5-397B-A17B-AWQ-4bit are published alongside the full checkpoint for low-memory serving. Before relying on Qwen3.5-397B-A17B-AWQ-4bit, reproduce its key numbers on representative inputs.
340,681 ↓ · 3 ♡
Built for vision-language understanding, Qwen3.5-122B-A10B-GPTQ-Int4 is a qwen3-based model with publicly available weights. GPTQ builds of Qwen3.5-122B-A10B-GPTQ-Int4 are published alongside the full checkpoint for low-memory serving. Qwen3.5-122B-A10B-GPTQ-Int4 is Apache 2.0-licensed, clearing it for closed-source and paid products. Check the Qwen3.5-122B-A10B-GPTQ-Int4 model card for benchmarks and intended use before adopting it.
339,293 ↓ · 41 ♡
InternVL3-1B-hf targets vision-language understanding and is shipped as a mid-sized, self-hostable checkpoint. Licensing for InternVL3-1B-hf is unspecified or custom — clear it before commercial use. InternVL3-1B-hf's 1000M-parameter size keeps hosting requirements modest relative to frontier models. InternVL3-1B-hf is community-maintained, so track upstream changes and pin a known-good revision.
338,460 ↓ · 10 ♡
As a large model, Cosmos-Reason2-8B focuses on vision-language understanding. Cosmos-Reason2-8B lists a non-standard license, so confirm permissions before deployment. The weights start from qwen3-vl-8b-instruct and specialize it for the target task. Before relying on Cosmos-Reason2-8B, reproduce its key numbers on representative inputs.
337,398 ↓ · 193 ♡
As an open-weight model, HunyuanOCR focuses on vision-language understanding. HunyuanOCR lists a non-standard license, so confirm permissions before deployment. Training spans multiple languages, so HunyuanOCR covers cross-lingual vision-language understanding from one checkpoint. Check the HunyuanOCR model card for benchmarks and intended use before adopting it.
337,307 ↓ · 758 ♡
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF is an openly licensed vision-language understanding model in the qwen family. At about 27000M parameters, Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF sits in the large tier, which sets its memory and latency budget. Prebuilt GGUF weights make local and edge inference of Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF straightforward. Treat Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF's published metrics as a starting point and validate against your workload.
332,853 ↓ · 656 ♡
Qwen3-VL 30B MoE vision-language model in FP8 precision with 3B active parameters per token, instruction-tuned for multimodal tasks. Combines Qwen3's language capability with vision understanding, optimized for H100-class GPU serving.
331,510 ↓ · 110 ♡
An abliterated version of Google's Gemma-3-27B-IT, with safety refusal mechanisms removed by mlabonne using directional activation manipulation. Gemma license applies to the underlying weights. The abliteration removes content restrictions while preserving the model's multimodal instruction-following capability.
324,867 ↓ · 322 ♡
Gemma 3n E4B Instruct repackaged by Unsloth for efficient local fine-tuning and inference. Gemma 3n is Google's on-device model family designed for mobile and edge hardware; E4B uses per-layer selective parameter activation to run with approximately 4B effective parameters while having a larger total capacity. Unsloth's repackage enables QLoRA fine-tuning of this model on consumer GPUs.
318,961 ↓ · 10 ♡
Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled is a large checkpoint for vision-language understanding, distributed on the HuggingFace Hub. Weighing in near 9000M parameters, Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled trades some ceiling for cheaper, faster inference. Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled is multilingual by design rather than English-only. Evaluate Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled on your own data before trusting it in production.
318,006 ↓ · 60 ♡
TranslateGemma-4b-it is Google's Gemma 3-based 4B instruction-tuned model fine-tuned specifically for translation tasks. Unlike generic multilingual LLMs, it was trained with translation as a primary objective, producing more accurate and fluent translations than prompting a general-purpose model. It uses the standard HuggingFace transformers interface for translation inference.
317,340 ↓ · 780 ♡
Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive is a qwen3-based open-weight model aimed at vision-language understanding. GGUF builds of Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive are published alongside the full checkpoint for low-memory serving. Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive's 35000M-parameter size keeps hosting requirements modest relative to frontier models. Check the Qwen3.5-35B-A3B-Uncensored-HauhauCS-Aggressive model card for benchmarks and intended use before adopting it.
316,109 ↓ · 1,407 ♡
Qianfan-OCR is Baidu's vision-language model specialized for optical character recognition and document intelligence, supporting multilingual text extraction from images. It combines a vision encoder with a language model for scene text understanding beyond simple character recognition. Apache-2.0 licensed with published benchmark results.
313,490 ↓ · 1,176 ♡
Qwopus3.6-35B-A3B-v1-GGUF is an openly licensed vision-language understanding model. Qwopus3.6-35B-A3B-v1-GGUF is multilingual by design rather than English-only. Qwopus3.6-35B-A3B-v1-GGUF is Apache 2.0-licensed, clearing it for closed-source and paid products. Qwopus3.6-35B-A3B-v1-GGUF is community-maintained, so track upstream changes and pin a known-good revision.
313,403 ↓ · 203 ♡
Qwen3.5-35B-A3B-GGUF is a qwen3-based open-weight model aimed at vision-language understanding. Qwen3.5-35B-A3B-GGUF's 35000M-parameter size keeps hosting requirements modest relative to frontier models. Permissive Apache 2.0 terms let Qwen3.5-35B-A3B-GGUF go straight into commercial pipelines. Qwen3.5-35B-A3B-GGUF ships without a hosted SLA, so budget for self-managed deployment and monitoring.
312,514 ↓ · 843 ♡
Unsloth's GGUF conversion of Moonshot AI's Kimi K2.6 MoE model, enabling local inference via llama.cpp. Kimi K2 is a large MoE model from Moonshot AI notable for its strong reasoning performance at a competitive compute cost.
309,012 ↓ · 157 ♡
gemma-3n-E4B-it-MLX-bf16 is a mid-sized checkpoint for vision-language understanding, distributed on the HuggingFace Hub. It is a fine-tune of gemma-3n-e4b-it, inheriting that base model's general competence. Weighing in near 4000M parameters, gemma-3n-E4B-it-MLX-bf16 trades some ceiling for cheaper, faster inference. Like most open checkpoints, gemma-3n-E4B-it-MLX-bf16 rewards a quick in-domain eval before commitment.
308,673 ↓ · 3 ♡
Built for vision-language understanding, Qwen3-VL-235B-A22B-Instruct-FP8 is a qwen3-based model with publicly available weights. At about 235000M parameters, Qwen3-VL-235B-A22B-Instruct-FP8 sits in the frontier-scale tier, which sets its memory and latency budget. Qwen3-VL-235B-A22B-Instruct-FP8 is Apache 2.0-licensed, clearing it for closed-source and paid products. Qwen3-VL-235B-A22B-Instruct-FP8 ships without a hosted SLA, so budget for self-managed deployment and monitoring.
308,125 ↓ · 44 ♡
InternVL2_5-8B is an internvl-based open-weight model aimed at vision-language understanding. Permissive MIT terms let InternVL2_5-8B go straight into commercial pipelines. Training spans multiple languages, so InternVL2_5-8B covers cross-lingual vision-language understanding from one checkpoint. InternVL2_5-8B ships without a hosted SLA, so budget for self-managed deployment and monitoring.
308,120 ↓ · 104 ♡
QuantTrio's AWQ 4-bit quantisation of Qwen3.5-27B, a multimodal image-text model at 27 billion parameters. This variant uses vLLM-compatible AWQ serialisation and targets teams running the 27B model on GPU servers with constrained memory. QuantTrio maintains several AWQ quantisations of Qwen family models with consistent quantisation settings.
307,462 ↓ · 43 ♡
gemma-3n-E4B-it-MLX-8bit is an open-weight vision-language understanding model in the gemma family. At about 4000M parameters, gemma-3n-E4B-it-MLX-8bit sits in the mid-sized tier, which sets its memory and latency budget. Prebuilt MLX/8BIT weights make local and edge inference of gemma-3n-E4B-it-MLX-8bit straightforward. Like most open checkpoints, gemma-3n-E4B-it-MLX-8bit rewards a quick in-domain eval before commitment.
307,139 ↓ · 0 ♡
Built for vision-language understanding, NVIDIA-Nemotron-Nano-12B-v2-VL-FP8 is a nemotron-based model with publicly available weights. At about 12000M parameters, NVIDIA-Nemotron-Nano-12B-v2-VL-FP8 sits in the large tier, which sets its memory and latency budget. FP8 builds of NVIDIA-Nemotron-Nano-12B-v2-VL-FP8 are published alongside the full checkpoint for low-memory serving. NVIDIA-Nemotron-Nano-12B-v2-VL-FP8 ships without a hosted SLA, so budget for self-managed deployment and monitoring.
306,743 ↓ · 50 ♡
google_gemma-4-26B-A4B-it-GGUF is a gemma-based open-weight model aimed at vision-language understanding. GGUF builds of google_gemma-4-26B-A4B-it-GGUF are published alongside the full checkpoint for low-memory serving. Permissive Apache 2.0 terms let google_gemma-4-26B-A4B-it-GGUF go straight into commercial pipelines. google_gemma-4-26B-A4B-it-GGUF ships without a hosted SLA, so budget for self-managed deployment and monitoring.
304,592 ↓ · 113 ♡
Built for vision-language understanding, Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit is a mistral-based model with publicly available weights. Training spans multiple languages, so Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit covers cross-lingual vision-language understanding from one checkpoint. Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit is Apache 2.0-licensed, clearing it for closed-source and paid products. Mistral-Small-3.2-24B-Instruct-2506-bnb-4bit ships without a hosted SLA, so budget for self-managed deployment and monitoring.
304,064 ↓ · 10 ♡
Built for vision-language understanding, Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GPTQ-int4 is a qwen3-based model with publicly available weights. At about 35000M parameters, Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GPTQ-int4 sits in the frontier-scale tier, which sets its memory and latency budget. Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GPTQ-int4 is Apache 2.0-licensed, clearing it for closed-source and paid products. Before relying on Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GPTQ-int4, reproduce its key numbers on representative inputs.
303,726 ↓ · 9 ♡
gemma-3n-E4B-it-MLX-6bit is an open-weight vision-language understanding model in the gemma family. Distribution of gemma-3n-E4B-it-MLX-6bit is under Gemma, which is worth reading before you ship. Prebuilt MLX weights make local and edge inference of gemma-3n-E4B-it-MLX-6bit straightforward. Evaluate gemma-3n-E4B-it-MLX-6bit on your own data before trusting it in production.
303,662 ↓ · 0 ♡
As a qwen3-based mid-sized model, Qwen3-VL-2B-Instruct-FP8 focuses on vision-language understanding. The Apache 2.0 license keeps Qwen3-VL-2B-Instruct-FP8 unrestricted for commercial reuse. FP8 builds of Qwen3-VL-2B-Instruct-FP8 are published alongside the full checkpoint for low-memory serving. Check the Qwen3-VL-2B-Instruct-FP8 model card for benchmarks and intended use before adopting it.
303,557 ↓ · 39 ♡
RolmOCR is an openly licensed vision-language understanding model in the olmo family. RolmOCR is Apache 2.0-licensed, clearing it for closed-source and paid products. It is a fine-tune of qwen2.5-vl-7b-instruct, inheriting that base model's general competence. RolmOCR is community-maintained, so track upstream changes and pin a known-good revision.
302,994 ↓ · 586 ♡
gemma-3-27b-it-AWQ-INT4 targets vision-language understanding and is shipped as a large, self-hostable checkpoint. Prebuilt AWQ/INT4 weights make local and edge inference of gemma-3-27b-it-AWQ-INT4 straightforward. Permissive Apache 2.0 terms let gemma-3-27b-it-AWQ-INT4 go straight into commercial pipelines. Treat gemma-3-27b-it-AWQ-INT4's published metrics as a starting point and validate against your workload.
300,315 ↓ · 7 ♡
gemma-3-4b-it-qat-4bit targets vision-language understanding and is shipped as a mid-sized, self-hostable checkpoint. gemma-3-4b-it-qat-4bit is multilingual by design rather than English-only. gemma-3-4b-it-qat-4bit's 4000M-parameter size keeps hosting requirements modest relative to frontier models. Like most open checkpoints, gemma-3-4b-it-qat-4bit rewards a quick in-domain eval before commitment.
300,091 ↓ · 8 ♡
As a gemma-based mid-sized model, Gemma-4-E2B-Uncensored-HauhauCS-Aggressive focuses on vision-language understanding. Training spans multiple languages, so Gemma-4-E2B-Uncensored-HauhauCS-Aggressive covers cross-lingual vision-language understanding from one checkpoint. GGUF builds of Gemma-4-E2B-Uncensored-HauhauCS-Aggressive are published alongside the full checkpoint for low-memory serving. Check the Gemma-4-E2B-Uncensored-HauhauCS-Aggressive model card for benchmarks and intended use before adopting it.
299,361 ↓ · 165 ♡
gemma-4-31B-it-MLX-4bit targets vision-language understanding and is shipped as a large, self-hostable checkpoint. gemma-4-31B-it-MLX-4bit's 31000M-parameter size keeps hosting requirements modest relative to frontier models. Prebuilt MLX/4BIT weights make local and edge inference of gemma-4-31B-it-MLX-4bit straightforward. Like most open checkpoints, gemma-4-31B-it-MLX-4bit rewards a quick in-domain eval before commitment.
298,419 ↓ · 1 ♡
Qwen3.5-27B-GGUF is a qwen3-based open-weight model aimed at vision-language understanding. Qwen3.5-27B-GGUF's 27000M-parameter size keeps hosting requirements modest relative to frontier models. GGUF builds of Qwen3.5-27B-GGUF are published alongside the full checkpoint for low-memory serving. Check the Qwen3.5-27B-GGUF model card for benchmarks and intended use before adopting it.
297,706 ↓ · 490 ♡
Unsloth's GGUF conversion of Qwen3.5-0.8B, the smallest model in the Qwen3.5 series. At 0.8B parameters, it targets extremely constrained inference environments — Raspberry Pi, microcontrollers with GGUF support, or embedding in applications.
297,370 ↓ · 178 ♡
Qwen3.5-2B-GGUF is a mid-sized checkpoint for vision-language understanding, distributed on the HuggingFace Hub. Weighing in near 2000M parameters, Qwen3.5-2B-GGUF trades some ceiling for cheaper, faster inference. Prebuilt GGUF weights make local and edge inference of Qwen3.5-2B-GGUF straightforward. Treat Qwen3.5-2B-GGUF's published metrics as a starting point and validate against your workload.
296,345 ↓ · 100 ♡
Built for vision-language understanding, Qwopus3.5-9B-v3 is a qwen-based model with publicly available weights. Training spans multiple languages, so Qwopus3.5-9B-v3 covers cross-lingual vision-language understanding from one checkpoint. Qwopus3.5-9B-v3 is Apache 2.0-licensed, clearing it for closed-source and paid products. Qwopus3.5-9B-v3 ships without a hosted SLA, so budget for self-managed deployment and monitoring.
294,775 ↓ · 88 ♡
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF is a qwen-based open-weight model aimed at vision-language understanding. Training spans multiple languages, so Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF covers cross-lingual vision-language understanding from one checkpoint. Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF's 27000M-parameter size keeps hosting requirements modest relative to frontier models. Before relying on Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF, reproduce its key numbers on representative inputs.
292,155 ↓ · 601 ♡
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled targets vision-language understanding and is shipped as a large, self-hostable checkpoint. It is a fine-tune of qwen3.5-27b, inheriting that base model's general competence. Permissive Apache 2.0 terms let Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled go straight into commercial pipelines. Evaluate Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled on your own data before trusting it in production.
290,793 ↓ · 2,814 ♡
gemma-3n-E4B-it-MLX-4bit targets vision-language understanding and is shipped as a mid-sized, self-hostable checkpoint. Prebuilt MLX/4BIT weights make local and edge inference of gemma-3n-E4B-it-MLX-4bit straightforward. gemma-3n-E4B-it-MLX-4bit's 4000M-parameter size keeps hosting requirements modest relative to frontier models. Treat gemma-3n-E4B-it-MLX-4bit's published metrics as a starting point and validate against your workload.
289,004 ↓ · 2 ♡
Qwen3.5-9B-FP8 is a qwen3-based open-weight model aimed at vision-language understanding. Permissive Apache 2.0 terms let Qwen3.5-9B-FP8 go straight into commercial pipelines. Qwen3.5-9B-FP8's 9000M-parameter size keeps hosting requirements modest relative to frontier models. Qwen3.5-9B-FP8 ships without a hosted SLA, so budget for self-managed deployment and monitoring.
287,785 ↓ · 10 ♡
google_gemma-4-31B-it-GGUF is a gemma-based open-weight model aimed at vision-language understanding. Permissive Apache 2.0 terms let google_gemma-4-31B-it-GGUF go straight into commercial pipelines. google_gemma-4-31B-it-GGUF's 31000M-parameter size keeps hosting requirements modest relative to frontier models. google_gemma-4-31B-it-GGUF ships without a hosted SLA, so budget for self-managed deployment and monitoring.
285,205 ↓ · 62 ♡
LLaVA 1.5 7B is Haotian Liu et al.'s multimodal instruction-following model combining a CLIP vision encoder with a Vicuna-7B language model. At 7B, it was one of the strongest open VLMs at its release and remains a common fine-tuning starting point.
235,049 ↓ · 555 ♡
gemma-4-26B-A4B-it-qat-q4_0-gguf targets vision-language understanding and is shipped as a large, self-hostable checkpoint. Prebuilt GGUF weights make local and edge inference of gemma-4-26B-A4B-it-qat-q4_0-gguf straightforward. gemma-4-26B-A4B-it-qat-q4_0-gguf's 26000M-parameter size keeps hosting requirements modest relative to frontier models. Like most open checkpoints, gemma-4-26B-A4B-it-qat-q4_0-gguf rewards a quick in-domain eval before commitment.
229,514 ↓ · 70 ♡