mask generation models

4 models · ranked by HuggingFace downloads

sam3

Segment Anything Model 3 (SAM 3) from Meta extends SAM 2's video object segmentation into a unified model handling images, video, and 3D point cloud data with a single prompt interface.

1,708,249 ↓ · 2,331 ♡

SAM (Segment Anything Model) with a ViT-Base image encoder is Meta's promptable segmentation model trained on 1 billion masks across 11 million images. It accepts point, box, or mask prompts and produces high-quality binary masks for arbitrary objects without task-specific fine-tuning. The base variant trades some accuracy for faster inference compared to the ViT-Large and ViT-Huge variants.

1,047,326 ↓ · 170 ♡

sam-vit-huge

SAM-ViT-Huge is Meta's largest Segment Anything Model variant, using a ViT-H image encoder trained on the SA-1B dataset of over 1 billion masks. It generates high-quality segmentation masks from point, box, or text prompts and is designed for zero-shot segmentation across arbitrary image domains. The Apache 2.0 license and Azure deployment support make it accessible for both research and production workloads.

428,385 ↓ · 198 ♡

sam3.1

sam3.1 is an open-weight checkpoint for promptable segmentation, distributed on the HuggingFace Hub. Licensing for sam3.1 is unspecified or custom — clear it before commercial use. Treat sam3.1's published metrics as a starting point and validate against your workload.

313,182 ↓ · 285 ♡

Search

mask generation models

sam3

sam-vit-base

sam-vit-huge

sam3.1