sam3
Segment Anything Model 3 (SAM 3) from Meta extends SAM 2's video object segmentation into a unified model handling images, video, and 3D point cloud data with a single prompt interface.
4 models · ranked by HuggingFace downloads
Segment Anything Model 3 (SAM 3) from Meta extends SAM 2's video object segmentation into a unified model handling images, video, and 3D point cloud data with a single prompt interface.
SAM (Segment Anything Model) with a ViT-Base image encoder is Meta's promptable segmentation model trained on 1 billion masks across 11 million images. It accepts point, box, or mask prompts and produces high-quality binary masks for arbitrary objects without task-specific fine-tuning. The base variant trades some accuracy for faster inference compared to the ViT-Large and ViT-Huge variants.
SAM-ViT-Huge is Meta's largest Segment Anything Model variant, using a ViT-H image encoder trained on the SA-1B dataset of over 1 billion masks. It generates high-quality segmentation masks from point, box, or text prompts and is designed for zero-shot segmentation across arbitrary image domains. The Apache 2.0 license and Azure deployment support make it accessible for both research and production workloads.
sam3.1 is an open-weight checkpoint for promptable segmentation, distributed on the HuggingFace Hub. Licensing for sam3.1 is unspecified or custom — clear it before commercial use. Treat sam3.1's published metrics as a starting point and validate against your workload.