Distilled BERT model that encodes sentences into 384-dimensional vectors for measuring semantic similarity. Trained on over a billion sentence pairs spanning scientific papers, web QA, NLI datasets, and community forums. At 22M parameters and 6 transformer layers, it is fast enough for CPU inference while remaining competitive on standard sentence similarity benchmarks.

245,742,847 5,015

ms-marco-MiniLM-L6-v2

text-ranking

Cross-encoder reranker trained on the MS MARCO passage retrieval dataset, designed to score query-document pairs jointly rather than encoding them independently. Distilled from a 12-layer cross-encoder into 6 layers to reduce latency while retaining re-ranking accuracy. Used as a second-stage ranker on top of fast first-stage retrieval (BM25 or bi-encoder).

80,423,371 269

bge-small-en-v1.5

feature-extraction

Small English dense embedding model from BAAI's BGE (BAAI General Embedding) series, producing 384-dimensional vectors via MIT license. Optimized for MTEB retrieval benchmarks through a retrieval-focused training strategy, it achieves competitive scores relative to its parameter count. Suited for embedding workflows where throughput and cost matter more than peak accuracy.

61,803,330 497

bert-base-uncased

fill-mask

Google's original BERT base model in uncased form, pre-trained on BookCorpus and English Wikipedia via masked language modeling. Tokens are lowercased before processing, making it insensitive to capitalization. It remains a standard fine-tuning base for classification, NER, and extractive QA, though newer encoders outperform it on most benchmarks.

60,271,662 2,690

paraphrase-multilingual-MiniLM-L12-v2

sentence-similarity

Multilingual sentence embedding model covering 50+ languages, built on a 12-layer distilled MiniLM architecture. Produces 384-dimensional vectors designed for semantic similarity and paraphrase detection across language boundaries. Trained on multilingual paraphrase data to align semantically equivalent sentences even when expressed in different languages.

50,349,812 1,290

electra-base-discriminator

ELECTRA base discriminator from Google, pre-trained using replaced token detection rather than masked language modeling. A small generator produces candidate replacements; this model learns to identify which tokens were swapped — a task that uses every token for training signal, making pre-training more efficient than BERT per compute dollar. Intended as a fine-tuning base for classification and token-level tasks.

41,762,860 128

all-mpnet-base-v2

sentence-similarity

Sentence embedding model based on the MPNet architecture, producing 768-dimensional vectors. Trained on over a billion sentence pairs from MS MARCO, NLI datasets, and community QA forums, it is frequently used when accuracy matters more than inference speed among English embedding models. The MPNet backbone enables masked and permuted prediction during pre-training for stronger representations.

33,515,916 1,313

bge-m3

sentence-similarity

BAAI's BGE-M3 embedding model supporting over 100 languages with a unified architecture capable of dense, sparse (lexical), and late-interaction (ColBERT-style) retrieval modes from a single checkpoint. Built on XLM-RoBERTa with large-scale multilingual training, it targets multi-lingual and cross-lingual retrieval where a single model must handle diverse language inputs.

31,360,936 3,158

Qwen3-0.6B

text-generation

Qwen3-0.6B is the 0.6-billion-parameter instruction-tuned model from Alibaba Cloud's Qwen3 series, fine-tuned from the Qwen3-0.6B-Base for conversational and task-following use. It targets deployment in environments where even a 1B model is too large — edge hardware, mobile devices, or ultra-low-latency services. Apache 2.0 licensed.

27,739,500 1,362

clip-vit-base-patch32

zero-shot-image-classification

OpenAI's CLIP model using a ViT-B/32 image encoder, the smaller of the two widely deployed CLIP variants. Trained contrastively on 400 million image-text pairs, it aligns image and text representations in a shared embedding space for zero-shot classification and retrieval. The B/32 variant sacrifices accuracy versus ViT-L/14 for faster inference.

23,159,737 964

mobilenetv3_small_100.lamb_in1k

image-classification

MobileNetV3 small model at 100% width multiplier, trained on ImageNet-1k using the LAMB optimizer via the timm library. At under 3M parameters, it targets image classification on mobile and edge hardware where latency and memory are primary constraints. Part of timm's standardized pretrained model zoo with consistent preprocessing and inference APIs.

21,222,099 80

xlm-roberta-base

fill-mask

XLM-RoBERTa base from Facebook AI, pre-trained on 2.5TB of filtered CommonCrawl text across 100 languages using the RoBERTa training procedure. Enables cross-lingual transfer — models fine-tuned on labeled English data can infer on other languages without parallel annotations. The standard starting point for multilingual classification and token-level tasks.

20,459,644 855

nomic-embed-text-v1.5

sentence-similarity

Nomic Embed Text v1.5 is a matryoshka-capable English embedding model from Nomic AI, built on a custom nomic-BERT architecture trained with contrastive learning on large-scale text pairs. Matryoshka Representation Learning allows truncating embeddings to shorter dimensions (e.g. 64, 128, 256) without retraining, enabling flexible precision-cost tradeoffs. The model is transformers.js-compatible for browser-side inference.

18,124,658 857

bge-reranker-v2-m3

text-classification

BGE-Reranker-v2-M3 is BAAI's multilingual cross-encoder reranker built on XLM-RoBERTa, designed for re-ranking retrieved passages in multilingual RAG or search pipelines. It jointly encodes query-passage pairs to produce relevance scores, providing higher accuracy than bi-encoder similarity for the same candidate set. Apache 2.0 licensed with text-embeddings-inference support.

16,278,800 1,056

Qwen3-4B

text-generation

Qwen3-4B is Alibaba's 4B parameter model from the Qwen3 series, which introduced a hybrid thinking mode allowing the model to switch between fast direct answering and extended chain-of-thought reasoning. It is a compact model capable of running on consumer hardware while outperforming many 7B predecessors on reasoning benchmarks. Apache 2.0 licensed.

15,932,949 641

clap-htsat-fused

audio-classification

LAION's CLAP (Contrastive Language-Audio Pretraining) model using the HTSAT (Hierarchical Token-Semantic Audio Transformer) encoder, fused with a text encoder to align audio and text in a shared embedding space. Analogous to CLIP for images, it enables zero-shot audio classification and retrieval using natural language descriptions without task-specific labeled audio data.

15,828,115 107

Kokoro-82M

text-to-speech

Kokoro-82M is a compact 82-million-parameter text-to-speech model fine-tuned from StyleTTS2, targeting natural-sounding English speech synthesis at a size runnable on CPU or modest GPU. Released under Apache 2.0 with a HuggingFace DOI, it gained attention as a high-quality open TTS model at significantly smaller scale than most alternatives. It supports multiple English voice styles.

15,754,089 6,395

chronos-2

time-series-forecasting

Chronos-2 is Amazon's second-generation pretrained foundation model for zero-shot time-series forecasting. It frames forecasting as a language modeling problem over quantized time-series tokens using a T5 encoder-decoder architecture, enabling it to forecast across diverse domains without per-dataset training. Released under Apache 2.0.

15,256,609 338

bge-large-en-v1.5

feature-extraction

BGE-Large-EN-v1.5 is BAAI's highest-capacity English embedding model in the v1.5 series, producing 1024-dimensional vectors. It achieves top MTEB retrieval scores among its generation of English-only embedding models, at the cost of higher compute and storage than BGE-small or BGE-base. MIT licensed with ONNX export support.

14,764,689 689

chronos-bolt-small

time-series-forecasting

Chronos-Bolt-Small is a small time-series foundation model from AutoGluon, using a T5-based encoder-decoder architecture for zero-shot forecasting. The 'Bolt' variant improves over original Chronos through training and architectural refinements for better speed and accuracy. Apache 2.0 licensed and part of the AutoGluon time-series forecasting ecosystem.

13,904,827 44

t5-small

translation

T5-small is the 60M-parameter variant of Google's Text-to-Text Transfer Transformer, casting all NLP tasks as seq2seq problems. It was influential in establishing the unified text-to-text training paradigm but is outdated for production use.

13,698,687 556

Qwen3-8B

text-generation

Qwen3-8B is the 8-billion-parameter instruction-tuned model from Alibaba Cloud's Qwen3 family, positioned at the competitive midpoint between 4B and 14B+ tiers. It targets deployment on single consumer or workstation GPUs while providing strong reasoning and multilingual capabilities. Apache 2.0 licensed with text-generation-inference compatibility.

13,501,708 1,166

gemma-4-26B-A4B-it

image-text-to-text

Gemma 4-26B-A4B-IT is Google DeepMind's 26-billion-total-parameter MoE (Mixture-of-Experts) vision-language model, with approximately 4 billion active parameters per token. The MoE design means it achieves 26B parameter quality while activating only ~4B per forward pass, reducing per-token compute relative to a dense 26B model. Apache 2.0 licensed.

13,172,985 1,197

gpt2

text-generation

OpenAI's original GPT-2 at 124M parameters, an autoregressive language model trained on WebText (over 8 million web documents filtered from Reddit outlinks). It generates English text continuation given a prompt using next-token prediction, trained without any instruction tuning or RLHF. MIT licensed and runnable on commodity CPU hardware.

12,980,059 3,316

Open-source AI models,compared at a glance.

Browse by pipeline

text generation

image text to text

automatic speech recognition

sentence similarity

feature extraction

fill mask

text classification

image classification

time series forecasting

zero shot image classification

text ranking

any to any

translation

text to image

image feature extraction

token classification

text to speech

audio classification

image to text

object detection

image segmentation

zero shot classification

image to video

depth estimation

question answering

image to image

zero shot object detection

mask generation

summarization

audio to audio

audio text to text

image to 3d

video classification

voice activity detection

visual document retrieval

keypoint detection

text to audio

robotics

table question answering

text to video

other

tabular classification

visual question answering

tabular regression

image text to image

Top by downloads

Open-source AI models,
compared at a glance.