A VideoMAE-Small model fine-tuned on XD-Violence, a multi-scene violence detection dataset covering realistic violent video clips from films and surveillance footage. The model performs binary video classification (violent/non-violent) using temporal self-supervised pre-training. VideoMAE's masked autoencoder approach requires fewer labelled examples than supervised-only baselines for video tasks.
391,007 ↓ · 0 ♡
As a vit-based open-weight model, vjepa2-vitg-fpc64-256 focuses on video classification. The Apache 2.0 license keeps vjepa2-vitg-fpc64-256 unrestricted for commercial reuse. Read vjepa2-vitg-fpc64-256's card for hardware requirements and licensing fine print before deploying.
377,363 ↓ · 53 ♡
kandinsky-videomae-large-camera-motion targets video classification and is shipped as an open-weight, self-hostable checkpoint. It is a fine-tune of videomae-large, inheriting that base model's general competence. Treat kandinsky-videomae-large-camera-motion's published metrics as a starting point and validate against your workload.
323,151 ↓ · 5 ♡