object detection models

9 models · ranked by HuggingFace downloads

table-transformer-detection

A DETR-based object detection model from Microsoft Research trained to locate tables in document images. It is the detection stage in a two-step pipeline — a separate structure recognition model then parses the detected table's rows and columns.

1,701,134 ↓ · 425 ♡

table-transformer-structure-recognition

table-transformer-structure-recognition is an openly licensed object detection model. table-transformer-structure-recognition is MIT-licensed, clearing it for closed-source and paid products. Like most open checkpoints, table-transformer-structure-recognition rewards a quick in-domain eval before commitment.

1,673,396 ↓ · 220 ♡

detr-resnet-50

DETR (Detection Transformer) with a ResNet-50 backbone reformulates object detection as a direct set prediction problem, eliminating anchor generation and NMS post-processing. Trained on COCO, it uses a transformer encoder-decoder to output a fixed set of object predictions in a single forward pass. With 956 likes and over 850k downloads, it remains one of the most widely referenced end-to-end detection baselines.

916,010 ↓ · 957 ♡

yolos-small

yolos-small targets object detection and is shipped as an open-weight, self-hostable checkpoint. Permissive Apache 2.0 terms let yolos-small go straight into commercial pipelines. Treat yolos-small's published metrics as a starting point and validate against your workload.

728,149 ↓ · 95 ♡

table-transformer-structure-recognition-v1.1-all

table-transformer-structure-recognition-v1.1-all targets object detection and is shipped as an open-weight, self-hostable checkpoint. Permissive MIT terms let table-transformer-structure-recognition-v1.1-all go straight into commercial pipelines. Evaluate table-transformer-structure-recognition-v1.1-all on your own data before trusting it in production.

643,667 ↓ · 83 ♡

rtdetr_v2_r50vd

rtdetr_v2_r50vd is a Real-Time DEtection TRansformer v2 built on a ResNet-50vd backbone, trained on COCO. RT-DETRv2 improves over RT-DETRv1 with flexible denoising training and faster convergence, achieving real-time detection without NMS post-processing. The ResNet-50vd variant targets the speed-accuracy balance point for production deployment.

407,807 ↓ · 28 ♡

yolos-fashionpedia

yolos-fashionpedia targets object detection and is shipped as an open-weight, self-hostable checkpoint. Permissive MIT terms let yolos-fashionpedia go straight into commercial pipelines. Treat yolos-fashionpedia's published metrics as a starting point and validate against your workload.

402,585 ↓ · 145 ♡

PP-DocLayoutV3_safetensors

PP-DocLayoutV3 is PaddleOCR's third-generation document layout detection model, converted to safetensors format for HuggingFace compatibility. It performs object detection to identify layout regions — text blocks, tables, figures, formulas, headings — in document images using a transformer-based backbone. The model is a building block in PaddleOCR's full document parsing pipeline.

364,570 ↓ · 28 ♡

detr-doc-table-detection

detr-doc-table-detection is an open-weight model aimed at object detection. Permissive Apache 2.0 terms let detr-doc-table-detection go straight into commercial pipelines. Before relying on detr-doc-table-detection, reproduce its key numbers on representative inputs.

232,864 ↓ · 63 ♡