ViDT models trained for 50 and 150 epochs
There are ViDT pre-trained models for 50 and 150 epochs with different model sizes (from nano to base).
We activated auxiliary decoding loss and iterative box refinement.
There are ViDT pre-trained models for 50 and 150 epochs with different model sizes (from nano to base).
We activated auxiliary decoding loss and iterative box refinement.