You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, it's a normal, because of the multi-scale training and denoising query, the model's memory usage is not that stable, it may takes about more than 12GB of 2080Ti, you can try to use fp16 training or lower the total_batch_size to skip this issue, or you can try to add activation checkpoint to reduce the memory usage of the total model
Divice info
sys.platform linux
Python 3.10.0 (default, Mar 3 2022, 09:58:08) [GCC 7.5.0]
numpy 1.22.4
detectron2 0.6 @/home/lolikonloli/code/detection/package/detrex/detectron2/detectron2
Compiler GCC 11.4
CUDA compiler CUDA 11.8
detectron2 arch flags 7.5
DETECTRON2_ENV_MODULE
PyTorch 2.0.1+cu118 @/home/lolikonloli/anaconda3/envs/pl_det/lib/python3.10/site-packages/torch
PyTorch debug build False
GPU available Yes
GPU 0,1 NVIDIA GeForce RTX 2080 Ti (arch=7.5)
Driver version 535.104.05
CUDA_HOME /usr/local/cuda-11.8
Pillow 9.3.0
torchvision 0.15.2+cu118 @/home/lolikonloli/anaconda3/envs/pl_det/lib/python3.10/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20221221
iopath 0.1.9
cv2 4.8.0
PyTorch built with:
describe
Memory continuously increases during DINO training with two 2080ti GPUs until it gets killed by the system.
The text was updated successfully, but these errors were encountered: