반응형
# Problem
When I trained yolact model with following command, I got a below issue.
$ python train.py --config=custom_config_ty --batch_size=3
Multiple GPUs detected! Turning off JIT.
Scaling parameters by 0.38 to account for a batch size of 3.
Per-GPU batch size is less than the recommended limit for batch norm. Disabling batch norm.
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Initializing weights...
Begin training!
...
[ 0] 0 || B: 7.891 | C: 15.078 | M: 4.267 | S: 1.053 | T: 28.289 || ETA: 143 days, 6:44:08 || timer: 5.803
[ 0] 10 || B: 14.656 | C: 7.509 | M: 4.164 | S: 0.946 | T: 27.275 || ETA: 32 days, 11:15:16 || timer: 0.788
Traceback (most recent call last):
File "train.py", line 532, in <module>
train()
File "train.py", line 273, in train
for datum in data_loader:
File "/home/avs/anaconda3/envs/yolact/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 681, in __next__
data = self._next_data()
File "/home/avs/anaconda3/envs/yolact/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1359, in _next_data
idx, data = self._get_data()
File "/home/avs/anaconda3/envs/yolact/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1325, in _get_data
success, data = self._try_get_data()
File "/home/avs/anaconda3/envs/yolact/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1163, in _try_get_data
data = self._data_queue.get(timeout=timeout)
File "/home/avs/anaconda3/envs/yolact/lib/python3.7/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/home/avs/anaconda3/envs/yolact/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 303, in rebuild_storage_fd
shared_cache[fd_id(fd)] = StorageWeakRef(storage)
File "/home/avs/anaconda3/envs/yolact/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 65, in __setitem__
self.free_dead_references()
File "/home/avs/anaconda3/envs/yolact/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 70, in free_dead_references
if storage_ref.expired():
File "/home/avs/anaconda3/envs/yolact/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 35, in expired
return torch.Storage._expired(self.cdata) # type: ignore[attr-defined]
File "/home/avs/anaconda3/envs/yolact/lib/python3.7/site-packages/torch/storage.py", line 757, in _expired
return eval(cls.__module__)._UntypedStorage._expired(*args, **kwargs)
AttributeError: module 'torch.cuda' has no attribute '_UntypedStorage'
# Solution
It might happen due to mismatch of torch and cuda versions. (Matching with cuda version 11.3)
# CUDA 11.3
conda install pytorch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
# Reference
- [Pytorch] INSTALLING PREVIOUS VERSIONS OF PYTORCH: https://pytorch.org/get-started/previous-versions/
- [NVIDIA] AttributeError: module ‘torch.cuda’ has no attribute ‘_UntypedStorage’: https://forums.developer.nvidia.com/t/attributeerror-module-torch-cuda-has-no-attribute-untypedstorage/224108
반응형