We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Aborted (core dumped)
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Now, this never happened to me. On two different systems.
(kraken-5.3.0) incognito@DESKTOP-FVRLETC:~/kraken-train/catmus-medieval$ ketos segtrain -d cuda:0 -f alto -t output.txt -q early --resize both --schedule reduceonplateau -i /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/kraken/blla.mlmodel -o catmus_seg /catmus_med_seg_v1 Training line types: HeadingLine 2 2081 MusicLine 3 167 DefaultLine 4 95194 DropCapitalLine 5 1280 InterlinearLine 13 2835 TironianSignLine 14 282 default 20 39 Training region types: MainZone 6 1523 NumberingZone 7 613 GraphicZone 8 134 DropCapitalZone 9 689 MusicZone 10 16 MarginTextZone 11 364 RunningTitleZone 12 398 TitlePageZone 15 5 QuireMarksZone 16 94 DigitizationArtefactZone 17 28 StampZone 18 39 DamageZone 19 13 text 21 39 SealZone 22 3 GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores HPU available: False, using: 0 HPUs `Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch.. You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision [03/30/25 17:31:47] WARNING Setting baseline location to baseline from unset model. train.py:1032 LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] ┏━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ ┃ Name ┃ Type ┃ Params ┃ Mode ┃ In sizes ┃ Out sizes ┃ ┡━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ 0 │ net │ MultiParamSequential │ 1.3 M │ train │ [1, 3, 1800, 300] │ [[1, 23, 450, 75], '?'] │ │ 1 │ net.C_0 │ ActConv2D │ 9.5 K │ train │ [[1, 3, 1800, 300], '?'] │ [[1, 64, 900, 150], '?'] │ │ 2 │ net.Gn_1 │ GroupNorm │ 128 │ train │ [[1, 64, 900, 150], '?', '?'] │ [[1, 64, 900, 150], '?'] │ │ 3 │ net.C_2 │ ActConv2D │ 73.9 K │ train │ [[1, 64, 900, 150], '?', '?'] │ [[1, 128, 450, 75], '?'] │ │ 4 │ net.Gn_3 │ GroupNorm │ 256 │ train │ [[1, 128, 450, 75], '?', '?'] │ [[1, 128, 450, 75], '?'] │ │ 5 │ net.C_4 │ ActConv2D │ 147 K │ train │ [[1, 128, 450, 75], '?', '?'] │ [[1, 128, 450, 75], '?'] │ │ 6 │ net.Gn_5 │ GroupNorm │ 256 │ train │ [[1, 128, 450, 75], '?', '?'] │ [[1, 128, 450, 75], '?'] │ │ 7 │ net.C_6 │ ActConv2D │ 295 K │ train │ [[1, 128, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │ │ 8 │ net.Gn_7 │ GroupNorm │ 512 │ train │ [[1, 256, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │ │ 9 │ net.C_8 │ ActConv2D │ 590 K │ train │ [[1, 256, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │ │ 10 │ net.Gn_9 │ GroupNorm │ 512 │ train │ [[1, 256, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │ │ 11 │ net.L_10 │ TransposedSummarizingRNN │ 74.2 K │ train │ [[1, 256, 450, 75], '?', '?'] │ [[1, 64, 450, 75], '?'] │ │ 12 │ net.L_11 │ TransposedSummarizingRNN │ 25.1 K │ train │ [[1, 64, 450, 75], '?', '?'] │ [[1, 64, 450, 75], '?'] │ │ 13 │ net.C_12 │ ActConv2D │ 2.1 K │ train │ [[1, 64, 450, 75], '?', '?'] │ [[1, 32, 450, 75], '?'] │ │ 14 │ net.Gn_13 │ GroupNorm │ 64 │ train │ [[1, 32, 450, 75], '?', '?'] │ [[1, 32, 450, 75], '?'] │ │ 15 │ net.L_14 │ TransposedSummarizingRNN │ 16.9 K │ train │ [[1, 32, 450, 75], '?', '?'] │ [[1, 64, 450, 75], '?'] │ │ 16 │ net.L_15 │ TransposedSummarizingRNN │ 25.1 K │ train │ [[1, 64, 450, 75], '?', '?'] │ [[1, 64, 450, 75], '?'] │ │ 17 │ net.l_16 │ ActConv2D │ 1.5 K │ train │ [[1, 64, 450, 75], '?', '?'] │ [[1, 23, 450, 75], '?'] │ │ 18 │ val_px_accuracy │ MultilabelAccuracy │ 0 │ train │ ? │ ? │ │ 19 │ val_mean_accuracy │ MultilabelAccuracy │ 0 │ train │ ? │ ? │ │ 20 │ val_mean_iu │ MultilabelJaccardIndex │ 0 │ train │ ? │ ? │ │ 21 │ val_freq_iu │ MultilabelJaccardIndex │ 0 │ train │ ? │ ? │ └────┴───────────────────┴──────────────────────────┴────────┴───────┴───────────────────────────────┴──────────────────────────┘ Trainable params: 1.3 M Non-trainable params: 0 Total params: 1.3 M Total estimated model params size (MB): 5 Modules in train mode: 39 Modules in eval mode: 0 stage 0/∞ ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 35/1374 0:00:32 • 0:18:51 1.18it/s early_stopping: 0/10 -inf ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/incognito/miniconda3/envs/kraken-5.3.0/bin/ketos:8 in <module> │ │ │ │ 5 from kraken.ketos import cli │ │ 6 if __name__ == '__main__': │ │ 7 │ sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0]) │ │ ❱ 8 │ sys.exit(cli()) │ │ 9 │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/click/core.py:1161 in │ │ __call__ │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/click/core.py:1082 in │ │ main │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/click/core.py:1697 in │ │ invoke │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/click/core.py:1443 in │ │ invoke │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/click/core.py:788 in │ │ invoke │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/click/decorators.py:33 │ │ in new_func │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/kraken/ketos/segmentat │ │ ion.py:366 in segtrain │ │ │ │ 363 │ │ │ │ │ │ │ **val_check_interval) │ │ 364 │ │ │ 365 │ with threadpool_limits(limits=threads): │ │ ❱ 366 │ │ trainer.fit(model) │ │ 367 │ │ │ 368 │ if model.best_epoch == -1: │ │ 369 │ │ logger.warning('Model did not improve during training.') │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/kraken/lib/train.py:12 │ │ 9 in fit │ │ │ │ 126 │ │ with warnings.catch_warnings(): │ │ 127 │ │ │ warnings.filterwarnings(action='ignore', category=UserWarning, │ │ 128 │ │ │ │ │ │ │ │ │ message='The dataloader,') │ │ ❱ 129 │ │ │ super().fit(*args, **kwargs) │ │ 130 │ │ 131 │ │ 132 class KrakenFreezeBackbone(BaseFinetuning): │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/lightning/pytorch/trai │ │ ner/trainer.py:538 in fit │ │ │ │ 535 │ │ self.state.fn = TrainerFn.FITTING │ │ 536 │ │ self.state.status = TrainerStatus.RUNNING │ │ 537 │ │ self.training = True │ │ ❱ 538 │ │ call._call_and_handle_interrupt( │ │ 539 │ │ │ self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, │ │ 540 │ │ ) │ │ 541 │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/lightning/pytorch/trai │ │ ner/call.py:47 in _call_and_handle_interrupt │ │ │ │ 44 │ try: │ │ 45 │ │ if trainer.strategy.launcher is not None: │ │ 46 │ │ │ return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, │ │ ❱ 47 │ │ return trainer_fn(*args, **kwargs) │ │ 48 │ │ │ 49 │ except _TunerExitException: │ │ 50 │ │ _call_teardown_hook(trainer) │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/lightning/pytorch/trai │ │ ner/trainer.py:574 in _fit_impl │ │ │ │ 571 │ │ │ model_provided=True, │ │ 572 │ │ │ model_connected=self.lightning_module is not None, │ │ 573 │ │ ) │ │ ❱ 574 │ │ self._run(model, ckpt_path=ckpt_path) │ │ 575 │ │ │ │ 576 │ │ assert self.state.stopped │ │ 577 │ │ self.training = False │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/lightning/pytorch/trai │ │ ner/trainer.py:981 in _run │ │ │ │ 978 │ │ # ---------------------------- │ │ 979 │ │ # RUN THE TRAINER │ │ 980 │ │ # ---------------------------- │ │ ❱ 981 │ │ results = self._run_stage() │ │ 982 │ │ │ │ 983 │ │ # ---------------------------- │ │ 984 │ │ # POST-Training CLEAN UP │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/lightning/pytorch/trai │ │ ner/trainer.py:1025 in _run_stage │ │ │ │ 1022 │ │ │ with isolate_rng(): │ │ 1023 │ │ │ │ self._run_sanity_check() │ │ 1024 │ │ │ with torch.autograd.set_detect_anomaly(self._detect_anomaly): │ │ ❱ 1025 │ │ │ │ self.fit_loop.run() │ │ 1026 │ │ │ return None │ │ 1027 │ │ raise RuntimeError(f"Unexpected state {self.state}") │ │ 1028 │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/lightning/pytorch/loop │ │ s/fit_loop.py:205 in run │ │ │ │ 202 │ │ while not self.done: │ │ 203 │ │ │ try: │ │ 204 │ │ │ │ self.on_advance_start() │ │ ❱ 205 │ │ │ │ self.advance() │ │ 206 │ │ │ │ self.on_advance_end() │ │ 207 │ │ │ │ self._restarting = False │ │ 208 │ │ │ except StopIteration: │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/lightning/pytorch/loop │ │ s/fit_loop.py:363 in advance │ │ │ │ 360 │ │ │ ) │ │ 361 │ │ with self.trainer.profiler.profile("run_training_epoch"): │ │ 362 │ │ │ assert self._data_fetcher is not None │ │ ❱ 363 │ │ │ self.epoch_loop.run(self._data_fetcher) │ │ 364 │ │ │ 365 │ def on_advance_end(self) -> None: │ │ 366 │ │ trainer = self.trainer │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/lightning/pytorch/loop │ │ s/training_epoch_loop.py:140 in run │ │ │ │ 137 │ │ self.on_run_start(data_fetcher) │ │ 138 │ │ while not self.done: │ │ 139 │ │ │ try: │ │ ❱ 140 │ │ │ │ self.advance(data_fetcher) │ │ 141 │ │ │ │ self.on_advance_end(data_fetcher) │ │ 142 │ │ │ │ self._restarting = False │ │ 143 │ │ │ except StopIteration: │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/lightning/pytorch/loop │ │ s/training_epoch_loop.py:212 in advance │ │ │ │ 209 │ │ │ batch_idx = data_fetcher._batch_idx │ │ 210 │ │ else: │ │ 211 │ │ │ dataloader_iter = None │ │ ❱ 212 │ │ │ batch, _, __ = next(data_fetcher) │ │ 213 │ │ │ # TODO: we should instead use the batch_idx returned by the fetcher, however │ │ 214 │ │ │ # fetcher state so that the batch_idx is correct after restarting │ │ 215 │ │ │ batch_idx = self.batch_idx + 1 │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/lightning/pytorch/loop │ │ s/fetchers.py:133 in __next__ │ │ │ │ 130 │ │ │ │ self.done = not self.batches │ │ 131 │ │ elif not self.done: │ │ 132 │ │ │ # this will run only when no pre-fetching was done. │ │ ❱ 133 │ │ │ batch = super().__next__() │ │ 134 │ │ else: │ │ 135 │ │ │ # the iterator is empty │ │ 136 │ │ │ raise StopIteration │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/lightning/pytorch/loop │ │ s/fetchers.py:60 in __next__ │ │ │ │ 57 │ │ assert self.iterator is not None │ │ 58 │ │ self._start_profiler() │ │ 59 │ │ try: │ │ ❱ 60 │ │ │ batch = next(self.iterator) │ │ 61 │ │ except StopIteration: │ │ 62 │ │ │ self.done = True │ │ 63 │ │ │ raise │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/lightning/pytorch/util │ │ ities/combined_loader.py:341 in __next__ │ │ │ │ 338 │ │ │ 339 │ def __next__(self) -> _ITERATOR_RETURN: │ │ 340 │ │ assert self._iterator is not None │ │ ❱ 341 │ │ out = next(self._iterator) │ │ 342 │ │ if isinstance(self._iterator, _Sequential): │ │ 343 │ │ │ return out │ │ 344 │ │ out, batch_idx, dataloader_idx = out │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/lightning/pytorch/util │ │ ities/combined_loader.py:78 in __next__ │ │ │ │ 75 │ │ out = [None] * n # values per iterator │ │ 76 │ │ for i in range(n): │ │ 77 │ │ │ try: │ │ ❱ 78 │ │ │ │ out[i] = next(self.iterators[i]) │ │ 79 │ │ │ except StopIteration: │ │ 80 │ │ │ │ self._consumed[i] = True │ │ 81 │ │ │ │ if all(self._consumed): │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/torch/utils/data/datal │ │ oader.py:630 in __next__ │ │ │ │ 627 │ │ │ if self._sampler_iter is None: │ │ 628 │ │ │ │ # TODO(https://github.com/pytorch/pytorch/issues/76750) │ │ 629 │ │ │ │ self._reset() # type: ignore[call-arg] │ │ ❱ 630 │ │ │ data = self._next_data() │ │ 631 │ │ │ self._num_yielded += 1 │ │ 632 │ │ │ if self._dataset_kind == _DatasetKind.Iterable and \ │ │ 633 │ │ │ │ │ self._IterableDataset_len_called is not None and \ │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/torch/utils/data/datal │ │ oader.py:1344 in _next_data │ │ │ │ 1341 │ │ │ │ self._task_info[idx] += (data,) │ │ 1342 │ │ │ else: │ │ 1343 │ │ │ │ del self._task_info[idx] │ │ ❱ 1344 │ │ │ │ return self._process_data(data) │ │ 1345 │ │ │ 1346 │ def _try_put_index(self): │ │ 1347 │ │ assert self._tasks_outstanding < self._prefetch_factor * self._num_workers │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/torch/utils/data/datal │ │ oader.py:1370 in _process_data │ │ │ │ 1367 │ │ self._rcvd_idx += 1 │ │ 1368 │ │ self._try_put_index() │ │ 1369 │ │ if isinstance(data, ExceptionWrapper): │ │ ❱ 1370 │ │ │ data.reraise() │ │ 1371 │ │ return data │ │ 1372 │ │ │ 1373 │ def _mark_worker_as_unavailable(self, worker_id, shutdown=False): │ │ │ │ /home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/torch/_utils.py:706 in │ │ reraise │ │ │ │ 703 │ │ │ # If the exception takes multiple arguments, don't try to │ │ 704 │ │ │ # instantiate since we don't know how to │ │ 705 │ │ │ raise RuntimeError(msg) from None │ │ ❱ 706 │ │ raise exception │ │ 707 │ │ 708 │ │ 709 def _get_available_device_type(): │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: Caught RuntimeError in pin memory thread for device 0. Original Traceback (most recent call last): File "/home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/torch/utils/data/_utils/pin_memory.py", line 38, in do_one_step data = pin_memory(data, device) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/torch/utils/data/_utils/pin_memory.py", line 69, in pin_memory clone.update({k: pin_memory(sample, device) for k, sample in data.items()}) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/torch/utils/data/_utils/pin_memory.py", line 69, in <dictcomp> clone.update({k: pin_memory(sample, device) for k, sample in data.items()}) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/incognito/miniconda3/envs/kraken-5.3.0/lib/python3.11/site-packages/torch/utils/data/_utils/pin_memory.py", line 59, in pin_memory return data.pin_memory(device) ^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: CUDA error: out of memory CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. Aborted (core dumped)
This one on a Geforce GTX 3090 FOunders Edition 24Gb. I was watching the GPU with nvitop and never got 100% usage.
Geforce GTX 3090 FOunders Edition 24Gb
nvitop
(nvitop) incognito@DESKTOP-FVRLETC:~$ nvidia-smi Sun Mar 30 17:36:12 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 565.77.01 Driver Version: 566.36 CUDA Version: 12.7 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3090 On | 00000000:04:00.0 Off | N/A | | 0% 27C P8 8W / 350W | 0MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| | No running processes found | +-----------------------------------------------------------------------------------------+
Python modules:
(kraken-5.3.0) incognito@DESKTOP-FVRLETC:~/kraken-train/catmus-medieval$ pip list Package Version ------------------------- ----------- aiohappyeyeballs 2.4.6 aiohttp 3.11.13 aiosignal 1.3.2 attrs 25.1.0 cattrs 24.1.2 certifi 2025.1.31 cffi 1.17.1 charset-normalizer 3.4.1 click 8.1.8 coremltools 8.2 filelock 3.17.0 frozenlist 1.5.0 fsspec 2025.2.0 idna 3.10 imageio 2.37.0 importlib_resources 6.5.2 Jinja2 3.1.5 joblib 1.4.2 jsonschema 4.23.0 jsonschema-specifications 2024.10.1 kraken 5.3.0 lazy_loader 0.4 lightning 2.4.0 lightning-utilities 0.12.0 lxml 5.3.1 markdown-it-py 3.0.0 MarkupSafe 3.0.2 mdurl 0.1.2 mpmath 1.3.0 multidict 6.1.0 networkx 3.4.2 numpy 2.0.2 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 9.1.0.70 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.8.61 nvidia-nvtx-cu12 12.1.105 packaging 24.2 pillow 11.1.0 pip 25.0 propcache 0.3.0 protobuf 5.29.3 pyaml 25.1.0 pyarrow 19.0.1 pycparser 2.22 Pygments 2.19.1 python-bidi 0.6.6 pytorch-lightning 2.5.0.post0 pyvips 2.2.3 PyYAML 6.0.2 referencing 0.36.2 regex 2024.11.6 requests 2.32.3 rich 13.9.4 rpds-py 0.23.1 scikit-image 0.24.0 scikit-learn 1.5.2 scipy 1.13.1 setuptools 75.8.0 shapely 2.0.7 sympy 1.13.3 threadpoolctl 3.5.0 tifffile 2025.2.18 torch 2.4.1 torchmetrics 1.6.1 torchvision 0.19.1 tqdm 4.67.1 triton 3.0.0 typing_extensions 4.12.2 urllib3 2.3.0 wheel 0.45.1 yarl 1.18.3
The dataset is CATMuS/medieval-segmentation
CATMuS/medieval-segmentation
On the same GPU and dataset I trained all yolo11 segmentation models (from nano to x large) without any issue.
yolo11
The text was updated successfully, but these errors were encountered:
Exactly the same dataset on exactly the same GPU ran on PyLaia training, no crash:
PyLaia
(pylaia-py3.10) incognito@DESKTOP-H1BS9PO:~/pylaia-latest-train/Catmus_medieval$ pylaia-htr-train-ctc --config config_train_model.yaml Global seed set to 74565 GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] | Name | Type | Params --------------------------------------- 0 | model | LaiaCRNN | 5.4 M 1 | criterion | CTCLoss | 0 --------------------------------------- 5.4 M Trainable params 0 Non-trainable params 5.4 M Total params 21.678 Total estimated model params size (MB) Global seed set to 74565 TR - E0: 10%|████████████████████▋ | 1875/19102 [04:06<37:40, 7.62it/s, running_loss=107]
Sorry, something went wrong.
No branches or pull requests
Uh oh!
There was an error while loading. Please reload this page.
Now, this never happened to me. On two different systems.
This one on a
Geforce GTX 3090 FOunders Edition 24Gb
. I was watching the GPU withnvitop
and never got 100% usage.Python modules:
The dataset is
CATMuS/medieval-segmentation
On the same GPU and dataset I trained all
yolo11
segmentation models (from nano to x large) without any issue.The text was updated successfully, but these errors were encountered: