5090.yml文件拉取镜像报错 #450

xiaotang-12-ops · 2025-05-15T01:33:41Z

报错如下：
Error response from daemon: Get "https://registry-1.docker.io/v2/guiji2025/fish-speech-5090/manifests/sha256:ec9aabf14419d10f3823f8a73bc6ded71cb6d112833018965dd618a88a3c9f85": EOF

我之前拉取过一次这个镜像是没问题的，后来我发现声音很沙哑然后我把heygem的所有的镜像都删掉了客户端也卸载了，代码重新git clone然后拉取5090.yml文件镜像的时候出现上面问题，有人知道怎么回事吗？

LegendaryM · 2025-05-15T06:46:08Z

The last error is EOF, right? Usually, the EOF OF when Docker pulls images is a network issue. You can try again. It should be able to solve it

xiaotang-12-ops · 2025-05-15T07:54:18Z

最后一个错误是 EOF，对吧？Docker 拉取镜像时出现 EOF 通常是网络问题。你可以再试一次，应该能解决。

谢谢你，确实是这个问题，我的5090电脑训练出来的数字人视频声音依旧是很沙哑的...我注意到5090配置文件只有两个服务，会和这个有关系吗

LegendaryM · 2025-05-15T08:00:58Z

最后一个错误是 EOF，对吧？Docker 拉取镜像时出现 EOF 通常是网络问题。你可以再试一次，应该能解决。

谢谢你，确实是这个问题，我的5090电脑训练出来的数字人视频声音依旧是很沙哑的...我注意到5090配置文件只有两个服务，会和这个有关系吗

Could you please provide the problematic audio and video so that our developers can locate them in detail

xiaotang-12-ops · 2025-05-15T08:39:13Z

最后一个错误是 EOF，对吧？Docker 拉取镜像时出现 EOF 通常是网络问题。你可以再试一次，应该能解决。

谢谢你，确实是这个问题，我的5090电脑训练出来的数字人视频声音依旧是很沙哑的...我注意到5090配置文件只有两个服务，会和这个有关系吗

Could you please provide the problematic audio and video so that our developers can locate them in detail

It seems that there is no way to upload long videos on GitHub ..

xiaotang-12-ops · 2025-05-15T09:02:28Z

最后一个错误是 EOF，对吧？Docker 拉取镜像时出现 EOF 通常是网络问题。你可以再试一次，应该能解决。

谢谢你，确实是这个问题，我的5090电脑训练出来的数字人视频声音依旧是很沙哑的...我注意到5090配置文件只有两个服务，会和这个有关系吗

Could you please provide the problematic audio and video so that our developers can locate them in detail

Can you download the video by visiting this link?
https://raw.githubusercontent.com/xiaotang-12-ops/my-videos/main/cbdd8806433f9f3bc87e25cb3524bb6c.mp4

LegendaryM · 2025-05-16T02:08:32Z

https://raw.githubusercontent.com/xiaotang-12-ops/my-videos/main/cbdd8806433f9f3bc87e25cb3524bb6c.mp4

Ok. I can download it normally here. Thank you for providing the materials. The developers have been informed and I believe it will be fixed soon

simplify123 · 2025-05-21T10:32:44Z

我的是从2080显卡今天换成5090d显卡后，这个数字人没办法用了，提示cuda内核错误，怎么解决？你是怎么跑起来的？
`==========
== CUDA ==

CUDA Version 12.1.1
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
Matplotlib is building the font cache; this may take a moment.
taskset: bad usage
Try 'taskset --help' for more information.
INFO:gjtts_server:加载自定义姓名多音字 [tools/text_norm/front_end/utils/name_polyphone.json]
INFO: Started server process [1]
INFO: Waiting for application startup.
DEBUG:gjtts_server:语言类型 CN_EN
DEBUG:gjtts_server:加载自定义单位 [/code/tools/text_norm/front_end/normalize/config/units.json]
DEBUG:gjtts_server:加载自定义单位 [/code/tools/text_norm/front_end/normalize/config/units.json]
DEBUG:gjtts_server:加载自定义单位 [/code/tools/text_norm/front_end/normalize/config/units.json]
2025-05-21 10:20:12.909 | INFO | tools.llama.generate:load_model:682 - Restored model from checkpoint
2025-05-21 10:20:12.910 | INFO | tools.llama.generate:load_model:688 - Using DualARTransformer
Exception in thread Thread-2 (worker):
Traceback (most recent call last):
File "/opt/conda/envs/python310/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/opt/conda/envs/python310/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/code/tools/llama/generate.py", line 916, in worker
model.setup_caches(
File "/code/fish_speech/models/text2semantic/llama.py", line 575, in setup_caches
super().setup_caches(max_batch_size, max_seq_len, dtype)
File "/code/fish_speech/models/text2semantic/llama.py", line 241, in setup_caches
b.attention.kv_cache = KVCache(
File "/code/fish_speech/models/text2semantic/llama.py", line 139, in init
self.register_buffer("k_cache", torch.zeros(cache_shape, dtype=dtype))
File "/opt/conda/envs/python310/lib/python3.10/site-packages/torch/utils/_device.py", line 78, in torch_function
return func(*args, **kwargs)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.`

xiaotang-12-ops · 2025-05-22T06:56:11Z

我是从2080显卡今天换成5090d显卡后，这个数字人没办法用了，提示cuda内核错误，怎么解决？你是怎么跑起来的？

========== == CUDA == CUDA 版本 12.1.1 容器镜像版权所有 (c) 2016-2023，NVIDIA CORPORATION & AFFILIATES。保留所有权利。此容器镜像及其内容受 NVIDIA 深度学习容器许可证管辖。拉取和使用容器即表示您接受此许可证的条款和条件： https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license 为方便您使用，此容器中提供了此许可证的副本，网址为 /NGC-DL-CONTAINER-LICENSE。Matplotlib 正在构建字体缓存；这可能需要一些时间。taskset ：使用不当请尝试“taskset --help”获取更多信息。INFO :gjtts_server:加载自定义姓名多音字 [tools/text_norm/front_end/utils/name_polyphone.json] INFO: 已启动服务器进程 [1] INFO: 正在等待应用程序启动。 DEBUG:gjtts_server:语言类型 CN_EN DEBUG:gjtts_server:加载自定义单位 [/code/tools/text_norm/front_end/normalize/config/units.json] DEBUG:gjtts_server:加载自定义单位 [/code/tools/text_norm/front_end/normalize/config/units.json] DEBUG:gjtts_server:加载自定义单位[/code/tools/text_norm/front_end/normalize/config/units.json] 2025-05-21 10:20:12.909 |信息| tools.llama.generate:load_model:682 - 从检查点恢复模型 2025-05-21 10:20:12.910 |信息| tools.llama.generate:load_model:688 - 使用 DualARTransformer 线程 Thread-2（worker）中的异常：回溯（最近一次调用最后一次）：文件“/opt/conda/envs/python310/lib/python3.10/threading.py”，第 1016 行，在 _bootstrap_inner self.run() 文件“/opt/conda/envs/python310/lib/python3.10/threading.py”，第 953 行，在运行中 self._target(*self._args, **self._kwargs) 文件“/code/tools/llama/generate.py”，第 916 行，在 worker model.setup_caches（文件“/code/fish_speech/models/text2semantic/llama.py”，第 575 行，在 setup_caches super().setup_caches(max_batch_size, max_seq_len, dtype) 文件“/code/fish_speech/models/text2semantic/llama.py”，第 241 行，在 setup_caches 中 b.attention.kv_cache = KVCache( 文件“/code/fish_speech/models/text2semantic/llama.py”，第 139 行，在**init**中 self.register_buffer("k_cache", torch.zeros(cache_shape, dtype=dtype)) 文件“/opt/conda/envs/python310/lib/python3.10/site-packages/torch/utils/_device.py”，第 78 行，在**torch_function**中 return func(*args, **kwargs) RuntimeError: CUDA 错误：没有可在设备上执行的内核映像 CUDA 内核错误可能会在其他一些 API 调用中异步报告，因此下面的堆栈跟踪可能不正确。为了进行调试，请考虑传递 CUDA_LAUNCH_BLOCKING=1。使用以下方式编译TORCH_USE_CUDA_DSA`启用设备端断言。

好奇怪，我记得我回你了，怎么这里看不到记录，我都还看到你的回复了来着

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

5090.yml文件拉取镜像报错 #450

5090.yml文件拉取镜像报错 #450

xiaotang-12-ops commented May 15, 2025

LegendaryM commented May 15, 2025

Uh oh!

xiaotang-12-ops commented May 15, 2025

Uh oh!

LegendaryM commented May 15, 2025

Uh oh!

xiaotang-12-ops commented May 15, 2025

Uh oh!

xiaotang-12-ops commented May 15, 2025

Uh oh!

LegendaryM commented May 16, 2025

Uh oh!

simplify123 commented May 21, 2025

Uh oh!

xiaotang-12-ops commented May 22, 2025

我是从2080显卡今天换成5090d显卡后，这个数字人没办法用了，提示cuda内核错误，怎么解决？你是怎么跑起来的？

Uh oh!

5090.yml文件拉取镜像报错 #450

5090.yml文件拉取镜像报错 #450

Comments

xiaotang-12-ops commented May 15, 2025

LegendaryM commented May 15, 2025

Uh oh!

xiaotang-12-ops commented May 15, 2025

Uh oh!

LegendaryM commented May 15, 2025

Uh oh!

xiaotang-12-ops commented May 15, 2025

Uh oh!

xiaotang-12-ops commented May 15, 2025

Uh oh!

LegendaryM commented May 16, 2025

Uh oh!

simplify123 commented May 21, 2025

我的是从2080显卡今天换成5090d显卡后，这个数字人没办法用了，提示cuda内核错误，怎么解决？你是怎么跑起来的？ `========== == CUDA ==

Uh oh!

xiaotang-12-ops commented May 22, 2025

我是从2080显卡今天换成5090d显卡后，这个数字人没办法用了，提示cuda内核错误，怎么解决？你是怎么跑起来的？

Uh oh!

我的是从2080显卡今天换成5090d显卡后，这个数字人没办法用了，提示cuda内核错误，怎么解决？你是怎么跑起来的？
`==========
== CUDA ==