Skip to content

Qwen3 MoE模型训练GPU使用率很低 #8117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
haichuan1221 opened this issue May 20, 2025 · 1 comment
Open

Qwen3 MoE模型训练GPU使用率很低 #8117

haichuan1221 opened this issue May 20, 2025 · 1 comment
Labels
bug Something isn't working pending This problem is yet to be addressed

Comments

@haichuan1221
Copy link

haichuan1221 commented May 20, 2025

llama factory 2台8卡H800 lora微调Qwen3-235B-A22的时候,GPU SM使用率很低且不稳定

Image

@haichuan1221 haichuan1221 added bug Something isn't working pending This problem is yet to be addressed labels May 20, 2025
@Kuangdd01
Copy link
Collaborator

目前用transformers modeling的方式后训练moe模型都会慢很多,QwenLM/Qwen3#736 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

2 participants