Qwen3 MoE模型训练GPU使用率很低 #8117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

haichuan1221 opened this issue May 20, 2025 · 1 comment

Labels

bug pending

haichuan1221 commented May 20, 2025 •

edited

Loading

llama factory 2台8卡H800 lora微调Qwen3-235B-A22的时候，GPU SM使用率很低且不稳定

haichuan1221 added bug pending labels

Collaborator

Kuangdd01 commented May 20, 2025

目前用transformers modeling的方式后训练moe模型都会慢很多，QwenLM/Qwen3#736 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment