Skip to content

DeepSeek-R1-Distill-Qwen SFT训练问题 #6833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
TW-NLP opened this issue Feb 6, 2025 · 18 comments · Fixed by #6866
Closed
1 task done

DeepSeek-R1-Distill-Qwen SFT训练问题 #6833

TW-NLP opened this issue Feb 6, 2025 · 18 comments · Fixed by #6866
Labels
solved This problem has been already solved

Comments

@TW-NLP
Copy link

TW-NLP commented Feb 6, 2025

Reminder

  • I have read the above rules and searched the existing issues.

System Info

训练数据是alpaca格式,模版设置为deepseek3,为什么LoRA训练后,正常的问答也不行了,领域任务也很差,请问是数据集的格式问题吗?
数据格式如下:
{"instruction": "文本纠错", "input": "目前区次事件的细节还不清楚,伤亡人数也未确定。", "output": "目前这次事件的细节还不清楚,伤亡人数也未确定。"},如果不是这个类型的数据集,可以提供R1 SFT数据实例吗?

Reproduction

Put your message here.

Others

No response

@TW-NLP TW-NLP added bug Something isn't working pending This problem is yet to be addressed labels Feb 6, 2025
@Christoph-XJ
Copy link

the same problem

@Amo5
Copy link

Amo5 commented Feb 6, 2025

Reminder

  • I have read the above rules and searched the existing issues.

System Info

训练数据是alpaca格式,模版设置为deepseek3,为什么LoRA训练后,正常的问答也不行了,领域任务也很差,请问是数据集的格式问题吗? 数据格式如下: {"instruction": "文本纠错", "input": "目前区次事件的细节还不清楚,伤亡人数也未确定。", "output": "目前这次事件的细节还不清楚,伤亡人数也未确定。"},如果不是这个类型的数据集,可以提供R1 SFT数据实例吗?

Reproduction

Put your message here.

Others

No response

R1蒸馏过的模型输出都是包含think和answer两部分, 那作SFT的时候,数据集里面也要包含think这部分吗? 目前只有成对的指令微调数据,不知道能不能对R1蒸馏过的模型做SFT训练

@datalee
Copy link

datalee commented Feb 6, 2025

Reminder

  • I have read the above rules and searched the existing issues.

System Info

训练数据是alpaca格式,模版设置为deepseek3,为什么LoRA训练后,正常的问答也不行了,领域任务也很差,请问是数据集的格式问题吗? 数据格式如下: {"instruction": "文本纠错", "input": "目前区次事件的细节还不清楚,伤亡人数也未确定。", "output": "目前这次事件的细节还不清楚,伤亡人数也未确定。"},如果不是这个类型的数据集,可以提供R1 SFT数据实例吗?

Reproduction

Put your message here.

Others

No response

R1蒸馏过的模型输出都是包含think和answer两部分, 那作SFT的时候,数据集里面也要包含think这部分吗? 目前只有成对的指令微调数据,不知道能不能对R1蒸馏过的模型做SFT训练

只有指令数据,你凑什么热闹,你想用他啥

@bluryar
Copy link

bluryar commented Feb 6, 2025

我反而是没法进入的reasoning过程了,变成普通的qwen2.5了:https://huggingface.co/bluryar/DeepSeek-R1-Distill-Qwen-1.5B-sft

也许加system prompt做控制,然后训练数据混合一些r1的蒸馏数据会可控一点。

@TW-NLP
Copy link
Author

TW-NLP commented Feb 7, 2025

如果是新的数据格式,是不是应该在data目录下,给出实例数据集? @hiyouga

@WellAllIn
Copy link

想请问一下您的训练参数呢?我训练loss一直为0,调参也没有解决。

@TW-NLP
Copy link
Author

TW-NLP commented Feb 8, 2025

想请问一下您的训练参数呢?我训练loss一直为0,调参也没有解决。

train

per_device_train_batch_size: 4
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
数据是alpaca格式,训练的话,loss是递减的,但是我感觉我这样训练不对,数据是否应该和蒸馏数据格式保持一致?

@TC10127
Copy link

TC10127 commented Feb 8, 2025

请问您在进行训练的时候,template设置为deepseek3,有报这个错误吗?

Image

@WellAllIn
Copy link

请问您在进行训练的时候,template设置为deepseek3,有报这个错误吗?

Image

你拉去一下最新的代码应该就好了

@WellAllIn
Copy link

per_device_train_batch_size: 4
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

谢谢,我也在看它的具体数据格式要求。

@hiyouga
Copy link
Owner

hiyouga commented Feb 8, 2025

@hiyouga hiyouga closed this as completed Feb 8, 2025
@hiyouga hiyouga added solved This problem has been already solved and removed bug Something isn't working pending This problem is yet to be addressed labels Feb 8, 2025
@lkj7b226
Copy link

the same problem

@zzwtop1
Copy link

zzwtop1 commented Feb 13, 2025

我找到了一个应该适合deepseek3微调的数据格式,参考:https://huggingface.co/datasets/horus-ai-labs/R1-Dstill-SFT-gsm8k?row=3
后续我试下效果,再来反馈

@TW-NLP
Copy link
Author

TW-NLP commented Feb 14, 2025

我找到了一个应该适合deepseek3微调的数据格式,参考:https://huggingface.co/datasets/horus-ai-labs/R1-Dstill-SFT-gsm8k?row=3 后续我试下效果,再来反馈
感谢分享,我去试试

@zzwtop1
Copy link

zzwtop1 commented Feb 18, 2025

我用的sharegpt格式的数据进行的微调,带思维链的数据进行微调感觉效果尚可,这是我拿开源中医数据整理成的数据格式,里面的“\”之类的格式可以自己改,我微调完最终答案总是开头带个\

val_medical_o1_sft_Chinese1.json

@TW-NLP
Copy link
Author

TW-NLP commented Feb 18, 2025

我用的sharegpt格式的数据进行的微调,带思维链的数据进行微调感觉效果尚可,这是我拿开源中医数据整理成的数据格式,里面的“\”之类的格式可以自己改,我微调完最终答案总是开头带个\

val_medical_o1_sft_Chinese1.json

先不SFT,试试GRPO,看看效果

@fxb392
Copy link

fxb392 commented Feb 26, 2025

@zzwtop1 兄弟您好,这个中医数据您是咋做的呢?

@fxb392
Copy link

fxb392 commented Feb 26, 2025

继续训练qwen的蒸馏模型不可以用qwen2这个模板吗?为什么都用deepseek3模板呢?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging a pull request may close this issue.