DeepSeek-R1-Distill-Qwen SFT训练问题 #6833

TW-NLP · 2025-02-06T07:25:57Z

Reminder

I have read the above rules and searched the existing issues.

System Info

训练数据是alpaca格式，模版设置为deepseek3，为什么LoRA训练后，正常的问答也不行了，领域任务也很差，请问是数据集的格式问题吗？
数据格式如下：
{"instruction": "文本纠错", "input": "目前区次事件的细节还不清楚，伤亡人数也未确定。", "output": "目前这次事件的细节还不清楚，伤亡人数也未确定。"}，如果不是这个类型的数据集，可以提供R1 SFT数据实例吗？

Reproduction

Put your message here.

Others

No response

The text was updated successfully, but these errors were encountered:

Christoph-XJ · 2025-02-06T07:59:24Z

the same problem

Amo5 · 2025-02-06T08:13:51Z

Reminder

I have read the above rules and searched the existing issues.

System Info

训练数据是alpaca格式，模版设置为deepseek3，为什么LoRA训练后，正常的问答也不行了，领域任务也很差，请问是数据集的格式问题吗？数据格式如下： {"instruction": "文本纠错", "input": "目前区次事件的细节还不清楚，伤亡人数也未确定。", "output": "目前这次事件的细节还不清楚，伤亡人数也未确定。"}，如果不是这个类型的数据集，可以提供R1 SFT数据实例吗？

Reproduction
Put your message here.
Others

No response

R1蒸馏过的模型输出都是包含think和answer两部分，那作SFT的时候，数据集里面也要包含think这部分吗？目前只有成对的指令微调数据，不知道能不能对R1蒸馏过的模型做SFT训练

datalee · 2025-02-06T09:09:01Z

Reminder

I have read the above rules and searched the existing issues.

System Info

训练数据是alpaca格式，模版设置为deepseek3，为什么LoRA训练后，正常的问答也不行了，领域任务也很差，请问是数据集的格式问题吗？数据格式如下： {"instruction": "文本纠错", "input": "目前区次事件的细节还不清楚，伤亡人数也未确定。", "output": "目前这次事件的细节还不清楚，伤亡人数也未确定。"}，如果不是这个类型的数据集，可以提供R1 SFT数据实例吗？

Reproduction
Put your message here.
Others

No response
R1蒸馏过的模型输出都是包含think和answer两部分，那作SFT的时候，数据集里面也要包含think这部分吗？目前只有成对的指令微调数据，不知道能不能对R1蒸馏过的模型做SFT训练

只有指令数据，你凑什么热闹，你想用他啥

bluryar · 2025-02-06T09:23:01Z

我反而是没法进入的reasoning过程了，变成普通的qwen2.5了：https://huggingface.co/bluryar/DeepSeek-R1-Distill-Qwen-1.5B-sft

也许加system prompt做控制，然后训练数据混合一些r1的蒸馏数据会可控一点。

TW-NLP · 2025-02-07T01:11:07Z

如果是新的数据格式，是不是应该在data目录下，给出实例数据集？ @hiyouga

WellAllIn · 2025-02-07T13:40:57Z

想请问一下您的训练参数呢?我训练loss一直为0，调参也没有解决。

TW-NLP · 2025-02-08T01:24:30Z

想请问一下您的训练参数呢?我训练loss一直为0，调参也没有解决。

train

per_device_train_batch_size: 4
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
数据是alpaca格式，训练的话，loss是递减的，但是我感觉我这样训练不对，数据是否应该和蒸馏数据格式保持一致？

TC10127 · 2025-02-08T02:56:23Z

请问您在进行训练的时候，template设置为deepseek3，有报这个错误吗？

WellAllIn · 2025-02-08T03:00:38Z

请问您在进行训练的时候，template设置为deepseek3，有报这个错误吗？

你拉去一下最新的代码应该就好了

WellAllIn · 2025-02-08T03:01:19Z

per_device_train_batch_size: 4
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

谢谢，我也在看它的具体数据格式要求。

hiyouga · 2025-02-08T16:37:46Z

see https://huggingface.co/datasets/ServiceNow-AI/R1-Distill-SFT?row=0

lkj7b226 · 2025-02-11T09:06:37Z

the same problem

zzwtop1 · 2025-02-13T09:52:33Z

我找到了一个应该适合deepseek3微调的数据格式，参考：https://huggingface.co/datasets/horus-ai-labs/R1-Dstill-SFT-gsm8k?row=3
后续我试下效果，再来反馈

TW-NLP · 2025-02-14T01:24:48Z

我找到了一个应该适合deepseek3微调的数据格式，参考：https://huggingface.co/datasets/horus-ai-labs/R1-Dstill-SFT-gsm8k?row=3 后续我试下效果，再来反馈
感谢分享，我去试试

zzwtop1 · 2025-02-18T02:11:04Z

我用的sharegpt格式的数据进行的微调，带思维链的数据进行微调感觉效果尚可，这是我拿开源中医数据整理成的数据格式，里面的“\”之类的格式可以自己改，我微调完最终答案总是开头带个\

val_medical_o1_sft_Chinese1.json

TW-NLP · 2025-02-18T06:44:09Z

我用的sharegpt格式的数据进行的微调，带思维链的数据进行微调感觉效果尚可，这是我拿开源中医数据整理成的数据格式，里面的“\”之类的格式可以自己改，我微调完最终答案总是开头带个\

val_medical_o1_sft_Chinese1.json

先不SFT，试试GRPO，看看效果

fxb392 · 2025-02-26T15:41:09Z

@zzwtop1 兄弟您好，这个中医数据您是咋做的呢？

fxb392 · 2025-02-26T15:42:53Z

继续训练qwen的蒸馏模型不可以用qwen2这个模板吗？为什么都用deepseek3模板呢？

TW-NLP added bug Something isn't working pending This problem is yet to be addressed labels Feb 6, 2025

hiyouga mentioned this issue Feb 8, 2025

[dataset] add openthought dataset #6866

Merged

2 tasks

hiyouga closed this as completed Feb 8, 2025

hiyouga added solved This problem has been already solved and removed bug Something isn't working pending This problem is yet to be addressed labels Feb 8, 2025

DeepSeek-R1-Distill-Qwen SFT训练问题 #6833

DeepSeek-R1-Distill-Qwen SFT训练问题 #6833

Comments

TW-NLP commented Feb 6, 2025

Reminder

System Info

Reproduction

Others

Christoph-XJ commented Feb 6, 2025

Uh oh!

Amo5 commented Feb 6, 2025

Reminder

System Info

Reproduction

Others

Uh oh!

datalee commented Feb 6, 2025

Reminder

System Info

Reproduction

Others

Uh oh!

bluryar commented Feb 6, 2025

Uh oh!

TW-NLP commented Feb 7, 2025

Uh oh!

WellAllIn commented Feb 7, 2025

Uh oh!

TW-NLP commented Feb 8, 2025

train

Uh oh!

TC10127 commented Feb 8, 2025

Uh oh!

WellAllIn commented Feb 8, 2025

Uh oh!

WellAllIn commented Feb 8, 2025

Uh oh!

hiyouga commented Feb 8, 2025

Uh oh!

lkj7b226 commented Feb 11, 2025

Uh oh!

zzwtop1 commented Feb 13, 2025

Uh oh!

TW-NLP commented Feb 14, 2025

Uh oh!

zzwtop1 commented Feb 18, 2025

Uh oh!

TW-NLP commented Feb 18, 2025

Uh oh!

fxb392 commented Feb 26, 2025

Uh oh!

fxb392 commented Feb 26, 2025

Uh oh!