Skip to content

add Sequence Parallelism (reopened #6506) #7338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

HaoshengZou
Copy link

What does this PR do?

add Sequence Parallelism (reopened #6506)

Before submitting

@hiyouga hiyouga added the pending This problem is yet to be addressed label Mar 17, 2025
@hiyouga
Copy link
Owner

hiyouga commented Mar 17, 2025

Nice work, we expect to add this feature to LlamaFactory v1.0

@xiaosu-zhu
Copy link
Contributor

#7335 Has a new implementation with Deepspeed Ulysses, please consider it as an alternative solution.

@HaoshengZou
Copy link
Author

@xiaosu-zhu Thanks for pointing it out!

Deepspeed Ulysses is always on our ToDo list and we have already integrated it recently and now pushed here, supporting --sequence_parallel_mode zigzag-ring / ulysses.
Correctness is thoroughly verified and training speed tested similarly here.
Our Ulysses integration is based on yunchang for now but could be stripped with core functions adapted and source-acknowledged.

This PR reopened #6506, which was opened on Jan.2 and accidentally closed last week #6506 (comment).

#7335 may represent alternative nice ways to split SP data and monkey patching. Still, to get SP 100% correct requires much more detailed work in dealing with padding, labels and loss computation etc. (as we've discussed Swift here), followed by thorough testing.

We are confident that our implementation is 100% correct up to numerical differences inherent to SP, and our implementation bears close-to-minimal modular code change to original LLaMA-Factory.

@zhijie-reallm
Copy link

does it support DPO?

@HaoshengZou
Copy link
Author

@zhijie-reallm yeah

@zhijie-reallm
Copy link

does it support liger_kernel?

@HaoshengZou
Copy link
Author

@zhijie-reallm SP is not compatible with liger_kernel right now, but we are working on it.

@burkun
Copy link

burkun commented Apr 17, 2025

@zhijie-reallm SP is not compatible with liger_kernel right now, but we are working on it.
use_unsloth_gc: true enable_liger_kernel: true packing: true
can support those features?

@burkun
Copy link

burkun commented Apr 21, 2025

在src/llamafactory/data/processor/supervised.py
文件中的函数,PackedSupervisedDatasetProcessor中,model_inputs["position_ids"].append(list(range(len(input_ids))))这么生成position id 是合理的吗?不同样本packing到一起后position要做特殊处理吧?

@HaoshengZou
Copy link
Author

在src/llamafactory/data/processor/supervised.py 文件中的函数,PackedSupervisedDatasetProcessor中,model_inputs["position_ids"].append(list(range(len(input_ids))))这么生成position id 是合理的吗?不同样本packing到一起后position要做特殊处理吧?

@burkun 你指哪里?是llama-factory本身的代码还是我们增加的部分?

@burkun
Copy link

burkun commented Apr 22, 2025

在src/llamafactory/data/processor/supervised.py 文件中的函数,PackedSupervisedDatasetProcessor中,model_inputs["position_ids"].append(list(range(len(input_ids))))这么生成position id 是合理的吗?不同样本packing到一起后position要做特殊处理吧?

@burkun 你指哪里?是llama-factory本身的代码还是我们增加的部分?

现象描述:在开启packing的序列并行中,会出现loss 有频繁尖刺的情况,这个时候grad norm也比较高;同样的数据集,关闭packing就没有这个现象了。
image
同时,model_inputs["position_ids"].append(list(range(len(input_ids)))) 在PackedSupervisedDatasetProcessor中直接用时问题的,因为for循环中并不存在这个变量,我尝试修复后,loss尖刺问题没有明显改变。
model_inputs["position_ids"].append(list(range(len(input_ids))))
image

@HaoshengZou
Copy link
Author

@burkun 用的是我们的repo吗?我们训过很多SP+packing的版本,印象中没有碰到过这个问题。
TODO注释是llama-factory本来就有的
SP把position_ids加进去是必须的,因为切分序列后需要标记原始position。RoPE本身是相对位置编码,所以packing的时候不需要reset position id

@burkun
Copy link

burkun commented May 8, 2025

@burkun 用的是我们的repo吗?我们训过很多SP+packing的版本,印象中没有碰到过这个问题。 TODO注释是llama-factory本来就有的 SP把position_ids加进去是必须的,因为切分序列后需要标记原始position。RoPE本身是相对位置编码,所以packing的时候不需要reset position id

TODO是本来就有的。我这几天不用这个pull request, 而是换成https://github.com/Qihoo360/360-LLaMA-Factory 这里的版本,同样的配置和数据,就没有问题了。可能是这个PR跟360-LLaMA-Factory原本的没对齐

@HaoshengZou
Copy link
Author

@burkun 是的,这个PR是专门为了跟上llama-factory主仓更新,可能测试不够充分,保持更新也略微困难。一般可以先用https://github.com/Qihoo360/360-LLaMA-Factory 的默认分支

@Zhaoyi-Yan
Copy link

有更新吗?感觉挺想用这个的。我基于原版llamafactory写了一些改动,如果是又切换到360的llamafactory版本,还是有点麻烦的。所以还是问问这个大概啥时候能merge呢,期待。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants