add Sequence Parallelism (reopened #6506) #7338

HaoshengZou · 2025-03-17T07:20:35Z

What does this PR do?

add Sequence Parallelism (reopened #6506)

Before submitting

[√] Did you read the contributor guideline?
[√] Did you write any new necessary tests?

hiyouga · 2025-03-17T07:37:42Z

Nice work, we expect to add this feature to LlamaFactory v1.0

xiaosu-zhu · 2025-03-21T06:14:17Z

#7335 Has a new implementation with Deepspeed Ulysses, please consider it as an alternative solution.

HaoshengZou · 2025-03-21T11:32:04Z

@xiaosu-zhu Thanks for pointing it out!

Deepspeed Ulysses is always on our ToDo list and we have already integrated it recently and now pushed here, supporting --sequence_parallel_mode zigzag-ring / ulysses.
Correctness is thoroughly verified and training speed tested similarly here.
Our Ulysses integration is based on yunchang for now but could be stripped with core functions adapted and source-acknowledged.

This PR reopened #6506, which was opened on Jan.2 and accidentally closed last week #6506 (comment).

#7335 may represent alternative nice ways to split SP data and monkey patching. Still, to get SP 100% correct requires much more detailed work in dealing with padding, labels and loss computation etc. (as we've discussed Swift here), followed by thorough testing.

We are confident that our implementation is 100% correct up to numerical differences inherent to SP, and our implementation bears close-to-minimal modular code change to original LLaMA-Factory.

zhijie-reallm · 2025-04-01T14:34:12Z

does it support DPO?

HaoshengZou · 2025-04-01T14:54:47Z

@zhijie-reallm yeah

zhijie-reallm · 2025-04-03T08:58:04Z

does it support liger_kernel?

HaoshengZou · 2025-04-03T09:24:31Z

@zhijie-reallm SP is not compatible with liger_kernel right now, but we are working on it.

burkun · 2025-04-17T02:45:52Z

@zhijie-reallm SP is not compatible with liger_kernel right now, but we are working on it.
use_unsloth_gc: true enable_liger_kernel: true packing: true
can support those features?

burkun · 2025-04-21T12:03:17Z

在src/llamafactory/data/processor/supervised.py
文件中的函数，PackedSupervisedDatasetProcessor中，model_inputs["position_ids"].append(list(range(len(input_ids))))这么生成position id 是合理的吗？不同样本packing到一起后position要做特殊处理吧？

HaoshengZou · 2025-04-21T15:04:06Z

在src/llamafactory/data/processor/supervised.py 文件中的函数，PackedSupervisedDatasetProcessor中，model_inputs["position_ids"].append(list(range(len(input_ids))))这么生成position id 是合理的吗？不同样本packing到一起后position要做特殊处理吧？

@burkun 你指哪里？是llama-factory本身的代码还是我们增加的部分？

burkun · 2025-04-22T10:45:24Z

在src/llamafactory/data/processor/supervised.py 文件中的函数，PackedSupervisedDatasetProcessor中，model_inputs["position_ids"].append(list(range(len(input_ids))))这么生成position id 是合理的吗？不同样本packing到一起后position要做特殊处理吧？

@burkun 你指哪里？是llama-factory本身的代码还是我们增加的部分？

现象描述：在开启packing的序列并行中，会出现loss 有频繁尖刺的情况，这个时候grad norm也比较高；同样的数据集，关闭packing就没有这个现象了。

同时，model_inputs["position_ids"].append(list(range(len(input_ids)))) 在PackedSupervisedDatasetProcessor中直接用时问题的，因为for循环中并不存在这个变量，我尝试修复后，loss尖刺问题没有明显改变。
model_inputs["position_ids"].append(list(range(len(input_ids))))

HaoshengZou · 2025-04-22T11:32:55Z

@burkun 用的是我们的repo吗？我们训过很多SP+packing的版本，印象中没有碰到过这个问题。
TODO注释是llama-factory本来就有的
SP把position_ids加进去是必须的，因为切分序列后需要标记原始position。RoPE本身是相对位置编码，所以packing的时候不需要reset position id

burkun · 2025-05-08T11:25:52Z

@burkun 用的是我们的repo吗？我们训过很多SP+packing的版本，印象中没有碰到过这个问题。 TODO注释是llama-factory本来就有的 SP把position_ids加进去是必须的，因为切分序列后需要标记原始position。RoPE本身是相对位置编码，所以packing的时候不需要reset position id

TODO是本来就有的。我这几天不用这个pull request, 而是换成https://github.com/Qihoo360/360-LLaMA-Factory 这里的版本，同样的配置和数据，就没有问题了。可能是这个PR跟360-LLaMA-Factory原本的没对齐

HaoshengZou · 2025-05-08T11:31:14Z

@burkun 是的，这个PR是专门为了跟上llama-factory主仓更新，可能测试不够充分，保持更新也略微困难。一般可以先用https://github.com/Qihoo360/360-LLaMA-Factory 的默认分支

Zhaoyi-Yan · 2025-05-26T06:18:44Z

有更新吗？感觉挺想用这个的。我基于原版llamafactory写了一些改动，如果是又切换到360的llamafactory版本，还是有点麻烦的。所以还是问问这个大概啥时候能merge呢，期待。

merge llamafactory newest version with sp

1bf1ef7

hiyouga added the pending This problem is yet to be addressed label Mar 17, 2025

xiaosu-zhu mentioned this pull request Mar 20, 2025

Integrates DeepSpeed-Ulysses sequence parallel, enabling efficient training of large language models with ultra-long sequences. #7335

Open

1 task

gom168 added 2 commits March 21, 2025 18:48

add ulysses implementation

ea2bb80

modify setup.py for ulysses

a4d25a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add Sequence Parallelism (reopened #6506) #7338

add Sequence Parallelism (reopened #6506) #7338

HaoshengZou commented Mar 17, 2025

Uh oh!

hiyouga commented Mar 17, 2025

Uh oh!

xiaosu-zhu commented Mar 21, 2025

Uh oh!

HaoshengZou commented Mar 21, 2025

Uh oh!

zhijie-reallm commented Apr 1, 2025

Uh oh!

HaoshengZou commented Apr 1, 2025

Uh oh!

zhijie-reallm commented Apr 3, 2025

Uh oh!

HaoshengZou commented Apr 3, 2025

Uh oh!

burkun commented Apr 17, 2025

Uh oh!

burkun commented Apr 21, 2025

Uh oh!

HaoshengZou commented Apr 21, 2025

Uh oh!

burkun commented Apr 22, 2025

Uh oh!

HaoshengZou commented Apr 22, 2025

Uh oh!

burkun commented May 8, 2025 •

edited

Loading

Uh oh!

HaoshengZou commented May 8, 2025

Uh oh!

Zhaoyi-Yan commented May 26, 2025

Uh oh!

Uh oh!

add Sequence Parallelism (reopened #6506) #7338

Are you sure you want to change the base?

add Sequence Parallelism (reopened #6506) #7338

Conversation

HaoshengZou commented Mar 17, 2025

What does this PR do?

Before submitting

Uh oh!

hiyouga commented Mar 17, 2025

Uh oh!

xiaosu-zhu commented Mar 21, 2025

Uh oh!

HaoshengZou commented Mar 21, 2025

Uh oh!

zhijie-reallm commented Apr 1, 2025

Uh oh!

HaoshengZou commented Apr 1, 2025

Uh oh!

zhijie-reallm commented Apr 3, 2025

Uh oh!

HaoshengZou commented Apr 3, 2025

Uh oh!

burkun commented Apr 17, 2025

Uh oh!

burkun commented Apr 21, 2025

Uh oh!

HaoshengZou commented Apr 21, 2025

Uh oh!

burkun commented Apr 22, 2025

Uh oh!

HaoshengZou commented Apr 22, 2025

Uh oh!

burkun commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HaoshengZou commented May 8, 2025

Uh oh!

Zhaoyi-Yan commented May 26, 2025

Uh oh!

Uh oh!

burkun commented May 8, 2025 •

edited

Loading