有没有简单的办法不shuffle trainning数据集 #1204

XuanRen4470 · 2023-10-17T06:38:22Z

我知道trainner会shuffle，但是我不希望training数据集被shuffle。

histmeisah · 2024-02-20T08:50:10Z

现在有参数配置可以不shuffle 数据集吗,想做一下课程学习

JerryDaHeLian · 2024-03-07T05:57:04Z

同问！

hiyouga · 2024-03-07T06:09:14Z

--streaming --buffer_size 1 不会 shuffle 数据集

iaoxuesheng · 2024-11-19T08:45:20Z

--streaming --buffer_size 1 不会 shuffle 数据集

你好，在设置 --streaming True
--buffer_size 1 \时会报错：Traceback (most recent call last):
File "src/train_bash.py", line 14, in
main()
File "src/train_bash.py", line 5, in main
run_exp()
File "/cephfs/renjinshan/work/LLaMA-Factory-0.7.0/src/llmtuner/train/tuner.py", line 33, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/cephfs/renjinshan/work/LLaMA-Factory-0.7.0/src/llmtuner/train/sft/workflow.py", line 33, in run_sft
dataset = get_dataset(model_args, data_args, training_args, stage="sft", **tokenizer_module)
File "/cephfs/renjinshan/work/LLaMA-Factory-0.7.0/src/llmtuner/data/loader.py", line 176, in get_dataset
print_function(next(iter(dataset)))
File "/cephfs/renjinshan/work/miniconda3/envs/llama_factory/lib/python3.8/site-packages/datasets/iterable_dataset.py", line 1384, in iter
for key, example in ex_iterable:
File "/cephfs/renjinshan/work/miniconda3/envs/llama_factory/lib/python3.8/site-packages/datasets/iterable_dataset.py", line 679, in iter
yield from self._iter()
File "/cephfs/renjinshan/work/miniconda3/envs/llama_factory/lib/python3.8/site-packages/datasets/iterable_dataset.py", line 718, in _iter
transformed_batch.update(self.function(*function_args, **self.fn_kwargs))
File "/cephfs/renjinshan/work/LLaMA-Factory-0.7.0/src/llmtuner/data/preprocess.py", line 79, in preprocess_supervised_dataset
if len(examples["prompt"][i]) % 2 != 1 or len(examples["response"][i]) != 1:
TypeError: object of type 'NoneType' has no len()

hiyouga added the wontfix This will not be worked on label Oct 19, 2023

hiyouga closed this as completed Oct 19, 2023

hiyouga closed this as not planned Won't fix, can't repro, duplicate, stale Oct 19, 2023

hiyouga added solved This problem has been already solved and removed wontfix This will not be worked on labels Mar 7, 2024

hiyouga closed this as completed Mar 7, 2024

hiyouga mentioned this issue Dec 19, 2024

[trainer] support disable shuffling #6388

Merged

2 tasks

hiyouga marked this as a duplicate and then as not a duplicate of #7276 Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

有没有简单的办法不shuffle trainning数据集 #1204

有没有简单的办法不shuffle trainning数据集 #1204

XuanRen4470 commented Oct 17, 2023 •

edited

Loading

histmeisah commented Feb 20, 2024

Uh oh!

JerryDaHeLian commented Mar 7, 2024

Uh oh!

hiyouga commented Mar 7, 2024

Uh oh!

iaoxuesheng commented Nov 19, 2024

Uh oh!

有没有简单的办法不shuffle trainning数据集 #1204

有没有简单的办法不shuffle trainning数据集 #1204

Comments

XuanRen4470 commented Oct 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

histmeisah commented Feb 20, 2024

Uh oh!

JerryDaHeLian commented Mar 7, 2024

Uh oh!

hiyouga commented Mar 7, 2024

Uh oh!

iaoxuesheng commented Nov 19, 2024

Uh oh!

XuanRen4470 commented Oct 17, 2023 •

edited

Loading