Skip to content

如何去打断点调试SFT, DPO流程 #5337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ReycoLi opened this issue Sep 3, 2024 · 13 comments
Closed

如何去打断点调试SFT, DPO流程 #5337

ReycoLi opened this issue Sep 3, 2024 · 13 comments
Labels
good first issue Good for newcomers solved This problem has been already solved

Comments

@ReycoLi
Copy link

ReycoLi commented Sep 3, 2024

大佬你好,
我觉得所有LLM入门的人都会有的问题, 就是如何去在vscode里面打断点调试SFT和DPO的流程
我整了一整天也没搞通, 所以特来求助.

首先是llamafactory-cli这个命令, 我看setup.py里面是运行的llamafactory.cli:main, 所以我就直接从这个cli.py文件的main函数开始
但是一直得到相对路径解析错误, 所有使用.进行的导入都会报错.

所以我想知道有没有一个简易的debug教程或者流程, 对新手来说很有帮助.

如果问题过于新手, 还请谅解.

@github-actions github-actions bot added the pending This problem is yet to be addressed label Sep 3, 2024
@flyinghu123
Copy link

llamafactory-cli对应setup.py中的llamafactory.cli:main
image
在查看llamafactory.cli:main
image
可以看到实际上运行是的torchrun,因此调试torchrun就行
例如在vscode中launch.json可以修改如下
image
和py脚本调试不同的是将program修改为module

@ReycoLi
Copy link
Author

ReycoLi commented Sep 3, 2024

@flyinghu123 感谢你这么快的回复! 受教了!

不介意的话, 还想请问下, 那如果是APi(也就是run_api()), 不是train的话, module这块应该怎么去设置呢? torch run写成module: torch.distributed.run这个是怎么来的, 以后具体的module该怎么设置
image
image

@flyinghu123
Copy link

这个没有用到subprocess,那你就和普通py脚本调试应该就行,你可以新建一个debug.py然后输入下面内容
image
然后就在launch.json program中填debug.py然后调试debug.py文件就行

@hiyouga hiyouga added the good first issue Good for newcomers label Sep 3, 2024
@ReycoLi
Copy link
Author

ReycoLi commented Sep 4, 2024

@flyinghu123 感谢你的帮助!

@ReycoLi ReycoLi closed this as completed Sep 4, 2024
@YunweiDai
Copy link

想问下用llamafactory-cli启动的话是不是改源码就没用了,我试着改了下src/llamafactory/train/tuner.py,发现没有任何变化,加个print都不行

@hiyouga
Copy link
Owner

hiyouga commented Sep 11, 2024

@YunweiDai pip install -e .

@w-zhih
Copy link

w-zhih commented Sep 12, 2024

llamafactory-cli对应setup.py中的llamafactory.cli:main image 在查看llamafactory.cli:main image 可以看到实际上运行是的torchrun,因此调试torchrun就行 例如在vscode中launch.json可以修改如下 image 和py脚本调试不同的是将program修改为module

请问当我需要使用deepspeed时,按照这个脚本运行它只会加载模型,不会使用deepspeed,是什么原因呢?我需要怎么做才可以像直接运行命令行那样使用deepspeed?

@flyinghu123
Copy link

llamafactory-cli对应setup.py中的llamafactory.cli:main image 在查看llamafactory.cli:main image 可以看到实际上运行是的torchrun,因此调试torchrun就行 例如在vscode中launch.json可以修改如下 image 和py脚本调试不同的是将program修改为module

请问当我需要使用deepspeed时,按照这个脚本运行它只会加载模型,不会使用deepspeed,是什么原因呢?我需要怎么做才可以像直接运行命令行那样使用deepspeed?

不理解你什么意思,如果你是指不使用torchrun,而是使用deepspeed启动,你可以尝试将launch.json中的program改成deepspeed程序位置(which deepspeed)

@verigle
Copy link

verigle commented Sep 17, 2024

在transformers 中的断点不生效如何解决呢?

@flyinghu123
Copy link

在transformers 中的断点不生效如何解决呢?

launch.json中justMyCode: false

@hiyouga hiyouga added solved This problem has been already solved and removed pending This problem is yet to be addressed labels Nov 2, 2024
@karenzimeng
Copy link

想问下用llamafactory-cli启动的话是不是改源码就没用了,我试着改了下src/llamafactory/train/tuner.py,发现没有任何变化,加个print都不行

我好像也遇到这个问题,要怎么解决呢?

@zhicong01
Copy link

各位大佬,想请教一下,假如我希望使用pycharm调试以下指令:
llamafactory-cli train "xx.yaml"
该怎么调试呢,非常感谢!!

Repository owner deleted a comment from flyinghu123 Apr 18, 2025
@JH95-ai
Copy link

JH95-ai commented May 22, 2025

@zhicong01 你好 ,我也想用pycharm远程调试,请问你解决了么

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers solved This problem has been already solved
Projects
None yet
Development

No branches or pull requests

9 participants