-
Notifications
You must be signed in to change notification settings - Fork 6.2k
Error while serving fine-tuned Qwen 2.5 VL model #8147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
same problem while serving fine-tuned qwen2.5 vl 3B model
Before 3B, I've tried serving fine-tune 7B with lora with no error |
Currently, there are some bugs in Transformers 4.52.* when using vLLM to run inference on fine-tuned models. We are working on a fix: huggingface/transformers#38385 As a temporary workaround, you can downgrade Transformers to version 4.51.3 and train again to avoid this issue. |
问题解决了吗,不需要重新训练能推理了吗 |
Uh oh!
There was an error while loading. Please reload this page.
Reminder
System Info
[2025-05-23 13:50:43,655] [INFO] [real_accelerator.py:254:get_accelerator] Setting ds_accelerator to cuda (auto detect)
INFO 05-23 13:50:48 [importing.py:53] Triton module has been replaced with a placeholder.
INFO 05-23 13:50:48 [init.py:239] Automatically detected platform cuda.
llamafactory
version: 0.9.3.dev0Reproduction
I fine-tuned Qwen 2.5 VL 3B Instruct. Then, I tried to deploy it as follows:
API_PORT=8000 llamafactory-cli api examples/inference/qwen2_5vl.yaml infer_backend=vllm vllm_enforce_eager=true
Which gave me an error. I was able to serve the base model using the same command, but not the fine-tuned version.
Here is the error:
Others
No response
The text was updated successfully, but these errors were encountered: