updated phi4 template-update sync-unsloth-bugs #7413
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request introduces the following updates for the Phi‑4 chat template to address issues during fine-tuning and inference:
User Prompt Formatting:
Removed the extra assistant prompt token (<|im_start|>assistant<|im_sep|>) from the user formatter. Now, the user messages are formatted as: <|im_start|>user<|im_sep|>{{content}}<|im_end|>
This change ensures that an assistant prompt is only added when explicitly needed (e.g., via add_generation_prompt=True).
EOS Token Replacement:
Ensured that the EOS token is properly replaced with <|im_end|> by setting replace_eos=True in the template registration.
Dedicated PAD Token:
Modified the fix_special_tokens method so that if no PAD token is defined, a dedicated pad token (<|dummy_85|>) is assigned rather than reusing the EOS token. This prevents infinite generations during fine-tuning.
Jinja Template Override:
Enabled replace_jinja_template=True to guarantee that the custom template is used without default additions.
Motivation:
These updates address observed issues in the inference outputs (unexpected tokens and out-of-context responses) when using VLLM for Phi‑4. By refining the chat template and tokenizer special token settings, the model's prompt structure now aligns with the intended design, leading to more accurate and consistent results.
Changes Made:
Updated the register_template call in the template registration module.
Revised the fix_special_tokens function to use <|dummy_85|> as the pad token.
Tested changes with VLLM to ensure the prompt no longer includes the extra assistant prompt.
reference links
https://huggingface.co/microsoft/phi-4/commit/6fbb3d3bbe726c99b4188087b4deeec1bceac5ae#d2h-782009
https://www.reddit.com/r/MachineLearning/comments/1i23zbo/p_how_i_found_fixed_4_bugs_in_microsofts_phi4/?rdt=63478
https://unsloth.ai/blog/phi4