请问为什么奖励模型[reward trainer]使用AutoModelForCausalLMWithValueHead而非AutoModelForSequenceClassification #6455
luoqishuai
started this conversation in
General
Replies: 1 comment
-
没有特殊逻辑,在后续更新里可能会换掉 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
@hiyouga 因为看到trl官方给的示例是AutoModelForSequenceClassification[https://github.com/huggingface/trl].
也没有搜到相关的知识点
请问大佬,使用AutoModelForCausalLMWithValueHead是有什么特殊逻辑在里面吗?
Beta Was this translation helpful? Give feedback.
All reactions