Skip to content

在已有模型基础上加新的层,并重新定义loss function进行训练 #8208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
Felixvillas opened this issue May 29, 2025 · 1 comment
Open
1 task done
Labels
bug Something isn't working pending This problem is yet to be addressed

Comments

@Felixvillas
Copy link

Felixvillas commented May 29, 2025

Reminder

  • I have read the above rules and searched the existing issues.

System Info

  • llamafactory version: 0.9.2.dev0
  • Platform: Linux-4.18.0-425.10.1.el8_7.x86_64-x86_64-with-glibc2.28
  • Python version: 3.10.15
  • PyTorch version: 2.5.1 (GPU)
  • Transformers version: 4.52.3
  • Datasets version: 3.3.0
  • Accelerate version: 1.4.0
  • PEFT version: 0.12.0
  • TRL version: 0.14.0
  • GPU type: NVIDIA A100-SXM4-80GB
  • DeepSpeed version: 0.15.4
  • Bitsandbytes version: 0.45.0
  • vLLM version: 0.7.3

Reproduction

请问能否在现有预训练模型基础之上添加新的网络层,并重新定义loss function进行训练?

我想在Qwen-VL的基础之上添加新的网络层,这个新的网络层输入是last hidden state,输出是我想预测的未来某个状态。我已经在transformers的models/qwen2_5_vl/modeling_qwen2_5_vl.py中定义了我的新模型:

class MyQwen2_5_VLModel(Qwen2_5_VLModel):
    def __init__(self, config):
        super(MyQwen2_5_VLModel, self).__init__(config)
        self.image_predictor = nn.Sequential(
            nn.Linear(3584, 1024),
            nn.ReLU(),
            nn.Linear(1024, 2048),
            nn.ReLU(),
            nn.Linear(2048, 3 * 256 * 256),
            nn.Sigmoid()
        )
        # self.loss = nn.CrossEntropyLoss(ignore_index=-100)
        
    def forward(
        self,
        input_ids: torch.LongTensor = None,
        attention_mask: Optional[torch.Tensor] = None,
        position_ids: Optional[torch.LongTensor] = None,
        past_key_values: Optional[List[torch.FloatTensor]] = None,
        inputs_embeds: Optional[torch.FloatTensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
        pixel_values: Optional[torch.Tensor] = None,
        pixel_values_videos: Optional[torch.FloatTensor] = None,
        image_grid_thw: Optional[torch.LongTensor] = None,
        video_grid_thw: Optional[torch.LongTensor] = None,
        rope_deltas: Optional[torch.LongTensor] = None,
        cache_position: Optional[torch.LongTensor] = None,
        second_per_grid_ts: Optional[torch.Tensor] = None,
    ) -> Union[Tuple, Qwen2_5_VLModelOutputWithPast]:
    
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        
        output = super().forward(
            input_ids,
            attention_mask,
            position_ids,
            past_key_values,
            inputs_embeds,
            use_cache,
            output_attentions,
            output_hidden_states,
            return_dict,
            pixel_values,
            pixel_values_videos,
            image_grid_thw,
            video_grid_thw,
            rope_deltas,
            cache_position,
            second_per_grid_ts,
        )
        
        assert len(output.last_hidden_state.shape) == 3
        last_token_last_hidden_state = output.last_hidden_state[:, -1, :]  # hidden states
        prediction_image = self.image_predictor(last_token_last_hidden_state)
        

        return MyQwen2_5_VLModelOutputWithPast(
            last_hidden_state=output.last_hidden_state,
            past_key_values=output.past_key_values,
            hidden_states=output.hidden_states,
            attentions=output.attentions,
            rope_deltas=output.rope_deltas,
            prediction_image=prediction_image
        )
    
class MyQwen2_5_VLModelOutputWithPast(ModelOutput):
    
    last_hidden_state: torch.FloatTensor = None
    past_key_values: Optional[List[torch.FloatTensor]] = None
    hidden_states: Optional[Tuple[torch.FloatTensor]] = None
    attentions: Optional[Tuple[torch.FloatTensor]] = None
    rope_deltas: Optional[torch.LongTensor] = None
    prediction_image: Optional[Tuple[torch.FloatTensor]] = None


__all__ = ["Qwen2_5_VLForConditionalGeneration", "Qwen2_5_VLModel", "Qwen2_5_VLPreTrainedModel", "Qwen2_5_VLTextModel", "MyQwen2_5_VLModel"]

我该如何在训练时指定调用该模型,而不是Qwen2_5_VLModel

我想我可以参考#8084#3843 自定义损失函数。

希望您能解惑,感谢。

Others

No response

@Felixvillas Felixvillas added bug Something isn't working pending This problem is yet to be addressed labels May 29, 2025
@Kuangdd01
Copy link
Collaborator

你可以在这段代码给你自定义的模型加一个hack操作,if model_type==xxx load_your_custom_model()

if model is None and not lazy_load:
init_kwargs["config"] = config
init_kwargs["pretrained_model_name_or_path"] = model_args.model_name_or_path
if model_args.mixture_of_depths == "load":
model = load_mod_pretrained_model(**init_kwargs)
else:
if type(config) in AutoModelForVision2Seq._model_mapping.keys(): # image-text
load_class = AutoModelForVision2Seq
elif (
is_transformers_version_greater_than("4.46.0")
and type(config) in AutoModelForImageTextToText._model_mapping.keys()
): # image-text
load_class = AutoModelForImageTextToText
elif type(config) in AutoModelForSeq2SeqLM._model_mapping.keys(): # audio-text
load_class = AutoModelForSeq2SeqLM
elif type(config) in AutoModelForTextToWaveform._model_mapping.keys(): # audio hack for qwen2_5_omni
load_class = AutoModelForTextToWaveform
else:
load_class = AutoModelForCausalLM
if model_args.train_from_scratch:
model = load_class.from_config(config, trust_remote_code=model_args.trust_remote_code)
else:
model = load_class.from_pretrained(**init_kwargs)
if getattr(model.config, "model_type", None) == "qwen2_5_omni":
model = model.thinker # use part of Omni model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

2 participants