在已有模型基础上加新的层，并重新定义loss function进行训练 #8208

Felixvillas · 2025-05-29T03:09:33Z

Reminder

I have read the above rules and searched the existing issues.

System Info

llamafactory version: 0.9.2.dev0
Platform: Linux-4.18.0-425.10.1.el8_7.x86_64-x86_64-with-glibc2.28
Python version: 3.10.15
PyTorch version: 2.5.1 (GPU)
Transformers version: 4.52.3
Datasets version: 3.3.0
Accelerate version: 1.4.0
PEFT version: 0.12.0
TRL version: 0.14.0
GPU type: NVIDIA A100-SXM4-80GB
DeepSpeed version: 0.15.4
Bitsandbytes version: 0.45.0
vLLM version: 0.7.3

Reproduction

请问能否在现有预训练模型基础之上添加新的网络层，并重新定义loss function进行训练？

我想在Qwen-VL的基础之上添加新的网络层，这个新的网络层输入是last hidden state，输出是我想预测的未来某个状态。我已经在transformers的models/qwen2_5_vl/modeling_qwen2_5_vl.py中定义了我的新模型：

class MyQwen2_5_VLModel(Qwen2_5_VLModel):
    def __init__(self, config):
        super(MyQwen2_5_VLModel, self).__init__(config)
        self.image_predictor = nn.Sequential(
            nn.Linear(3584, 1024),
            nn.ReLU(),
            nn.Linear(1024, 2048),
            nn.ReLU(),
            nn.Linear(2048, 3 * 256 * 256),
            nn.Sigmoid()
        )
        # self.loss = nn.CrossEntropyLoss(ignore_index=-100)
        
    def forward(
        self,
        input_ids: torch.LongTensor = None,
        attention_mask: Optional[torch.Tensor] = None,
        position_ids: Optional[torch.LongTensor] = None,
        past_key_values: Optional[List[torch.FloatTensor]] = None,
        inputs_embeds: Optional[torch.FloatTensor] = None,
        use_cache: Optional[bool] = None,
        output_attentions: Optional[bool] = None,
        output_hidden_states: Optional[bool] = None,
        return_dict: Optional[bool] = None,
        pixel_values: Optional[torch.Tensor] = None,
        pixel_values_videos: Optional[torch.FloatTensor] = None,
        image_grid_thw: Optional[torch.LongTensor] = None,
        video_grid_thw: Optional[torch.LongTensor] = None,
        rope_deltas: Optional[torch.LongTensor] = None,
        cache_position: Optional[torch.LongTensor] = None,
        second_per_grid_ts: Optional[torch.Tensor] = None,
    ) -> Union[Tuple, Qwen2_5_VLModelOutputWithPast]:
    
        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
        
        output = super().forward(
            input_ids,
            attention_mask,
            position_ids,
            past_key_values,
            inputs_embeds,
            use_cache,
            output_attentions,
            output_hidden_states,
            return_dict,
            pixel_values,
            pixel_values_videos,
            image_grid_thw,
            video_grid_thw,
            rope_deltas,
            cache_position,
            second_per_grid_ts,
        )
        
        assert len(output.last_hidden_state.shape) == 3
        last_token_last_hidden_state = output.last_hidden_state[:, -1, :]  # hidden states
        prediction_image = self.image_predictor(last_token_last_hidden_state)
        

        return MyQwen2_5_VLModelOutputWithPast(
            last_hidden_state=output.last_hidden_state,
            past_key_values=output.past_key_values,
            hidden_states=output.hidden_states,
            attentions=output.attentions,
            rope_deltas=output.rope_deltas,
            prediction_image=prediction_image
        )
    
class MyQwen2_5_VLModelOutputWithPast(ModelOutput):
    
    last_hidden_state: torch.FloatTensor = None
    past_key_values: Optional[List[torch.FloatTensor]] = None
    hidden_states: Optional[Tuple[torch.FloatTensor]] = None
    attentions: Optional[Tuple[torch.FloatTensor]] = None
    rope_deltas: Optional[torch.LongTensor] = None
    prediction_image: Optional[Tuple[torch.FloatTensor]] = None


__all__ = ["Qwen2_5_VLForConditionalGeneration", "Qwen2_5_VLModel", "Qwen2_5_VLPreTrainedModel", "Qwen2_5_VLTextModel", "MyQwen2_5_VLModel"]

我该如何在训练时指定调用该模型，而不是Qwen2_5_VLModel？

我想我可以参考#8084 和 #3843 自定义损失函数。

希望您能解惑，感谢。

Others

No response

The text was updated successfully, but these errors were encountered:

Kuangdd01 · 2025-05-29T05:43:31Z

你可以在这段代码给你自定义的模型加一个hack操作，if model_type==xxx load_your_custom_model()

LLaMA-Factory/src/llamafactory/model/loader.py

Lines 143 to 169 in a4048b7

    
           if model is None and not lazy_load: 
        
               init_kwargs["config"] = config 
        
               init_kwargs["pretrained_model_name_or_path"] = model_args.model_name_or_path 
        
               if model_args.mixture_of_depths == "load": 
        
                   model = load_mod_pretrained_model(**init_kwargs) 
        
               else: 
        
                   if type(config) in AutoModelForVision2Seq._model_mapping.keys():  # image-text 
        
                       load_class = AutoModelForVision2Seq 
        
                   elif ( 
        
                       is_transformers_version_greater_than("4.46.0") 
        
                       and type(config) in AutoModelForImageTextToText._model_mapping.keys() 
        
                   ):  # image-text 
        
                       load_class = AutoModelForImageTextToText 
        
                   elif type(config) in AutoModelForSeq2SeqLM._model_mapping.keys():  # audio-text 
        
                       load_class = AutoModelForSeq2SeqLM 
        
                   elif type(config) in AutoModelForTextToWaveform._model_mapping.keys():  # audio hack for qwen2_5_omni 
        
                       load_class = AutoModelForTextToWaveform 
        
                   else: 
        
                       load_class = AutoModelForCausalLM 
        
                   if model_args.train_from_scratch: 
        
                       model = load_class.from_config(config, trust_remote_code=model_args.trust_remote_code) 
        
                   else: 
        
                       model = load_class.from_pretrained(**init_kwargs) 
        
                       if getattr(model.config, "model_type", None) == "qwen2_5_omni": 
        
                           model = model.thinker  # use part of Omni model

Felixvillas added bug Something isn't working pending This problem is yet to be addressed labels May 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

在已有模型基础上加新的层，并重新定义loss function进行训练 #8208

在已有模型基础上加新的层，并重新定义loss function进行训练 #8208

Felixvillas commented May 29, 2025 •

edited

Loading

Kuangdd01 commented May 29, 2025

Uh oh!

在已有模型基础上加新的层，并重新定义loss function进行训练 #8208

在已有模型基础上加新的层，并重新定义loss function进行训练 #8208

Comments

Felixvillas commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reminder

System Info

Reproduction

Others

Kuangdd01 commented May 29, 2025

Uh oh!

Felixvillas commented May 29, 2025 •

edited

Loading