2025-3-8-一些相关代码存储

放置一些代码：有关DETR，主要是position encoding+prediction head。

其他内容无。

Carion, Nicolas, et al. "End-to-end object detection with transformers." European conference on computer vision. Cham: Springer International Publishing, 2020.

空间位置编码重构：

# 将文本的绝对位置编码改为图像相对坐标编码
class ImagePositionEncoding(nn.Module):
    def __init__(self, d_model, h, w):
        super().__init__()
        self.row_embed = nn.Embedding(h, d_model//2)  # 行编码
        self.col_embed = nn.Embedding(w, d_model//2)  # 列编码
        
    def forward(self, x):
        B, C, H, W = x.shape
        row = self.row_embed(torch.arange(H).to(x.device))  # (H, d/2)
        col = self.col_embed(torch.arange(W).to(x.device))  # (W, d/2)
        pos = torch.cat([row.unsqueeze(1).repeat(1,W,1), 
                       col.unsqueeze(0).repeat(H,1,1)], dim=-1)  # (H,W,d)
        return x + pos.permute(2,0,1).unsqueeze(0)  # 广播到BCHW

多目标奖励函数

动作空间重构：

将检测框调整建模为连续动作空间，采用DDPG或SAC算法
定义动作向量：[Δx, Δy, Δw, Δh, cls_score]（归一化到[-1,1]）

def reward_function(pred_boxes, gt_boxes):
    # 定位奖励（基于IoU）
    iou_reward = torch.diag(box_iou(pred_boxes, gt_boxes))  
    
    # 分类奖励（基于置信度校准）
    cls_reward = 1 - F.kl_div(pred_cls_prob.log(), gt_cls_prob, reduction='none')
    
    # 探索奖励（鼓励适度多样性）
    entropy_reward = Categorical(pred_cls_prob).entropy() * 0.1
    
    return iou_reward + cls_reward + entropy_reward

检测头结构融合：

保留DeepSeek的Transformer Encoder提取全局特征
新增适配检测的并行预测头（替换原Decoder）：

class DetectionAdapter(nn.Module):
    def __init__(self, d_model, num_classes):
        super().__init__()
        self.cls_head = nn.Linear(d_model, num_classes)  # 分类头
        self.reg_head = nn.Sequential(  # 回归头
            nn.Linear(d_model, 4),
            nn.Sigmoid()  # 输出归一化坐标
        )
    
    def forward(self, x):
        # x: (B, L, d_model)
        return self.cls_head(x), self.reg_head(x)

此处按DETR来给预测头