Schwertlilien
As a recoder: notes and ideas.

Mon Nov 24 2025 00:00:00 GMT+0800 (中國標準時間)

数据集的下一步工作的思考:

因为是分割数据集,我觉得后续的工作也是根据分割的标注去得到一些输出。之前也有说过,从方法上考虑是做下游任务:

  • 营养评估:考虑分割定位边缘更加精准,探索结合3D重建技术,从二维图像中估算菜品的实际体积。通过引入密度、食材成分等先验知识,构建从视觉信息到营养成分(如卡路里、碳水化合物、蛋白质、脂肪等)的量化评估模型。
  • 视觉VQA:对图片中存在的食材进行分析/计数等,接入大模型?可以辅佐营养评估得到更准确的结果。分割模型只是用于识别精准度的提高。融合分割输出与大语言模型(LLM),实现对菜品图像中食材的细粒度理解与交互式问答。例如,系统可回答“图中包含哪些食材?”“土豆有多少块?”“荤素比例如何?”等问题,从而辅助营养评估任务,增强系统的可解释性与实用性。
  • 图像生成与检测:美化照片/文生图/区分生成图与真实图,可以成为数据引擎,动态补充数据。

从数据集上考虑:

  • 扩充数据集:形成一个中餐食用场景全覆盖/大部分覆盖、菜品大部分覆盖、食材大部分覆盖的全面的分割数据集,可以用于微调大模型。

综合考虑,二者结合,全面的大规模数据集+食品专用大模型可以支撑下游任务,得到更好的结果。(相较于目前的通用大模型)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
graph TD
%% 样式设置
classDef hsi fill:#e3f2fd,stroke:#1565c0,stroke-width:2px;
classDef other fill:#fff3e0,stroke:#e65100,stroke-width:2px;
classDef merge fill:#e8f5e9,stroke:#2e7d32,stroke-width:2px;

subgraph "区域 1: HSI 分支 (图中右上方的长链)"
H1[hsi_encoder.spectral_conv.0]
H2[hsi_encoder.spectral_conv.4]
H3[hsi_encoder.spectral_conv.12]
end

subgraph "区域 2: 其他分支 (图中部的两侧)"
R[rgb_encoder]
F[fluo_proj]
end

subgraph "区域 3: 融合 (图中部的节点)"
Cat((CatBackward))
Fusion[fusion.weight]
end

subgraph "区域 4: 回归头 (图的最底端)"
Reg1[regressor.0]
Reg2[regressor.4]
Reg3[regressor.8]
Out((最终输出))
end

H1 --> H2 --> H3 --> Cat:::hsi
R --> Cat:::other
F --> Cat:::other

Cat --> Fusion --> Reg1 --> Reg2 --> Reg3 --> Out:::merge

Trainv9:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
  Loss: 0.0152
MAE: 152280.8594, RMSE: 819617.6535
R²: 0.2345, MAPE: 30.21%
Learning rate: 0.000049

[Epoch 67/200] Testing:
Loss: 0.0237
MAE: 164034.4688, RMSE: 1002628.9268
R²: 0.1086, MAPE: 25.57%
Epoch time: 198.11s
[Epoch end] Allocated: 90.75 MB, Reserved: 2322.00 MB
--------------------------------------------------------------------------------
Epoch 68/200: 100%|█████████████████████████████████████████| 177/177 [02:37<00:00, 1.13it/s, train_loss=0.00591]
Evaluating: 100%|███████████████████████████████████████████████████████████████| 178/178 [00:41<00:00, 4.29it/s]

[Epoch 68/200] Training:
Loss: 0.0158
MAE: 157461.2188, RMSE: 836186.2535
R²: 0.2032, MAPE: 30.45%
Learning rate: 0.000049

[Epoch 68/200] Testing:
Loss: 0.0184
MAE: 152916.9688, RMSE: 982551.3786
R²: 0.1440, MAPE: 23.75%
Epoch time: 198.83s
[Epoch end] Allocated: 90.75 MB, Reserved: 2322.00 MB
--------------------------------------------------------------------------------
Epoch 69/200: 100%|██████████████████████████████████████████| 177/177 [02:39<00:00, 1.11it/s, train_loss=0.0272]
Evaluating: 100%|███████████████████████████████████████████████████████████████| 178/178 [00:40<00:00, 4.40it/s]

[Epoch 69/200] Training:
Loss: 0.0158
MAE: 162229.7500, RMSE: 814155.9914
R²: 0.2446, MAPE: 30.78%
Learning rate: 0.000048

[Epoch 69/200] Testing:
Loss: 0.0185
MAE: 149214.3750, RMSE: 975006.2056
R²: 0.1571, MAPE: 33.34%
Epoch time: 200.45s
[Epoch end] Allocated: 90.75 MB, Reserved: 2322.00 MB
--------------------------------------------------------------------------------
Epoch 70/200: 100%|██████████████████████████████████████████| 177/177 [02:36<00:00, 1.13it/s, train_loss=0.0291]
Evaluating: 100%|███████████████████████████████████████████████████████████████| 178/178 [00:40<00:00, 4.35it/s]

[Epoch 70/200] Training:
Loss: 0.0149
MAE: 150820.3438, RMSE: 835482.0722
R²: 0.2045, MAPE: 29.44%
Learning rate: 0.000048

[Epoch 70/200] Testing:
Loss: 0.0171
MAE: 155585.0156, RMSE: 978441.2433
R²: 0.1511, MAPE: 26.28%
Epoch time: 197.00s
[Epoch end] Allocated: 90.75 MB, Reserved: 2322.00 MB
--------------------------------------------------------------------------------
Epoch 71/200: 100%|███████████████████████████████████████████| 177/177 [02:37<00:00, 1.13it/s, train_loss=0.013]
Evaluating: 100%|███████████████████████████████████████████████████████████████| 178/178 [00:40<00:00, 4.39it/s]

[Epoch 71/200] Training:
Loss: 0.0159
MAE: 155937.4844, RMSE: 849066.0192
R²: 0.1785, MAPE: 29.84%
Learning rate: 0.000048

[Epoch 71/200] Testing:
Loss: 0.0174
MAE: 158067.4062, RMSE: 971696.0136
R²: 0.1628, MAPE: 27.29%
Epoch time: 197.61s
[Epoch end] Allocated: 90.75 MB, Reserved: 2322.00 MB
--------------------------------------------------------------------------------
Epoch 72/200: 100%|█████████████████████████████████████████| 177/177 [02:39<00:00, 1.11it/s, train_loss=0.00188]
Evaluating: 100%|███████████████████████████████████████████████████████████████| 178/178 [00:41<00:00, 4.29it/s]

[Epoch 72/200] Training:
Loss: 0.0139
MAE: 150853.8281, RMSE: 817690.9701
R²: 0.2381, MAPE: 28.62%
Learning rate: 0.000047

[Epoch 72/200] Testing:
Loss: 0.0178
MAE: 150282.4062, RMSE: 982290.0139
R²: 0.1444, MAPE: 23.90%
Epoch time: 200.96s
[Epoch end] Allocated: 90.75 MB, Reserved: 2322.00 MB
--------------------------------------------------------------------------------
Epoch 73/200: 100%|█████████████████████████████████████████| 177/177 [02:37<00:00, 1.13it/s, train_loss=0.00202]
Evaluating: 100%|███████████████████████████████████████████████████████████████| 178/178 [00:41<00:00, 4.26it/s]

[Epoch 73/200] Training:
Loss: 0.0145
MAE: 151134.2656, RMSE: 797966.7833
R²: 0.2744, MAPE: 29.58%
Learning rate: 0.000047

[Epoch 73/200] Testing:
Loss: 0.0185
MAE: 151976.2031, RMSE: 981240.0188
R²: 0.1462, MAPE: 26.04%
Epoch time: 199.02s
[Epoch end] Allocated: 90.75 MB, Reserved: 2322.00 MB
搜索
匹配结果数:
未搜索到匹配的文章。