2023.3.28

U-GAT-IT

Unsupervised Generative Attentional networks

with adaptive layer-instance normalization(AdaLIN) for Image-to-Image Translation

The attention module:模型关注区分source domains and target domain(attention map)
New AdaLIN：自适应层实例归一化函数——control the amount of change in shape and texture(依据不同的数据集改变函数的超参)

G:attention map—将注意力集中在两个领域(source domain&target domain)之间的特殊区别上。
D:attention map—helps fine-tuning(微调) BY 关注 target domain内fake图像和real图像之间的区别。

==归一化函数(normalization function)的选取影响the quality of the transformed results for various datasets(很多数据集的转换结果质量，这些数据集有着不同数量的在shape和texture上的改变)==

<受BIN批处理实例归一化的启发>，提出AdaLIN(自适应层实例归一化)

无监督生成网络

Goal：训练函数$G_{s\rightarrow t }$：from source domain $X_s$ to target domain $X_t$ (==unpaired sample==)

Frameworks：

Generators：$G_{s\rightarrow t}$，$G_{t\rightarrow s}$
Discriminators：$D_s$，$D_t$

Model($G_{s\rightarrow t}$)

$x\in \{X_s,X_t\}$：表示一个来自source & target domain的sample。

$G_{s\rightarrow t}=E_s+G_t+\eta_s$：encoder、decoder、auxiliary classifier(辅助分类器)。

$\eta_s(x)$：$x$来自$X_s$(source domain)的可能性。

$E_s^k(x)$：encoder的第k个activation map(激活映射)。

$E_s^{K_{ij}}(x)$：点$(i,j)$的值—承接上面$E_s^k(x)$，所以是encoder的第k个激活映射中点$(i,j)$的值。

$w_s^k$：第k个feature map(特征映射)的权值。==使用全局平均池化&全局最大池化（BY CAM）==

$\color{red}i.e.\color{black}\ \eta_s(x)=\sigma(\sum_k w_s^k\sum_{ij}E_s^{k_{ij}}(x))$

$a_s(x)$：domain specific attention feature map( $n$是encoder feature maps的数量 )

$a_s(x)=w_s*E_s(x)=\{w_s^k*E_s^k|1\le k\le n\}$

==Translation Model：$G_{s\rightarrow t}=G_t(a_s(x))$==

AdaLIN

其参数$\gamma$和$\beta$将在一个来自attention map的全连接层进行动态计算。

$AdaLIN(\alpha,\gamma,\beta)=\gamma\ \cdot \ (\rho\ \cdot \hat{a_I}+(1-\rho)\ \cdot\ \hat{a_L})+\beta\\ \hat{a_I}=\frac{a-\mu_I}{\sqrt{\sigma_I^2+\epsilon}}\\ \hat{a_L}=\frac{a-\mu_L}{\sqrt{\sigma_L^2+\epsilon}}\\ \rho\leftarrow clip_{[0,1]}(\rho-\tau \Delta \rho),\rho\in[0,1]$

$\mu_I,\mu_L$：channel-wise mean(通道), layer-wise mean(层)
$\sigma_I,\sigma_L$：channel-wise & layer-wise 标准差(standard deviation)
$\tau$：learning rate(学习速率)
$\Delta \rho$：更新向量的参数(BY 优化器optimizer)

$\rho$的值是在imposing bounds (Step:更新param)时被限制在[0,1]

G调整$\rho$值，$\rho$接近1时instance normalization很重要；$\rho$接近0时LN很重要。

$\rho$值在解码器的剩余块(redidual blocks)中初始化为1，在解码器的上采样块(up-sampling blocks)中初始化为0。

Discriminator

$x\in \{X_t,G_{s\rightarrow t}(X_s)\}$：来自目标域和翻译后的源域的示例。

$D_t=E_{D_t}+C_{D_t}+\eta_{D_t}$：$\eta_{D_t}(x)$ 和$D_t(x)$都被训练to分辨$x$来自$X_t$(target domain)还是$G_{s\rightarrow t}(X_s)$(从source domain生成的由source到target的映射，即fake images)

$\alpha_{D_t}(x)=w_{D_t}*E_{D_t}(x)$

==discriminator model：$D_t(x)=C_{D_t}(\alpha_{D_t}(x))$==

Loss function

包含以下四个损失函数：

Adversarial loss(对抗损失)：将转换后图像的分布与目标图像的分布进行匹配。
Cycle loss(返回原样损失)：to alleviate the mode collapse problem，因此施加周期约束——使得$x$从$X_s\rightarrow X_t$然后再$X_t\rightarrow X_S$能够恢复原样(original domain)。
Identify loss(标志一致约束)：保证输入图像与输出图像的颜色分布式是相似的。Given an image $x ∈ X_t$, after the translation of $x$ using $G_{s→t}$, the image should not change.
CAM loss(可提升损失)：给定一个来自$\{X_s,X_t\}$的sample，$G_{s→t}$ and $D_t$ get to know where they need to improve or what makes the most difference between two domains in the current state.

$L_{lsgan}^{s\to t}=(E_{x\sim X_t}[(D_t(x))^2]+E_{x\sim X_s}[(1-D_t(G_{s\to t}(x)))^2])\\ L_{cycle}^{s\to t}=E_{x\sim X_s}[|x-G_{t\to s}(G_{s\to t}(x))|_1]\\ L_{identity}^{s\to t}=E_{x\sim X_t}[|x-G_{s\to t}(x)|_1]\\ L_{cam}^{s\to t}=-(E_{x\sim X_s}[log(\eta_s(x))]+E_{x\sim X_t}[log(1-\eta_s(x))])\\ L_{cam}^{D_t}=E_{x\sim X_t}[(\eta_{D_t}(x))^2]+E_{x\sim X_s}[1-\eta_{D_t}(G_{s\to t}(x))^2]\\$

然后对它们联合起来一起训练：

$\min_{G_{s\to t},G_{t\to s},\eta_s,\eta_t}\ \max_{D_s,D_t,\eta D_s,\eta D_t}\lambda_1 L_{lsgan}+\lambda_2 L_{cycle}+\lambda_3 L_{identity}+\lambda_4 L_{cam}\\ L_{lsgan}=L_{lsgan}^{s\to t}+L_{lsgan}^{t\to s}\ \ (same\ as\ L{cycle},L_{identity},L_{cam})\\ \lambda_1=1,\lambda_2=10,\lambda_3=100,\lambda_4=1000\\$

luogugu

P4017

给你一个食物网，你要求出这个食物网中最大食物链的数量。

（这里的“最大食物链”，指的是生物学意义上的食物链，即最左端是不会捕食其他生物的生产者，最右端是不会被其他生物捕食的消费者。）

Delia 非常急，所以你只有 $1$ 秒的时间。

由于这个结果可能过大，你只需要输出总数模上 $80112002$ 的结果。

P1177-快排

#include <bits/stdc++.h>
using namespace std;

int n, num;
vector<int> vec;

int Partition(int l, int h) {
	int pviot = vec[l];
	while (l < h) {
		while (l < h && pviot <= vec[h]) h--;
		swap(vec[l], vec[h]);
		while (l < h && pviot >= vec[l]) l++;
		swap(vec[l], vec[h]);
	}
	return l;//返回中间：l=h,两边有序，确实不稳定
}

void QuickSort(int l, int h) {
	if (l < h) {
		int pviot = Partition(l, h);
		QuickSort(l, pviot - 1);
		QuickSort(pviot + 1, h);
	}
}

int main() {
	cin >> n;
	for (int i = 1; i <= n; i++) {
		cin >> num;
		vec.push_back(num);
	}

	QuickSort(0, n - 1);
	for (int i = 0; i < n; i++)
		cout << vec[i] << " ";
}

乐，本来由三个点T了。结果开完O2就A了，6.

真実に：いろいろのways。所以换一种板子：

#include<iostream>
using namespace std;
int n,a[1000001];
void qsort(int l,int r){//应用二分思想
    int mid=a[(l+r)/2];//中间数
    int i=l,j=r;
    do{
        while(a[i]<mid) i++;//查找左半部分比中间数大的数
        while(a[j]>mid) j--;//查找右半部分比中间数小的数
        if(i<=j){//如果有一组不满足排序条件（左小右大）的数
            swap(a[i],a[j]);//交换
            i++;//--|感觉可以不要
            j--;//--|
        }
    }while(i<=j);//这里注意要有=
    if(l<j) qsort(l,j);//递归搜索左半部分
    if(i<r) qsort(i,r);//递归搜索右半部分
}

int main(){
    cin>>n;
    for(int i=1;i<=n;i++) cin>>a[i];
    qsort(1,n);
    for(int i=1;i<=n;i++) cout<<a[i]<<" ";
}

话说Bucket这种排序我是没有接触过么？

那倒不是，只是又忘记了：类似Hash表做vector，然后每个vec[i]都是一个bucket.