代码与探究上线

Unakar · Feb 14, 2024 · 8250b46 · 8250b46
1 parent eadde29
commit 8250b46
Show file tree

Hide file tree

Showing 8 changed files with 1,375 additions and 1 deletion.
diff --git a/docs/cifar10/探究.md b/docs/cifar10/探究.md
@@ -1 +1,281 @@
-# 探究实验
+# 探究实验
+
+# Deep Learning Lab Report - CIFAR-10
+
+## 0 
+batch_size是不是越大越好？
+
+Answer: 不是
+使用不同的batch_size，在测试集上的准确率分别为：
+| batch_size | accuracy |
+| :--------: | :------: |
+|     16     |   65%    |
+|     32     |   63%    |
+|     64     |   56%    |
+|    128     |   47%    |
+
+这说明了batch_size并不是越大越好。
+
+可能的原因是：
+- 大的batch会导致模型更容易收敛到局部最优解，而不是全局最优解；小的batch实际起了退火的作用
+- 大的batch会导致模型更新次数减少，收敛速度变慢
+
+另外，大的batch也可能导致显存容量不足，无法训练
+
+
+
+## 1 
+在训练集那里的transform试一下RandomHorizontalFlip，效果会更好吗？
+
+Answer: 
+我们对数据集做了RandomHorizontalFlip的Augmentation，改动后在测试集上的准确率为56%，与原来相比没有明显变化
+其Training Loss在最后一个epoch平均为1.226，相较原来的版本中的1.209高了一点，这是在预期之中的，因为增加Augmentation之后过拟合的程度降低了，Training Loss会提高
+实际上，由于这个Augmentation是比较弱的，所以效果并不明显
+
+
+
+## 2 
+换一个optimizer, 使效果更好一些
+
+Answer: 我们可以使用Adam来大幅优化训练效率
+```python
+self.optimizer = torch.optim.Adam(self.model.parameters(), lr=Config.LEARNING_RATE)
+```
+用其训练后，收敛速度大幅提高，且最终精度也大幅提高：
+
+对比它与原来的SGD在每个epoch的表现如下：
+| epoch |  SGD  | Adam  |
+| :---: | :---: | :---: |
+|   1   |  16%  |  48%  |
+|   2   |  29%  |  54%  |
+|   3   |  36%  |  57%  |
+|   4   |  40%  |  60%  |
+|   5   |  43%  |  62%  |
+|   6   |  46%  |  62%  |
+|   7   |  48%  |  63%  |
+|   8   |  50%  |  64%  |
+|   9   |  51%  |  63%  |
+|  10   |  53%  |  65%  |
+|  11   |  55%  |  64%  |
+|  12   |  56%  |  65%  |
+
+这样的结果在预期之中，毕竟Adam已经成为广大深度学习任务的标配优化器了，它在二阶上对动量进行修正，使得在loss的landscape较为极端的情况下，仍能保持较好的收敛性。
+
+## 3 
+保持epoch数不变，加一个scheduler，是否能让效果更好一些
+
+Answer: 
+我们使用最简单的StepLR来进行学习率的调整，我们设置的参数令其每过5个epoch学习率减半，初始学习率设为0.002，分别对SGD与Adam测试如下：
+|    Group     | SGD-Accuracy | SGD-Loss | Adam-Accuracy | Adam-Loss |
+| :----------: | :----------: | :------: | :-----------: | :-------: |
+| No-Scheduler |     56%      |  1.211   |      65%      |   0.779   |
+|   Step-LR    |     60%      |  1.068   |      65%      |   0.668   |
+
+可以看到，使用StepLR后，SGD的效果有了一定的提升，而Adam的效果没有明显变化，但两个优化器训出来的Loss都有了明显的下降，这说明了学习率的调整是有效的，的确和调小学习率能促进收敛的预期一致，而Adam的效果没有明显变化可能是因为过拟合了。
+
+
+## 4 
+根据Net() 生成 Net1(), 加入三个batch_normalization层，显示测试结果
+
+Answer: 
+在两个卷积层和第一个线性层后加入BN层后，在测试集上的准确率为65%，较原来提升了9%，可见BatchNorm对效果的提升极为显著。
+BatchNorm能提升效果主要是因为：
+- 可以保证各层数据特征分布的稳定性，使得模型更容易收敛，不容易梯度消失或爆炸
+- 可以保证隐含层输出集中在一般激活函数的主要非线性区
+
+具体代码如下：
+```python
+class BnModel(nn.Module): # Add Batch Normalization (Net1)
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(3, 6, 5)
+        self.pool = nn.MaxPool2d(2, 2)
+        self.bn1 = nn.BatchNorm2d(6)
+        self.conv2 = nn.Conv2d(6, 16, 5)
+        self.bn2 = nn.BatchNorm2d(16)
+
+        self.fc1 = nn.Linear(16 * 5 * 5, 120)
+        self.bn3 = nn.BatchNorm1d(120)
+        self.fc2 = nn.Linear(120, 84)
+        self.fc3 = nn.Linear(84, 10)
+
+    def forward(self, x):
+        x = self.pool(F.relu(self.conv1(x)))
+        x = self.bn1(x)
+        x = self.pool(F.relu(self.conv2(x)))
+        x = self.bn2(x)
+        x = torch.flatten(x, 1) # flatten all dimensions except batch
+        x = F.relu(self.fc1(x))
+        x = self.bn3(x)
+        x = F.relu(self.fc2(x))
+        x = self.fc3(x)
+        return x
+```
+
+## 5 
+根据Net() 生成Net2(), 使用Kaiming初始化卷积与全连接层，显示测试结果
+
+Answer: 
+加入Kaiming初始化后，在测试集上的准确率为57%，较原来提升了1%。
+
+实际上，`nn.Linear`及`nn.Conv2d`的父类`nn._ConvNd`的初始化默认就使用了Kaiming初始化，它们初始化时都调用了以下函数：
+```python
+def reset_parameters(self) -> None:
+    # Setting a=sqrt(5) in kaiming_uniform is the same as initializing with
+    # uniform(-1/sqrt(in_features), 1/sqrt(in_features)). For details, see
+    # https://github.com/pytorch/pytorch/issues/57109
+    init.kaiming_uniform_(self.weight, a=math.sqrt(5))
+    if self.bias is not None:
+        fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
+        bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
+        init.uniform_(self.bias, -bound, bound)
+```
+
+此处的提升可能是因为Kaiming初始化的参数不一样。
+
+该网络具体代码如下：
+```python
+class KaimingInitModel(nn.Module): # Using Kaiming Initialization (Net2)
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(3, 6, 5)
+        self.pool = nn.MaxPool2d(2, 2)
+        self.conv2 = nn.Conv2d(6, 16, 5)
+
+        self.fc1 = nn.Linear(16 * 5 * 5, 120)
+        self.fc2 = nn.Linear(120, 84)
+        self.fc3 = nn.Linear(84, 10)
+
+        torch.nn.init.kaiming_uniform_(self.conv1.weight, a=0, mode='fan_in', nonlinearity='relu')
+        torch.nn.init.kaiming_uniform_(self.conv2.weight, a=0, mode='fan_in', nonlinearity='relu')
+        torch.nn.init.kaiming_uniform_(self.fc1.weight, a=0, mode='fan_in', nonlinearity='relu')
+        torch.nn.init.kaiming_uniform_(self.fc2.weight, a=0, mode='fan_in', nonlinearity='relu')
+        torch.nn.init.kaiming_uniform_(self.fc3.weight, a=0, mode='fan_in', nonlinearity='relu')
+
+    def forward(self, x):
+        x = self.pool(F.relu(self.conv1(x)))
+        x = self.pool(F.relu(self.conv2(x)))
+        x = torch.flatten(x, 1) # flatten all dimensions except batch
+        x = F.relu(self.fc1(x))
+        x = F.relu(self.fc2(x))
+        x = self.fc3(x)
+        return x
+```
+
+## 6 
+根据Net()生成Net3(),将Net()中的通道数加到原来的2倍，显示测试结果
+
+Answer: 
+通道数翻倍后，在测试集上的准确率为60%，较原来提升了4%，可见提升通道数有一定效果。这主要是因为通道数增加后，卷积核的种类增加了，能提取的特征种类也增加了。
+
+具体代码如下：
+```python
+class DoubleChannelModel(nn.Module): # Doubling the Channel (Net3)
+    def __init__(self):
+        super().__init__()
+        self.conv1 = nn.Conv2d(3, 12, 5)
+        self.pool = nn.MaxPool2d(2, 2)
+        self.conv2 = nn.Conv2d(12, 32, 5)
+
+        self.fc1 = nn.Linear(32 * 5 * 5, 120)
+        self.fc2 = nn.Linear(120, 84)
+        self.fc3 = nn.Linear(84, 10)
+
+    def forward(self, x):
+        x = self.pool(F.relu(self.conv1(x)))
+        x = self.pool(F.relu(self.conv2(x)))
+        x = torch.flatten(x, 1) # flatten all dimensions except batch
+        x = F.relu(self.fc1(x))
+        x = F.relu(self.fc2(x))
+        x = self.fc3(x)
+        return x
+```
+
+
+
+## 7 
+在不改变Net()的基础结构（卷积层数、全连接层数不变）和训练epoch数的前提下，你能得到最好的结果是多少？
+
+Answer: 
+我们进行了如下改进：
+- 使用Adam优化器，增加了0.0001的WEIGHT_DECAY，使用`CosineAnnealingLR`余弦退火学习率调整器，初始学习率设为0.002
+- 同上述BN网络加入三个BatchNorm层，并加入了三个p=0.1的dropout层，这三个层正好加在BatchNorm层之后
+- 前两层卷积层通道数分别改为256与256
+- 采用了大量数据增强，具体如下（其中AutoAugment()来自论文'AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty - https://arxiv.org/abs/1912.02781）：
+```python
+def get_transform(self):
+    res = []
+    res.append(transforms.RandomHorizontalFlip(p=0.5))
+    res.extend([transforms.Pad(2, padding_mode='constant'),
+                    transforms.RandomCrop([32,32])])
+    res.append(transforms.RandomApply([AutoAugment()], p=0.6))
+    res.append(transforms.ToTensor())
+    res += [transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
+    return transforms.Compose(res)
+```
+
+最终在测试集上的准确率为83%，较原来提升了27%。
+
+模型具体代码如下：
+```python
+class BetterBaselineModel(nn.Module): # Better Baseline Model (Net4)
+    def __init__(self):
+        super().__init__()
+        channel1 = 256
+        channel2 = 256
+        self.conv1 = nn.Conv2d(3, channel1, 5)
+        self.pool = nn.MaxPool2d(2, 2)
+        self.bn1 = nn.BatchNorm2d(channel1)
+        self.dropout1 = nn.Dropout(0.1)
+        self.conv2 = nn.Conv2d(channel1, channel2, 5)
+        self.bn2 = nn.BatchNorm2d(channel2)
+        self.dropout2 = nn.Dropout(0.1)
+
+        self.fc1 = nn.Linear(channel2 * 5 * 5, 120)
+        self.bn3 = nn.BatchNorm1d(120)
+        self.dropout3 = nn.Dropout(0.1)
+        self.fc2 = nn.Linear(120, 84)
+        self.fc3 = nn.Linear(84, 10)
+
+    def forward(self, x):
+        x = self.pool(F.relu(self.conv1(x)))
+        x = self.bn1(x)
+        x = self.dropout1(x)
+        x = self.pool(F.relu(self.conv2(x)))
+        x = self.bn2(x)
+        x = self.dropout2(x)
+        x = torch.flatten(x, 1) # flatten all dimensions except batch
+        x = F.relu(self.fc1(x))
+        x = self.bn3(x)
+        x = self.dropout3(x)
+        x = F.relu(self.fc2(x))
+        x = self.fc3(x)
+        return x
+```
+
+
+## 8 
+使用ResNet18(),显示测试结果
+
+Answer: 
+使用resnet-18，上一问其它参数不变，在测试集上的准确率为83%，与上一问的结果相同。其在较少参数下能达到这样的效果，说明了其优秀的性能，这可能是因为：
+- 其残差连接能够有效地缓解梯度消失问题，方便层数堆叠
+- 让网络学习残差比学习原始特征更容易，免去了对额外的恒等映射的行为的学习
+
+具体代码如下：
+```python
+class ResNet(nn.Module): # Resnet18
+    def __init__(self):
+        super().__init__()
+        self.model = torchvision.models.resnet18(pretrained=False)
+        if Config.PRETRAINED:
+            self.model.load_state_dict(torch.load(Config.RESNET_PRETRAINED_PATH))
+            for param in self.model.parameters():
+                param.requires_grad = False
+        self.model.fc = nn.Linear(512, 10)
+        self.model.fc.requires_grad_(True)
+
+    def forward(self, x):
+        x = self.model(x)
+        return x
+```