Skip to content

Commit

Permalink
代码与探究上线
Browse files Browse the repository at this point in the history
  • Loading branch information
Unakar committed Feb 14, 2024
1 parent eadde29 commit 8250b46
Show file tree
Hide file tree
Showing 8 changed files with 1,375 additions and 1 deletion.
282 changes: 281 additions & 1 deletion docs/cifar10/探究.md
Original file line number Diff line number Diff line change
@@ -1 +1,281 @@
# 探究实验
# 探究实验

# Deep Learning Lab Report - CIFAR-10

## 0
batch_size是不是越大越好?

Answer: 不是
使用不同的batch_size,在测试集上的准确率分别为:
| batch_size | accuracy |
| :--------: | :------: |
| 16 | 65% |
| 32 | 63% |
| 64 | 56% |
| 128 | 47% |

这说明了batch_size并不是越大越好。

可能的原因是:
- 大的batch会导致模型更容易收敛到局部最优解,而不是全局最优解;小的batch实际起了退火的作用
- 大的batch会导致模型更新次数减少,收敛速度变慢

另外,大的batch也可能导致显存容量不足,无法训练



## 1
在训练集那里的transform试一下RandomHorizontalFlip,效果会更好吗?

Answer:
我们对数据集做了RandomHorizontalFlip的Augmentation,改动后在测试集上的准确率为56%,与原来相比没有明显变化
其Training Loss在最后一个epoch平均为1.226,相较原来的版本中的1.209高了一点,这是在预期之中的,因为增加Augmentation之后过拟合的程度降低了,Training Loss会提高
实际上,由于这个Augmentation是比较弱的,所以效果并不明显



## 2
换一个optimizer, 使效果更好一些

Answer: 我们可以使用Adam来大幅优化训练效率
```python
self.optimizer = torch.optim.Adam(self.model.parameters(), lr=Config.LEARNING_RATE)
```
用其训练后,收敛速度大幅提高,且最终精度也大幅提高:

对比它与原来的SGD在每个epoch的表现如下:
| epoch | SGD | Adam |
| :---: | :---: | :---: |
| 1 | 16% | 48% |
| 2 | 29% | 54% |
| 3 | 36% | 57% |
| 4 | 40% | 60% |
| 5 | 43% | 62% |
| 6 | 46% | 62% |
| 7 | 48% | 63% |
| 8 | 50% | 64% |
| 9 | 51% | 63% |
| 10 | 53% | 65% |
| 11 | 55% | 64% |
| 12 | 56% | 65% |

这样的结果在预期之中,毕竟Adam已经成为广大深度学习任务的标配优化器了,它在二阶上对动量进行修正,使得在loss的landscape较为极端的情况下,仍能保持较好的收敛性。

## 3
保持epoch数不变,加一个scheduler,是否能让效果更好一些

Answer:
我们使用最简单的StepLR来进行学习率的调整,我们设置的参数令其每过5个epoch学习率减半,初始学习率设为0.002,分别对SGD与Adam测试如下:
| Group | SGD-Accuracy | SGD-Loss | Adam-Accuracy | Adam-Loss |
| :----------: | :----------: | :------: | :-----------: | :-------: |
| No-Scheduler | 56% | 1.211 | 65% | 0.779 |
| Step-LR | 60% | 1.068 | 65% | 0.668 |

可以看到,使用StepLR后,SGD的效果有了一定的提升,而Adam的效果没有明显变化,但两个优化器训出来的Loss都有了明显的下降,这说明了学习率的调整是有效的,的确和调小学习率能促进收敛的预期一致,而Adam的效果没有明显变化可能是因为过拟合了。


## 4
根据Net() 生成 Net1(), 加入三个batch_normalization层,显示测试结果

Answer:
在两个卷积层和第一个线性层后加入BN层后,在测试集上的准确率为65%,较原来提升了9%,可见BatchNorm对效果的提升极为显著。
BatchNorm能提升效果主要是因为:
- 可以保证各层数据特征分布的稳定性,使得模型更容易收敛,不容易梯度消失或爆炸
- 可以保证隐含层输出集中在一般激活函数的主要非线性区

具体代码如下:
```python
class BnModel(nn.Module): # Add Batch Normalization (Net1)
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.bn1 = nn.BatchNorm2d(6)
self.conv2 = nn.Conv2d(6, 16, 5)
self.bn2 = nn.BatchNorm2d(16)

self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.bn3 = nn.BatchNorm1d(120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.bn1(x)
x = self.pool(F.relu(self.conv2(x)))
x = self.bn2(x)
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = self.bn3(x)
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
```

## 5
根据Net() 生成Net2(), 使用Kaiming初始化卷积与全连接层,显示测试结果

Answer:
加入Kaiming初始化后,在测试集上的准确率为57%,较原来提升了1%。

实际上,`nn.Linear``nn.Conv2d`的父类`nn._ConvNd`的初始化默认就使用了Kaiming初始化,它们初始化时都调用了以下函数:
```python
def reset_parameters(self) -> None:
# Setting a=sqrt(5) in kaiming_uniform is the same as initializing with
# uniform(-1/sqrt(in_features), 1/sqrt(in_features)). For details, see
# https://github.com/pytorch/pytorch/issues/57109
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in) if fan_in > 0 else 0
init.uniform_(self.bias, -bound, bound)
```

此处的提升可能是因为Kaiming初始化的参数不一样。

该网络具体代码如下:
```python
class KaimingInitModel(nn.Module): # Using Kaiming Initialization (Net2)
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)

self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

torch.nn.init.kaiming_uniform_(self.conv1.weight, a=0, mode='fan_in', nonlinearity='relu')
torch.nn.init.kaiming_uniform_(self.conv2.weight, a=0, mode='fan_in', nonlinearity='relu')
torch.nn.init.kaiming_uniform_(self.fc1.weight, a=0, mode='fan_in', nonlinearity='relu')
torch.nn.init.kaiming_uniform_(self.fc2.weight, a=0, mode='fan_in', nonlinearity='relu')
torch.nn.init.kaiming_uniform_(self.fc3.weight, a=0, mode='fan_in', nonlinearity='relu')

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
```

## 6
根据Net()生成Net3(),将Net()中的通道数加到原来的2倍,显示测试结果

Answer:
通道数翻倍后,在测试集上的准确率为60%,较原来提升了4%,可见提升通道数有一定效果。这主要是因为通道数增加后,卷积核的种类增加了,能提取的特征种类也增加了。

具体代码如下:
```python
class DoubleChannelModel(nn.Module): # Doubling the Channel (Net3)
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 12, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(12, 32, 5)

self.fc1 = nn.Linear(32 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
```



## 7
在不改变Net()的基础结构(卷积层数、全连接层数不变)和训练epoch数的前提下,你能得到最好的结果是多少?

Answer:
我们进行了如下改进:
- 使用Adam优化器,增加了0.0001的WEIGHT_DECAY,使用`CosineAnnealingLR`余弦退火学习率调整器,初始学习率设为0.002
- 同上述BN网络加入三个BatchNorm层,并加入了三个p=0.1的dropout层,这三个层正好加在BatchNorm层之后
- 前两层卷积层通道数分别改为256与256
- 采用了大量数据增强,具体如下(其中AutoAugment()来自论文'AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty - https://arxiv.org/abs/1912.02781):
```python
def get_transform(self):
res = []
res.append(transforms.RandomHorizontalFlip(p=0.5))
res.extend([transforms.Pad(2, padding_mode='constant'),
transforms.RandomCrop([32,32])])
res.append(transforms.RandomApply([AutoAugment()], p=0.6))
res.append(transforms.ToTensor())
res += [transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))]
return transforms.Compose(res)
```

最终在测试集上的准确率为83%,较原来提升了27%。

模型具体代码如下:
```python
class BetterBaselineModel(nn.Module): # Better Baseline Model (Net4)
def __init__(self):
super().__init__()
channel1 = 256
channel2 = 256
self.conv1 = nn.Conv2d(3, channel1, 5)
self.pool = nn.MaxPool2d(2, 2)
self.bn1 = nn.BatchNorm2d(channel1)
self.dropout1 = nn.Dropout(0.1)
self.conv2 = nn.Conv2d(channel1, channel2, 5)
self.bn2 = nn.BatchNorm2d(channel2)
self.dropout2 = nn.Dropout(0.1)

self.fc1 = nn.Linear(channel2 * 5 * 5, 120)
self.bn3 = nn.BatchNorm1d(120)
self.dropout3 = nn.Dropout(0.1)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.bn1(x)
x = self.dropout1(x)
x = self.pool(F.relu(self.conv2(x)))
x = self.bn2(x)
x = self.dropout2(x)
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F.relu(self.fc1(x))
x = self.bn3(x)
x = self.dropout3(x)
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
```


## 8
使用ResNet18(),显示测试结果

Answer:
使用resnet-18,上一问其它参数不变,在测试集上的准确率为83%,与上一问的结果相同。其在较少参数下能达到这样的效果,说明了其优秀的性能,这可能是因为:
- 其残差连接能够有效地缓解梯度消失问题,方便层数堆叠
- 让网络学习残差比学习原始特征更容易,免去了对额外的恒等映射的行为的学习

具体代码如下:
```python
class ResNet(nn.Module): # Resnet18
def __init__(self):
super().__init__()
self.model = torchvision.models.resnet18(pretrained=False)
if Config.PRETRAINED:
self.model.load_state_dict(torch.load(Config.RESNET_PRETRAINED_PATH))
for param in self.model.parameters():
param.requires_grad = False
self.model.fc = nn.Linear(512, 10)
self.model.fc.requires_grad_(True)

def forward(self, x):
x = self.model(x)
return x
```
Loading

0 comments on commit 8250b46

Please sign in to comment.