关于Mobilenet的内存使用情况
我正在使用 Pytorch 构建 MobileNetV1,每次训练模型时我的内存都会耗尽。 (pytorch 日志“被杀死!”然后突然崩溃)。
这是我的代码
配置文件:(yaml)
n_gpu: 0
arch:
type: MobileNet
args:
in_channels: 3
num_classes: 26
data_loader:
type: BallDataLoader
args:
data_dir: data/balls/
batch_size: 64
shuffle: true
validation_split: 0.2
num_workers: 0
resize:
- 224
- 224
optimizer:
type: Adam
args:
lr: 1.0e-2
weight_decay: 0
amsgrad: true
loss: nll_loss
metrics:
- accuracy
- top_k_acc
lr_scheduler:
type: StepLR
args:
step_size: 50
gamma: 0.1
trainer:
epochs: 50
save_dir: saved/
save_period: 2
verbosity: 2
monitor: min val_loss
early_stop: 10
tensorboard: true
模块.py:
class DepthwiseSeparableConv(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size = 3, stride = 1, padding = None):
super().__init__()
if padding == None:
padding = kernel_size // 2
self.depth_wise_conv = nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, groups= in_channels)
self.bn1 = nn.BatchNorm2d(in_channels)
self.point_wise_conv = nn.Conv2d(in_channels, out_channels, (1,1), 1, 0)
self.bn2 = nn.BatchNorm2d(out_channels)
self.in_channels = in_channels
self.out_channels = out_channels
def forward(self, x):
x = self.depth_wise_conv(x)
x = self.bn1(x)
x = F.relu(x)
x = self.point_wise_conv(x)
x = self.bn2(x)
x = F.relu(x)
return x
模型.py
class MobileNet(ImageNet):
def __init__(self, in_channels = 3, num_classes = 1000):
super().__init__()
self.convs = nn.Sequential(
nn.Conv2d(in_channels, 32, kernel_size= 3, padding= 1, stride = 1 ),
nn.BatchNorm2d(32),
nn.ReLU(inplace = True),
DepthwiseSeparableConv(32, 64),
DepthwiseSeparableConv(64, 128, stride = 2),
DepthwiseSeparableConv(128, 128),
DepthwiseSeparableConv(128, 256),
DepthwiseSeparableConv(256, 256),
DepthwiseSeparableConv(256, 512, stride = 2),
DepthwiseSeparableConv(512, 512),
DepthwiseSeparableConv(512, 512),
DepthwiseSeparableConv(512, 512),
DepthwiseSeparableConv(512, 512),
DepthwiseSeparableConv(512, 512),
DepthwiseSeparableConv(512, 1024, stride = 1),
DepthwiseSeparableConv(1024, 1024, stride= 2),
nn.AdaptiveAvgPool2d(1)
)
self.fc = nn.Linear(1024, num_classes)
def forward(self, x):
x = self.convs(x)
x = x.view(-1, 1024)
x = self.fc(x)
x = F.log_softmax(x, dim = 1)
return x
I'm building MobileNetV1 with Pytorch and had my memory ran out every time I train the model. (The pytorch log "Killed!" and suddenly crashed).
This is my code
Config file: (yaml)
n_gpu: 0
arch:
type: MobileNet
args:
in_channels: 3
num_classes: 26
data_loader:
type: BallDataLoader
args:
data_dir: data/balls/
batch_size: 64
shuffle: true
validation_split: 0.2
num_workers: 0
resize:
- 224
- 224
optimizer:
type: Adam
args:
lr: 1.0e-2
weight_decay: 0
amsgrad: true
loss: nll_loss
metrics:
- accuracy
- top_k_acc
lr_scheduler:
type: StepLR
args:
step_size: 50
gamma: 0.1
trainer:
epochs: 50
save_dir: saved/
save_period: 2
verbosity: 2
monitor: min val_loss
early_stop: 10
tensorboard: true
modules.py:
class DepthwiseSeparableConv(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size = 3, stride = 1, padding = None):
super().__init__()
if padding == None:
padding = kernel_size // 2
self.depth_wise_conv = nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, groups= in_channels)
self.bn1 = nn.BatchNorm2d(in_channels)
self.point_wise_conv = nn.Conv2d(in_channels, out_channels, (1,1), 1, 0)
self.bn2 = nn.BatchNorm2d(out_channels)
self.in_channels = in_channels
self.out_channels = out_channels
def forward(self, x):
x = self.depth_wise_conv(x)
x = self.bn1(x)
x = F.relu(x)
x = self.point_wise_conv(x)
x = self.bn2(x)
x = F.relu(x)
return x
model.py
class MobileNet(ImageNet):
def __init__(self, in_channels = 3, num_classes = 1000):
super().__init__()
self.convs = nn.Sequential(
nn.Conv2d(in_channels, 32, kernel_size= 3, padding= 1, stride = 1 ),
nn.BatchNorm2d(32),
nn.ReLU(inplace = True),
DepthwiseSeparableConv(32, 64),
DepthwiseSeparableConv(64, 128, stride = 2),
DepthwiseSeparableConv(128, 128),
DepthwiseSeparableConv(128, 256),
DepthwiseSeparableConv(256, 256),
DepthwiseSeparableConv(256, 512, stride = 2),
DepthwiseSeparableConv(512, 512),
DepthwiseSeparableConv(512, 512),
DepthwiseSeparableConv(512, 512),
DepthwiseSeparableConv(512, 512),
DepthwiseSeparableConv(512, 512),
DepthwiseSeparableConv(512, 1024, stride = 1),
DepthwiseSeparableConv(1024, 1024, stride= 2),
nn.AdaptiveAvgPool2d(1)
)
self.fc = nn.Linear(1024, num_classes)
def forward(self, x):
x = self.convs(x)
x = x.view(-1, 1024)
x = self.fc(x)
x = F.log_softmax(x, dim = 1)
return x
So I found a model from https://github.com/jmjeon94/MobileNet-Pytorch, and it worked. After hours I still can't find out why this happened as the models are nearly identical, and since the architect of mobilenet is farely light, this shouldn't take much space to run I supposed. Is there any chance that this is because of the python interpreter or there are actually something wrong with my code?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我认为这是因为你的批量大小。尝试使用较小的批量大小,例如 32,16,8,4,2。
I think it's because of your batch size. Try using smaller batch size like 32,16,8,4,2.
我删除了 nn.Conv2d(in_channels, 32, kernel_size= 3, padding= 1, stride = 1 ) 行并重写了相同的代码并运行了代码。仍然不知道为什么,但似乎是解释器或文本编辑器导致了错误。感谢您的出席。
特别感谢 @Anmol Narang 先生的努力,我非常感激。
I delete the line
nn.Conv2d(in_channels, 32, kernel_size= 3, padding= 1, stride = 1 )
and rewrite the same and the code ran. Still don't know why but it seem to be the interpreter or the text editor which cause the error. Thank you for attending.And special thanks to Mr. @Anmol Narang for your efford, I'm very appreciated.