关于Mobilenet的内存使用情况

发布于 2025-01-12 14:16:50 字数 3299 浏览 6 评论 0原文

我正在使用 Pytorch 构建 MobileNetV1，每次训练模型时我的内存都会耗尽。（pytorch 日志“被杀死！”然后突然崩溃）。
这是我的代码

配置文件：(yaml)

n_gpu: 0

arch: 
    type: MobileNet
    args: 
        in_channels: 3
        num_classes: 26
    
data_loader: 
    type: BallDataLoader
    args:
        data_dir: data/balls/
        batch_size: 64
        shuffle: true
        validation_split: 0.2
        num_workers: 0
        resize: 
        - 224
        - 224
    
optimizer:
    type: Adam
    args:
        lr: 1.0e-2
        weight_decay: 0
        amsgrad: true
    
loss: nll_loss
metrics: 
    - accuracy
    - top_k_acc

lr_scheduler: 
    type: StepLR
    args: 
        step_size: 50
        gamma: 0.1
    

trainer: 
    epochs: 50
    save_dir: saved/
    save_period: 2
    verbosity: 2
    monitor: min val_loss
    early_stop: 10
    tensorboard: true

模块.py：

class DepthwiseSeparableConv(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size = 3, stride = 1, padding = None):
        super().__init__()
        if padding == None:
            padding = kernel_size // 2
        self.depth_wise_conv = nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, groups= in_channels)
        self.bn1 = nn.BatchNorm2d(in_channels)
        self.point_wise_conv = nn.Conv2d(in_channels, out_channels, (1,1), 1, 0)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.in_channels = in_channels
        self.out_channels = out_channels

    def forward(self, x):
        x = self.depth_wise_conv(x)
        x = self.bn1(x)
        x = F.relu(x)
        x = self.point_wise_conv(x)
        x = self.bn2(x)
        x = F.relu(x)
        return x

模型.py

class MobileNet(ImageNet):
    def __init__(self, in_channels = 3, num_classes = 1000):
        super().__init__()
        
        self.convs = nn.Sequential(
            nn.Conv2d(in_channels, 32, kernel_size= 3, padding= 1, stride = 1 ),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace = True),
            DepthwiseSeparableConv(32, 64),
            DepthwiseSeparableConv(64, 128, stride = 2),
            DepthwiseSeparableConv(128, 128),
            DepthwiseSeparableConv(128, 256),
            DepthwiseSeparableConv(256, 256),
            DepthwiseSeparableConv(256, 512, stride = 2),
            DepthwiseSeparableConv(512, 512),
            DepthwiseSeparableConv(512, 512),
            DepthwiseSeparableConv(512, 512),
            DepthwiseSeparableConv(512, 512),
            DepthwiseSeparableConv(512, 512),
            DepthwiseSeparableConv(512, 1024, stride = 1),
            DepthwiseSeparableConv(1024, 1024, stride= 2),
            nn.AdaptiveAvgPool2d(1)
        )
        self.fc = nn.Linear(1024, num_classes)

    def forward(self, x):
        x = self.convs(x)
        x = x.view(-1, 1024)
        x = self.fc(x)
        x = F.log_softmax(x, dim = 1)
        return x

所以我从 https://github.com/jmjeon94/MobileNet-Pytorch，并且成功了。几个小时后，我仍然无法找出为什么会发生这种情况，因为模型几乎相同，而且由于移动网络的架构师很轻，我想这应该不会占用太多空间来运行。这是否有可能是因为 python 解释器或者我的代码实际上有问题？

原文

I'm building MobileNetV1 with Pytorch and had my memory ran out every time I train the model. (The pytorch log "Killed!" and suddenly crashed).
This is my code

Config file: (yaml)

n_gpu: 0

arch: 
    type: MobileNet
    args: 
        in_channels: 3
        num_classes: 26
    
data_loader: 
    type: BallDataLoader
    args:
        data_dir: data/balls/
        batch_size: 64
        shuffle: true
        validation_split: 0.2
        num_workers: 0
        resize: 
        - 224
        - 224
    
optimizer:
    type: Adam
    args:
        lr: 1.0e-2
        weight_decay: 0
        amsgrad: true
    
loss: nll_loss
metrics: 
    - accuracy
    - top_k_acc

lr_scheduler: 
    type: StepLR
    args: 
        step_size: 50
        gamma: 0.1
    

trainer: 
    epochs: 50
    save_dir: saved/
    save_period: 2
    verbosity: 2
    monitor: min val_loss
    early_stop: 10
    tensorboard: true

modules.py:

class DepthwiseSeparableConv(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size = 3, stride = 1, padding = None):
        super().__init__()
        if padding == None:
            padding = kernel_size // 2
        self.depth_wise_conv = nn.Conv2d(in_channels, in_channels, kernel_size, stride, padding, groups= in_channels)
        self.bn1 = nn.BatchNorm2d(in_channels)
        self.point_wise_conv = nn.Conv2d(in_channels, out_channels, (1,1), 1, 0)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.in_channels = in_channels
        self.out_channels = out_channels

    def forward(self, x):
        x = self.depth_wise_conv(x)
        x = self.bn1(x)
        x = F.relu(x)
        x = self.point_wise_conv(x)
        x = self.bn2(x)
        x = F.relu(x)
        return x

model.py

class MobileNet(ImageNet):
    def __init__(self, in_channels = 3, num_classes = 1000):
        super().__init__()
        
        self.convs = nn.Sequential(
            nn.Conv2d(in_channels, 32, kernel_size= 3, padding= 1, stride = 1 ),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace = True),
            DepthwiseSeparableConv(32, 64),
            DepthwiseSeparableConv(64, 128, stride = 2),
            DepthwiseSeparableConv(128, 128),
            DepthwiseSeparableConv(128, 256),
            DepthwiseSeparableConv(256, 256),
            DepthwiseSeparableConv(256, 512, stride = 2),
            DepthwiseSeparableConv(512, 512),
            DepthwiseSeparableConv(512, 512),
            DepthwiseSeparableConv(512, 512),
            DepthwiseSeparableConv(512, 512),
            DepthwiseSeparableConv(512, 512),
            DepthwiseSeparableConv(512, 1024, stride = 1),
            DepthwiseSeparableConv(1024, 1024, stride= 2),
            nn.AdaptiveAvgPool2d(1)
        )
        self.fc = nn.Linear(1024, num_classes)

    def forward(self, x):
        x = self.convs(x)
        x = x.view(-1, 1024)
        x = self.fc(x)
        x = F.log_softmax(x, dim = 1)
        return x

So I found a model from https://github.com/jmjeon94/MobileNet-Pytorch, and it worked. After hours I still can't find out why this happened as the models are nearly identical, and since the architect of mobilenet is farely light, this shouldn't take much space to run I supposed. Is there any chance that this is because of the python interpreter or there are actually something wrong with my code?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

杯别 2025-01-19 14:16:50

我认为这是因为你的批量大小。尝试使用较小的批量大小，例如 32,16,8,4,2。

回复收藏 0 原文

若无相欠,怎会相见 2025-01-19 14:16:50

我删除了 nn.Conv2d(in_channels, 32, kernel_size= 3, padding= 1, stride = 1 ) 行并重写了相同的代码并运行了代码。仍然不知道为什么，但似乎是解释器或文本编辑器导致了错误。感谢您的出席。
特别感谢 @Anmol Narang 先生的努力，我非常感激。

回复收藏 0 原文

~没有更多了~