如何在 PyTorch 中处理 2 层 LSTM 的隐藏单元输出？

发布于 2025-01-15 05:24:52 字数 833 浏览 1 评论 0原文

我在 PyTorch 中创建了一个带有 LSTM 和全连接层的网络。我想测试 LSTM 层数的增加如何影响我的性能。

假设我的输入是 (6, 9, 14)，意味着批量大小为 6，序列大小为 9，特征大小为 14，并且我正在处理一个有 6 个类的任务，所以我期望一个 6 元素 one-hot-编码张量作为单个序列的预测。 FC 层之后该网络的输出应该是 (6, 6)，但是，如果我使用 2 个 LSTM 层，它就会变成 (12, 6)。

我不明白应该如何处理 LSTM 层的输出，以将批次数量从 [2 * batch_size] 减少到 [batch_size]。另外，我知道我正在使用隐藏状态作为 FC 层的输入，我现在想尝试这种方式。

我应该每两批求和或连接还是其他什么？干杯!

    def forward(self, x):
        hidden_0 = torch.zeros((self.lstm_layers, x.size(0), self.hidden_size), dtype=torch.double, device=self.device)
        cell_0 = torch.zeros((self.lstm_layers, x.size(0), self.hidden_size), dtype=torch.double, device=self.device)

        y1, (hidden_1, cell_1) = self.lstm(x, (hidden_0, cell_0))
        hidden_1 = hidden_1.view(-1, self.hidden_size)

        y = self.linear(hidden_1)

        return y

原文

I have made a network with a LSTM and a fully connected layer in PyTorch. I want to test how an increase in the LSTM layers affects my performance.

Say my input is (6, 9, 14), meaning batch size 6, sequence size 9, and feature size 14, and I'm working on a task that has 6 classes, so I expect a 6-element one-hot-encoded tensor as the prediction for a single sequence. The output of this network after the FC layer should be (6, 6), however, if I use 2 LSTM layers it becomes (12, 6).

I don't understand how I should handle the output of the LSTM layer to decrease the number of batches from [2 * batch_size] to [batch_size]. Also, I know I'm using the hidden state as the input to the FC layer, I want to try it this way for now.

Should I sum or concatenate every two batches or anything else?? Cheers!

    def forward(self, x):
        hidden_0 = torch.zeros((self.lstm_layers, x.size(0), self.hidden_size), dtype=torch.double, device=self.device)
        cell_0 = torch.zeros((self.lstm_layers, x.size(0), self.hidden_size), dtype=torch.double, device=self.device)

        y1, (hidden_1, cell_1) = self.lstm(x, (hidden_0, cell_0))
        hidden_1 = hidden_1.view(-1, self.hidden_size)

        y = self.linear(hidden_1)

        return y

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

笑着哭最痛 2025-01-22 05:24:52

多层 lstm 的隐藏状态形状为 (layers, batch_size, hide_size) 查看输出 LSTM。它包含沿第 0 维的每一层的隐藏状态。

在您的示例中，您在此处将形状转换为二维：

hidden_1 = hidden_1.view(-1, self.hidden_size)

这会将形状转换为 (batch_size *layers, hide_size)。

您想要做的只是使用最后一层的隐藏状态：

hidden = hidden_1[-1,:,:].view(-1, self.hidden_size)  # (1, bs, hidden) -> (bs, hidden)
y = self.linear(hidden)
return y

The hidden state shape of a multi layer lstm is (layers, batch_size, hidden_size) see output LSTM. It contains the hidden state for each layer along the 0th dimension.

In your example you convert the shape into two dimensions here:

hidden_1 = hidden_1.view(-1, self.hidden_size)

this transforms the shape into (batch_size * layers, hidden_size).

What you would want to do is only use the hidden state of the last layer:

hidden = hidden_1[-1,:,:].view(-1, self.hidden_size)  # (1, bs, hidden) -> (bs, hidden)
y = self.linear(hidden)
return y

回复收藏 0 原文

雨轻弹 2025-01-22 05:24:52

对于多层LSTM，可以这样写：

class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size, batch_size):
        super().__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.output_size = output_size
        self.batch_size = batch_size
        self.lstm = nn.LSTM(self.input_size, self.hidden_size, self.num_layers, batch_first=True)
        self.linear = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, input_seq):
        h_0 = torch.randn(self.num_directions * self.num_layers, self.batch_size, self.hidden_size).to(device)
        c_0 = torch.randn(self.num_directions * self.num_layers, self.batch_size, self.hidden_size).to(device)
        seq_len = input_seq.shape[1]
        # input(batch_size, seq_len, input_size)
        input_seq = input_seq.view(self.batch_size, seq_len, 1)
        # output(batch_size, seq_len, num_directions * hidden_size)
        output, _ = self.lstm(input_seq, (h_0, c_0))
        # print('output.size=', output.size())
        # print(self.batch_size * seq_len, self.hidden_size)
        output = output.contiguous().view(self.batch_size * seq_len, self.hidden_size)
        pred = self.linear(output)
        # print('pred=', pred.shape)
        pred = pred.view(self.batch_size, seq_len, -1)
        pred = pred[:, -1, :]
        return pred

For a multi-layer LSTM, you can write it like this:

class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, output_size, batch_size):
        super().__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.output_size = output_size
        self.batch_size = batch_size
        self.lstm = nn.LSTM(self.input_size, self.hidden_size, self.num_layers, batch_first=True)
        self.linear = nn.Linear(self.hidden_size, self.output_size)

    def forward(self, input_seq):
        h_0 = torch.randn(self.num_directions * self.num_layers, self.batch_size, self.hidden_size).to(device)
        c_0 = torch.randn(self.num_directions * self.num_layers, self.batch_size, self.hidden_size).to(device)
        seq_len = input_seq.shape[1]
        # input(batch_size, seq_len, input_size)
        input_seq = input_seq.view(self.batch_size, seq_len, 1)
        # output(batch_size, seq_len, num_directions * hidden_size)
        output, _ = self.lstm(input_seq, (h_0, c_0))
        # print('output.size=', output.size())
        # print(self.batch_size * seq_len, self.hidden_size)
        output = output.contiguous().view(self.batch_size * seq_len, self.hidden_size)
        pred = self.linear(output)
        # print('pred=', pred.shape)
        pred = pred.view(self.batch_size, seq_len, -1)
        pred = pred[:, -1, :]
        return pred

回复收藏 0 原文

~没有更多了~