如何在 PyTorch 中对此函数建模

发布于 2025-01-11 05:19:48 字数 492 浏览 0 评论 0原文

我想训练一个具有单个隐藏层的前馈神经网络,该网络对以下方程进行建模。

h = g(W1.input1 + V1.input2 + b)
output1 = f(W2.h + b_w)
output2 = f(V2.h + b_v)

f 和 g 是激活函数,h 是隐藏表示,W1, W2, V1, V2 是权重矩阵,b, b_w, b_v 是各自的偏差。

我无法连接 2 个输入,因为这将产生一个权重矩阵。我无法训练两个单独的神经网络,因为潜在表示会错过两个输入之间的交互。非常感谢任何帮助。我还附上了下面的 NN 图

NN 图

I want to train a feed-forward Neural Network with a single hidden layer that models the below equation.

h = g(W1.input1 + V1.input2 + b)
output1 = f(W2.h + b_w)
output2 = f(V2.h + b_v)

f and g are activation functions, h is the hidden representation, W1, W2, V1, V2 are Weight matrices, b, b_w, b_v are respective biases.

I can't concatenate 2 inputs because that will result in a single Weight matrix. I can't train two separate NNs because the latent representation will miss the interaction between 2 inputs. Any help is much appreciated. I have also attached the NN diagram below

NN Diagram

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

一腔孤↑勇 2025-01-18 05:19:48

PyTorch 让您以函数形式定义前向实现,因此您可以执行以下操作:

class NN(nn.Module):
    def __init__(self, in_ch, h_ch, out_ch):
        super().__init__()
        self.in1 = nn.Linear(in_ch, h_ch-in_ch)
        self.fc1 = nn.Linear(h_ch, out_ch)
        self.fc2 = nn.Linear(h_ch, out_ch)
        self.act = nn.ReLU()

    def forward(self, i1, i2):
        i = torch.cat((self.in1(i1), i2), 1)
        h = self.act(i)
        o1 = self.act(self.fc1(h)) 
        o2 = self.act(self.fc2(h))
        return o1, o2

然后将其用作:

>>> model = NN(5, 10, 3)
>>> model(torch.rand(2, 5), torch.rand(2, 5))

PyTorch let's you define your forward implementation in functional form, so you could do:

class NN(nn.Module):
    def __init__(self, in_ch, h_ch, out_ch):
        super().__init__()
        self.in1 = nn.Linear(in_ch, h_ch-in_ch)
        self.fc1 = nn.Linear(h_ch, out_ch)
        self.fc2 = nn.Linear(h_ch, out_ch)
        self.act = nn.ReLU()

    def forward(self, i1, i2):
        i = torch.cat((self.in1(i1), i2), 1)
        h = self.act(i)
        o1 = self.act(self.fc1(h)) 
        o2 = self.act(self.fc2(h))
        return o1, o2

And then have it being used as:

>>> model = NN(5, 10, 3)
>>> model(torch.rand(2, 5), torch.rand(2, 5))
魄砕の薆 2025-01-18 05:19:48

我决定编写自己的线性层来计算 h = g(W1.input1 + V1.input2 + b) 我通过创建 2 个参数 W1 和 V1 将 input1 和 input2 与这 2 个参数相乘来实现此目的然后添加所有内容。代码如下:

import torch
import torch.nn as nn
import math


class MyLinearLayer(nn.Module):
    def __init__(self, size_in1, size_out1):
        super().__init__()
        self.size_in1, self.size_out1 = size_in1, size_out1
        W_1 = torch.Tensor(size_out1, size_in1)
        V_1 = torch.Tensor(size_out1, size_in1)
        self.W1 = nn.Parameter(W_1)
        self.V1 = nn.Parameter(V_1)
        bias = torch.Tensor(size_out1)
        self.bias = nn.Parameter(bias)

    def forward(self, x):
        w_times_x= torch.mm(x[0], self.W1.t())
        v_times_x= torch.mm(x[1], self.V1.t())
        weight_times_x = torch.add(w_times_x, v_times_x)
        return torch.add(weight_times_x, self.bias)  # w times x + w times v + b

    
class NN(nn.Module):
    def __init__(self, in_ch, h_ch, out_ch):
        super().__init__()
        self.input = MyLinearLayer(in_ch, h_ch)
        self.W2 = nn.Linear(h_ch, out_ch)
        self.V2= nn.Linear(h_ch, out_ch)
        self.act = nn.ReLU()

    def forward(self, i1, i2):
        # I pass in stacked input 
        inp = torch.stack([i1,i2])
        h = self.act(self.input(inp))
        o1 = self.act(self.W2(h)) 
        o2 = self.act(self.V2(h))
        return o1, o2

model = NN(5, 10, 5)
o1,o2 = model(torch.rand(2, 5), torch.rand(2, 5))
for name, param in model.named_parameters():
    if param.requires_grad:
        print(name, '->',param.data.shape)

输出 7 个待训练参数:

input.W1 -> torch.Size([10, 5])
input.V1 -> torch.Size([10, 5])
input.bias -> torch.Size([10])
W2.weight -> torch.Size([5, 10])
W2.bias -> torch.Size([5])
V2.weight -> torch.Size([5, 10])
V2.bias -> torch.Size([5])

感谢 @aretor、@Ivan 和 @DerekG 的所有输入

I decided to write my own Linear layer which calculates h = g(W1.input1 + V1.input2 + b) I do this by creating 2 parameters W1 and V1 multiply input1 and input2 with the 2 parameters and then add everything. The code is given below:

import torch
import torch.nn as nn
import math


class MyLinearLayer(nn.Module):
    def __init__(self, size_in1, size_out1):
        super().__init__()
        self.size_in1, self.size_out1 = size_in1, size_out1
        W_1 = torch.Tensor(size_out1, size_in1)
        V_1 = torch.Tensor(size_out1, size_in1)
        self.W1 = nn.Parameter(W_1)
        self.V1 = nn.Parameter(V_1)
        bias = torch.Tensor(size_out1)
        self.bias = nn.Parameter(bias)

    def forward(self, x):
        w_times_x= torch.mm(x[0], self.W1.t())
        v_times_x= torch.mm(x[1], self.V1.t())
        weight_times_x = torch.add(w_times_x, v_times_x)
        return torch.add(weight_times_x, self.bias)  # w times x + w times v + b

    
class NN(nn.Module):
    def __init__(self, in_ch, h_ch, out_ch):
        super().__init__()
        self.input = MyLinearLayer(in_ch, h_ch)
        self.W2 = nn.Linear(h_ch, out_ch)
        self.V2= nn.Linear(h_ch, out_ch)
        self.act = nn.ReLU()

    def forward(self, i1, i2):
        # I pass in stacked input 
        inp = torch.stack([i1,i2])
        h = self.act(self.input(inp))
        o1 = self.act(self.W2(h)) 
        o2 = self.act(self.V2(h))
        return o1, o2

model = NN(5, 10, 5)
o1,o2 = model(torch.rand(2, 5), torch.rand(2, 5))
for name, param in model.named_parameters():
    if param.requires_grad:
        print(name, '->',param.data.shape)

output 7 parameters to be trained:

input.W1 -> torch.Size([10, 5])
input.V1 -> torch.Size([10, 5])
input.bias -> torch.Size([10])
W2.weight -> torch.Size([5, 10])
W2.bias -> torch.Size([5])
V2.weight -> torch.Size([5, 10])
V2.bias -> torch.Size([5])

Thanks for all the inputs @aretor, @Ivan, and @DerekG

梦巷 2025-01-18 05:19:48

我将指出,您可以组合 W_1 和 V_1,前提是您只需将它们对角平铺在一个更大的矩阵中,并将所有其他值设置为 0。然后您可以在每个优化步骤之后修剪权重,以将这些参数限制为 0。或者您可以使用稀疏张量来仅表示您想要更改的权重。无论如何:

C = [[W_1   0]
     [0   V_1]]

让 W1 为 mxn 矩阵,V1 为 qxp 矩阵:

C = [[w11 w12 ... w1n 0   0   ... 0  ]
     [w21 w22 ... w2n 0   0   ... 0  ]
     [... ... ... ... ... ... ... ...]
     [wm1 wm2 ... wmn 0   0   ... 0  ]
     [0   0   ... 0   v11 v12 ... v1p]
     [0   0   ... 0   v21 v22 ... v2p]
     [... ... ... ... ... ... ... ...]
     [0   0   ... 0   vq1 vq2 ... vqp] 

并且:(

a  = [[input1]
      [input2]]

请原谅维度触发器,如果​​存在的话,我没有仔细检查)。所得的乘法结果将为您提供您关心的中间值。 h = g(Ca + b)

同样,我相信第二个操作与普通的全连接层相同。您也可以连接最终的偏差项,因为偏差项已定义为每个输出的一个参数。 b_cat = [b_w, b_v]。也就是说,以下命令会产生相同的结果:

f(W2.h + b_w) , f(V2.h + b_v)
f(C2.h + b_cat)

where C2 = [W2,V2]  # concatenated

因此,唯一的新颖之处在于您需要将初始级联权重矩阵 C 中的某些参数限制为 0。

I will point out that you CAN combine W_1 and V_1, provided that you simply tile them diagonally in a larger matrix and setting all other values to 0. You can then clip the weights after each optimization step to constrain these parameters to remain 0. Or you could use a sparse tensor to represent only the weights you care to change. In any case:

C = [[W_1   0]
     [0   V_1]]

as in, let W1 be a mxn matrix and V1 be a qxp matrix:

C = [[w11 w12 ... w1n 0   0   ... 0  ]
     [w21 w22 ... w2n 0   0   ... 0  ]
     [... ... ... ... ... ... ... ...]
     [wm1 wm2 ... wmn 0   0   ... 0  ]
     [0   0   ... 0   v11 v12 ... v1p]
     [0   0   ... 0   v21 v22 ... v2p]
     [... ... ... ... ... ... ... ...]
     [0   0   ... 0   vq1 vq2 ... vqp] 

and:

a  = [[input1]
      [input2]]

(Please excuse dimension flip-flops if they exist, I didn't double-check). The resulting multiplication gives you the intermediate value you care about. h = g(Ca + b)

And likewise, I believe the second operation is identical to a normal fully connected layer. You can concatenate the final bias terms too, as the bias term is already defined as one parameter per output. b_cat = [b_w, b_v]. That is, the following yield the same result:

f(W2.h + b_w) , f(V2.h + b_v)
f(C2.h + b_cat)

where C2 = [W2,V2]  # concatenated

So the only novelty is that you need to constrain some of the parameters in the initial concatenated weight matrix C to be 0.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文