如何在 PyTorch 中对此函数建模

发布于 2025-01-11 05:19:48 字数 492 浏览 0 评论 0原文

我想训练一个具有单个隐藏层的前馈神经网络，该网络对以下方程进行建模。

h = g(W1.input1 + V1.input2 + b)
output1 = f(W2.h + b_w)
output2 = f(V2.h + b_v)

f 和 g 是激活函数，h 是隐藏表示，W1, W2, V1, V2 是权重矩阵，b, b_w, b_v 是各自的偏差。

我无法连接 2 个输入，因为这将产生一个权重矩阵。我无法训练两个单独的神经网络，因为潜在表示会错过两个输入之间的交互。非常感谢任何帮助。我还附上了下面的 NN 图

原文

I want to train a feed-forward Neural Network with a single hidden layer that models the below equation.

h = g(W1.input1 + V1.input2 + b)
output1 = f(W2.h + b_w)
output2 = f(V2.h + b_v)

f and g are activation functions, h is the hidden representation, W1, W2, V1, V2 are Weight matrices, b, b_w, b_v are respective biases.

I can't concatenate 2 inputs because that will result in a single Weight matrix. I can't train two separate NNs because the latent representation will miss the interaction between 2 inputs. Any help is much appreciated. I have also attached the NN diagram below

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

一腔孤↑勇 2025-01-18 05:19:48

PyTorch 让您以函数形式定义前向实现，因此您可以执行以下操作：

class NN(nn.Module):
    def __init__(self, in_ch, h_ch, out_ch):
        super().__init__()
        self.in1 = nn.Linear(in_ch, h_ch-in_ch)
        self.fc1 = nn.Linear(h_ch, out_ch)
        self.fc2 = nn.Linear(h_ch, out_ch)
        self.act = nn.ReLU()

    def forward(self, i1, i2):
        i = torch.cat((self.in1(i1), i2), 1)
        h = self.act(i)
        o1 = self.act(self.fc1(h)) 
        o2 = self.act(self.fc2(h))
        return o1, o2

然后将其用作：

>>> model = NN(5, 10, 3)
>>> model(torch.rand(2, 5), torch.rand(2, 5))

PyTorch let's you define your forward implementation in functional form, so you could do:

class NN(nn.Module):
    def __init__(self, in_ch, h_ch, out_ch):
        super().__init__()
        self.in1 = nn.Linear(in_ch, h_ch-in_ch)
        self.fc1 = nn.Linear(h_ch, out_ch)
        self.fc2 = nn.Linear(h_ch, out_ch)
        self.act = nn.ReLU()

    def forward(self, i1, i2):
        i = torch.cat((self.in1(i1), i2), 1)
        h = self.act(i)
        o1 = self.act(self.fc1(h)) 
        o2 = self.act(self.fc2(h))
        return o1, o2

And then have it being used as:

>>> model = NN(5, 10, 3)
>>> model(torch.rand(2, 5), torch.rand(2, 5))

回复收藏 0 原文

魄砕の薆 2025-01-18 05:19:48

我决定编写自己的线性层来计算 h = g(W1.input1 + V1.input2 + b) 我通过创建 2 个参数 W1 和 V1 将 input1 和 input2 与这 2 个参数相乘来实现此目的然后添加所有内容。代码如下：

import torch
import torch.nn as nn
import math


class MyLinearLayer(nn.Module):
    def __init__(self, size_in1, size_out1):
        super().__init__()
        self.size_in1, self.size_out1 = size_in1, size_out1
        W_1 = torch.Tensor(size_out1, size_in1)
        V_1 = torch.Tensor(size_out1, size_in1)
        self.W1 = nn.Parameter(W_1)
        self.V1 = nn.Parameter(V_1)
        bias = torch.Tensor(size_out1)
        self.bias = nn.Parameter(bias)

    def forward(self, x):
        w_times_x= torch.mm(x[0], self.W1.t())
        v_times_x= torch.mm(x[1], self.V1.t())
        weight_times_x = torch.add(w_times_x, v_times_x)
        return torch.add(weight_times_x, self.bias)  # w times x + w times v + b

    
class NN(nn.Module):
    def __init__(self, in_ch, h_ch, out_ch):
        super().__init__()
        self.input = MyLinearLayer(in_ch, h_ch)
        self.W2 = nn.Linear(h_ch, out_ch)
        self.V2= nn.Linear(h_ch, out_ch)
        self.act = nn.ReLU()

    def forward(self, i1, i2):
        # I pass in stacked input 
        inp = torch.stack([i1,i2])
        h = self.act(self.input(inp))
        o1 = self.act(self.W2(h)) 
        o2 = self.act(self.V2(h))
        return o1, o2

model = NN(5, 10, 5)
o1,o2 = model(torch.rand(2, 5), torch.rand(2, 5))
for name, param in model.named_parameters():
    if param.requires_grad:
        print(name, '->',param.data.shape)

输出 7 个待训练参数：

input.W1 -> torch.Size([10, 5])
input.V1 -> torch.Size([10, 5])
input.bias -> torch.Size([10])
W2.weight -> torch.Size([5, 10])
W2.bias -> torch.Size([5])
V2.weight -> torch.Size([5, 10])
V2.bias -> torch.Size([5])

感谢 @aretor、@Ivan 和 @DerekG 的所有输入

I decided to write my own Linear layer which calculates h = g(W1.input1 + V1.input2 + b) I do this by creating 2 parameters W1 and V1 multiply input1 and input2 with the 2 parameters and then add everything. The code is given below:

import torch
import torch.nn as nn
import math


class MyLinearLayer(nn.Module):
    def __init__(self, size_in1, size_out1):
        super().__init__()
        self.size_in1, self.size_out1 = size_in1, size_out1
        W_1 = torch.Tensor(size_out1, size_in1)
        V_1 = torch.Tensor(size_out1, size_in1)
        self.W1 = nn.Parameter(W_1)
        self.V1 = nn.Parameter(V_1)
        bias = torch.Tensor(size_out1)
        self.bias = nn.Parameter(bias)

    def forward(self, x):
        w_times_x= torch.mm(x[0], self.W1.t())
        v_times_x= torch.mm(x[1], self.V1.t())
        weight_times_x = torch.add(w_times_x, v_times_x)
        return torch.add(weight_times_x, self.bias)  # w times x + w times v + b

    
class NN(nn.Module):
    def __init__(self, in_ch, h_ch, out_ch):
        super().__init__()
        self.input = MyLinearLayer(in_ch, h_ch)
        self.W2 = nn.Linear(h_ch, out_ch)
        self.V2= nn.Linear(h_ch, out_ch)
        self.act = nn.ReLU()

    def forward(self, i1, i2):
        # I pass in stacked input 
        inp = torch.stack([i1,i2])
        h = self.act(self.input(inp))
        o1 = self.act(self.W2(h)) 
        o2 = self.act(self.V2(h))
        return o1, o2

model = NN(5, 10, 5)
o1,o2 = model(torch.rand(2, 5), torch.rand(2, 5))
for name, param in model.named_parameters():
    if param.requires_grad:
        print(name, '->',param.data.shape)

output 7 parameters to be trained:

input.W1 -> torch.Size([10, 5])
input.V1 -> torch.Size([10, 5])
input.bias -> torch.Size([10])
W2.weight -> torch.Size([5, 10])
W2.bias -> torch.Size([5])
V2.weight -> torch.Size([5, 10])
V2.bias -> torch.Size([5])

Thanks for all the inputs @aretor, @Ivan, and @DerekG

回复收藏 0 原文

梦巷 2025-01-18 05:19:48

我将指出，您可以组合 W_1 和 V_1，前提是您只需将它们对角平铺在一个更大的矩阵中，并将所有其他值设置为 0。然后您可以在每个优化步骤之后修剪权重，以将这些参数限制为 0。或者您可以使用稀疏张量来仅表示您想要更改的权重。无论如何：

C = [[W_1   0]
     [0   V_1]]

让 W1 为 mxn 矩阵，V1 为 qxp 矩阵：

C = [[w11 w12 ... w1n 0   0   ... 0  ]
     [w21 w22 ... w2n 0   0   ... 0  ]
     [... ... ... ... ... ... ... ...]
     [wm1 wm2 ... wmn 0   0   ... 0  ]
     [0   0   ... 0   v11 v12 ... v1p]
     [0   0   ... 0   v21 v22 ... v2p]
     [... ... ... ... ... ... ... ...]
     [0   0   ... 0   vq1 vq2 ... vqp]

并且：（

a  = [[input1]
      [input2]]

请原谅维度触发器，如果存在的话，我没有仔细检查）。所得的乘法结果将为您提供您关心的中间值。 h = g(Ca + b)

同样，我相信第二个操作与普通的全连接层相同。您也可以连接最终的偏差项，因为偏差项已定义为每个输出的一个参数。 b_cat = [b_w, b_v]。也就是说，以下命令会产生相同的结果：

f(W2.h + b_w) , f(V2.h + b_v)
f(C2.h + b_cat)

where C2 = [W2,V2]  # concatenated

因此，唯一的新颖之处在于您需要将初始级联权重矩阵 C 中的某些参数限制为 0。

I will point out that you CAN combine W_1 and V_1, provided that you simply tile them diagonally in a larger matrix and setting all other values to 0. You can then clip the weights after each optimization step to constrain these parameters to remain 0. Or you could use a sparse tensor to represent only the weights you care to change. In any case:

C = [[W_1   0]
     [0   V_1]]

as in, let W1 be a mxn matrix and V1 be a qxp matrix:

C = [[w11 w12 ... w1n 0   0   ... 0  ]
     [w21 w22 ... w2n 0   0   ... 0  ]
     [... ... ... ... ... ... ... ...]
     [wm1 wm2 ... wmn 0   0   ... 0  ]
     [0   0   ... 0   v11 v12 ... v1p]
     [0   0   ... 0   v21 v22 ... v2p]
     [... ... ... ... ... ... ... ...]
     [0   0   ... 0   vq1 vq2 ... vqp]

and:

a  = [[input1]
      [input2]]

(Please excuse dimension flip-flops if they exist, I didn't double-check). The resulting multiplication gives you the intermediate value you care about. h = g(Ca + b)

And likewise, I believe the second operation is identical to a normal fully connected layer. You can concatenate the final bias terms too, as the bias term is already defined as one parameter per output. b_cat = [b_w, b_v]. That is, the following yield the same result:

f(W2.h + b_w) , f(V2.h + b_v)
f(C2.h + b_cat)

where C2 = [W2,V2]  # concatenated

So the only novelty is that you need to constrain some of the parameters in the initial concatenated weight matrix C to be 0.

回复收藏 0 原文

~没有更多了~

关于作者

牵你的手，一向走下去

暂无简介

文章

818 人气

关注发私信

友情链接

文江博客

如何在 PyTorch 中对此函数建模

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

峩卟喜欢

一向肩并

潜伏

dongyinghao

百变从容

沧笙踏歌

友情链接

如何在 PyTorch 中对此函数建模

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

峩卟喜欢

一向肩并

潜伏

dongyinghao

百变从容

沧笙踏歌

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。