TensorFlow 模型中的矩阵乘法

发布于 2025-01-12 16:06:52 字数 978 浏览 0 评论 0原文

我想在 TF 模型中使用矩阵乘法。我的模型是一个输入形状 = (1,9) 的神经网络。我想得到这个向量本身的乘积（即我想要得到一个矩阵乘积等于转置输入向量本身的乘积，所以它的形状等于（9,9））。

代码示例：

inputs = tf.keras.layers.Input(shape=(1,9))
outputs = tf.keras.layers.Dense(1, activation='linear')(tf.transpose(inputs) @ inputs)
    
model = tf.keras.Model(inputs, outputs)

adam = keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

model.compile(optimizer=adam, loss='mse', metrics=['mae'])

但是我对这种结果的形状有疑问。在上述代码的情况下，我得到了下一个架构：

如果我理解正确，输入层中的第一个维度（无）对应于输入数据批次的大小。当我使用转置操作时，它适用于该形状的所有维度。所以转置和乘法后我得到形状为 (9,1,9) 的结果。但我认为，这是不正确的。因为我想获得批量中所有向量的转置输入向量本身的乘积（即我想要得到的结果的正确形状是 (None, 9, 9)）。

将此乘积作为模型的输入（在该模型之外计算此乘法）是不合适的。因为我想在我的模型中使用原始输入向量和乘法结果来执行一些操作（上面的架构并不完整并用作示例）。

我怎样才能得到正确的结果？如果我们想批量应用此操作到所有向量（矩阵），那么在 TF 中矩阵和向量相乘的正确方法是什么？

原文

I want to use matrix multiplication inside TF model. My model is a NN with input shape = (1,9). And I want to get a product of this vectors by themself (i.e. I want to get a matrix-product equals multiplication of transposed input vector by itself, so its shape equals (9,9)).

Code example:

inputs = tf.keras.layers.Input(shape=(1,9))
outputs = tf.keras.layers.Dense(1, activation='linear')(tf.transpose(inputs) @ inputs)
    
model = tf.keras.Model(inputs, outputs)

adam = keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

model.compile(optimizer=adam, loss='mse', metrics=['mae'])

But I have problem with shape of such result. In the case of the above code I get a next architecture:

If I understand correctly, first dimension (None) in the input layer corresponds to size of batch of input data. And when I use transpose operation, it applies to all dimensions in this shape. So I get result with shape (9,1,9) after transpose and multiplication. But I think, that it is not correctly. Because I want to get product of transposed input vector by itself for all vectors in batch (i.e. correct shape for result which I want to get is (None, 9, 9)).

Getting this product as input for the model (compute this multiplication outside this model) is not suitable. Because I want to have in my model original input vector and the result of multiplication to do some operations after (above architecture is not full and using as example).

How can I get correct result? What is correct way to multiply matrices and vectors in TF, if we want to apply this operation to all vectors (matrices) in batch?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

陌生 2025-01-19 16:06:53

尝试tf.linalg.matmul，因为它会尊重批量维度：

import tensorflow as tf

inputs = tf.keras.layers.Input(shape=(1,9))
outputs = tf.keras.layers.Dense(1, activation='linear')(tf.linalg.matmul(inputs, inputs, transpose_a=True))
    
model = tf.keras.Model(inputs, outputs)

adam = keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

model.compile(optimizer=adam, loss='mse', metrics=['mae'])
print(model.summary())

Model: "model_3"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_5 (InputLayer)           [(None, 1, 9)]       0           []                               
                                                                                                  
 tf.linalg.matmul_3 (TFOpLambda  (None, 9, 9)        0           ['input_5[0][0]',                
 )                                                                'input_5[0][0]']                
                                                                                                  
 dense_4 (Dense)                (None, 9, 1)         10          ['tf.linalg.matmul_3[0][0]']     
                                                                                                  
==================================================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
__________________________________________________________________________________________________
None

Try tf.linalg.matmul, since it will respect the batch dimension:

import tensorflow as tf

inputs = tf.keras.layers.Input(shape=(1,9))
outputs = tf.keras.layers.Dense(1, activation='linear')(tf.linalg.matmul(inputs, inputs, transpose_a=True))
    
model = tf.keras.Model(inputs, outputs)

adam = keras.optimizers.Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0)

model.compile(optimizer=adam, loss='mse', metrics=['mae'])
print(model.summary())

Model: "model_3"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_5 (InputLayer)           [(None, 1, 9)]       0           []                               
                                                                                                  
 tf.linalg.matmul_3 (TFOpLambda  (None, 9, 9)        0           ['input_5[0][0]',                
 )                                                                'input_5[0][0]']                
                                                                                                  
 dense_4 (Dense)                (None, 9, 1)         10          ['tf.linalg.matmul_3[0][0]']     
                                                                                                  
==================================================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
__________________________________________________________________________________________________
None

回复收藏 0 原文

以酷 2025-01-19 16:06:53

我从你的问题中读到，要在神经网络内进行矩阵乘法，其中数字乘法很容易！
这是序列到序列，我们有很多例子（那些带有目标乘法字典的单词句子输入）
不需要指定形状输出，但序列输出仍然是答案！

使用 TF.where 或更高版本！
输入：

    array_1 = [ 0, 1, 1, 0 ]
    array_2 = np.concatenate((array_1, array_1), axis = 0)
    temp = [ 0, 1, 1, 0 ]
    
    print( np.asarray( tf.where([ temp == [0, 1, 1, 0] ], array_2, 0 ) ) )
    
    input('...')

输出：

[0 1 1 0 0 1 1 0]

使用 tfa.seq2seq.BasicDecoder 总和
输入：

    index = 1
    next_char = tf.strings.substr(
        input_word, index, len(input_word[0].numpy()) - index, unit="UTF8_CHAR", name=None
    )
    output, state, lengths = decoder(
        next_char, start_tokens=start_tokens, end_token=end_token, initial_state=initial_state)
    
    print('next_char[0].numpy(): ' + str(next_char[0].numpy()))

输出：

input_word[0].numpy() length: tf.Tensor([b'Gl\xc3\xbccklicherweise '], shape=(1,), dtype=string)
input_word[0].numpy() length: 18
next_char[0].numpy(): b'Gl\xc3\xbccklicherweise '
next_char[0].numpy(): b'l\xc3\xbccklicherweise '
next_char[0].numpy(): b'\xc3\xbccklicherweise '
next_char[0].numpy(): b'cklicherweise '
next_char[0].numpy(): b'klicherweise '
next_char[0].numpy(): b'licherweise '

sum = G + L + L + ...

模型乘法，您使用密集输入，输出是所需目标的序列，如图所示。

I am reading from your questoion that to do the matrix multiplication inside the NN where number mutiplication is do it easy !
It is sequence to sequence where we had many example of them ( those word sentense input with target multiplication dictionary )
It is no need shape output specify but seuquence output is still answer !

Using TF.where or greater !
input:

    array_1 = [ 0, 1, 1, 0 ]
    array_2 = np.concatenate((array_1, array_1), axis = 0)
    temp = [ 0, 1, 1, 0 ]
    
    print( np.asarray( tf.where([ temp == [0, 1, 1, 0] ], array_2, 0 ) ) )
    
    input('...')

output:

[0 1 1 0 0 1 1 0]

Using tfa.seq2seq.BasicDecoder sum
input:

    index = 1
    next_char = tf.strings.substr(
        input_word, index, len(input_word[0].numpy()) - index, unit="UTF8_CHAR", name=None
    )
    output, state, lengths = decoder(
        next_char, start_tokens=start_tokens, end_token=end_token, initial_state=initial_state)
    
    print('next_char[0].numpy(): ' + str(next_char[0].numpy()))

output:

input_word[0].numpy() length: tf.Tensor([b'Gl\xc3\xbccklicherweise '], shape=(1,), dtype=string)
input_word[0].numpy() length: 18
next_char[0].numpy(): b'Gl\xc3\xbccklicherweise '
next_char[0].numpy(): b'l\xc3\xbccklicherweise '
next_char[0].numpy(): b'\xc3\xbccklicherweise '
next_char[0].numpy(): b'cklicherweise '
next_char[0].numpy(): b'klicherweise '
next_char[0].numpy(): b'licherweise '

sum = G + L + L + ...