如何为Keras Gru解释get_weights？

发布于 2025-02-11 16:41:43 字数 1867 浏览 4 评论 0 原文

我无法解释GRU层的get_weights的结果。这是我的代码 -

#Modified from - https://machinelearningmastery.com/understanding-simple-recurrent-neural-networks-in-keras/
from pandas import read_csv
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN, GRU
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import math
import matplotlib.pyplot as plt

model = Sequential()
model.add(GRU(units = 2, input_shape = (3,1), activation = 'linear'))
model.add(Dense(units = 1, activation = 'linear'))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')

initial_weights = model.layers[0].get_weights()
print("Shape = ",initial_weights)

我熟悉GRU概念。此外，我了解get_weights如何适用于Keras Simple RNN层，其中第一个阵列代表输入权重，第二个数组代表激活权重的第二个阵列，而第三个则是偏差。但是，我因GRU输出而迷失了方向，这是在下面给出的 -

Shape =  [array([[-0.64266175, -0.0870676 , -0.25356603, -0.03685969,  0.22260845,
        -0.04923642]], dtype=float32), array([[ 0.01929092, -0.4932567 ,  0.3723044 , -0.6559699 , -0.33790302,
         0.27062896],
       [-0.4214194 ,  0.46456426,  0.27233726, -0.00461334, -0.6533575 ,
        -0.32483965]], dtype=float32), array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]], dtype=float32)]

我假设它与Gru Gates有关。

更新：7/4-此说Keras Gru有3个大门，更新，重置和输出。但是，基于 /a>，GRU不应具有输出门。

原文

I am unable to interpret the results of get_weights from a GRU layer. Here's my code -

#Modified from - https://machinelearningmastery.com/understanding-simple-recurrent-neural-networks-in-keras/
from pandas import read_csv
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN, GRU
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import math
import matplotlib.pyplot as plt

model = Sequential()
model.add(GRU(units = 2, input_shape = (3,1), activation = 'linear'))
model.add(Dense(units = 1, activation = 'linear'))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')

initial_weights = model.layers[0].get_weights()
print("Shape = ",initial_weights)

I am familiar with GRU concepts. In addition, I understand how the get_weights work for Keras Simple RNN layer, where the first array represents the input weights, the second the activation weights and the third the bias. However, I am lost with output of GRU, which is given below -

Shape =  [array([[-0.64266175, -0.0870676 , -0.25356603, -0.03685969,  0.22260845,
        -0.04923642]], dtype=float32), array([[ 0.01929092, -0.4932567 ,  0.3723044 , -0.6559699 , -0.33790302,
         0.27062896],
       [-0.4214194 ,  0.46456426,  0.27233726, -0.00461334, -0.6533575 ,
        -0.32483965]], dtype=float32), array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]], dtype=float32)]

I am assuming it has something to do with GRU gates.

Update:7/4 - This page says that keras GRU has 3 gates, update, reset and output. However, based on this, GRU shouldn't have the output gate.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

木落 2025-02-18 16:41:43

Best way I know would be to track the add_weight() calls in the

让我们以示例模型为例，

model = tf.keras.models.Sequential(
    [
     tf.keras.layers.GRU(32, input_shape=(5, 10), name='gru'),
     tf.keras.layers.Dense(10)
    ]
)

我们将如何打印一些有关 stoges = model.get_layer（'gru'）的元数据。get_weights（）。这给出了，

Number of arrays in weights: 3
Shape of each array in weights: [(10, 96), (32, 96), (2, 96)]

让我们回到 grucell 定义的权重。我们得到了，

self.kernel = self.add_weight(
    shape=(input_dim, self.units * 3),
    ...
)
self.recurrent_kernel = self.add_weight(
    shape=(self.units, self.units * 3),
    ...
)

    ...
    bias_shape = (2, 3 * self.units)
    self.bias = self.add_weight(
        shape=bias_shape,
        ...
    )

这就是您所看到的重量（按此顺序）。这就是为什么它们会像这样。 gru计算概述在这里。

strape 中的第一个矩阵（Shape [10，96] ）是 WZ | WR | WR | WH （按此顺序）。这些都是 [10，32] 大小的张量。串联给出了 [10，32*3 = 96] 大小的张量。

同样，第二个矩阵是 uz | ur | uh 的串联。这些都是 [32，32] 大小的张量，在串联后变成 [32，96] 。
您可以看到他们如何将此组合的权重矩阵打破 z ， r 和 h 组件

终于偏见了。它包含2个偏见，即 [2，96] 尺寸张量； recurrent_bias 。同样，所有门/权重的偏见都合并为一个张量。通常，仅使用 input_bias 。但是，如果您拥有 reset_after （确定如何应用重置门）设置为 true ，则使用 recurrent_bias 使用。这是一个实现细节。

Best way I know would be to track the add_weight() calls in the build() function of the GRUCell.

Let's take an example model,

model = tf.keras.models.Sequential(
    [
     tf.keras.layers.GRU(32, input_shape=(5, 10), name='gru'),
     tf.keras.layers.Dense(10)
    ]
)

How we'll print some metadata about what's returned by weights = model.get_layer('gru').get_weights(). Which gives,

Number of arrays in weights: 3
Shape of each array in weights: [(10, 96), (32, 96), (2, 96)]

Let's go back to what weights defined by the GRUCell. We got,

self.kernel = self.add_weight(
    shape=(input_dim, self.units * 3),
    ...
)
self.recurrent_kernel = self.add_weight(
    shape=(self.units, self.units * 3),
    ...
)

    ...
    bias_shape = (2, 3 * self.units)
    self.bias = self.add_weight(
        shape=bias_shape,
        ...
    )

This is what you're seeing as weights (in that order). Here's why they are shaped like this. GRU computations are outlined here.

The first matrix in weights (of shape [10, 96]) is a concatenation of Wz|Wr|Wh (in that order). Each of these is a [10, 32] sized tensor. Concatenation gives a [10, 32*3=96] sized tensor.

Similarly, the second matrix is a concatenation of Uz|Ur|Uh. Each of these is a [32, 32] sized tensor which becomes [32, 96] after concatenation.
You can see how they break this combined weight matrix to each of z, r and h components here.

Finally the bias. It contains 2 biases i.e. [2, 96] sized tensor; input_bias and recurrent_bias. Again, biases from all gates/weights are combined to a single tensor. Typically, only the input_bias is used. But if you have reset_after (decides how the reset gate is applied) set to True, then the recurrent_bias gets used. It's an implementation detail.