我无法解释GRU层的get_weights的结果。这是我的代码 -
#Modified from - https://machinelearningmastery.com/understanding-simple-recurrent-neural-networks-in-keras/
from pandas import read_csv
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN, GRU
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import math
import matplotlib.pyplot as plt
model = Sequential()
model.add(GRU(units = 2, input_shape = (3,1), activation = 'linear'))
model.add(Dense(units = 1, activation = 'linear'))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')
initial_weights = model.layers[0].get_weights()
print("Shape = ",initial_weights)
我熟悉GRU概念。此外,我了解get_weights如何适用于Keras Simple RNN层,其中第一个阵列代表输入权重,第二个数组代表激活权重的第二个阵列,而第三个则是偏差。但是,我因GRU输出而迷失了方向,这是在下面给出的 -
Shape = [array([[-0.64266175, -0.0870676 , -0.25356603, -0.03685969, 0.22260845,
-0.04923642]], dtype=float32), array([[ 0.01929092, -0.4932567 , 0.3723044 , -0.6559699 , -0.33790302,
0.27062896],
[-0.4214194 , 0.46456426, 0.27233726, -0.00461334, -0.6533575 ,
-0.32483965]], dtype=float32), array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]], dtype=float32)]
我假设它与Gru Gates有关。
更新:7/4-此说Keras Gru有3个大门,更新,重置和输出。但是,基于 /a>,GRU不应具有输出门。
I am unable to interpret the results of get_weights from a GRU layer. Here's my code -
#Modified from - https://machinelearningmastery.com/understanding-simple-recurrent-neural-networks-in-keras/
from pandas import read_csv
import numpy as np
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN, GRU
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import math
import matplotlib.pyplot as plt
model = Sequential()
model.add(GRU(units = 2, input_shape = (3,1), activation = 'linear'))
model.add(Dense(units = 1, activation = 'linear'))
model.compile(loss = 'mean_squared_error', optimizer = 'adam')
initial_weights = model.layers[0].get_weights()
print("Shape = ",initial_weights)
I am familiar with GRU concepts. In addition, I understand how the get_weights work for Keras Simple RNN layer, where the first array represents the input weights, the second the activation weights and the third the bias. However, I am lost with output of GRU, which is given below -
Shape = [array([[-0.64266175, -0.0870676 , -0.25356603, -0.03685969, 0.22260845,
-0.04923642]], dtype=float32), array([[ 0.01929092, -0.4932567 , 0.3723044 , -0.6559699 , -0.33790302,
0.27062896],
[-0.4214194 , 0.46456426, 0.27233726, -0.00461334, -0.6533575 ,
-0.32483965]], dtype=float32), array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]], dtype=float32)]
I am assuming it has something to do with GRU gates.
Update:7/4 - This page says that keras GRU has 3 gates, update, reset and output. However, based on this, GRU shouldn't have the output gate.
发布评论
评论(1)
Best way I know would be to track the
add_weight()
calls in the让我们以示例模型为例,
我们将如何打印一些有关
stoges = model.get_layer('gru')的元数据。get_weights()
。这给出了,让我们回到
grucell
定义的权重。我们得到了,这就是您所看到的重量(按此顺序)。这就是为什么它们会像这样。 gru计算概述在这里。
strape
中的第一个矩阵(Shape[10,96]
)是WZ | WR | WR | WH
(按此顺序)。这些都是[10,32]
大小的张量。串联给出了[10,32*3 = 96]
大小的张量。同样,第二个矩阵是
uz | ur | uh
的串联。这些都是[32,32]
大小的张量,在串联后变成[32,96]
。您可以看到他们如何将此组合的权重矩阵打破
z
,r
和h
组件终于偏见了。它包含2个偏见,即
[2,96]
尺寸张量; recurrent_bias 。同样,所有门/权重的偏见都合并为一个张量。通常,仅使用input_bias
。但是,如果您拥有reset_after
(确定如何应用重置门)设置为true
,则使用recurrent_bias
使用。这是一个实现细节。Best way I know would be to track the
add_weight()
calls in thebuild()
function of theGRUCell
.Let's take an example model,
How we'll print some metadata about what's returned by
weights = model.get_layer('gru').get_weights()
. Which gives,Let's go back to what weights defined by the
GRUCell
. We got,This is what you're seeing as weights (in that order). Here's why they are shaped like this. GRU computations are outlined here.
The first matrix in
weights
(of shape[10, 96]
) is a concatenation ofWz|Wr|Wh
(in that order). Each of these is a[10, 32]
sized tensor. Concatenation gives a[10, 32*3=96]
sized tensor.Similarly, the second matrix is a concatenation of
Uz|Ur|Uh
. Each of these is a[32, 32]
sized tensor which becomes[32, 96]
after concatenation.You can see how they break this combined weight matrix to each of
z
,r
andh
components here.Finally the bias. It contains 2 biases i.e.
[2, 96]
sized tensor;input_bias
andrecurrent_bias
. Again, biases from all gates/weights are combined to a single tensor. Typically, only theinput_bias
is used. But if you havereset_after
(decides how the reset gate is applied) set toTrue
, then therecurrent_bias
gets used. It's an implementation detail.