关于输入的神经网络输出的部分导数
我已经训练了一个深层神经网络,用于回归,具有2个输入神经元,1个输出神经元和一些隐藏层,如以下(Tensorflow 2):
import numpy as np
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import losses
import tensorflow as tf
import pandas as pd
import sklearn
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras import layers
#Creation of a "synthetic" dataset
x1 = np.linspace(0, 6*np.pi, 2000)
x2 = 1.5 * np.linspace(0, 6*np.pi, 2000)
y = np.sin(x1) + np.cos(x2)
data = pd.DataFrame(np.array([x1, x2, y]).transpose(), columns = ['x1', 'x2', 'y'])
# train/test split and definition of the normalization over the training set
train_df, test_df = train_test_split(data, test_size=0.2, random_state=0)
normalizer = tf.keras.layers.Normalization(axis=-1)
normalizer.adapt(np.array(train_df.iloc[:, :-1]))
#Definition of the DNN structure
def build_and_compile_model(norm):
model = keras.Sequential([
norm,
layers.Dense(64, input_dim=2, activation='LeakyReLU'),
layers.Dense(64, activation='LeakyReLU'),
layers.Dense(32, activation='LeakyReLU'),
layers.Dense(32, activation='LeakyReLU'),
layers.Dense(16, activation='LeakyReLU'),
layers.Dense(16, activation='LeakyReLU'),
layers.Dense(8, activation='LeakyReLU'),
layers.Dense(1, activation = 'linear')
])
model.compile(loss='mean_absolute_error',
optimizer=tf.keras.optimizers.Adam(0.001))
return model
model = build_and_compile_model(normalizer)
# Train of the DNN
%%time
history = model.fit(
train_df.iloc[:, :-1],
train_df.iloc[:, -1],
validation_split=0.2,
verbose=2, epochs=100)
现在,如果y是网络的预测,我想计算偏导数dy dy dy /dx1和dy/dx2。为了实现这一目标,如果我将y绘制为x1(或x2)的函数,我已经尝试了
x = tf.constant(data.iloc[:, :-1].values)
with tf.GradientTape(persistent = True) as t:
t.watch(x)
y = model(x)
dy_dx = t.gradient(y, x)
dy_dx.numpy()
,并且将其与上面给出的定义的分析结果进行了比较,我得到了一个很好的协议:
plt.figure(figsize = (5, 3), dpi = 190)
plt.plot(x1, model.predict(x), label = 'model prediction')
plt.plot(x1, np.sin(x1) + np.cos(x2), label = 'analytical result')
plt.xlabel('$x_1$')
plt.legend()
plt.show()
相反,如果我绘制矢量dy_dx的第一列,并且将其与分析衍生物(dy/dx1 = cos(x1))进行比较,则它们不匹配(其他部分衍生化的情况类似):
plt.figure(figsize = (5, 3), dpi = 190)
plt.plot(x1, dy_dx[:, 0], label = 'autodiff result')
plt.plot(x1, np.cos(x1), label = 'analytical result')
plt.xlabel('$x_1$')
plt.legend()
plt.show()
如果我将这个梯度与有限差异进行比较,我会得到
plt.figure(figsize = (5, 3), dpi = 190)
plt.plot(x1, dy_dx[:, 0], label = 'autodiff result')
plt.plot(x1[0:-1], np.diff(y.numpy()[:, 0])/.1e-1, label = 'finite differences')
plt.xlabel('$x_1$')
plt.legend()
plt.show()
因此,由于自动式差异和有限差结果等于缩放常数,因此这意味着自动iff是不是计算部分导数DY/DX1,但它仅计算总导数,将其绘制在其中一个变量上。
因此,我的问题仍然是:如何计算部分衍生物?
I have trained a deep neural network for regression, with 2 input neurons, 1 output neuron and some hidden layers, as in the following (Tensorflow 2):
import numpy as np
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import losses
import tensorflow as tf
import pandas as pd
import sklearn
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras import layers
#Creation of a "synthetic" dataset
x1 = np.linspace(0, 6*np.pi, 2000)
x2 = 1.5 * np.linspace(0, 6*np.pi, 2000)
y = np.sin(x1) + np.cos(x2)
data = pd.DataFrame(np.array([x1, x2, y]).transpose(), columns = ['x1', 'x2', 'y'])
# train/test split and definition of the normalization over the training set
train_df, test_df = train_test_split(data, test_size=0.2, random_state=0)
normalizer = tf.keras.layers.Normalization(axis=-1)
normalizer.adapt(np.array(train_df.iloc[:, :-1]))
#Definition of the DNN structure
def build_and_compile_model(norm):
model = keras.Sequential([
norm,
layers.Dense(64, input_dim=2, activation='LeakyReLU'),
layers.Dense(64, activation='LeakyReLU'),
layers.Dense(32, activation='LeakyReLU'),
layers.Dense(32, activation='LeakyReLU'),
layers.Dense(16, activation='LeakyReLU'),
layers.Dense(16, activation='LeakyReLU'),
layers.Dense(8, activation='LeakyReLU'),
layers.Dense(1, activation = 'linear')
])
model.compile(loss='mean_absolute_error',
optimizer=tf.keras.optimizers.Adam(0.001))
return model
model = build_and_compile_model(normalizer)
# Train of the DNN
%%time
history = model.fit(
train_df.iloc[:, :-1],
train_df.iloc[:, -1],
validation_split=0.2,
verbose=2, epochs=100)
Now, if y is the prediction of the network, I want to compute partial derivatives dy/dx1 and dy/dx2. To achieve this, I have tried
x = tf.constant(data.iloc[:, :-1].values)
with tf.GradientTape(persistent = True) as t:
t.watch(x)
y = model(x)
dy_dx = t.gradient(y, x)
dy_dx.numpy()
If I plot the y as a function of x1 (or of x2), and I compare it with the analytical result from the definition I have given above, I get a good agreement:
plt.figure(figsize = (5, 3), dpi = 190)
plt.plot(x1, model.predict(x), label = 'model prediction')
plt.plot(x1, np.sin(x1) + np.cos(x2), label = 'analytical result')
plt.xlabel('$x_1
On the contrary, if I plot the first column of the vector dy_dx and I compare it with the analytical derivative (dy/dx1 = cos(x1)), they do not match (similar situation for the other partial derivative):
plt.figure(figsize = (5, 3), dpi = 190)
plt.plot(x1, dy_dx[:, 0], label = 'autodiff result')
plt.plot(x1, np.cos(x1), label = 'analytical result')
plt.xlabel('$x_1
If I compare this gradient with the finite differences, I get
plt.figure(figsize = (5, 3), dpi = 190)
plt.plot(x1, dy_dx[:, 0], label = 'autodiff result')
plt.plot(x1[0:-1], np.diff(y.numpy()[:, 0])/.1e-1, label = 'finite differences')
plt.xlabel('$x_1
So, since the autodiff result and the finite difference result are equal up to a scaling constant, this means that autodiff is not computing the partial derivative dy/dx1, but it is only computing the total derivative, plotting it over one of the variables.
So, my question remains: how to compute partial derivatives?
)
plt.legend()
plt.show()
On the contrary, if I plot the first column of the vector dy_dx and I compare it with the analytical derivative (dy/dx1 = cos(x1)), they do not match (similar situation for the other partial derivative):
If I compare this gradient with the finite differences, I get
So, since the autodiff result and the finite difference result are equal up to a scaling constant, this means that autodiff is not computing the partial derivative dy/dx1, but it is only computing the total derivative, plotting it over one of the variables.
So, my question remains: how to compute partial derivatives?
)
plt.legend()
plt.show()
If I compare this gradient with the finite differences, I get
So, since the autodiff result and the finite difference result are equal up to a scaling constant, this means that autodiff is not computing the partial derivative dy/dx1, but it is only computing the total derivative, plotting it over one of the variables.
So, my question remains: how to compute partial derivatives?
)
plt.legend()
plt.show()
On the contrary, if I plot the first column of the vector dy_dx and I compare it with the analytical derivative (dy/dx1 = cos(x1)), they do not match (similar situation for the other partial derivative):
If I compare this gradient with the finite differences, I get
So, since the autodiff result and the finite difference result are equal up to a scaling constant, this means that autodiff is not computing the partial derivative dy/dx1, but it is only computing the total derivative, plotting it over one of the variables.
So, my question remains: how to compute partial derivatives?
) plt.legend() plt.show()So, since the autodiff result and the finite difference result are equal up to a scaling constant, this means that autodiff is not computing the partial derivative dy/dx1, but it is only computing the total derivative, plotting it over one of the variables.
So, my question remains: how to compute partial derivatives?
) plt.legend() plt.show()On the contrary, if I plot the first column of the vector dy_dx and I compare it with the analytical derivative (dy/dx1 = cos(x1)), they do not match (similar situation for the other partial derivative):
If I compare this gradient with the finite differences, I get
So, since the autodiff result and the finite difference result are equal up to a scaling constant, this means that autodiff is not computing the partial derivative dy/dx1, but it is only computing the total derivative, plotting it over one of the variables.
So, my question remains: how to compute partial derivatives?
) plt.legend() plt.show()If I compare this gradient with the finite differences, I get
So, since the autodiff result and the finite difference result are equal up to a scaling constant, this means that autodiff is not computing the partial derivative dy/dx1, but it is only computing the total derivative, plotting it over one of the variables.
So, my question remains: how to compute partial derivatives?
) plt.legend() plt.show()On the contrary, if I plot the first column of the vector dy_dx and I compare it with the analytical derivative (dy/dx1 = cos(x1)), they do not match (similar situation for the other partial derivative):
If I compare this gradient with the finite differences, I get
So, since the autodiff result and the finite difference result are equal up to a scaling constant, this means that autodiff is not computing the partial derivative dy/dx1, but it is only computing the total derivative, plotting it over one of the variables.
So, my question remains: how to compute partial derivatives?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论