关于输入的神经网络输出的部分导数

发布于 2025-02-03 17:45:07 字数 3559 浏览 3 评论 0原文

我已经训练了一个深层神经网络，用于回归，具有2个输入神经元，1个输出神经元和一些隐藏层，如以下（Tensorflow 2）：

import numpy as np
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import losses
import tensorflow as tf
import pandas as pd
import sklearn
from sklearn.model_selection import train_test_split

from tensorflow import keras
from tensorflow.keras import layers


#Creation of a "synthetic" dataset

x1 = np.linspace(0, 6*np.pi, 2000)
x2 = 1.5 * np.linspace(0, 6*np.pi, 2000)
y = np.sin(x1) + np.cos(x2)

data = pd.DataFrame(np.array([x1, x2, y]).transpose(), columns = ['x1', 'x2', 'y'])


# train/test split and definition of the normalization over the training set 

train_df, test_df = train_test_split(data, test_size=0.2, random_state=0)
normalizer = tf.keras.layers.Normalization(axis=-1)
normalizer.adapt(np.array(train_df.iloc[:, :-1]))


#Definition of the DNN structure

def build_and_compile_model(norm):
    model = keras.Sequential([
      norm,
      layers.Dense(64, input_dim=2, activation='LeakyReLU'),
      layers.Dense(64, activation='LeakyReLU'),
      layers.Dense(32, activation='LeakyReLU'),
      layers.Dense(32, activation='LeakyReLU'),
      layers.Dense(16, activation='LeakyReLU'),
      layers.Dense(16, activation='LeakyReLU'),
      layers.Dense(8, activation='LeakyReLU'),
    
      layers.Dense(1, activation = 'linear')
  ])

    model.compile(loss='mean_absolute_error',
                optimizer=tf.keras.optimizers.Adam(0.001))
    return model

model = build_and_compile_model(normalizer)


# Train of the DNN

%%time
history = model.fit(
    train_df.iloc[:, :-1],
    train_df.iloc[:, -1],
    validation_split=0.2,
    verbose=2, epochs=100)

现在，如果y是网络的预测，我想计算偏导数dy dy dy /dx1和dy/dx2。为了实现这一目标，如果我将y绘制为x1（或x2）的函数，我已经尝试了

x = tf.constant(data.iloc[:, :-1].values)

with tf.GradientTape(persistent = True) as t:
   t.watch(x)
   y = model(x)
   
dy_dx = t.gradient(y, x)
dy_dx.numpy()

，并且将其与上面给出的定义的分析结果进行了比较，我得到了一个很好的协议：

plt.figure(figsize = (5, 3), dpi = 190)
plt.plot(x1, model.predict(x), label = 'model prediction')
plt.plot(x1, np.sin(x1) + np.cos(x2), label = 'analytical result')
plt.xlabel('$x_1$')
plt.legend()
plt.show()

相反，如果我绘制矢量dy_dx的第一列，并且将其与分析衍生物（dy/dx1 = cos（x1））进行比较，则它们不匹配（其他部分衍生化的情况类似）：

plt.figure(figsize = (5, 3), dpi = 190)
plt.plot(x1, dy_dx[:, 0], label = 'autodiff result')
plt.plot(x1, np.cos(x1), label = 'analytical result')
plt.xlabel('$x_1$')
plt.legend()
plt.show()

如果我将这个梯度与有限差异进行比较，我会得到

plt.figure(figsize = (5, 3), dpi = 190)
plt.plot(x1, dy_dx[:, 0], label = 'autodiff result')
plt.plot(x1[0:-1], np.diff(y.numpy()[:, 0])/.1e-1, label = 'finite differences')
plt.xlabel('$x_1$')
plt.legend()
plt.show()

因此，由于自动式差异和有限差结果等于缩放常数，因此这意味着自动iff是不是计算部分导数DY/DX1，但它仅计算总导数，将其绘制在其中一个变量上。

因此，我的问题仍然是：如何计算部分衍生物？

原文

I have trained a deep neural network for regression, with 2 input neurons, 1 output neuron and some hidden layers, as in the following (Tensorflow 2):

import numpy as np
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import losses
import tensorflow as tf
import pandas as pd
import sklearn
from sklearn.model_selection import train_test_split

from tensorflow import keras
from tensorflow.keras import layers


#Creation of a "synthetic" dataset

x1 = np.linspace(0, 6*np.pi, 2000)
x2 = 1.5 * np.linspace(0, 6*np.pi, 2000)
y = np.sin(x1) + np.cos(x2)

data = pd.DataFrame(np.array([x1, x2, y]).transpose(), columns = ['x1', 'x2', 'y'])


# train/test split and definition of the normalization over the training set 

train_df, test_df = train_test_split(data, test_size=0.2, random_state=0)
normalizer = tf.keras.layers.Normalization(axis=-1)
normalizer.adapt(np.array(train_df.iloc[:, :-1]))


#Definition of the DNN structure

def build_and_compile_model(norm):
    model = keras.Sequential([
      norm,
      layers.Dense(64, input_dim=2, activation='LeakyReLU'),
      layers.Dense(64, activation='LeakyReLU'),
      layers.Dense(32, activation='LeakyReLU'),
      layers.Dense(32, activation='LeakyReLU'),
      layers.Dense(16, activation='LeakyReLU'),
      layers.Dense(16, activation='LeakyReLU'),
      layers.Dense(8, activation='LeakyReLU'),
    
      layers.Dense(1, activation = 'linear')
  ])

    model.compile(loss='mean_absolute_error',
                optimizer=tf.keras.optimizers.Adam(0.001))
    return model

model = build_and_compile_model(normalizer)


# Train of the DNN

%%time
history = model.fit(
    train_df.iloc[:, :-1],
    train_df.iloc[:, -1],
    validation_split=0.2,
    verbose=2, epochs=100)

Now, if y is the prediction of the network, I want to compute partial derivatives dy/dx1 and dy/dx2. To achieve this, I have tried

x = tf.constant(data.iloc[:, :-1].values)

with tf.GradientTape(persistent = True) as t:
   t.watch(x)
   y = model(x)
   
dy_dx = t.gradient(y, x)
dy_dx.numpy()

If I plot the y as a function of x1 (or of x2), and I compare it with the analytical result from the definition I have given above, I get a good agreement:

plt.figure(figsize = (5, 3), dpi = 190)
plt.plot(x1, model.predict(x), label = 'model prediction')
plt.plot(x1, np.sin(x1) + np.cos(x2), label = 'analytical result')
plt.xlabel('$x_1

On the contrary, if I plot the first column of the vector dy_dx and I compare it with the analytical derivative (dy/dx1 = cos(x1)), they do not match (similar situation for the other partial derivative):
plt.figure(figsize = (5, 3), dpi = 190)
plt.plot(x1, dy_dx[:, 0], label = 'autodiff result')
plt.plot(x1, np.cos(x1), label = 'analytical result')
plt.xlabel('$x_1

If I compare this gradient with the finite differences, I get
plt.figure(figsize = (5, 3), dpi = 190)
plt.plot(x1, dy_dx[:, 0], label = 'autodiff result')
plt.plot(x1[0:-1], np.diff(y.numpy()[:, 0])/.1e-1, label = 'finite differences')
plt.xlabel('$x_1

So, since the autodiff result and the finite difference result are equal up to a scaling constant, this means that autodiff is not computing the partial derivative dy/dx1, but it is only computing the total derivative, plotting it over one of the variables.
So, my question remains: how to compute partial derivatives?
)
plt.legend()
plt.show()


On the contrary, if I plot the first column of the vector dy_dx and I compare it with the analytical derivative (dy/dx1 = cos(x1)), they do not match (similar situation for the other partial derivative):


If I compare this gradient with the finite differences, I get


So, since the autodiff result and the finite difference result are equal up to a scaling constant, this means that autodiff is not computing the partial derivative dy/dx1, but it is only computing the total derivative, plotting it over one of the variables.
So, my question remains: how to compute partial derivatives?
)
plt.legend()
plt.show()


If I compare this gradient with the finite differences, I get


So, since the autodiff result and the finite difference result are equal up to a scaling constant, this means that autodiff is not computing the partial derivative dy/dx1, but it is only computing the total derivative, plotting it over one of the variables.
So, my question remains: how to compute partial derivatives?
)
plt.legend()
plt.show()

On the contrary, if I plot the first column of the vector dy_dx and I compare it with the analytical derivative (dy/dx1 = cos(x1)), they do not match (similar situation for the other partial derivative):

If I compare this gradient with the finite differences, I get

So, since the autodiff result and the finite difference result are equal up to a scaling constant, this means that autodiff is not computing the partial derivative dy/dx1, but it is only computing the total derivative, plotting it over one of the variables.