如何使用 GradientTape 计算张量中每个元素的梯度?
我想计算张量中每个元素相对于监视张量列表的梯度。
当我直接在 y
上使用 GradientTape 的 gradient()
时,生成的 dy_dx
具有我的 x
的尺寸。例如:结果
x = [ tf.constant(3.0), tf.constant(4.0), tf.constant(5.0) ]
with tf.GradientTape(persistent=True) as g:
g.watch(x)
y_as_list = [ x[0]*x[1]*x[2], x[0]*x[1]*x[2]*x[0] ]
y_as_tensor = tf.stack(y_as_list, axis=0)
print("---------------------------")
print("x:", x)
print("y:", y_as_tensor)
print("y:", y_as_list)
dy_dx_from_tensor = g.gradient(y_as_tensor, x, unconnected_gradients=tf.UnconnectedGradients.ZERO)
dy_dx_from_list = g.gradient(y_as_list, x, unconnected_gradients=tf.UnconnectedGradients.ZERO)
print("---------------------------")
print("dy_dx_from_tensor:", dy_dx_from_tensor)
print("dy_dx_from_list:", dy_dx_from_list)
:
---------------------------
x: [<tf.Tensor: shape=(), dtype=float32, numpy=3.0>, <tf.Tensor: shape=(), dtype=float32, numpy=4.0>, <tf.Tensor: shape=(), dtype=float32, numpy=5.0>]
y: tf.Tensor([ 60. 180.], shape=(2,), dtype=float32)
y: [<tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=180.0>]
---------------------------
dy_dx_from_tensor: [<tf.Tensor: shape=(), dtype=float32, numpy=140.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=48.0>]
dy_dx_from_list: [<tf.Tensor: shape=(), dtype=float32, numpy=140.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=48.0>]
请注意,张量和列表版本的结果与观看的 x
具有相同的维度。
当我尝试为每个元素调用磁带的梯度方法时,我想要列表,但对于张量,所有梯度都为零:
dy_dx_from_tensor_elements = [ g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_tensor ]
dy_dx_from_list_elements = [ g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_list ]
print("---------------------------")
print("dy_dx_from_tensor_elements:", dy_dx_from_tensor_elements)
print("dy_dx_from_list_elements:", dy_dx_from_list_elements)
yields:
dy_dx_from_tensor_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>]]
dy_dx_from_list_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=15.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=120.0>, <tf.Tensor: shape=(), dtype=float32, numpy=45.0>, <tf.Tensor: shape=(), dtype=float32, numpy=36.0>]]
The dy_dx_from_list_elements
值就是我正在寻找的值for 但我真的希望能够从张量中获取它们,因为我的现实世界模型将 y 值输出为张量。
任何关于如何为张量中的每个元素生成梯度的建议将不胜感激!
I would like to calculate the gradient of each element in a tensor with respect to a list of watched tensors.
When I use GradientTape's gradient()
on y
directly, the resulting dy_dx
has the dimension of my x
. For example:
x = [ tf.constant(3.0), tf.constant(4.0), tf.constant(5.0) ]
with tf.GradientTape(persistent=True) as g:
g.watch(x)
y_as_list = [ x[0]*x[1]*x[2], x[0]*x[1]*x[2]*x[0] ]
y_as_tensor = tf.stack(y_as_list, axis=0)
print("---------------------------")
print("x:", x)
print("y:", y_as_tensor)
print("y:", y_as_list)
dy_dx_from_tensor = g.gradient(y_as_tensor, x, unconnected_gradients=tf.UnconnectedGradients.ZERO)
dy_dx_from_list = g.gradient(y_as_list, x, unconnected_gradients=tf.UnconnectedGradients.ZERO)
print("---------------------------")
print("dy_dx_from_tensor:", dy_dx_from_tensor)
print("dy_dx_from_list:", dy_dx_from_list)
results in:
---------------------------
x: [<tf.Tensor: shape=(), dtype=float32, numpy=3.0>, <tf.Tensor: shape=(), dtype=float32, numpy=4.0>, <tf.Tensor: shape=(), dtype=float32, numpy=5.0>]
y: tf.Tensor([ 60. 180.], shape=(2,), dtype=float32)
y: [<tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=180.0>]
---------------------------
dy_dx_from_tensor: [<tf.Tensor: shape=(), dtype=float32, numpy=140.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=48.0>]
dy_dx_from_list: [<tf.Tensor: shape=(), dtype=float32, numpy=140.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=48.0>]
Note both the tensor and the list versions' result have the same dimension as the watched x
.
When I try to call the tape's gradient method for each element, I get want I want for the list but for the tensor all gradients are zero:
dy_dx_from_tensor_elements = [ g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_tensor ]
dy_dx_from_list_elements = [ g.gradient(y_i, x, unconnected_gradients=tf.UnconnectedGradients.ZERO) for y_i in y_as_list ]
print("---------------------------")
print("dy_dx_from_tensor_elements:", dy_dx_from_tensor_elements)
print("dy_dx_from_list_elements:", dy_dx_from_list_elements)
yeilds:
dy_dx_from_tensor_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0>]]
dy_dx_from_list_elements: [[<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=15.0>, <tf.Tensor: shape=(), dtype=float32, numpy=12.0>], [<tf.Tensor: shape=(), dtype=float32, numpy=120.0>, <tf.Tensor: shape=(), dtype=float32, numpy=45.0>, <tf.Tensor: shape=(), dtype=float32, numpy=36.0>]]
The dy_dx_from_list_elements
values are what I am looking for but I would really like to be able to get them from the tensor because my real world model outputs the y
values as a tensor.
Any suggestion to how I could generate gradients for every element in a tensor would be much appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为问题出在张量的迭代上。 tf.unstack 或类似操作可能在内部运行,并且所有 tf 操作都需要在渐变带的范围内才能考虑在内。仅计算一个张量相对于计算中涉及的另一个张量的梯度。几个例子:
当您使用 tf.split 时,同样适用:
根据 文档:
此外,tf.stack 通常是不可微分的。
I think the problem is coming from iterating over a tensor. A
tf.unstack
or similar operation might be running internally and alltf
operations need to be within the scope of the gradient tape for them to be taken into account. Gradients will be calculated only for a tensor in relation to another tensor that was involved in its calculation. A couple of examples:The same applies when you for example use
tf.split
:According to the docs:
Also,
tf.stack
is generally not differentiable.