有关张量流模型可重复性的问题

发布于 2025-02-02 16:04:25 字数 1024 浏览 3 评论 0 原文

我目前正在开发TensorFlow模型，并遇到了有关其可重复性的问题。

我构建了一个以恒定值初始化的简单密集模型，并用虚拟数据训练。

import tensorflow as tf
weight_init = tf.keras.initializers.Constant(value=0.001)

inputs = tf.keras.Input(shape=(5,))
layer1 = tf.keras.layers.Dense(5, 
                           activation=tf.nn.leaky_relu,
                           kernel_initializer=weight_init,
                           bias_initializer=weight_init)
outputs = layer1(inputs)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="test")

model.compile(loss='mse',
               optimizer = 'Adam'
               )
model.fit([[111,1.02,-1.98231,1,1],
      [112,1.02,-1.98231,1,1],
      [113,1.02,-1.98231,1,1],
      [114,1.02,-1.98231,1,1],
      [115,1.02,-1.98231,1,1],
      [116,1.02,-1.98231,1,1],
      [117,1.02,-1.98231,1,1]], 
      
      [1,1,1,1,1,1,2], epochs = 3, batch_size=1)

即使我将模型的初始价值设置为0.001，但每次尝试的训练损失都会发生变化。...

我在这里缺少什么？我还有其他值可以解决吗？

更令人惊讶的是，如果我将batch_size更改为16，损失不会随着尝试而改变

..教我伙计们...

原文

I am currently working on tensorflow model and got a issue about its reproducibility.

I built a simple Dense model initialized with a constant value and trained with dummy data.

import tensorflow as tf
weight_init = tf.keras.initializers.Constant(value=0.001)

inputs = tf.keras.Input(shape=(5,))
layer1 = tf.keras.layers.Dense(5, 
                           activation=tf.nn.leaky_relu,
                           kernel_initializer=weight_init,
                           bias_initializer=weight_init)
outputs = layer1(inputs)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="test")

model.compile(loss='mse',
               optimizer = 'Adam'
               )
model.fit([[111,1.02,-1.98231,1,1],
      [112,1.02,-1.98231,1,1],
      [113,1.02,-1.98231,1,1],
      [114,1.02,-1.98231,1,1],
      [115,1.02,-1.98231,1,1],
      [116,1.02,-1.98231,1,1],
      [117,1.02,-1.98231,1,1]], 
      
      [1,1,1,1,1,1,2], epochs = 3, batch_size=1)

Even though I set initial value of model as 0.001, loss of training changes with every attempt....

What am I missing here? Are there any additional values for me to fix to constant?

What is more surprising is that if I change batch_size to 16, loss doesn't change with attempt

Please.. teach me guys...

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

ぇ气 2025-02-09 16:04:25

由于 keras.model.fit（）具有默认的kwarg shuffle = true ，因此数据将被散布。如果将 batch_size 更改为比数据长度大的整数，则任何混音都将无效，因为只剩下一批。

因此，在 mode.fit（）中添加 shuffle = false 将在此处实现可重复性。

此外，如果您的模型越来越大，真正的可重复性问题将出现，即，即使您没有随机或洗牌，两个连续计算的结果也会有一些错误，但只需单击“运行”，然后
单击运行。我们将其绘制为确定性用于可重复性。

确定性是一个很好的问题，通常很容易被许多用户忽略。
让我们从结论开始，即，可重复性是影响：

随机种子（ operation_seed + sideen_global_seed ）
操作确定性

如何做？在构建或还原模型之前添加以下代码。

tf.keras.utils.set_random_seed(some_seed)
tf.config.experimental.enable_op_determinism()

但是，仅当您严重依赖重复性时，才可以使用它，因为 tf.config.experiment.enable.enable_op_determinism（）将大大降低速度。更深层次的原因是，硬件降低了一些准确性以加快计算，这通常不会影响我们的算法。除了深度学习之外，模型非常大，很容易出现领先的错误，并且训练周期非常长，累积的错误。如果在回归模型中，任何额外的错误都是不可接受的，因此我们需要确定性算法。

Since keras.model.fit() has default kwarg shuffle=True, data will be shuffled cross batch. If you change batch_size to any integer that larger than data length, any shuffle will be invalid, because there left only one batch.

So, add shuffle=False in model.fit() will achieve reproducibility here.

Additionally, if your model grows bigger, real reproducibility problem will arise, i.e, there will be slight error in the results of two successive calculations, even though you do no random or shuffle, but just click run, then
click run. We draw this as determinism for reproducibility.

Determinism is a good question that usually easily ignored by many users.
Let's start with the conclusion, i.e., reproducibility is influence by:

random seed (operation_seed+hidden_global_seed)
operation determinism

How to do? Tensorflow determinism has declared precisicely, i.e, add the following codes before building or restoring the model.

tf.keras.utils.set_random_seed(some_seed)
tf.config.experimental.enable_op_determinism()

But it can be used only if you rely heavily on reproducibility, since tf.config.experimental.enable_op_determinism() will reduce the speed significantly. The deeper reason is that, hardware reduces some accuracy in order to speed up the calculation, which usually does not affect our algorithm. Except in the deep learning, the model is very large, leading errors occured easily, and the training cycle is very long, leading accumulated errors. If in a regression model, any extra error is unacceptable, so we need deterministic algorithm in this occasion.

回复收藏 0 原文

~没有更多了~