在Tensorflow/keras中的层归一化训练与评估模式

发布于 2025-02-13 09:15:01 字数 1287 浏览 2 评论 0原文

我正在尝试了解训练与评估模式中的层归一化行为。我认为它类似于批处理(训练更新beta/gamma值,评估都使用其全球值)。因此,当培训= true/false时,我期望结果不同,但是结果相同。我想念什么吗?

这是工作代码:

import tensorflow as tf

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

tf.random.set_seed(42)

class MLPModel(tf.keras.Model):
    def __init__(self):
        super(MLPModel, self).__init__()
        
        self.flatten = tf.keras.layers.Flatten(input_shape=(28, 28))
        self.dense1 = tf.keras.layers.Dense(128, activation='relu')
        self.dense2 = tf.keras.layers.Dense(10)
        self.norm = tf.keras.layers.LayerNormalization()
        

    def call(self, inputs):
        x = self.flatten(inputs)
        x = self.dense1(x)
        x = self.dense2(x)
        x = self.norm(x)
        return x

model = MLPModel()

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])

model.fit(x_train, y_train, epochs=2)

当我使用训练= true vs false时,结果是相同的。但是,当我用batchnorm替换层时,结果是预期的。

model(x_train[:1], training=False).numpy()
model(x_train[:1], training=True).numpy()

I am trying to understand how Layer Normalization behaves in training vs evaluation mode. I thought it is similar to Batch Normalization (training updates the beta/gamma values and evaluation uses their global values). Hence I was expecting different results when training=True/False but the results are same. Am I missing something?

Here is the working code:

import tensorflow as tf

mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

tf.random.set_seed(42)

class MLPModel(tf.keras.Model):
    def __init__(self):
        super(MLPModel, self).__init__()
        
        self.flatten = tf.keras.layers.Flatten(input_shape=(28, 28))
        self.dense1 = tf.keras.layers.Dense(128, activation='relu')
        self.dense2 = tf.keras.layers.Dense(10)
        self.norm = tf.keras.layers.LayerNormalization()
        

    def call(self, inputs):
        x = self.flatten(inputs)
        x = self.dense1(x)
        x = self.dense2(x)
        x = self.norm(x)
        return x

model = MLPModel()

loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])

model.fit(x_train, y_train, epochs=2)

When I use training=True vs False, the results are same. But when I replace the layer with BatchNorm, the results are different as expected.

model(x_train[:1], training=False).numpy()
model(x_train[:1], training=True).numpy()

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文