在Tensorflow/keras中的层归一化训练与评估模式
我正在尝试了解训练与评估模式中的层归一化行为。我认为它类似于批处理(训练更新beta/gamma值,评估都使用其全球值)。因此,当培训= true/false
时,我期望结果不同,但是结果相同。我想念什么吗?
这是工作代码:
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
tf.random.set_seed(42)
class MLPModel(tf.keras.Model):
def __init__(self):
super(MLPModel, self).__init__()
self.flatten = tf.keras.layers.Flatten(input_shape=(28, 28))
self.dense1 = tf.keras.layers.Dense(128, activation='relu')
self.dense2 = tf.keras.layers.Dense(10)
self.norm = tf.keras.layers.LayerNormalization()
def call(self, inputs):
x = self.flatten(inputs)
x = self.dense1(x)
x = self.dense2(x)
x = self.norm(x)
return x
model = MLPModel()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
model.fit(x_train, y_train, epochs=2)
当我使用训练= true vs false
时,结果是相同的。但是,当我用batchnorm替换层时,结果是预期的。
model(x_train[:1], training=False).numpy()
model(x_train[:1], training=True).numpy()
I am trying to understand how Layer Normalization behaves in training vs evaluation mode. I thought it is similar to Batch Normalization (training updates the beta/gamma values and evaluation uses their global values). Hence I was expecting different results when training=True/False
but the results are same. Am I missing something?
Here is the working code:
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
tf.random.set_seed(42)
class MLPModel(tf.keras.Model):
def __init__(self):
super(MLPModel, self).__init__()
self.flatten = tf.keras.layers.Flatten(input_shape=(28, 28))
self.dense1 = tf.keras.layers.Dense(128, activation='relu')
self.dense2 = tf.keras.layers.Dense(10)
self.norm = tf.keras.layers.LayerNormalization()
def call(self, inputs):
x = self.flatten(inputs)
x = self.dense1(x)
x = self.dense2(x)
x = self.norm(x)
return x
model = MLPModel()
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
model.compile(optimizer='adam', loss=loss_fn, metrics=['accuracy'])
model.fit(x_train, y_train, epochs=2)
When I use training=True vs False
, the results are same. But when I replace the layer with BatchNorm, the results are different as expected.
model(x_train[:1], training=False).numpy()
model(x_train[:1], training=True).numpy()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论