批量尺寸> 1使用TensorFlow 1.x给出了错误
我正在使用这个 vae。
我做出的唯一区别是将损失从二进制交叉熵更改为MSE,如这样:
class OptimizerVAE(object):
def __init__(self, model, learning_rate=1e-3):
"""
OptimizerVAE initializer
:param model: a model object
:param learning_rate: float, learning rate of the optimizer
"""
# binary cross entropy error
self.bce = tf.keras.losses.mse(model.x, model.logits)
self.reconstruction_loss = tf.reduce_mean(tf.reduce_sum(self.bce, axis=-1))
if model.distribution == 'normal':
# KL divergence between normal approximate posterior and standard normal prior
self.p_z = tf.distributions.Normal(tf.zeros_like(model.z), tf.ones_like(model.z))
kl = model.q_z.kl_divergence(self.p_z)
self.kl = tf.reduce_mean(tf.reduce_sum(kl, axis=-1))*0.1
elif model.distribution == 'vmf':
# KL divergence between vMF approximate posterior and uniform hyper-spherical prior
self.p_z = HypersphericalUniform(model.z_dim - 1, dtype=model.x.dtype)
kl = model.q_z.kl_divergence(self.p_z)
self.kl = tf.reduce_mean(kl)*0.1
else:
raise NotImplemented
self.ELBO = - self.reconstruction_loss - self.kl
self.train_step = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(-self.ELBO)
self.print = {'recon loss': self.reconstruction_loss, 'ELBO': self.ELBO, 'KL': self.kl}
运行原始体系结构时,该型号的运行完美(2 MLP层),无论批处理的大小如何(指定为“无”, GitHub代码)。
我正在尝试将其更改为卷积模型,但是当我仅更改编码器时:
def _encoder(self, x):
"""
Encoder network
:param x: placeholder for input
:return: tuple `(z_mean, z_var)` with mean and concentration around the mean
"""
# 2 hidden layers encoder
#h0 = tf.layers.dense(x, units=self.h_dim * 2, activation=self.activation)
#h1 = tf.layers.dense(h0, units=self.h_dim, activation=self.activation)
h1 = tf.layers.conv1d(x, filters = 32, kernel_size = 7, activation = tf.nn.relu)
h1 = tf.layers.conv1d(h1, filters = 64, kernel_size = 7, activation =tf.nn.relu)
h1 = tf.layers.conv1d(h1, filters = 64, kernel_size = 7, activation = tf.nn.relu)
h1 = tf.layers.flatten(h1)
h1 = tf.layers.dense(h1, 32, activation = tf.nn.relu)
if self.distribution == 'normal':
# compute mean and std of the normal distribution
z_mean = tf.layers.dense(h1, units=self.z_dim, activation=None, name = 'z_output')
z_var = tf.layers.dense(h1, units=self.z_dim, activation=tf.nn.softplus)
elif self.distribution == 'vmf':
# compute mean and concentration of the von Mises-Fisher
z_mean = tf.layers.dense(h1, units=self.z_dim, activation=lambda x: tf.nn.l2_normalize(x, axis=-1))
# the `+ 1` prevent collapsing behaviors
z_var = tf.layers.dense(h1, units=1, activation=tf.nn.softplus) + 1
else:
raise NotImplemented
return z_mean, z_var
运行模型时,我会得到错误:
InvalidArgumentError: Incompatible shapes: [32,1] vs. [32,512,1]
[[{{node gradients/SquaredDifference_grad/BroadcastGradientArgs}}]]
32是运行模型时的batch_size。令人困惑的是,当我用batch_size = 1运行此操作时,模型运行!
这出错了哪里?是优化器和平均方式吗?
I am using this example of a VAE.
The only difference I made was change the loss from binary cross entropy to MSE, like this:
class OptimizerVAE(object):
def __init__(self, model, learning_rate=1e-3):
"""
OptimizerVAE initializer
:param model: a model object
:param learning_rate: float, learning rate of the optimizer
"""
# binary cross entropy error
self.bce = tf.keras.losses.mse(model.x, model.logits)
self.reconstruction_loss = tf.reduce_mean(tf.reduce_sum(self.bce, axis=-1))
if model.distribution == 'normal':
# KL divergence between normal approximate posterior and standard normal prior
self.p_z = tf.distributions.Normal(tf.zeros_like(model.z), tf.ones_like(model.z))
kl = model.q_z.kl_divergence(self.p_z)
self.kl = tf.reduce_mean(tf.reduce_sum(kl, axis=-1))*0.1
elif model.distribution == 'vmf':
# KL divergence between vMF approximate posterior and uniform hyper-spherical prior
self.p_z = HypersphericalUniform(model.z_dim - 1, dtype=model.x.dtype)
kl = model.q_z.kl_divergence(self.p_z)
self.kl = tf.reduce_mean(kl)*0.1
else:
raise NotImplemented
self.ELBO = - self.reconstruction_loss - self.kl
self.train_step = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(-self.ELBO)
self.print = {'recon loss': self.reconstruction_loss, 'ELBO': self.ELBO, 'KL': self.kl}
and when running the original architecture, the model runs perfectly (2 MLP layers), no matter the size of the batches (specified as "None" in the github code).
I am trying to change this to a convolutional model, but when I change just the encoder to this:
def _encoder(self, x):
"""
Encoder network
:param x: placeholder for input
:return: tuple `(z_mean, z_var)` with mean and concentration around the mean
"""
# 2 hidden layers encoder
#h0 = tf.layers.dense(x, units=self.h_dim * 2, activation=self.activation)
#h1 = tf.layers.dense(h0, units=self.h_dim, activation=self.activation)
h1 = tf.layers.conv1d(x, filters = 32, kernel_size = 7, activation = tf.nn.relu)
h1 = tf.layers.conv1d(h1, filters = 64, kernel_size = 7, activation =tf.nn.relu)
h1 = tf.layers.conv1d(h1, filters = 64, kernel_size = 7, activation = tf.nn.relu)
h1 = tf.layers.flatten(h1)
h1 = tf.layers.dense(h1, 32, activation = tf.nn.relu)
if self.distribution == 'normal':
# compute mean and std of the normal distribution
z_mean = tf.layers.dense(h1, units=self.z_dim, activation=None, name = 'z_output')
z_var = tf.layers.dense(h1, units=self.z_dim, activation=tf.nn.softplus)
elif self.distribution == 'vmf':
# compute mean and concentration of the von Mises-Fisher
z_mean = tf.layers.dense(h1, units=self.z_dim, activation=lambda x: tf.nn.l2_normalize(x, axis=-1))
# the `+ 1` prevent collapsing behaviors
z_var = tf.layers.dense(h1, units=1, activation=tf.nn.softplus) + 1
else:
raise NotImplemented
return z_mean, z_var
and when running the model, I get the error:
InvalidArgumentError: Incompatible shapes: [32,1] vs. [32,512,1]
[[{{node gradients/SquaredDifference_grad/BroadcastGradientArgs}}]]
32 is the batch_size when running the model. The thing that is confusing me is when I run this with batch_size = 1, the model runs!
Where is this going wrong? is it the optimizer and the way it averages?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我通过以形式重塑解码器的输出来解决问题:( win_size,1),因为MLP未能在其中添加额外的dim'n!
I solved the issue by reshaping the output from the decoder in form: (win_size, 1), since the MLP fails to add that extra dim'n in!