批量尺寸> 1使用TensorFlow 1.x给出了错误

发布于 2025-01-24 07:52:02 字数 3301 浏览 0 评论 0原文

我正在使用这个 vae。

我做出的唯一区别是将损失从二进制交叉熵更改为MSE,如这样:

class OptimizerVAE(object):

def __init__(self, model, learning_rate=1e-3):
    """
    OptimizerVAE initializer
    :param model: a model object
    :param learning_rate: float, learning rate of the optimizer
    """

    # binary cross entropy error
    self.bce = tf.keras.losses.mse(model.x, model.logits)
    self.reconstruction_loss = tf.reduce_mean(tf.reduce_sum(self.bce, axis=-1))

    if model.distribution == 'normal':
        # KL divergence between normal approximate posterior and standard normal prior
        self.p_z = tf.distributions.Normal(tf.zeros_like(model.z), tf.ones_like(model.z))
        kl = model.q_z.kl_divergence(self.p_z)
        self.kl = tf.reduce_mean(tf.reduce_sum(kl, axis=-1))*0.1
    elif model.distribution == 'vmf':
        # KL divergence between vMF approximate posterior and uniform hyper-spherical prior
        self.p_z = HypersphericalUniform(model.z_dim - 1, dtype=model.x.dtype)
        kl = model.q_z.kl_divergence(self.p_z)
        self.kl = tf.reduce_mean(kl)*0.1
    else:
        raise NotImplemented

    self.ELBO = - self.reconstruction_loss - self.kl

    self.train_step = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(-self.ELBO)

    self.print = {'recon loss': self.reconstruction_loss, 'ELBO': self.ELBO, 'KL': self.kl}

运行原始体系结构时,该型号的运行完美(2 MLP层),无论批处理的大小如何(指定为“无”, GitHub代码)。

我正在尝试将其更改为卷积模型,但是当我仅更改编码器时:

def _encoder(self, x):
    """
    Encoder network
    :param x: placeholder for input
    :return: tuple `(z_mean, z_var)` with mean and concentration around the mean
    """
    
    # 2 hidden layers encoder
    #h0 = tf.layers.dense(x, units=self.h_dim * 2, activation=self.activation)
    #h1 = tf.layers.dense(h0, units=self.h_dim, activation=self.activation)
    h1 = tf.layers.conv1d(x, filters = 32, kernel_size = 7, activation = tf.nn.relu)
    h1 = tf.layers.conv1d(h1, filters = 64, kernel_size = 7, activation =tf.nn.relu)
    h1 = tf.layers.conv1d(h1, filters = 64, kernel_size = 7, activation = tf.nn.relu)
    h1 = tf.layers.flatten(h1)
    h1 = tf.layers.dense(h1, 32, activation = tf.nn.relu)

    if self.distribution == 'normal':
        # compute mean and std of the normal distribution
        z_mean = tf.layers.dense(h1, units=self.z_dim, activation=None, name = 'z_output')
        z_var = tf.layers.dense(h1, units=self.z_dim, activation=tf.nn.softplus)
    elif self.distribution == 'vmf':
        # compute mean and concentration of the von Mises-Fisher
        z_mean = tf.layers.dense(h1, units=self.z_dim, activation=lambda x: tf.nn.l2_normalize(x, axis=-1))
        # the `+ 1` prevent collapsing behaviors
        z_var = tf.layers.dense(h1, units=1, activation=tf.nn.softplus) + 1
    else:
        raise NotImplemented

    return z_mean, z_var

运行模型时,我会得到错误:

InvalidArgumentError: Incompatible shapes: [32,1] vs. [32,512,1]
 [[{{node gradients/SquaredDifference_grad/BroadcastGradientArgs}}]]

32是运行模型时的batch_size。令人困惑的是,当我用batch_size = 1运行此操作时,模型运行!

这出错了哪里?是优化器和平均方式吗?

I am using this example of a VAE.

The only difference I made was change the loss from binary cross entropy to MSE, like this:

class OptimizerVAE(object):

def __init__(self, model, learning_rate=1e-3):
    """
    OptimizerVAE initializer
    :param model: a model object
    :param learning_rate: float, learning rate of the optimizer
    """

    # binary cross entropy error
    self.bce = tf.keras.losses.mse(model.x, model.logits)
    self.reconstruction_loss = tf.reduce_mean(tf.reduce_sum(self.bce, axis=-1))

    if model.distribution == 'normal':
        # KL divergence between normal approximate posterior and standard normal prior
        self.p_z = tf.distributions.Normal(tf.zeros_like(model.z), tf.ones_like(model.z))
        kl = model.q_z.kl_divergence(self.p_z)
        self.kl = tf.reduce_mean(tf.reduce_sum(kl, axis=-1))*0.1
    elif model.distribution == 'vmf':
        # KL divergence between vMF approximate posterior and uniform hyper-spherical prior
        self.p_z = HypersphericalUniform(model.z_dim - 1, dtype=model.x.dtype)
        kl = model.q_z.kl_divergence(self.p_z)
        self.kl = tf.reduce_mean(kl)*0.1
    else:
        raise NotImplemented

    self.ELBO = - self.reconstruction_loss - self.kl

    self.train_step = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(-self.ELBO)

    self.print = {'recon loss': self.reconstruction_loss, 'ELBO': self.ELBO, 'KL': self.kl}

and when running the original architecture, the model runs perfectly (2 MLP layers), no matter the size of the batches (specified as "None" in the github code).

I am trying to change this to a convolutional model, but when I change just the encoder to this:

def _encoder(self, x):
    """
    Encoder network
    :param x: placeholder for input
    :return: tuple `(z_mean, z_var)` with mean and concentration around the mean
    """
    
    # 2 hidden layers encoder
    #h0 = tf.layers.dense(x, units=self.h_dim * 2, activation=self.activation)
    #h1 = tf.layers.dense(h0, units=self.h_dim, activation=self.activation)
    h1 = tf.layers.conv1d(x, filters = 32, kernel_size = 7, activation = tf.nn.relu)
    h1 = tf.layers.conv1d(h1, filters = 64, kernel_size = 7, activation =tf.nn.relu)
    h1 = tf.layers.conv1d(h1, filters = 64, kernel_size = 7, activation = tf.nn.relu)
    h1 = tf.layers.flatten(h1)
    h1 = tf.layers.dense(h1, 32, activation = tf.nn.relu)

    if self.distribution == 'normal':
        # compute mean and std of the normal distribution
        z_mean = tf.layers.dense(h1, units=self.z_dim, activation=None, name = 'z_output')
        z_var = tf.layers.dense(h1, units=self.z_dim, activation=tf.nn.softplus)
    elif self.distribution == 'vmf':
        # compute mean and concentration of the von Mises-Fisher
        z_mean = tf.layers.dense(h1, units=self.z_dim, activation=lambda x: tf.nn.l2_normalize(x, axis=-1))
        # the `+ 1` prevent collapsing behaviors
        z_var = tf.layers.dense(h1, units=1, activation=tf.nn.softplus) + 1
    else:
        raise NotImplemented

    return z_mean, z_var

and when running the model, I get the error:

InvalidArgumentError: Incompatible shapes: [32,1] vs. [32,512,1]
 [[{{node gradients/SquaredDifference_grad/BroadcastGradientArgs}}]]

32 is the batch_size when running the model. The thing that is confusing me is when I run this with batch_size = 1, the model runs!

Where is this going wrong? is it the optimizer and the way it averages?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

倦话 2025-01-31 07:52:02

我通过以形式重塑解码器的输出来解决问题:( win_size,1),因为MLP未能在其中添加额外的dim'n!

I solved the issue by reshaping the output from the decoder in form: (win_size, 1), since the MLP fails to add that extra dim'n in!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文