使用Model.Fit（）时处理张量形状

发布于 2025-02-08 16:41:30 字数 2014 浏览 1 评论 0原文

我是出于自己的目的在TensorFlow中实现Tacotron2，但我无法使用Model.Fit（）方法训练它。我的tf.data.dataset生成两个元组（输入和输出），如下：（Phonemes，Mel_spec），（Mel_spec，Gates，Gates）其中Phonemes是各种尺寸的字符串， mel_spec是各个长度和固定通道的频谱2D张量（此处为80），gates是1D张量，长度与mel_spec（表示A停止预测）。 TACOTRON利用教师强迫，这就是为什么MEL_SPEC在输入和输出元组中。

由于输入的长度不同，因此我使用paded_batch方法如下：

dataset = dataset.padded_batch(batch_size, 
        padding_values=((None, None), (None, 1.)) )

仅用于门的填充值1。的内容时，使用print（next（iter（dataset）））），一切听起来不错：形状，填充...此外，当手动喂食完整的批次时：

x, y = next(iter(dataset.padded_batch(batch_size, padding_values=((None, None), (None, 1.)) )))
mels, gates = tac(x)

在检查它返回张量，形状正确。

但是，事实证明，我无法浏览fit方法。当我这样做时，

dataset = dataset.padded_batch(batch_size, 
        padding_values=((None, None), (None, 1.)) )

"""
train
"""
optimizer = conf["train"]["optimizer"]
epochs = conf["train"]["epochs"]

tac.compile(optimizer=optimizer, loss=tac.criterion)
tac.fit(dataset, epochs=epochs)

我得到了：

Tacotron2.py:177 call  *
        crop = mels.shape[2] - mels.shape[2]%self.config["n_frames_per_step"]#max_len must be a multiple of n_frames_per_step
TypeError: unsupported operand type(s) for %: 'NoneType' and 'int'

这是我的调用函数，

def call(self, batch, training=False):

    phon, mels = batch
    x = self.tokenizer(phon)
    x = self.char_embedding(x)
    y = self.encoder(x)
    print(y.shape)


    crop = mels.shape[2] - mels.shape[2]%self.config["n_frames_per_step"]#max_len must be a multiple of n_frames_per_step
    mels, gates = self.decoder(y, mels[:,:,:crop])

    residual = self.decoder.postnet(mels)
    mels_post = mels + residual
return (mels, mels_post), gates

似乎fit正在为我的模型提供空张量，并且由于我需要访问我的张量的形状将其裁剪，这会引起错误。我不知道为什么Tensorflow这样做，我主要与Pytorch一起工作，所以我仍在学习TF的基础知识。有人知道我的方式怎么了？

原文

I'm implementing Tacotron2 in Tensorflow for my own purpose and I fail to train it using the Model.fit() method.
My tf.data.Dataset generates two tuples (input and output) as follows : (phonemes, mel_spec), (mel_spec, gates) where phonemes are strings of various sizes, mel_spec are spectrogram 2D tensor of various length and fixed channels (80 here), and gates are 1d tensor with same length as mel_spec (representing a stop prediction). Tacotron makes use of teacher forcing, that's why mel_specs is both in input and output tuples.

Since the inputs have different lengths, I use the padded_batch method as following :

dataset = dataset.padded_batch(batch_size, 
        padding_values=((None, None), (None, 1.)) )

With padding value 1 only for the gates. When checking what it returns with print(next(iter(dataset))), everything sounds good : shapes, padding ... Moreover, when feeding a full batch manually like this :

x, y = next(iter(dataset.padded_batch(batch_size, padding_values=((None, None), (None, 1.)) )))
mels, gates = tac(x)

Everything works and it returns tensor with correct shapes.

However it turns out that I can't go through the fit method.
When I do this,

dataset = dataset.padded_batch(batch_size, 
        padding_values=((None, None), (None, 1.)) )

"""
train
"""
optimizer = conf["train"]["optimizer"]
epochs = conf["train"]["epochs"]

tac.compile(optimizer=optimizer, loss=tac.criterion)
tac.fit(dataset, epochs=epochs)

I get :

Tacotron2.py:177 call  *
        crop = mels.shape[2] - mels.shape[2]%self.config["n_frames_per_step"]#max_len must be a multiple of n_frames_per_step
TypeError: unsupported operand type(s) for %: 'NoneType' and 'int'

and here is my call function,

def call(self, batch, training=False):

    phon, mels = batch
    x = self.tokenizer(phon)
    x = self.char_embedding(x)
    y = self.encoder(x)
    print(y.shape)


    crop = mels.shape[2] - mels.shape[2]%self.config["n_frames_per_step"]#max_len must be a multiple of n_frames_per_step
    mels, gates = self.decoder(y, mels[:,:,:crop])

    residual = self.decoder.postnet(mels)
    mels_post = mels + residual
return (mels, mels_post), gates

It seems that fit is giving an empty tensor to my model, and since I need to access to the shape of my tensor in order to crop it, it raises an error. I have no idea why Tensorflow is doing this, I've mainly worked so far with Pytorch so I'm still learning the basics of TF. Anybody knows what's wrong with my way of doing it ?

分享到QQ

分享到微博