当前位置：文江博客话题详情

输入层与Tensorflow 2D CNN不兼容

发布于 2025-02-05 14:19:46 字数 3519 浏览 2 评论 0 原文

我正在尝试使用频谱图作为输入来训练CNN模型，以进行语音情感识别任务。我已经重塑了频谱图以具有形状（num_frequency_bins，num_time_frames，1），我认为这是足够的，但是在尝试将模型安装到数据集中时，该数据集存储在tensorflow数据集中，我有以下错误：

Input 0 of layer "sequential_12" is incompatible with the layer: expected shape=(None, 257, 1001, 1), found shape=(257, 1001, 1)

我尝试重塑频谱图以具有形状（1，num_frequency_bins，num_time_frames，1），但是在创建顺序模型时会产生错误：

ValueError: Exception encountered when calling layer "resizing_14" (type Resizing).

'images' must have either 3 or 4 dimensions.

Call arguments received:
  • inputs=tf.Tensor(shape=(None, 1, 257, 1001, 1), dtype=float32)

因此我以形状作为形状传递，如（num_frequency_bins，num_time_frames，1）创建模型时，然后将模型拟合到使用4维数据的训练数据中，但这引发了此错误：

InvalidArgumentError: slice index 0 of dimension 0 out of bounds. [Op:StridedSlice] name: strided_slice/

因此，我有点损失现在。我真的不知道该怎么做以及如何解决这个问题。我已经阅读了，但没有遇到任何有用的东西。非常感谢任何帮助。

这是上下文的一些代码。

dataset = [[specgram_files[i], labels[i]] for i in range(len(specgram_files))]
specgram_files_and_labels_dataset = tf.data.Dataset.from_tensor_slices((specgram_files, labels))

def read_npy_file(data):
    # 'data' stores the file name of the numpy binary file storing the features of a particular sound file
    # item() returns numpy array of size 1 as a suitable python scalar.
    # data.item() then returns the bytes string stored in the numpy array.
    # decode() is then called on the bytes string to decode it from a bytes string to a regular string
    # so that it can be passed as a parameter in np.load()
    data = np.load(data.item().decode())
    # Shape of data is now (1, rows, columns)
    # Needs to be reshaped to (rows, columns, 1):
    data = np.reshape(data, (data.shape[0], data.shape[1], 1))
    return data.astype(np.float32)

specgram_dataset = specgram_files_and_labels_dataset.map(
                    lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]),
                    num_parallel_calls=tf.data.AUTOTUNE)

num_files = len(train_df)
num_train = int(0.8 * num_files)
num_val = int(0.1 * num_files)
num_test = int(0.1 * num_files)

specgram_dataset.shuffle(buffer_size=1000)
specgram_train_ds = specgram_dataset.take(num_train)
specgram_test_ds = specgram_dataset.skip(num_train)
specgram_val_ds = specgram_test_ds.take(num_val)
specgram_test_ds = specgram_test_ds.skip(num_val)

batch_size = 32
specgram_train_ds.batch(batch_size)
specgram_val_ds.batch(batch_size)

specgram_train_ds = specgram_train_ds.cache().prefetch(tf.data.AUTOTUNE)
specgram_val_ds = specgram_val_ds.cache().prefetch(tf.data.AUTOTUNE)

for specgram, label in specgram_train_ds.take(1):
    input_shape = specgram.shape

num_emotions = len(train_df["emotion"].unique())

model = models.Sequential([
    layers.Input(shape=input_shape),
    # downsampling the input. 
    layers.Resizing(32, 128),
    layers.Conv2D(32, 3, activation="relu"),
    layers.MaxPooling2D(),
    layers.Conv2D(64, 3, activation="relu"),
    layers.MaxPooling2D(),
    layers.Flatten(),
    layers.Dense(128, activation="softmax"),
    layers.Dense(num_emotions)
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(0.01),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    metrics=["accuracy"]
)

EPOCHS = 10

model.fit(
    specgram_train_ds,
    validation_data=specgram_val_ds,
    epochs=EPOCHS,
    callbacks=tf.keras.callbacks.EarlyStopping(verbose=1, patience=2)
)

原文

I'm trying to train a CNN model for a speech emotion recognition task using spectrograms as input. I've reshaped the spectrograms to have the shape (num_frequency_bins, num_time_frames, 1) which I thought would be sufficient, but upon trying to fit the model to the dataset, which is stored in a Tensorflow dataset, I got the following error:

Input 0 of layer "sequential_12" is incompatible with the layer: expected shape=(None, 257, 1001, 1), found shape=(257, 1001, 1)

I tried reshaping the spectrograms to have the shape (1, num_frequency_bins, num_time_frames, 1), but that produced an error when creating the Sequential model:

ValueError: Exception encountered when calling layer "resizing_14" (type Resizing).

'images' must have either 3 or 4 dimensions.

Call arguments received:
  • inputs=tf.Tensor(shape=(None, 1, 257, 1001, 1), dtype=float32)

So I passed in the shape as (num_frequency_bins, num_time_frames, 1) when creating the model, and then fitted the model to the training data with the 4-dimensional data, but that raised this error:

InvalidArgumentError: slice index 0 of dimension 0 out of bounds. [Op:StridedSlice] name: strided_slice/

So I'm kind of at a loss now. I genuinely have no idea what to do and how I can go about fixing this. I've read around but haven't come across anything useful. Would really appreciate any help.

Here's some of the code for context.

dataset = [[specgram_files[i], labels[i]] for i in range(len(specgram_files))]
specgram_files_and_labels_dataset = tf.data.Dataset.from_tensor_slices((specgram_files, labels))

def read_npy_file(data):
    # 'data' stores the file name of the numpy binary file storing the features of a particular sound file
    # item() returns numpy array of size 1 as a suitable python scalar.
    # data.item() then returns the bytes string stored in the numpy array.
    # decode() is then called on the bytes string to decode it from a bytes string to a regular string
    # so that it can be passed as a parameter in np.load()
    data = np.load(data.item().decode())
    # Shape of data is now (1, rows, columns)
    # Needs to be reshaped to (rows, columns, 1):
    data = np.reshape(data, (data.shape[0], data.shape[1], 1))
    return data.astype(np.float32)

specgram_dataset = specgram_files_and_labels_dataset.map(
                    lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]),
                    num_parallel_calls=tf.data.AUTOTUNE)

num_files = len(train_df)
num_train = int(0.8 * num_files)
num_val = int(0.1 * num_files)
num_test = int(0.1 * num_files)

specgram_dataset.shuffle(buffer_size=1000)
specgram_train_ds = specgram_dataset.take(num_train)
specgram_test_ds = specgram_dataset.skip(num_train)
specgram_val_ds = specgram_test_ds.take(num_val)
specgram_test_ds = specgram_test_ds.skip(num_val)

batch_size = 32
specgram_train_ds.batch(batch_size)
specgram_val_ds.batch(batch_size)

specgram_train_ds = specgram_train_ds.cache().prefetch(tf.data.AUTOTUNE)
specgram_val_ds = specgram_val_ds.cache().prefetch(tf.data.AUTOTUNE)

for specgram, label in specgram_train_ds.take(1):
    input_shape = specgram.shape

num_emotions = len(train_df["emotion"].unique())

model = models.Sequential([
    layers.Input(shape=input_shape),
    # downsampling the input. 
    layers.Resizing(32, 128),
    layers.Conv2D(32, 3, activation="relu"),
    layers.MaxPooling2D(),
    layers.Conv2D(64, 3, activation="relu"),
    layers.MaxPooling2D(),
    layers.Flatten(),
    layers.Dense(128, activation="softmax"),
    layers.Dense(num_emotions)
])

model.compile(
    optimizer=tf.keras.optimizers.Adam(0.01),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    metrics=["accuracy"]
)

EPOCHS = 10

model.fit(
    specgram_train_ds,
    validation_data=specgram_val_ds,
    epochs=EPOCHS,
    callbacks=tf.keras.callbacks.EarlyStopping(verbose=1, patience=2)
)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

毁我热情 2025-02-12 14:19:46

假设您知道您的 input_shape ，我建议您先将其硬编码为模型：

model = models.Sequential([
    layers.Input(shape=(257, 1001, 1),
    # downsampling the input. 
    layers.Resizing(32, 128),
    layers.Conv2D(32, 3, activation="relu"),
    layers.MaxPooling2D(),
    layers.Conv2D(64, 3, activation="relu"),
    layers.MaxPooling2D(),
    layers.Flatten(),
    layers.Dense(128, activation="softmax"),
    layers.Dense(num_emotions)
])

另外，当使用 tf.data.dataaset.batch 时，您应该分配数据集输出到一个变量：

batch_size = 32
specgram_train_ds = specgram_train_ds.batch(batch_size)
specgram_val_ds = specgram_val_ds.batch(batch_size)

之后，请确保 specgram_train_ds 确实确实具有正确的形状：

specgrams, _ = next(iter(specgram_train_ds.take(1)))
assert specgrams.shape == (32, 257, 1001, 1)

Assuming you know your input_shape, I would recommend first hard-coding it into your model:

model = models.Sequential([
    layers.Input(shape=(257, 1001, 1),
    # downsampling the input. 
    layers.Resizing(32, 128),
    layers.Conv2D(32, 3, activation="relu"),
    layers.MaxPooling2D(),
    layers.Conv2D(64, 3, activation="relu"),
    layers.MaxPooling2D(),
    layers.Flatten(),
    layers.Dense(128, activation="softmax"),
    layers.Dense(num_emotions)
])

Also, when using tf.data.Dataset.batch, you should assign the Dataset output to a variable:

batch_size = 32
specgram_train_ds = specgram_train_ds.batch(batch_size)
specgram_val_ds = specgram_val_ds.batch(batch_size)

Afterwards, make sure that specgram_train_ds really does have the correct shape:

specgrams, _ = next(iter(specgram_train_ds.take(1)))
assert specgrams.shape == (32, 257, 1001, 1)

回复收藏 0 原文

~没有更多了~

关于作者

戒ㄋ

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

输入层与Tensorflow 2D CNN不兼容

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

陪我终i

别忘他

野心澎湃

蒲公英的约定

。

旧时模样

友情链接

输入层与Tensorflow 2D CNN不兼容

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

陪我终i

别忘他

野心澎湃

蒲公英的约定

。

旧时模样

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。