各个时期的准确率和损失值相同

发布于 2025-01-17 05:16:46 字数 8098 浏览 1 评论 0原文

我正在开发一个项目，对“石头、剪刀、布”游戏的手势进行分类。我知道张量流已经提供了“rps 数据集”，但由于其构造，分类只能在白色背景上很好地工作。

为了更好地概括，我创建了一个包含 11.700 张图像的新数据集，并将其结构类似于“rps 数据集”：我为论文类创建的一些图像示例。

每个类包含 3900 张图像，其中 3096 张图像用于训练，650 张用于测试，154 张用于验证。为了创建我的卷积网络，我在“google colab”上使用tensorflow/keras，并通过 Jupiter 在本地运行时连接。首先，我为训练、验证和测试创建了三个数据生成器：

target_size = (300, 300)
batch_size = 32
num_epochs = 50
num_classes = 3
learning_rate = .001

path_name = 'C:/Users/andre/Desktop/new_dataset_bg'
save_path_name = 'C:/Users/andre/Desktop/rps_model/'
save_history_path_name = save_path_name + "lastModel_history_def.csv"

#init parameters data augmentation for training
datagenerator_train = ImageDataGenerator(
    rescale=1. / 255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    brightness_range=(0.3, 1.0),
    horizontal_flip=True,
    vertical_flip=True,
    fill_mode='nearest',
)

#datagenerator for test and validation (normalization only)
datagenerator_test = ImageDataGenerator(rescale=1. / 255)

#creazione training set, validation set e test set
train_generator = datagenerator_train.flow_from_directory(
    directory = path_name + "/rps_train",
    target_size = target_size,
    color_mode = "rgb",
    batch_size = batch_size,
    class_mode="categorical",
    shuffle=True,
    seed=42
)

test_generator = datagenerator_test.flow_from_directory(
    directory = path_name + "/rps_test",
    target_size = target_size,
    color_mode = "rgb",
    batch_size = batch_size,
    class_mode="categorical",
    shuffle=False,
)

val_generator = datagenerator_train.flow_from_directory(
    directory = path_name + "/rps_val",
    target_size = target_size,
    color_mode = "rgb",
    batch_size = batch_size,
    class_mode="categorical",
    shuffle=False,
)

train_steps = train_generator.n // train_generator.batch_size
test_steps = test_generator.n // test_generator.batch_size
val_steps = val_generator.n // val_generator.batch_size

我选择 32 作为批量大小的原因是，对于较大的值，gpu 饱和和张量流会给我一个 OMM 错误。之后，我创建了一个返回网络模型的函数：

def neural_struct():
   newModel = tf.keras.models.Sequential([
        Conv2D(32, (2,2), activation='relu', padding="same", input_shape=(300, 300, 3)),
        MaxPooling2D(3,3),
        Conv2D(64, (3,3), activation='relu', padding="same"),
        MaxPooling2D(2,2),
        Dropout(0.2),
        Conv2D(64, (3,3), activation='relu', padding="same"),
        MaxPooling2D(2,2),
        Dropout(0.2),
        Conv2D(128, (3,3), activation='relu', padding="same"),
        MaxPooling2D(2,2),
        Dropout(0.2),
        Conv2D(128, (3,3), activation='relu', padding="same"),
        MaxPooling2D(2,2),
        Dropout(0.2),
        Flatten(),
        Dense(512, activation='relu'),
        Dropout(0.2),
        Dense(3, activation='softmax')
    ])
    
    newModel.compile(
        loss='categorical_crossentropy',
        optimizer=Adam(learning_rate=learning_rate),
        metrics=['accuracy']
        )
    
    return newModel

newModel =  neural_struct()
newModel.summary()

这是张量流打印的输出：

Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_47 (Conv2D)          (None, 300, 300, 32)      896       
                                                                 
 max_pooling2d_44 (MaxPoolin  (None, 150, 150, 32)     0         
 g2D)                                                            
                                                                 
 conv2d_48 (Conv2D)          (None, 150, 150, 64)      18496     
                                                                 
 max_pooling2d_45 (MaxPoolin  (None, 75, 75, 64)       0         
 g2D)                                                            
                                                                 
 dropout_39 (Dropout)        (None, 75, 75, 64)        0         
                                                                 
 conv2d_49 (Conv2D)          (None, 75, 75, 64)        36928     
                                                                 
 max_pooling2d_46 (MaxPoolin  (None, 37, 37, 64)       0         
 g2D)                                                            
                                                                 
 dropout_40 (Dropout)        (None, 37, 37, 64)        0         
                                                                 
 conv2d_50 (Conv2D)          (None, 37, 37, 128)       73856     
                                                                 
 max_pooling2d_47 (MaxPoolin  (None, 18, 18, 128)      0         
 g2D)                                                            
                                                                 
 dropout_41 (Dropout)        (None, 18, 18, 128)       0         
                                                                 
 conv2d_51 (Conv2D)          (None, 18, 18, 128)       147584    
                                                                 
 max_pooling2d_48 (MaxPoolin  (None, 9, 9, 128)        0         
 g2D)                                                            
                                                                 
 dropout_42 (Dropout)        (None, 9, 9, 128)         0         
                                                                 
 flatten_10 (Flatten)        (None, 10368)             0         
                                                                 
 dense_20 (Dense)            (None, 512)               5308928   
                                                                 
 dropout_43 (Dropout)        (None, 512)               0         
                                                                 
 dense_21 (Dense)            (None, 3)                 1539      
                                                                 
=================================================================
Total params: 5,588,227
Trainable params: 5,588,227
Non-trainable params: 0
_________________________________________________________________

最后，这是我用于训练的代码（我使用提前停止和 GPU 来加快训练所需的时间）：

earlyStop = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, restore_best_weights=True)

with tf.device("/device:GPU:0"):
    historyModel = newModel.fit(train_generator,
              steps_per_epoch = train_steps,
              epochs = num_epochs,
              validation_data = val_generator,
              validation_steps = val_steps,
              callbacks=[earlyStop],
              )

但是，当我尝试拟合网络时，准确性和损失在各个时期保持不变：

Epoch 1/50
290/290 [==============================] - 218s 747ms/step - loss: 1.0995 - accuracy: 0.3313 - val_loss: 1.0990 - val_accuracy: 0.3125
Epoch 2/50
290/290 [==============================] - 226s 778ms/step - loss: 1.0990 - accuracy: 0.3440 - val_loss: 1.0997 - val_accuracy: 0.3125
Epoch 3/50
290/290 [==============================] - 224s 773ms/step - loss: 1.0988 - accuracy: 0.3318 - val_loss: 1.0984 - val_accuracy: 0.3438
Epoch 4/50
290/290 [==============================] - 226s 780ms/step - loss: 1.0988 - accuracy: 0.3280 - val_loss: 1.0988 - val_accuracy: 0.3125
Epoch 5/50
290/290 [==============================] - 227s 784ms/step - loss: 1.0989 - accuracy: 0.3332 - val_loss: 1.0984 - val_accuracy: 0.3438
Epoch 6/50
290/290 [==============================] - ETA: 0s - loss: 1.0988 - accuracy: 0.3296Restoring model weights from the end of the best epoch: 1.
290/290 [==============================] - 227s 784ms/step - loss: 1.0988 - accuracy: 0.3296 - val_loss: 1.0984 - val_accuracy: 0.3438
Epoch 6: early stopping

为了解决这个问题，最初，我尝试改变每层更少过滤器的架构。我也将优化器从“Adam”更改为“RMSProp”，但没有任何改进。

之后，我尝试减少每个类别的数据集和无用图像。我做了两次尝试：第一个数据集包含 5000 张图像，而第二个数据集包含 3000 张图像。在这两种情况下，损失和准确度值在各个时期内几乎保持不变。我错过了什么？

原文

I'm working on a project to classify the hand gestures of the "rock, paper, and scissors" game. I know there's already the "rps dataset" provided by tensorflow, but due to its construction, the classification works well only on a white background.

In order to generalize better, I create a new dataset with 11.700 images and structured it like the "rps dataset":
some examples of images I create for the paper class .

Each class contains 3900 images, specifically 3096 images for train, 650 for the test, and 154 for validation. To create my convolutional network, I use tensorflow/keras on "google colab" and connected locally runtime through Jupiter. First, I create three data generators for train, validation, and test:

target_size = (300, 300)
batch_size = 32
num_epochs = 50
num_classes = 3
learning_rate = .001

path_name = 'C:/Users/andre/Desktop/new_dataset_bg'
save_path_name = 'C:/Users/andre/Desktop/rps_model/'
save_history_path_name = save_path_name + "lastModel_history_def.csv"

#init parameters data augmentation for training
datagenerator_train = ImageDataGenerator(
    rescale=1. / 255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    brightness_range=(0.3, 1.0),
    horizontal_flip=True,
    vertical_flip=True,
    fill_mode='nearest',
)

#datagenerator for test and validation (normalization only)
datagenerator_test = ImageDataGenerator(rescale=1. / 255)

#creazione training set, validation set e test set
train_generator = datagenerator_train.flow_from_directory(
    directory = path_name + "/rps_train",
    target_size = target_size,
    color_mode = "rgb",
    batch_size = batch_size,
    class_mode="categorical",
    shuffle=True,
    seed=42
)

test_generator = datagenerator_test.flow_from_directory(
    directory = path_name + "/rps_test",
    target_size = target_size,
    color_mode = "rgb",
    batch_size = batch_size,
    class_mode="categorical",
    shuffle=False,
)

val_generator = datagenerator_train.flow_from_directory(
    directory = path_name + "/rps_val",
    target_size = target_size,
    color_mode = "rgb",
    batch_size = batch_size,
    class_mode="categorical",
    shuffle=False,
)

train_steps = train_generator.n // train_generator.batch_size
test_steps = test_generator.n // test_generator.batch_size
val_steps = val_generator.n // val_generator.batch_size

The reason why I chose 32 as batch size is that for larger values the gpu saturated and tensorflow gave me an OMM error.
After that, I created a function that returns the net model:

def neural_struct():
   newModel = tf.keras.models.Sequential([
        Conv2D(32, (2,2), activation='relu', padding="same", input_shape=(300, 300, 3)),
        MaxPooling2D(3,3),
        Conv2D(64, (3,3), activation='relu', padding="same"),
        MaxPooling2D(2,2),
        Dropout(0.2),
        Conv2D(64, (3,3), activation='relu', padding="same"),
        MaxPooling2D(2,2),
        Dropout(0.2),
        Conv2D(128, (3,3), activation='relu', padding="same"),
        MaxPooling2D(2,2),
        Dropout(0.2),
        Conv2D(128, (3,3), activation='relu', padding="same"),
        MaxPooling2D(2,2),
        Dropout(0.2),
        Flatten(),
        Dense(512, activation='relu'),
        Dropout(0.2),
        Dense(3, activation='softmax')
    ])
    
    newModel.compile(
        loss='categorical_crossentropy',
        optimizer=Adam(learning_rate=learning_rate),
        metrics=['accuracy']
        )
    
    return newModel

newModel =  neural_struct()
newModel.summary()

This is the output printed by tensorflow:

Model: "sequential_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_47 (Conv2D)          (None, 300, 300, 32)      896       
                                                                 
 max_pooling2d_44 (MaxPoolin  (None, 150, 150, 32)     0         
 g2D)                                                            
                                                                 
 conv2d_48 (Conv2D)          (None, 150, 150, 64)      18496     
                                                                 
 max_pooling2d_45 (MaxPoolin  (None, 75, 75, 64)       0         
 g2D)                                                            
                                                                 
 dropout_39 (Dropout)        (None, 75, 75, 64)        0         
                                                                 
 conv2d_49 (Conv2D)          (None, 75, 75, 64)        36928     
                                                                 
 max_pooling2d_46 (MaxPoolin  (None, 37, 37, 64)       0         
 g2D)                                                            
                                                                 
 dropout_40 (Dropout)        (None, 37, 37, 64)        0         
                                                                 
 conv2d_50 (Conv2D)          (None, 37, 37, 128)       73856     
                                                                 
 max_pooling2d_47 (MaxPoolin  (None, 18, 18, 128)      0         
 g2D)                                                            
                                                                 
 dropout_41 (Dropout)        (None, 18, 18, 128)       0         
                                                                 
 conv2d_51 (Conv2D)          (None, 18, 18, 128)       147584    
                                                                 
 max_pooling2d_48 (MaxPoolin  (None, 9, 9, 128)        0         
 g2D)                                                            
                                                                 
 dropout_42 (Dropout)        (None, 9, 9, 128)         0         
                                                                 
 flatten_10 (Flatten)        (None, 10368)             0         
                                                                 
 dense_20 (Dense)            (None, 512)               5308928   
                                                                 
 dropout_43 (Dropout)        (None, 512)               0         
                                                                 
 dense_21 (Dense)            (None, 3)                 1539      
                                                                 
=================================================================
Total params: 5,588,227
Trainable params: 5,588,227
Non-trainable params: 0
_________________________________________________________________

Finally, this is the code I used for training (i use early stopping and the GPU to speed up the time needed for training):

earlyStop = EarlyStopping(monitor='val_loss', min_delta=1e-3, patience=5, verbose=1, restore_best_weights=True)

with tf.device("/device:GPU:0"):
    historyModel = newModel.fit(train_generator,
              steps_per_epoch = train_steps,
              epochs = num_epochs,
              validation_data = val_generator,
              validation_steps = val_steps,
              callbacks=[earlyStop],
              )

However, when I try to fit the net, accuracy and loss stay the same over the epochs:

Epoch 1/50
290/290 [==============================] - 218s 747ms/step - loss: 1.0995 - accuracy: 0.3313 - val_loss: 1.0990 - val_accuracy: 0.3125
Epoch 2/50
290/290 [==============================] - 226s 778ms/step - loss: 1.0990 - accuracy: 0.3440 - val_loss: 1.0997 - val_accuracy: 0.3125
Epoch 3/50
290/290 [==============================] - 224s 773ms/step - loss: 1.0988 - accuracy: 0.3318 - val_loss: 1.0984 - val_accuracy: 0.3438
Epoch 4/50
290/290 [==============================] - 226s 780ms/step - loss: 1.0988 - accuracy: 0.3280 - val_loss: 1.0988 - val_accuracy: 0.3125
Epoch 5/50
290/290 [==============================] - 227s 784ms/step - loss: 1.0989 - accuracy: 0.3332 - val_loss: 1.0984 - val_accuracy: 0.3438
Epoch 6/50
290/290 [==============================] - ETA: 0s - loss: 1.0988 - accuracy: 0.3296Restoring model weights from the end of the best epoch: 1.
290/290 [==============================] - 227s 784ms/step - loss: 1.0988 - accuracy: 0.3296 - val_loss: 1.0984 - val_accuracy: 0.3438
Epoch 6: early stopping

To solve the problem, initially, I try to change architecture with less filter for each layer. I also change the optimizer from "Adam" to "RMSProp", but there has been no improvement.

After that, I try to reduce the dataset and useless images for each class. I made two attempts: the first dataset contained 5000 images, while the second contained 3000. In both cases, the loss and accuracy values remained almost the same over the epochs. What have I missed?

分享到QQ

分享到微博