TensorFlow Predive（）在单个样本上是准确的，但是在数据集上预测时给出了奇怪的结果

发布于 2025-02-11 12:01:35 字数 5827 浏览 0 评论 0原文

预测训练有素的模型时，我会遇到奇怪的结果。

如果我使用数据集使用model.predict（），我会获得虚假结果。

如果我一次通过数据集迭代一个样本，则可以根据训练的模型的准确性获得结果。

该模型是在TensorFlow中构建的，可以在Stanford Dogs数据集上进行预测，我这样加载了它：

dataset, info = tfds.load(name="stanford_dogs", with_info=True)

尽管不是最好的练习，但我正在使用测试数据集进行验证：

training_data = dataset['train']
val_data = dataset['test']

我预处理类似的数据集：

IMG_LEN = 224
IMG_SHAPE = (IMG_LEN,IMG_LEN,3)
N_BREEDS = 120

def preprocess(ds_row):
    image = tf.image.convert_image_dtype(ds_row['image'], dtype=tf.float32)
    image = tf.image.resize(image, (IMG_LEN, IMG_LEN), method='nearest')
    
    label = tf.one_hot(ds_row['label'], N_BREEDS)  # TODO: Can remove one_hot and change loss function
    return image, label

def prepare(dataset, batch_size=None):
    ds = dataset.map(preprocess, num_parallel_calls=4)
    # ds = ds.shuffle(buffer_size=1000)
    if batch_size:
        ds = ds.batch(batch_size)
        ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
    return ds

train_batches = prepare(training_data, batch_size=32)
val_batches = prepare(val_data, batch_size=32)

我的模型基于Mobilenet V2：

base_model = tf.keras.applications.MobileNetV2(input_shape = IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')

base_model.trainable = False

model = tf.keras.Sequential([
  base_model,
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(N_BREEDS, activation='softmax')
])

model.compile(optimizer=tf.keras.optimizers.Adamax(0.0001),
              loss='categorical_crossentropy',
              metrics=['accuracy', 'top_k_categorical_accuracy'])

在训练期间，我获得以下准确性的级别

...
Epoch 25/30
375/375 [==============================] - 17s 45ms/step - loss: 0.9224 - accuracy: 0.7914 - top_k_categorical_accuracy: 0.9777 - val_loss: 1.0945 - val_accuracy: 0.7309 - val_top_k_categorical_accuracy: 0.9486
Epoch 26/30
375/375 [==============================] - 17s 45ms/step - loss: 0.8981 - accuracy: 0.7944 - top_k_categorical_accuracy: 0.9784 - val_loss: 1.0755 - val_accuracy: 0.7331 - val_top_k_categorical_accuracy: 0.9493
Epoch 27/30
375/375 [==============================] - 17s 44ms/step - loss: 0.8757 - accuracy: 0.7972 - top_k_categorical_accuracy: 0.9788 - val_loss: 1.0581 - val_accuracy: 0.7359 - val_top_k_categorical_accuracy: 0.9500
Epoch 28/30
375/375 [==============================] - 17s 44ms/step - loss: 0.8548 - accuracy: 0.8019 - top_k_categorical_accuracy: 0.9796 - val_loss: 1.0417 - val_accuracy: 0.7378 - val_top_k_categorical_accuracy: 0.9505
Epoch 29/30
375/375 [==============================] - 17s 45ms/step - loss: 0.8350 - accuracy: 0.8058 - top_k_categorical_accuracy: 0.9807 - val_loss: 1.0271 - val_accuracy: 0.7389 - val_top_k_categorical_accuracy: 0.9508
Epoch 30/30
375/375 [==============================] - 17s 44ms/step - loss: 0.8165 - accuracy: 0.8076 - top_k_categorical_accuracy: 0.9813 - val_loss: 1.0134 - val_accuracy: 0.7404 - val_top_k_categorical_accuracy: 0.9509

，在这里更好地显示...

https://i.sstatic.net/7pexl.png“ alt =“训练精度”>

如果我运行model.predict.predict（）使用：

test_images = dataset['test'].map(
    lambda x:  (tf.image.resize(x['image'], (IMG_LEN, IMG_LEN), method='nearest'))
).batch(1)

_test_labels = dataset['test'].map(
    lambda y: (y['label'])
)
test_labels = [l.numpy() for l in _test_labels]

tf_preds = model.predict(test_images)

我得到以下输出：

print(test_labels)
[67, 84, 57, 12, 88, 32, 55, 9, 68, 99, 10, 1, 60, 52, 96, 33, 108, 71, 11, 75, 77, 9, 50, 19, 41, 118, 30...]

print(tf_preds)
[12  65   0  65  56  65  91  65  12  50   0  65   0  65   0   0  65  65   58  65   0  65   0   0  65  65  65  65  65  50  65  65  65  97  65  65...]

这是为此，混淆矩阵：

我最初认为该模型不是训练，或者对培训数据过于匹配，但是如果我一次迭代一个数据样本，我会明白这一点：

_preds = []
_actuals = []

for dog, label in zip(dataset['test'].take(n), test_labels[:n]):

    pic, _ = preprocess(dog)  # Convert to float32 and resize

    img_tensor = tf.expand_dims(pic,0)
    pred = model(img_tensor)

    top_components = tf.reshape(tf.math.top_k(pred, k=5).indices,shape=[-1])
    top_matches = [get_name(i) for i in top_components]
    actual = get_name(label)

    _preds.append(top_components[0])
    _actuals.append(label)

print(_preds)
[ 67  84  57   8  88  32  55   9  68  99  15   1  17  52  96  33 108  71...]

当您时可以查看是否非常接近地面真相标签：

print(test_labels)
[67, 84, 57, 12, 88, 32, 55, 9, 68, 99, 10, 1, 60, 52, 96, 33, 108, 71, 11, 75, 77, 9, 50, 19, 41, 118, 30...]

我知道这是使用top_k（），但是输出与argmax（）

这是此的混淆矩阵，这与我对良好模型精度的期望保持一致：

上与预期结果：

model.evaluate(val_batches)
269/269 [==============================] - 175s 637ms/step - loss: 1.0301 - accuracy: 0.7336 - top_k_categorical_accuracy: 0.9541
Out[93]:
[1.0301110744476318, 0.7335664629936218, 0.954079270362854]

对于完整性，model.evaluate（）在验证数据问题是为什么这会？这是制定预测的模型，因为数据已畸形或具有不正确的维度，它没有错误，它通过两种情况的相同的预处理代码运行，但是一个人正在按预期工作，并且一个人非常不准确，例如<<<<<<代码> model.predict（）正在使用未经训练的模型？

原文

I'm experiencing strange results when predicting against a trained model.

If I use model.predict() with a dataset, I get spurious results.

If I iterate through the dataset one sample at a time, I get results in line with the accuracy of the trained model.

The model is built in Tensorflow to predict on the Stanford Dogs dataset, which I load like this:

dataset, info = tfds.load(name="stanford_dogs", with_info=True)

Although not best practice, I'm using the test dataset for validation:

training_data = dataset['train']
val_data = dataset['test']

I pre-process the dataset like this:

IMG_LEN = 224
IMG_SHAPE = (IMG_LEN,IMG_LEN,3)
N_BREEDS = 120

def preprocess(ds_row):
    image = tf.image.convert_image_dtype(ds_row['image'], dtype=tf.float32)
    image = tf.image.resize(image, (IMG_LEN, IMG_LEN), method='nearest')
    
    label = tf.one_hot(ds_row['label'], N_BREEDS)  # TODO: Can remove one_hot and change loss function
    return image, label

def prepare(dataset, batch_size=None):
    ds = dataset.map(preprocess, num_parallel_calls=4)
    # ds = ds.shuffle(buffer_size=1000)
    if batch_size:
        ds = ds.batch(batch_size)
        ds = ds.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
    return ds

train_batches = prepare(training_data, batch_size=32)
val_batches = prepare(val_data, batch_size=32)

My model is based on MobileNet V2:

base_model = tf.keras.applications.MobileNetV2(input_shape = IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')

base_model.trainable = False

model = tf.keras.Sequential([
  base_model,
  tf.keras.layers.GlobalAveragePooling2D(),
  tf.keras.layers.Dense(N_BREEDS, activation='softmax')
])

model.compile(optimizer=tf.keras.optimizers.Adamax(0.0001),
              loss='categorical_crossentropy',
              metrics=['accuracy', 'top_k_categorical_accuracy'])

During training I get the following levels of accuracy

...
Epoch 25/30
375/375 [==============================] - 17s 45ms/step - loss: 0.9224 - accuracy: 0.7914 - top_k_categorical_accuracy: 0.9777 - val_loss: 1.0945 - val_accuracy: 0.7309 - val_top_k_categorical_accuracy: 0.9486
Epoch 26/30
375/375 [==============================] - 17s 45ms/step - loss: 0.8981 - accuracy: 0.7944 - top_k_categorical_accuracy: 0.9784 - val_loss: 1.0755 - val_accuracy: 0.7331 - val_top_k_categorical_accuracy: 0.9493
Epoch 27/30
375/375 [==============================] - 17s 44ms/step - loss: 0.8757 - accuracy: 0.7972 - top_k_categorical_accuracy: 0.9788 - val_loss: 1.0581 - val_accuracy: 0.7359 - val_top_k_categorical_accuracy: 0.9500
Epoch 28/30
375/375 [==============================] - 17s 44ms/step - loss: 0.8548 - accuracy: 0.8019 - top_k_categorical_accuracy: 0.9796 - val_loss: 1.0417 - val_accuracy: 0.7378 - val_top_k_categorical_accuracy: 0.9505
Epoch 29/30
375/375 [==============================] - 17s 45ms/step - loss: 0.8350 - accuracy: 0.8058 - top_k_categorical_accuracy: 0.9807 - val_loss: 1.0271 - val_accuracy: 0.7389 - val_top_k_categorical_accuracy: 0.9508
Epoch 30/30
375/375 [==============================] - 17s 44ms/step - loss: 0.8165 - accuracy: 0.8076 - top_k_categorical_accuracy: 0.9813 - val_loss: 1.0134 - val_accuracy: 0.7404 - val_top_k_categorical_accuracy: 0.9509

This is shown better here...

If I run model.predict() using:

test_images = dataset['test'].map(
    lambda x:  (tf.image.resize(x['image'], (IMG_LEN, IMG_LEN), method='nearest'))
).batch(1)

_test_labels = dataset['test'].map(
    lambda y: (y['label'])
)
test_labels = [l.numpy() for l in _test_labels]

tf_preds = model.predict(test_images)

I get the following output:

print(test_labels)
[67, 84, 57, 12, 88, 32, 55, 9, 68, 99, 10, 1, 60, 52, 96, 33, 108, 71, 11, 75, 77, 9, 50, 19, 41, 118, 30...]

print(tf_preds)
[12  65   0  65  56  65  91  65  12  50   0  65   0  65   0   0  65  65   58  65   0  65   0   0  65  65  65  65  65  50  65  65  65  97  65  65...]

Here's the confusion matrix for this:

I initially thought the model was not training, or was overfitting to the training data, but if I iterate over the data one sample at a time, I get this:

_preds = []
_actuals = []

for dog, label in zip(dataset['test'].take(n), test_labels[:n]):

    pic, _ = preprocess(dog)  # Convert to float32 and resize

    img_tensor = tf.expand_dims(pic,0)
    pred = model(img_tensor)

    top_components = tf.reshape(tf.math.top_k(pred, k=5).indices,shape=[-1])
    top_matches = [get_name(i) for i in top_components]
    actual = get_name(label)

    _preds.append(top_components[0])
    _actuals.append(label)

print(_preds)
[ 67  84  57   8  88  32  55   9  68  99  15   1  17  52  96  33 108  71...]

Which as you can see if very close to the ground truth labels:

print(test_labels)
[67, 84, 57, 12, 88, 32, 55, 9, 68, 99, 10, 1, 60, 52, 96, 33, 108, 71, 11, 75, 77, 9, 50, 19, 41, 118, 30...]

I know this is using top_k(), but the output is the same with argmax()

Here's the confusion matrix for this, which is aligned with what I would expect for a good model accuracy:

For completeness, here's model.evaluate() on the validation data which is inline with expected results:

model.evaluate(val_batches)
269/269 [==============================] - 175s 637ms/step - loss: 1.0301 - accuracy: 0.7336 - top_k_categorical_accuracy: 0.9541
Out[93]:
[1.0301110744476318, 0.7335664629936218, 0.954079270362854]

My question is why would this be? This is the same model making predictions, it is not erroring because the data is malformed or has incorrect dimensions, it runs through the same pre-processing code for both scenarios, yet one is working as expected and one is wildly inaccurate, like the model.predict() is using an untrained model?

分享到QQ

分享到微博