计算逻辑模型的混淆矩阵时出错

发布于 2025-01-12 19:04:58 字数 1153 浏览 0 评论 0原文

我在 r-studio 中创建了一个空逻辑模型。

nullModel <- glm(train$bigFire ~ 1, data = train, family = binomial)

然后要求模型对测试集进行预测。

nullModel.pred <- predict(nullModel, test, type = "response")

此时我想计算混淆矩阵以评估模型的性能。

CM <- table(test$bigFire, nullModel.pred>0.5)

结果输出如下：

    TRUE
  0   58
  1   46

即使我更改截止值（现在设置为 0.5），结果也始终相同。我不明白为什么，因为模型应该以具有不同截止值的不同方式执行。

数据集如下：

  month day FFMC  DMC    DC  ISI temp RH wind rain zone bigFire
1   mar fri 86.2 26.2  94.3  5.1  8.2 51  6.7  0.0   75       0
2   oct tue 90.6 35.4 669.1  6.7 18.0 33  0.9  0.0   74       0
3   oct sat 90.6 43.7 686.9  6.7 14.6 33  1.3  0.0   74       0
4   mar fri 91.7 33.3  77.5  9.0  8.3 97  4.0  0.2   86       0
5   mar sun 89.3 51.3 102.2  9.6 11.4 99  1.8  0.0   86       0
6   aug sun 92.3 85.3 488.0 14.7 22.2 29  5.4  0.0   86       0

共有 517 行。测试和训练是从之前的数据帧生成的，其中 80% 用于训练，20% 用于测试（104 行）。预测向量的长度为：

> length(nullModel.pred)
[1] 104

并且始终包含相同的值-> 0.542。这是合理的，因为它只能估计响应的期望值为 1。

原文

I created in r-studio a null logistic model.

nullModel <- glm(train$bigFire ~ 1, data = train, family = binomial)

Then it is asked to the model to make predictions on the test-set.

nullModel.pred <- predict(nullModel, test, type = "response")

At this point i want to compute the confusion matrix in order to evaluate the performances of the model.

CM <- table(test$bigFire, nullModel.pred>0.5)

The resulting output is the following:

    TRUE
  0   58
  1   46

Even if i change the cutoff value (now set to 0.5) the result is always the same. I don't understand why since the model should perform in a different way having different cutoff values.

The dataset is the following:

  month day FFMC  DMC    DC  ISI temp RH wind rain zone bigFire
1   mar fri 86.2 26.2  94.3  5.1  8.2 51  6.7  0.0   75       0
2   oct tue 90.6 35.4 669.1  6.7 18.0 33  0.9  0.0   74       0
3   oct sat 90.6 43.7 686.9  6.7 14.6 33  1.3  0.0   74       0
4   mar fri 91.7 33.3  77.5  9.0  8.3 97  4.0  0.2   86       0
5   mar sun 89.3 51.3 102.2  9.6 11.4 99  1.8  0.0   86       0
6   aug sun 92.3 85.3 488.0 14.7 22.2 29  5.4  0.0   86       0

It counts 517 rows.
The test and train are generated from the previous datafram with a split of 80% for train and 20% for test (104 rows).
The length of the prediction vector is: