机器学习 - 使用不平衡数据训练模型

发布于 2025-02-11 04:26:39 字数 641 浏览 3 评论 0原文

我的数据中有两个类。

这就是班级分布的样子。

0.0    169072
1.0     84944

换句话说，我有2：1班级分布。

我相信我有两个选择。下样本类0.0或upsample class 1.0。如果我选择选项1，我将丢失数据。如果我使用选项2，那么我正在使用非现实数据。

有没有办法的，我可以在不采样或下样本的情况下训练模型？

这就是我的分类_repport的样子。

               precision    recall  f1-score   support

         0.0       0.68      1.00      0.81     51683
         1.0       1.00      0.00      0.00     24522

    accuracy                           0.68     76205
   macro avg       0.84      0.50      0.40     76205
weighted avg       0.78      0.68      0.55     76205

原文

I have two classes in my data.

This is how class distribution looks like.

0.0    169072
1.0     84944

In other words, I have 2:1 class distribution.

I believe I have two choices. Downsample the class 0.0 or upsample class 1.0. If I go with option 1, I'm losing data. If i go with option 2, then I'm using non-real data.

Is there a way, I can train the model without upsample or downsample?

This is how my classification_report looks like.

               precision    recall  f1-score   support

         0.0       0.68      1.00      0.81     51683
         1.0       1.00      0.00      0.00     24522

    accuracy                           0.68     76205
   macro avg       0.84      0.50      0.40     76205
weighted avg       0.78      0.68      0.55     76205

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

以可爱出名 2025-02-18 04:26:39

您的数据有些不平衡，但这并不意味着您只有两个选项之一（在或示例数据下）。您可以按原样保留数据，并在模型中应用成本敏感培训。例如，如果在您的情况下，类具有2：1的匹配，那么您需要将权重 2给您的少数班级。在XGBoost分类器的示例中，此参数称为scale_pos_weight。在此出色的

关于模型评估，您应该使用分类报告对模型的真实和虚假预测具有完整的直觉（精度和回忆是您在此过程中的两个最好的朋友！）。

回复收藏 0 原文