对于基于多个连续变量的二进制分类模型,应使用哪种模型?

发布于 2025-02-01 02:48:23 字数 354 浏览 6 评论 0 原文

我正在研究废水数据。每5分钟收集数据一次。这是示例数据。

提供了单个参数的阈值。我的问题是,我应该选择哪种模型将其分类为可用或不可用的,并且由于它是无法使用的(如果可能的话),则输出异常(因为它是变量的组合)。是/否的列尚未提供给我。

我还有另一个问题是,由于每5分钟收集数据,我该如何保持运行?

I am working on a waste water data. The data is collected every 5 min. This is the sample data.

enter image description here

The threshold of the individual parameters is provided. My question is what kind of models should I go for to classify it as usable or not useable and also output the anomaly because of which it is unusable (if possible since it is a combination of the variables). The column for yes/no is yet to be and will be provided to me.

The other question I have is how do I keep it running since the data is collected every 5 minutes?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

世界等同你 2025-02-08 02:48:23

您的数据和用例似乎适合决策树分类器。决策树很容易训练和解释(这是您的要求之一,因为您想知道为什么给定样品分类为可用或不可用),不需要大量的标记数据,可以培训并用于预测在大多数Haedware上,并且非常适合没有缺失值和低维度的结构化数据。它们也可以很好地工作,而无需使您的变量归一化。

Scikit Learn是超级成熟且易于使用的,因此您应该能够在没有太多麻烦的情况下完成工作。

关于时间,我不确定您或您的员工将如何取样,所以我不知道。如果您要以此速度获取和阅读样本,则使用模型来标记数据应该不是问题,但是我不确定我是否了解您的情况。

请注意,Stackoverflow的目的是针对“这是我的代码,我该如何修复?”的形式问题,而不是涉及这样的一般问题。还有其他专门致力于统计和数据科学的Stackexhange网站。如果您在这里找不到所需的东西,也许可以尝试其他网站!

Your data and use case seem fit for a decision tree classifier. Decision trees are easy to train and interpret (which is one of your requirements, since you want to know why a given sample was classified as usable or not usable), do not require large amounts of labeled data, can be trained and used for prediction on most haedware, and are well suited for structured data with no missing values and low dimensionality. They also work well without normalizing your variables.

Scikit learn is super mature and easy to use, so you should be able to get something working without too much trouble.

As regards time, I'm not sure how you or your employee will be taking samples, so I don't know. If you will be getting and reading samples at that rate, using your model to label data should not be a problem, but I'm not sure if I understood your situation.

Note stackoverflow is aimed towards questions of the form "here's my code, how do I fix this?", and not so much towards general questions such as this. There are other stackexhange sites specially dedicated to statistics and data science. If you don't find here what you need, maybe you can try those other sites!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文