当前位置：文江博客话题详情

机器学习预测分类

发布于 2024-11-19 23:22:34 字数 208 浏览 3 评论 0原文

我有以下问题。我有一个包含一系列数字的训练数据集。每个数字都属于某个类别。有五个班。

范围： 1...10

训练数据集： {1,5,6,6,10,2,3,4,1,8,6,...}

类： [1,2][3,4][5,6][7,8][9,10]

是否可以使用机器学习算法来查找类别预测的可能性以及哪种算法适合于此？

最好的，美国

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

蹲在坟头点根烟 2024-11-26 23:22:34

正如问题评论中所述，
我想根据给定的训练集分布计算某个类出现的可能性，
这个问题很微不足道，几乎不是机器学习问题：
简单地统计“训练集”中每个类别的出现次数，Count_12、Count_34、...Count_910。给定类 xy 出现的可能性简单地由

   P(xy) = Count_xy  / Total Number of elements in the "training set"
         = Count_xy  / (Count_12 + Count_34 + Count_56 + Count_78 + Count_910)

一个更有趣的问题...
给出
...将把训练集视为一个序列，并猜测该序列中的下一项是什么。下一个项目来自给定类别的概率不仅基于该类别的先验（上面计算的 P(xy)），而且还会考虑序列中位于该类别之前的项目。这个问题的有趣部分之一就是弄清楚要查看“多远”以及给予前面的项目序列多少“权重”。

编辑（现在OP表示他/她对“更有趣的问题”感兴趣）。
这个“给定前序预测”问题几乎直接映射到
machine-learning-algorithm-for-predicting-order-of -StackOverflow事件问题。
细微的差别在于，这里的字母表有 10 个不同的代码（另一个问题中有 4 个），而且这里我们尝试预测代码的类，而不仅仅是代码本身。对于这里每个类 2 个代码的聚合，我们有几种选择：

从一开始就使用类，即用其类替换序列中读取的每个代码，并且只考虑并保留然后继续上课。
仅使用代码，即创建 1-thru-10 个代码的预测器，并且仅考虑最后的类别，添加构成类别的两个代码的概率，以产生下一个项目属于该类别的可能性班级。
一些混合解决方案：考虑/使用代码，但有时聚合到类。

我个人的选择是首先尝试使用代码预测器（仅在最后聚合），如果从最初的尝试中获得的某种见解告诉我们可以简化或改进逻辑或其性能，则可以从那里进行调整我们更早进行汇总。事实上，可以使用完全相同的预测器来尝试这两种方法，只需更改输入流，将所有偶数替换为其前面的奇数。我猜测当我们提前聚合时，有价值的信息（为了猜测即将到来的代码）会丢失。

As described in the question's comment,
I want to calculate the likelihood of a certain class to appear based on the given distribution of the training set,
the problem is trivial and hardly a machine learning one:
Simply count the number of occurrences of each class in the "training set", Count_12, Count_34, ... Count_910. The likelihood that a given class xy would appear is simply given by

   P(xy) = Count_xy  / Total Number of elements in the "training set"
         = Count_xy  / (Count_12 + Count_34 + Count_56 + Count_78 + Count_910)

A more interesting problem...
...would be to consider the training set as a sequence and to guess what would the next item in that sequence be. The probability that the next item be from a given category would then not only be based on the prior for that category (the P(xy) computed above), but it would also be taking into account the items which precede it in the sequence. One of the interesting parts of this problem would then be to figure out how "far back" to look and much "weight" to give to the preceding sequences of items.

Edit (now that OP indicated his/her interest for the "more interesting problem").
This "prediction-given-preceding-sequence" problem maps almost directly to the
machine-learning-algorithm-for-predicting-order-of-events StackOverflow question.
The slight differences being that the alphabet here has 10 distinct code (4 in the other question) and the fact that here we try and predict a class of codes, rather that just the code itself. With regards to this aggregation of, here, 2 codes per class, we have several options:

work with classes from the start, i.e. replace each code read in the sequence by its class, and only consider and keep track of classes then on.
work with codes only, i.e. create a predictor of 1-thru-10 codes, and only consider the class at the very end, adding the probability of the two codes which comprise a class, to produce the likelihood of the next item being of that class.
some hybrid solution: consider / work with the codes but sometimes aggregate to the class.

My personal choice would be to try first with the code predictor (only aggregating at the very end), and maybe adapt from there if somehow insight gained from this initial attempt were to tell us that the logic or its performance could be simplified or improved would we aggregate earlier. Indeed the very same predictor could be used to try both approaches, one would simply need to alter the input stream, replacing all even numbers by the odd number preceding it. I'm guessing that valuable information (for the purpose of guessing upcoming codes) is lost when we aggregate early.

回复收藏 0 原文

~没有更多了~