多标签分类做对了吗?
假设我有一个数据集,可以使用 weka 的 J48 或 R 中的 randomForest 对其进行整齐分类。 现在假设我有另一个训练文件,其中每个数据点包含两个分类。
我如何将这两者结合起来才能将新数据点分类为这两类?
(所以我需要“两次通过”训练。)
我应该使用 MLP(如受限玻尔兹曼机)吗?
Let's say I have a dataset, which can be neatly classified using weka's J48 or randomForest in R.
Now let's say I have an other training file, which contains two classifications per datapoint.
How could I combine these two to be able to classify new data points into these two classes?
(So I'd need a "two-pass" training.)
Should I use a MLP (like a restricted Bolzmann machine) instead?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我假设您的两个数据集如下所示...
数据集 1:
数据集 2:
假设这就是您的问题,我会将其分为两个问题:预测两个不同标签的问题。我认为这可以通过概率公式来证明:
其中 L1 和 L2 是两个类标签,X 是数据。
我的建议是使用数据集 1 和 2 以及 L1 作为目标变量训练 p(L1|X) 模型,然后使用数据集 2 和 L1 训练 p(L2|L1,X) 模型,以 L2 作为目标变量多变的。要预测一对新标签,您应用第一个模型来获取 L1 的估计值,然后应用第二个模型使用 L1 的估计值来获取 L2 的估计值。
我认为反对这种方法的一个论点是,虽然公式是正确的,但 p(L1,L2|X) 可能比 p(L2|L1,X) 和 p(L1|X) 更容易学习。然而,在没有更多细节的情况下,我真的不知道。
I'm assuming your two data sets look like this...
Data set 1:
Data set 2:
Assuming that is what your problem looks like, I would split it into two problems: that of predicting the two different labels. I think this can be justified by the probability formula:
where the L1 and L2 are the two class labels and X is the data.
My suggestion is to train a model for p(L1|X) using datasets 1 and 2 and L1 as your target variable and then train a model of p(L2|L1,X) using dataset 2 and L1, with L2 as your target variable. To predict a new pair of labels, you apply the first model to get an estimate of L1 and then the second model using the estimate of L1 to get an estimate of L2.
I suppose an argument against this approach is that, although the formula is true, it may be the case that p(L1,L2|X) is easier to learn than p(L2|L1,X) and p(L1|X). However, in the absence of more details I really don't know.