如何优化模型中线性组合的权重?
我正在制作一个生成3个值的预测因子:每个预测的a,b,c。我已经在〜7000个样本的数据集上进行了预测,并构建了一个看起来像这样的熊猫数据框架:
示例 | a | b | c | 正确 |
---|---|---|---|---|
示例_1 | 0.8 | 0.4 | 0.4 0.9 | true |
sampe_2 | 0.2 0.9 | 0.9 | 0.5 | false |
sample_3 | 0.3 | 1.0 | 0.1 | true |
我想解释值A,B,C在我的预测方面判断预测的质量。我该怎么做?
我只能想到像这样的结合:x = a*a + b*b + c*c,x是对预测的信心。但是我不知道如何获得最佳权重a,b,c。
I am making a predictor that generates 3 values: A, B, C for each prediction. I have made predictions on a dataset of ~7000 samples and built a Pandas dataframe that looks like this:
Sample | A | B | C | Correct |
---|---|---|---|---|
Sample_1 | 0.8 | 0.4 | 0.9 | True |
Sample_2 | 0.2 | 0.9 | 0.5 | False |
Sample_3 | 0.3 | 1.0 | 0.1 | True |
I want to be able to interpret the values A, B, C in my predictor to judge the quality of a prediction. How do I do this?
I can only think of combining them like this somehow: X = a*A + b*B + c*C with X being a measure of confidence in the prediction. But I wouldn't know how to get the optimal weights a, b, c.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为执行此类任务的正确方法是遵循以下步骤:
编码“正确”列中的值以传递true -> 1和false-> -1并将数据集拆分为测试和训练。
训练一个随机森林从a,b,c进行分类。
在测试集上显示了使用predition_proba(x)的每个预测的概率,并使得平均值。更深入地,您可以将特征的重要性重要,并且知道A,B或C是最重要的。
随机森林上的DOC 。我认为您可以知道a,b,c在预测中的作用。之后,如果您想要其他方法,则可以尝试ANOVA测试,以查看A,B,C和目标之间是否具有独立性。
I think the right methodology for doing this type of task would be to follow these steps:
Encode the values in the "Correct" column to pass True -> 1 and False -> -1 and split the dataset into test and train.
Train a random forest to classify from A, B, C the target.
On the test set show the probability of each prediction with predict_proba(X) and make the mean. To go deeper you can the the feature importance and know wich of A, B or C is the most important.
Don't hesitate to see the doc on random forest here. I think this way you can know how A, B, C act in the prediction. After if you want other method you coud try ANOVA test to see if there is an independance between A, B, C and the target.