Pycaret 不能很好地管理多重共线性

发布于 2025-01-14 06:14:25 字数 1369 浏览 1 评论 0原文

我在 Pycaret 库的输入中有一个 Panda Dataframe df 。 所以 df 具有:

3 categoricals variables:
    LIB_SOURCE  : values: 'arome_001', 'gfs_025' and 'arpege_01'
    MonthNumber : values from 1 to 12
    origine     : 'Sencrop' and 'Visiogreen' values

3 continuous variables : 

    TEMPERATURE_PREDITE  DIFF_HOURS  TEMPERATURE_OBSERVEE

我让 Pycaret 将分类特征编码为 0/1 并管理多重共线性:

regression = setup(data = dataset_predictions_meteo, 
                   target = 'TEMPERATURE_PREDITE', 
                   categorical_features = ['MonthNumber' , 'origine' , 'LIB_SOURCE'],
                   numeric_features = ['DIFF_HOURS' , 'TEMPERATURE_OBSERVEE'],  
                   session_id=123,
                   train_size=0.8, 
                   normalize=True, 
                   #transform_target=True,
                   remove_perfect_collinearity = True
                  )

在此处输入图像描述

在此处输入图像描述

但正如您在上面的屏幕中看到的,Pycaret不能很好地管理多重共线性:PyCaret 应该自行删除 3 列 'arome_001'、'gfs_025' 和 'arpege_01' 中的 1 列(get_config('X'))。 但 PyCaret 保留所有 3 列。

为什么 PyCaret 不删除 3 列之一? 谢谢。

I have a Panda Dataframe df in input to Pycaret library.
So the df has :

3 categoricals variables:
    LIB_SOURCE  : values: 'arome_001', 'gfs_025' and 'arpege_01'
    MonthNumber : values from 1 to 12
    origine     : 'Sencrop' and 'Visiogreen' values

3 continuous variables : 

    TEMPERATURE_PREDITE  DIFF_HOURS  TEMPERATURE_OBSERVEE

I let Pycaret encoding categorical features to 0/1 and manage multicollinearity:

regression = setup(data = dataset_predictions_meteo, 
                   target = 'TEMPERATURE_PREDITE', 
                   categorical_features = ['MonthNumber' , 'origine' , 'LIB_SOURCE'],
                   numeric_features = ['DIFF_HOURS' , 'TEMPERATURE_OBSERVEE'],  
                   session_id=123,
                   train_size=0.8, 
                   normalize=True, 
                   #transform_target=True,
                   remove_perfect_collinearity = True
                  )

enter image description here

enter image description here

But as you can see in the screen above, Pycaret doesn't well manage multicollinearity : PyCaret should remove by itself 1 of 3 columns 'arome_001', 'gfs_025' and 'arpege_01' (get_config('X')).
But PyCaret keeps all 3 columns.

Why PyCaret doesn't remove one of 3 columns?
Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

荒芜了季节 2025-01-21 06:14:25

多重共线性意味着两个或多个特征相关,这意味着它们的相关系数接近+1.0或-1.0。如果两个特征相关,那么它们会一起改变:如果一个特征发生变化,另一个特征也会发生变化(它们相互影响)。这种情况会对模型性能产生负面影响。 PyCaret 在内部管理多重共线性以实现性能良好的模型。

在多重共线性的情况下,可以使用PLS(偏最小二乘回归)和PCA(主成分分析)来消除特征之间的相关性。 PLS回归可以将特征减少到较小的一组特征(通过消除一些特征),这些特征之间没有相关性。另一方面,PCA 创建不相关的新特征(用不相关的新特征替换旧特征)。

我不太清楚为什么你认为应该删除 3 列 'arome_001'、'gfs_025' 和 'arpege_01' 中的 1 列,我的猜测是 PyCaret 按预期工作。

Multicollinearity means that two or more features are correlated, meaning that they have a correlation coefficient close to +1.0 or -1.0. If two features are correlated, then they change together: if one changes, also the other one changes (they affect each other). This situation affects the model performance negatively. PyCaret manages multicollinearity internally to achieve well-performing models.

In the case of multicollinearity, PLS (Partial Least Squares Regresssion), and PCA (Principal Component Analysis) can be used to remove correlation among the features. PLS regression can reduce the features to a smaller set of features (by eliminating some of the features) that have no correlation among them. On the other hand, PCA creates new features which are uncorrelated (it replaces the old features with the uncorrelated new features).

I am not very clear about why you think that 1 of 3 columns 'arome_001', 'gfs_025' and 'arpege_01' should be removed, my guess is that PyCaret works as expected.

风透绣罗衣 2025-01-21 06:14:25

我认为正在计算浮点数和整数的共线性。它们确实是绝对的。

I suppose that colinearity is being calculated for floats and integers. They are indeed categorical.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文