使用卡方测试选择最佳的K功能

发布于 2025-02-10 12:07:27 字数 413 浏览 1 评论 0原文

我一直在尝试实现卡方功能选择，其中我选择了最佳的K功能或高度依赖于label的功能。

到目前为止，我正在这样做：

from scipy.stats import chi2_contingency

for col in all_cols:
    contingency_table = pd.crosstab(data[col] , y)
    stat, _, _ , _ = chi2_contingency(contingency_table.values)

然后，我选择了具有较高Stat值的顶部功能。由于Sklearn已经使用selectkbest（CHI2，...）提供了此功能。那么，我的实施是正确的还是与预建方法同步的？

原文

I have been trying to implement Chi-Square feature selection, wherein I select the best k features or the features that are highly dependent to the Label.

So far I am doing this:

from scipy.stats import chi2_contingency

for col in all_cols:
    contingency_table = pd.crosstab(data[col] , y)
    stat, _, _ , _ = chi2_contingency(contingency_table.values)

Then I am selecting the top features as the ones having higher stat values.
Since sklearn already provides this feature using SelectKBest(chi2,...).
So, is my implementation correct or in sync with the pre-built approach?

分享到QQ

分享到微博