为什么 CondensedNearestNeighbour() 最终没有得到大数据？

发布于 2025-01-10 16:48:32 字数 435 浏览 3 评论 0原文

我根据一个变量和目标，在 jupyter 笔记本中运行 CondensedNearestNeighbour() 欠采样方法，处理 100 万行。我认为这需要很长时间。快两天过去了，但仍然没有结果。

我真的不明白，如果它对大数据不起作用，它有什么作用。我需要欠采样来减少样本数量。我不想使用随机抽样。如果您有任何意见，我将不胜感激。我的代码示例如下：

X = df1[['var1']].to_numpy()
y=df1['target'].to_numpy()

 
counter = Counter(y)
undersample = CondensedNearestNeighbour(random_state=44, n_neighbors=1)
X1, y1 = undersample1.fit_resample(X, y)
sample_counter = Counter(y1)

原文

I run CondensedNearestNeighbour() undersampling method in jupyter notebook for 1 million rows, according to one variable and target. I think it takes long time. Almost two days are over but, it is still running without result.

I really don't understand, if it doesn't work for huge data, what does it do. I need undersampling to reduce sample number. I don't want to use random sampling. If you have any opinion, i would appreciate. My code sample is below:

X = df1[['var1']].to_numpy()
y=df1['target'].to_numpy()

 
counter = Counter(y)
undersample = CondensedNearestNeighbour(random_state=44, n_neighbors=1)
X1, y1 = undersample1.fit_resample(X, y)
sample_counter = Counter(y1)

分享到QQ

分享到微博