为什么 CondensedNearestNeighbour() 最终没有得到大数据?
我根据一个变量和目标,在 jupyter 笔记本中运行 CondensedNearestNeighbour() 欠采样方法,处理 100 万行。我认为这需要很长时间。快两天过去了,但仍然没有结果。
我真的不明白,如果它对大数据不起作用,它有什么作用。我需要欠采样来减少样本数量。我不想使用随机抽样。如果您有任何意见,我将不胜感激。我的代码示例如下:
X = df1[['var1']].to_numpy()
y=df1['target'].to_numpy()
counter = Counter(y)
undersample = CondensedNearestNeighbour(random_state=44, n_neighbors=1)
X1, y1 = undersample1.fit_resample(X, y)
sample_counter = Counter(y1)
I run CondensedNearestNeighbour() undersampling method in jupyter notebook for 1 million rows, according to one variable and target. I think it takes long time. Almost two days are over but, it is still running without result.
I really don't understand, if it doesn't work for huge data, what does it do. I need undersampling to reduce sample number. I don't want to use random sampling. If you have any opinion, i would appreciate. My code sample is below:
X = df1[['var1']].to_numpy()
y=df1['target'].to_numpy()
counter = Counter(y)
undersample = CondensedNearestNeighbour(random_state=44, n_neighbors=1)
X1, y1 = undersample1.fit_resample(X, y)
sample_counter = Counter(y1)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论