划分类:jenks 与 kmeans
我想将一个向量(长度约为 10^5)分为五个类。通过包 classInt
中的函数 classIntervals
,我想使用 style = "jenks"
自然中断,但这即使对于向量小得多,只有 500。设置 style = "kmeans"
几乎立即执行。
library(classInt)
my_n <- 100
set.seed(1)
x <- mapply(rnorm, n = my_n, mean = (1:5) * 5)
system.time(classIntervals(x, n = 5, style = "jenks"))
R> system.time(classIntervals(x, n = 5, style = "jenks"))
user system elapsed
13.46 0.00 13.45
system.time(classIntervals(x, n = 5, style = "kmeans"))
R> system.time(classIntervals(x, n = 5, style = "kmeans"))
user system elapsed
0.02 0.00 0.02
是什么让 Jenks 算法如此缓慢,是否有更快的方法来运行它?
如果需要,我会将问题的最后两部分移至 stats.stackexchange.com:
- 在什么情况下 kmeans 是 Jenks 的合理替代品?
- 通过在数据点的随机 1% 子集上运行 classInt 来定义类是否合理?
I want to partition a vector (length around 10^5) into five classes. With the function classIntervals
from package classInt
I wanted to use style = "jenks"
natural breaks but this takes an inordinate amount of time even for a much smaller vector of only 500. Setting style = "kmeans"
executes almost instantaneously.
library(classInt)
my_n <- 100
set.seed(1)
x <- mapply(rnorm, n = my_n, mean = (1:5) * 5)
system.time(classIntervals(x, n = 5, style = "jenks"))
R> system.time(classIntervals(x, n = 5, style = "jenks"))
user system elapsed
13.46 0.00 13.45
system.time(classIntervals(x, n = 5, style = "kmeans"))
R> system.time(classIntervals(x, n = 5, style = "kmeans"))
user system elapsed
0.02 0.00 0.02
What makes the Jenks algorithm so slow, and is there a faster way to run it?
If need be I will move the last two parts of the question to stats.stackexchange.com:
- Under what circumstances is kmeans a reasonable substitute for Jenks?
- Is it reasonable to define classes by running classInt on a random 1% subset of the data points?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
回答你原来的问题:
事实上,同时还有一种更快的方法来应用 Jenks 算法,即
BAMMtools
包中的setjenksBreaks
函数。但是,请注意,您必须以不同的方式设置中断数,即,如果您在
classInt
包的classIntervals
函数中将中断数设置为 5,则必须设置将BAMMtools
包中的setjenksBreaks
函数分成 6 个,以获得相同的结果。速度提升是巨大的,即
To answer your original question:
Indeed, meanwhile there is a faster way to apply the Jenks algorithm, the
setjenksBreaks
function in theBAMMtools
package.However, be aware that you have to set the number of breaks differently, i.e. if you set the breaks to 5 in the the
classIntervals
function of theclassInt
package you have to set the breaks to 6 thesetjenksBreaks
function in theBAMMtools
package to get the same results.The speed up is huge, i.e.
来自
?BAMMtools::getJenksBreaks
两个程序是一样的;由于其实现方式,其中一种比另一种更快(C 与 R)。
From
?BAMMtools::getJenksBreaks
The two programs are the same; one is faster than the other because of their implementation (C vs R).