knnsearch 从 Matlab 到 Julia
我正在尝试使用 NearestNeighbors.jl 包在 Julia 中运行最近邻居搜索。相应的Matlab代码是
X = rand(10);
Y = rand(100);
Z = zeros(size(Y));
Z = knnsearch(X, Y);
这生成Z,一个长度为100的向量,其中第i个元素是X的索引,其元素最接近Y中的第i个元素,对于所有i=1:100。
确实需要一些帮助才能将上面的 Matlab 代码的最后一行转换为 Julia!
I am trying to run a nearest neighbour search in Julia using NearestNeighbors.jl package. The corresponding Matlab code is
X = rand(10);
Y = rand(100);
Z = zeros(size(Y));
Z = knnsearch(X, Y);
This generates Z, a vector of length 100, where the i-th element is the index of X whose element is nearest to the i-th element in Y, for all i=1:100.
Could really use some help converting the last line of the Matlab code above to Julia!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
使用:
如果您想在将来重用中间
KDTree
对象,那么存储它会很有用(因为它将提高查询效率)。现在我的例子的关键点是什么? NearestNeighbors.jl 接受以下输入数据:
我已经使用了第一种方法。要点是观察结果必须在列中(而不是像原始代码中那样在行中)。请记住,在 Julia 中,向量是柱状的,因此
rand(10)
被 NearestNeighbors.jl 视为具有 10 个维度的 1 个观测值,而rand(1, 10)
是被认为是 10 个观测值,每个观测值具有 1 个维度。但是,对于您的原始数据,因为您只需要最近邻,并且它是单维且很小,所以足以编写(这里我假设
X
和Y
是原始数据您存储在向量中的数据):无需使用任何额外的包。
NearestNeighbors.jl 对于处理具有大量元素的高维数据非常有效。
Use:
The storing the intermediate
KDTree
object would be useful if you wanted to reuse it in the future (as it will improve the efficiency of queries).Now what is the crucial point of my example. The NearestNeighbors.jl accepst the following input data:
I have used the first approach. The point is that observations must be in columns (not in rows as in your original code). Remember that in Julia vectors are columnar, so
rand(10)
is considered to be 1 observation that has 10 dimensions by NearestNeighbors.jl, whilerand(1, 10)
is considered to be 10 observations with 1 dimension each.However, for your original data since you want a nearest neighbor only and it is single-dimensional and is small it is enough to write (here I assume
X
andY
are original data you have stored in vectors):without using any extra packages.
The NearestNeighbors.jl is very efficient for working with high-dimensional data that has very many elements.