当大多数/所有属性都是离散且距离相等时,KD 树仍然有效吗?

发布于 2024-10-06 17:32:20 字数 191 浏览 0 评论 0原文

人们总是吹捧 KD 树非常适合最近邻搜索。但是,如果您的数据集都是离散值,没有实际距离度量,那么它们仍然有效吗?

例如,如果您的属性类似于[黑色,蓝色,红色],[面包,牛奶,奶酪],[右,左,直,弯曲],则没有连续性,并且唯一测量距离的方法是汉明距离(我们检查有多少与测试示例等效)。 KD 树在这些场景中仍然有效吗?怎么会?

It's always touted that KD trees are great for nearest neighbor searches. However, if your data set is all discrete values, with no real distance metric, are they still efficient?

For example, if your attributes were things something like [black, blue, red], [bread, milk, cheese], [right, left, straight, curved] There is no continuity, and the only way to measure distance would be hamming distance (where we check how many are equivalent to the testing example). Do KD trees still hold up efficiently in these scenarios? How come?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

清晨说晚安 2024-10-13 17:32:20

我认为,如果您的值集没有度量标准,那么考虑(最近的)“邻居”是什么可能是合适的。具体来说,如何在没有距离测量的情况下定义集合中的元素彼此距离近还是远?

话虽这么说,KD 树可以用于离散集。一些效率本质上来自于能够划分数据,这样我们就可以通过一次比较来消除元素块,就像任何其他平衡树一样。但是,最自然的用途是具有有用且有意义的拓扑的集合。

I think it might be appropriate to consider what a (nearest) "neighbor" would be if there is no metric on your set of values. Specifically, how does one define whether elements in the set are near or far from one another without a measure of distance?

That being said, KD-trees can work for discrete sets. Some of the efficiently essentially comes from being able to divide data so we can eliminate chunks of elements with one comparison, like any other balanced tree. But, the most natural use is on sets that have a useful and meaningful topology.

思念绕指尖 2024-10-13 17:32:20

KD 树仍然需要维度的概念。您的示例没有按照维度(离散与否)描述数据点,因此 KD 树不适用。此外,KD 树依赖于一些不等式,而这些数据到维度的映射可能不具有这些不等式。

话虽这么说,如果离散数据像前面提到的那样整齐地映射,那么它就不是问题——计算机只存储离散近似值。

KD trees still require a notion of dimensions. Your examples do not describe data points in terms of dimensions, discrete or not, so a KD tree does not apply. Furthermore, KD trees rely on some inequalities that a mapping of such data onto dimensions may not have.

That being said, discrete data isn't a problem if it maps neatly as aforementioned -- computers only store discrete approximations.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文