将项目集合分类到存储桶中的最有效方法是什么?

发布于 2024-10-06 16:45:23 字数 126 浏览 12 评论 0原文

我有一个任意哈希数组,其中哈希元素是一个整数(称为“id”)。我想将这些哈希值排序到多个桶中(在数组上恒定),其中每个桶是任意范围的“id”(例如1-10、15-20、20-30)。执行此操作的最佳排序策略是什么?是否可以不使用嵌套循环?

I have an array of arbitrary hashes, with an element of the hash an integer (call it 'id'). I want to sort these hashes into a number of buckets (constant over the array), where each bucket is an arbitrary range of 'ids' (e.g. 1-10, 15-20, 20-30). What is the best sorting strategy to do this? Is it possible to do without a nested loop?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

じее 2024-10-13 16:45:23

如果存储桶的数量很少,那么使用嵌套循环可能会更好。外部循环遍历哈希值,内部循环遍历存储桶。 O(n*m)

如果散列的数量和存储桶的数量很大,您可以:

hashes = sort(hashes)
buckets = sort(buckets) # sort by lower-bound of bucket
i = 0

foreach (hash in hashes) {
  while (buckets[i].lower_bound > hash) {
    i = i + 1
  }
  bucket[i].add(hash)
}

基本上循环遍历散列,将它们添加到当前存储桶,并在需要时前进到下一个存储桶。 O(n*log(n) + m*log(m))

If the number of buckets is small, you are probably better off with the nested loops. The outer loop over the hashes, and the inner over the buckets. O(n*m).

If the number of hashes, and the number of buckets are large, you can:

hashes = sort(hashes)
buckets = sort(buckets) # sort by lower-bound of bucket
i = 0

foreach (hash in hashes) {
  while (buckets[i].lower_bound > hash) {
    i = i + 1
  }
  bucket[i].add(hash)
}

The basically loops through the hashes adding them to the current bucket and advancing to the next bucket when needed. O(n*log(n) + m*log(m))

梦太阳 2024-10-13 16:45:23

如果哈希质量良好,它们将表现出均匀分布,因此您可以使用均匀分布的存储桶在一次传递中对集合进行分区。

如果您还希望哈希值在存储桶中排序,请在所有内容都存储在存储桶中后使用正常的排序算法。然而,这对于哈希值来说是一种不寻常的使用。 (如果您不想在存储桶内排序,那么“排序”这个词用词不当。您真正想要的是分区。)

If the hashes are good quality, they will exhibit an even distribution, so you can use evenly-distributed buckets to partition the collection in a single pass.

If you also want the hashes sorted within the buckets, use a normal sorting algorithm after everything is in buckets. This would be an unusual use of hashes, however. (If you aren't trying to sort within buckets, then the word "sort" is a misnomer. What you really wanted was partitioning.)

阳光的暖冬 2024-10-13 16:45:23

您没有提到语言/平台,而是为了提高击键效率(C#):

        var histogram = new[] { 0, 10, 15, 20, 30, 40 };
        var values = new[] { 12, 14, 5, 6, 7, 1, 34, 26, 17 };
        var bars = values.GroupBy(v => histogram.First(b => v < b));

You don't mention a language/platform, but for efficient in terms of keystrokes (C#):

        var histogram = new[] { 0, 10, 15, 20, 30, 40 };
        var values = new[] { 12, 14, 5, 6, 7, 1, 34, 26, 17 };
        var bars = values.GroupBy(v => histogram.First(b => v < b));
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文