哈希集比Rust中的蛮力慢

发布于 2025-02-13 19:32:58 字数 2220 浏览 1 评论 0原文

我正在做一些Rust的编码练习。我发现测试结果有些奇怪。也许我误解了一些东西。

测试结果
在功能中经过的时间是：2.387µs
在功能上经过的时间标签是：24.413µS //为什么需要相对长的时间？
在功能生锈的功能中经过的时间为：1.13µs
测试阵列:: contains_common_item :: test_time_measure ...确定

use std::collections::HashSet;

// O(N^2) O(1)
fn contains_common_item_bruteforce(arr1: Vec<char>, arr2: Vec<char>) -> bool {
    for arr1_item in arr1.iter() {
        for arr2_item in arr2.iter() {
            if arr1_item == arr2_item {
                return true;
            }
        }
    }
    false
}

// O(2N) O(N)
fn contains_common_item_hashset(arr1: Vec<char>, arr2: Vec<char>) -> bool {
    let mut contained_items_map = HashSet::new();

    // iter() iterates over the items by reference
    // iter_mut() iterates over the items, giving a mutable reference to each item
    // into_iter() iterates over the items, moving them into the new scope
    for item in arr1.iter() {
        contained_items_map.insert(*item);
    }

    for item in arr2.iter() {
        if contained_items_map.contains(item) {
            return true;
        }
    }
    false
}

fn contains_common_item_rusty(arr1: Vec<char>, arr2: Vec<char>) -> bool {
    arr1.iter().any(|x| arr2.contains(x)) // contains O(n)
}

#[test]
fn test_time_measure() {
    let arr1 = vec!['a', 'b', 'c', 'x'];
    let arr2 = vec!['z', 'y', 'i'];
    let start = std::time::Instant::now();
    contains_common_item_bruteforce(arr1, arr2);
    let duration: std::time::Duration = start.elapsed();
    eprintln!("Time elapsed in function bruteforce is: {:?}", duration);

    let arr1 = vec!['a', 'b', 'c', 'x'];
    let arr2 = vec!['z', 'y', 'i'];
    let start = std::time::Instant::now();
    contains_common_item_hashset(arr1, arr2);
    let duration: std::time::Duration = start.elapsed();
    eprintln!("Time elapsed in function hashset is: {:?}", duration);

    let arr1 = vec!['a', 'b', 'c', 'x'];
    let arr2 = vec!['z', 'y', 'i'];
    let start = std::time::Instant::now();
    contains_common_item_rusty(arr1, arr2);
    let duration: std::time::Duration = start.elapsed();
    eprintln!("Time elapsed in function rusty is: {:?}", duration);
}

原文

I'm doing some coding practice in Rust.
I found somewhat weird test result. Maybe I misunderstood something.

Test Result
Time elapsed in function bruteforce is: 2.387µs
Time elapsed in function hashset is: 24.413µs // Why it takes relatively long ?
Time elapsed in function rusty is: 1.13µs
test arrays::contains_common_item::test_time_measure ... ok

use std::collections::HashSet;

// O(N^2) O(1)
fn contains_common_item_bruteforce(arr1: Vec<char>, arr2: Vec<char>) -> bool {
    for arr1_item in arr1.iter() {
        for arr2_item in arr2.iter() {
            if arr1_item == arr2_item {
                return true;
            }
        }
    }
    false
}

// O(2N) O(N)
fn contains_common_item_hashset(arr1: Vec<char>, arr2: Vec<char>) -> bool {
    let mut contained_items_map = HashSet::new();

    // iter() iterates over the items by reference
    // iter_mut() iterates over the items, giving a mutable reference to each item
    // into_iter() iterates over the items, moving them into the new scope
    for item in arr1.iter() {
        contained_items_map.insert(*item);
    }

    for item in arr2.iter() {
        if contained_items_map.contains(item) {
            return true;
        }
    }
    false
}

fn contains_common_item_rusty(arr1: Vec<char>, arr2: Vec<char>) -> bool {
    arr1.iter().any(|x| arr2.contains(x)) // contains O(n)
}

#[test]
fn test_time_measure() {
    let arr1 = vec!['a', 'b', 'c', 'x'];
    let arr2 = vec!['z', 'y', 'i'];
    let start = std::time::Instant::now();
    contains_common_item_bruteforce(arr1, arr2);
    let duration: std::time::Duration = start.elapsed();
    eprintln!("Time elapsed in function bruteforce is: {:?}", duration);

    let arr1 = vec!['a', 'b', 'c', 'x'];
    let arr2 = vec!['z', 'y', 'i'];
    let start = std::time::Instant::now();
    contains_common_item_hashset(arr1, arr2);
    let duration: std::time::Duration = start.elapsed();
    eprintln!("Time elapsed in function hashset is: {:?}", duration);

    let arr1 = vec!['a', 'b', 'c', 'x'];
    let arr2 = vec!['z', 'y', 'i'];
    let start = std::time::Instant::now();
    contains_common_item_rusty(arr1, arr2);
    let duration: std::time::Duration = start.elapsed();
    eprintln!("Time elapsed in function rusty is: {:?}", duration);
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

月亮是我掰弯的 2025-02-20 19:32:58

您的测试方法和期望都有一些问题。

首先：优化是可变的，CPU是可变的，缓存是可变的...在合并的测试中运行单个通过，固定值并不考虑这些变量。如果您需要实用的结果，则应使用适当的性能基准测试框架。查看使用 Criterion 。

另外，要么您的计算机很慢，要么您要在调试模式下进行测试。 Rust Playground分别给出210NS，4.25µs和170NS。在调试模式下进行基准测试是相当毫无用处的，因为该性能无法反映出在发行环境中的表现。

其次，Hashset声称o（1）访问时间，但是没有免费午餐之类的东西。一方面，这是一个可变大小的集合，必须在使用它之前构建。类似的原油测试表明，仅此步骤的成本与其他两个功能的全部成本高4倍。这将包括分配时间，哈希时间以及Hashset的任何其他记录保存。

您可能会惊讶于Big-O的复杂性，表明Hashset应该表现更好，但是您可能会缺少Big-O符号仅显示正在完成的工作的上限并表达如何表达该工作以n的增长而扩展。在这里，您有固定的4-5个项目集，因此Big-O符号遗漏的固定成本将更加主导（例如HashSet Creation）。而且Big-O符号还列出了每个n实际使用的工作量，它需要更多的工作来计算哈希，查找存储桶，可能处理碰撞并检查是否存在项目；比较两个char S所需要的。 Bruteforce方法的n o（n^2）在标签方法的o（n） 不直接可比。

总而言之，如果您的用法意味着对小数据集进行交叉检查检查，那么Bruteforce很有可能会更快。但是，您应该在适当的基准测试中使用现实数据进行验证。

There are a few things wrong with both your testing methodology and with your expectations.

First of all: optimizations are variable, cpus are variable, caches are variable... Running a single pass, in a combined test, with fixed values is not accounting for these variables. You should be using a proper performance benchmarking test framework if you want practical results. Look into using criterion.

Also, either your computer is quite slow, or you're testing in debug mode. The Rust Playground gives 210ns, 4.25µs, and 170ns respectively. Benchmarking in debug mode is fairly useless since the performance wouldn't reflect how it'd behave in a release environment.

Second, HashSet purports an O(1) access time, but there's no such thing as a free lunch. For one thing, it is a variable sized collection that must be built before you can even use it. A similar crude test shows that this step alone is 4x as costly as the other two functions in their entirety. This would include allocation time, hashing time, and any other record-keeping that the HashSet does.

You may have been surprised by the Big-O complexity indicating that the HashSet should perform better, but you're probably missing that Big-O notation only shows an upper-bound of work being done and expresses how the work extends as n grows larger. Here, you have fixed sets of 4-5 items so the time will be more dominated by the fixed costs that Big-O notation leaves out (like the HashSet creation). And Big-O notation also leaves out how much work each n actually uses, it takes much more work to compute a hash, look-up the bucket, potentially handle collisions, and check whether an item exists; than it takes to compare two chars. The n in the O(n^2) of the bruteforce method and the O(n) of the hashset method are not directly comparable.

In summary, if your usage means doing intersection checks on small datasets then there's a pretty good chance that bruteforce will be faster. But you should use realistic data in a proper benchmarking test to verify.

回复收藏 0 原文

~没有更多了~