哈希集比Rust中的蛮力慢
我正在做一些Rust的编码练习。 我发现测试结果有些奇怪。也许我误解了一些东西。
测试结果
在功能中经过的时间是:2.387µs
在功能上经过的时间标签是:24.413µS //为什么需要相对长的时间?
在功能生锈的功能中经过的时间为:1.13µs
测试阵列:: contains_common_item :: test_time_measure ...确定
use std::collections::HashSet;
// O(N^2) O(1)
fn contains_common_item_bruteforce(arr1: Vec<char>, arr2: Vec<char>) -> bool {
for arr1_item in arr1.iter() {
for arr2_item in arr2.iter() {
if arr1_item == arr2_item {
return true;
}
}
}
false
}
// O(2N) O(N)
fn contains_common_item_hashset(arr1: Vec<char>, arr2: Vec<char>) -> bool {
let mut contained_items_map = HashSet::new();
// iter() iterates over the items by reference
// iter_mut() iterates over the items, giving a mutable reference to each item
// into_iter() iterates over the items, moving them into the new scope
for item in arr1.iter() {
contained_items_map.insert(*item);
}
for item in arr2.iter() {
if contained_items_map.contains(item) {
return true;
}
}
false
}
fn contains_common_item_rusty(arr1: Vec<char>, arr2: Vec<char>) -> bool {
arr1.iter().any(|x| arr2.contains(x)) // contains O(n)
}
#[test]
fn test_time_measure() {
let arr1 = vec!['a', 'b', 'c', 'x'];
let arr2 = vec!['z', 'y', 'i'];
let start = std::time::Instant::now();
contains_common_item_bruteforce(arr1, arr2);
let duration: std::time::Duration = start.elapsed();
eprintln!("Time elapsed in function bruteforce is: {:?}", duration);
let arr1 = vec!['a', 'b', 'c', 'x'];
let arr2 = vec!['z', 'y', 'i'];
let start = std::time::Instant::now();
contains_common_item_hashset(arr1, arr2);
let duration: std::time::Duration = start.elapsed();
eprintln!("Time elapsed in function hashset is: {:?}", duration);
let arr1 = vec!['a', 'b', 'c', 'x'];
let arr2 = vec!['z', 'y', 'i'];
let start = std::time::Instant::now();
contains_common_item_rusty(arr1, arr2);
let duration: std::time::Duration = start.elapsed();
eprintln!("Time elapsed in function rusty is: {:?}", duration);
}
I'm doing some coding practice in Rust.
I found somewhat weird test result. Maybe I misunderstood something.
Test Result
Time elapsed in function bruteforce is: 2.387µs
Time elapsed in function hashset is: 24.413µs // Why it takes relatively long ?
Time elapsed in function rusty is: 1.13µs
test arrays::contains_common_item::test_time_measure ... ok
use std::collections::HashSet;
// O(N^2) O(1)
fn contains_common_item_bruteforce(arr1: Vec<char>, arr2: Vec<char>) -> bool {
for arr1_item in arr1.iter() {
for arr2_item in arr2.iter() {
if arr1_item == arr2_item {
return true;
}
}
}
false
}
// O(2N) O(N)
fn contains_common_item_hashset(arr1: Vec<char>, arr2: Vec<char>) -> bool {
let mut contained_items_map = HashSet::new();
// iter() iterates over the items by reference
// iter_mut() iterates over the items, giving a mutable reference to each item
// into_iter() iterates over the items, moving them into the new scope
for item in arr1.iter() {
contained_items_map.insert(*item);
}
for item in arr2.iter() {
if contained_items_map.contains(item) {
return true;
}
}
false
}
fn contains_common_item_rusty(arr1: Vec<char>, arr2: Vec<char>) -> bool {
arr1.iter().any(|x| arr2.contains(x)) // contains O(n)
}
#[test]
fn test_time_measure() {
let arr1 = vec!['a', 'b', 'c', 'x'];
let arr2 = vec!['z', 'y', 'i'];
let start = std::time::Instant::now();
contains_common_item_bruteforce(arr1, arr2);
let duration: std::time::Duration = start.elapsed();
eprintln!("Time elapsed in function bruteforce is: {:?}", duration);
let arr1 = vec!['a', 'b', 'c', 'x'];
let arr2 = vec!['z', 'y', 'i'];
let start = std::time::Instant::now();
contains_common_item_hashset(arr1, arr2);
let duration: std::time::Duration = start.elapsed();
eprintln!("Time elapsed in function hashset is: {:?}", duration);
let arr1 = vec!['a', 'b', 'c', 'x'];
let arr2 = vec!['z', 'y', 'i'];
let start = std::time::Instant::now();
contains_common_item_rusty(arr1, arr2);
let duration: std::time::Duration = start.elapsed();
eprintln!("Time elapsed in function rusty is: {:?}", duration);
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您的测试方法和期望都有一些问题。
首先:优化是可变的,CPU是可变的,缓存是可变的...在合并的测试中运行单个通过,固定值并不考虑这些变量。如果您需要实用的结果,则应使用适当的性能基准测试框架。查看使用 Criterion 。
另外,要么您的计算机很慢,要么您要在调试模式下进行测试。 Rust Playground分别给出
210NS
,4.25µs
和170NS
。在调试模式下进行基准测试是相当毫无用处的,因为该性能无法反映出在发行环境中的表现。其次,
Hashset
声称o(1)
访问时间,但是没有免费午餐之类的东西。一方面,这是一个可变大小的集合,必须在使用它之前构建。类似的原油测试表明,仅此步骤的成本与其他两个功能的全部成本高4倍。这将包括分配时间,哈希时间以及Hashset
的任何其他记录保存。您可能会惊讶于Big-O的复杂性,表明
Hashset
应该表现更好,但是您可能会缺少Big-O符号仅显示正在完成的工作的上限并表达如何表达该工作以n
的增长而扩展。在这里,您有固定的4-5个项目集,因此Big-O符号遗漏的固定成本将更加主导(例如HashSet
Creation)。而且Big-O符号还列出了每个n
实际使用的工作量,它需要更多的工作来计算哈希,查找存储桶,可能处理碰撞并检查是否存在项目;比较两个char
S所需要的。 Bruteforce方法的n
o(n^2)
在标签方法的o(n)
不直接可比。总而言之,如果您的用法意味着对小数据集进行交叉检查检查,那么Bruteforce很有可能会更快。但是,您应该在适当的基准测试中使用现实数据进行验证。
There are a few things wrong with both your testing methodology and with your expectations.
First of all: optimizations are variable, cpus are variable, caches are variable... Running a single pass, in a combined test, with fixed values is not accounting for these variables. You should be using a proper performance benchmarking test framework if you want practical results. Look into using criterion.
Also, either your computer is quite slow, or you're testing in debug mode. The Rust Playground gives
210ns
,4.25µs
, and170ns
respectively. Benchmarking in debug mode is fairly useless since the performance wouldn't reflect how it'd behave in a release environment.Second,
HashSet
purports anO(1)
access time, but there's no such thing as a free lunch. For one thing, it is a variable sized collection that must be built before you can even use it. A similar crude test shows that this step alone is 4x as costly as the other two functions in their entirety. This would include allocation time, hashing time, and any other record-keeping that theHashSet
does.You may have been surprised by the Big-O complexity indicating that the
HashSet
should perform better, but you're probably missing that Big-O notation only shows an upper-bound of work being done and expresses how the work extends asn
grows larger. Here, you have fixed sets of 4-5 items so the time will be more dominated by the fixed costs that Big-O notation leaves out (like theHashSet
creation). And Big-O notation also leaves out how much work eachn
actually uses, it takes much more work to compute a hash, look-up the bucket, potentially handle collisions, and check whether an item exists; than it takes to compare twochar
s. Then
in theO(n^2)
of the bruteforce method and theO(n)
of the hashset method are not directly comparable.In summary, if your usage means doing intersection checks on small datasets then there's a pretty good chance that bruteforce will be faster. But you should use realistic data in a proper benchmarking test to verify.