当前位置：文江博客话题详情

博耶摩尔寻找小钥匙

发布于 2024-12-07 07:25:50 字数 165 浏览 5 评论 0原文

首先，我对算法知之甚少，所以请耐心等待。

据我了解，Boyer Moore 算法在长密钥下速度最快。如果我有一个非常短的密钥（例如 10 个字符）和大量要搜索的文本（超过 10,000 个字符）怎么办？ Boyer Moore 会是这种情况下的最佳搜索算法吗？

如果不是的话会是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

同尘 2024-12-14 07:25:50

根据字符串搜索算法，“Boyer–Moore 字符串搜索算法已成为实用字符串搜索文献。”它并不总是最快的，但总的来说这是可行的方法。

当您谈论像 10,000 个字符这样的小型文本缓冲区时，Knuth-Morris-Pratt 和 Boyer-Moore 的运行时间非常接近。当在现代计算机上运行时，即使是简单的字符串搜索在 10K 缓冲区上也会快得令人眼花缭乱。我怀疑您会发现 KMP 和 Boyer-Moore 两者在 10,000 个字符缓冲区中搜索 10 个字符的字符串之间的差异约为纳秒。

这种情况下最好的搜索算法是什么？这将取决于您需要调用它的频率。如果它每秒被调用几次（最多），我可能会编写一个简单的搜索并保留它。与程序的运行时间相比，Boyer-Moore 搜索和在那个小缓冲区上的朴素搜索之间的差异是微不足道的，并且您的优化工作最好花在其他地方。如果我必须每秒调用它数百或数千次，我会花时间编写优化的 Boyer-Moore 搜索。

回复收藏 0 原文

风启觞 2024-12-14 07:25:50

要回答您的问题，您只需访问一个链接： http ://www.codeproject.com/Articles/250566/Fastest-strstr-like-function-in-C

事实上最快的文本搜索器的主页是这样的：
http://www.sanmayce.com/Railgun/index.html

> ；如果我有一个非常短的密钥（例如 10 个字符）和大量要搜索的文本（超过 10,000 个字符）怎么办？
正是这个关键范围（10-11 个字符）在我的重型 (1TB) strstr 摊牌中进行了测试。在 300,000 个单词中搜索了 400,000 个单词，逐一搜索，这对于 strstr 函数来说是一个极好的负载！

>Boyer Moore 会是这种情况下的最佳搜索算法吗？
根据我的测试，最快的文本搜索器（使用 Microsoft V16 /Ox）是 Railgun_Quadruplet_7Gulliver，但当使用 /Ox 和 Intel 12.1 时，它是第二好的，您可以自己测试一下，见下文。

如果您有一台速度很快的机器，比如 i7 37*，我想看看您最新的控制台基准测试包的结果（测试 Microsoft v16 与 Intel 12.1 编译器）：
http://www.sanmayce.com/Downloads/_KAZE_Benchmark_2013.zip

在我的 T7500 上测试2200Mhz笔记本：

E:\_KAZE_Benchmark_2013>RUNME.bat

E:\_KAZE_Benchmark_2013>SKYFALL_TXT2HTML_MicrosoftV16.exe MIX_Tesla_BABAJI_Castaneda_Poirot_Holmes_Sunnah_Hadith_Quran_Bible_Patanjali_Dao_Simplici
ssimus.wrd.txt MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd /Gulliver
SKYFALL_TXT2HTML, revision 2, written by Kaze.
Size of 1st input file: 8916987
Size of 2nd input file: 3869529

Doing Search for each word with Railgun_Quadruplet_7Gulliver ...
Railgun_Quadruplet_7Gulliver performance: 1944+KB/clock
Average Pattern Length: 11
Function Invocations: 420,640
Function Inner-Loop Iterations: 131,873,881,926
Function Really Traversed: 1,142,740,439KB

E:\_KAZE_Benchmark_2013>SKYFALL_TXT2HTML_MicrosoftV16.exe MIX_Tesla_BABAJI_Castaneda_Poirot_Holmes_Sunnah_Hadith_Quran_Bible_Patanjali_Dao_Simplici
ssimus.wrd.txt MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
SKYFALL_TXT2HTML, revision 2, written by Kaze.
Size of 1st input file: 8916987
Size of 2nd input file: 3869529

Doing Search for each word with Railgun_Quadruplet_7 ...
Railgun_Quadruplet_7 performance: 1747+KB/clock
Average Pattern Length: 11
Function Invocations: 420,640
Function Inner-Loop Iterations: 146,826,792,351
Function Really Traversed: 1,142,740,439KB

E:\_KAZE_Benchmark_2013>SKYFALL_TXT2HTML_MicrosoftV16.exe MIX_Tesla_BABAJI_Castaneda_Poirot_Holmes_Sunnah_Hadith_Quran_Bible_Patanjali_Dao_Simplici
ssimus.wrd.txt MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd /Hasherezade
SKYFALL_TXT2HTML, revision 2, written by Kaze.
Size of 1st input file: 8916987
Size of 2nd input file: 3869529

Doing Search for each word with Railgun_Hasherezade ...
Railgun_Hasherezade performance: 1774+KB/clock
Average Pattern Length: 11
Function Invocations: 420,640
Function Inner-Loop Iterations: 128,900,655,391
Function Really Traversed: 1,142,740,439KB

使用32位和64位代码时有一些起伏也很重要微软与微软之间的利润率英特尔。

Gulliver的“引擎”是由我调校的2号BM-Horspool。

正如您在我简陋的笔记本电脑上看到的那样，Gulliver 以 1898MB/s 的速度搜索模式（平均模式长度：11），甚至是超级漂亮的 BNDM在这里弯曲膝盖。

To answer your question you need to visit one link only: http://www.codeproject.com/Articles/250566/Fastest-strstr-like-function-in-C

In fact the homepage of fastest textual searcher is this:
http://www.sanmayce.com/Railgun/index.html

>So what if I have a very shor key (example 10 chars), and alot of text to search (over 10,000 chars).
Exactly this key range (10-11 chars) is tested in my heavy (1TB) strstr showdown. There 400,000 words are searched in 300,000 words, ONE-BY-ONE, an excellent load for a strstr function!

>Would Boyer Moore be the best search algorithm for this scenario?
According to my tests the fastest text searcher (using Microsoft V16 /Ox) is Railgun_Quadruplet_7Gulliver, yet it is second best when /Ox is used and Intel 12.1, you can test it yourself, see below.

If you have a fast machine, say i7 37*, I would like to see your results of my latest console benchmark package (testing Microsoft v16 vs Intel 12.1 compilers):
http://www.sanmayce.com/Downloads/_KAZE_Benchmark_2013.zip

Test on my T7500 2200Mhz notebook:

E:\_KAZE_Benchmark_2013>RUNME.bat

E:\_KAZE_Benchmark_2013>SKYFALL_TXT2HTML_MicrosoftV16.exe MIX_Tesla_BABAJI_Castaneda_Poirot_Holmes_Sunnah_Hadith_Quran_Bible_Patanjali_Dao_Simplici
ssimus.wrd.txt MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd /Gulliver
SKYFALL_TXT2HTML, revision 2, written by Kaze.
Size of 1st input file: 8916987
Size of 2nd input file: 3869529

Doing Search for each word with Railgun_Quadruplet_7Gulliver ...
Railgun_Quadruplet_7Gulliver performance: 1944+KB/clock
Average Pattern Length: 11
Function Invocations: 420,640
Function Inner-Loop Iterations: 131,873,881,926
Function Really Traversed: 1,142,740,439KB

E:\_KAZE_Benchmark_2013>SKYFALL_TXT2HTML_MicrosoftV16.exe MIX_Tesla_BABAJI_Castaneda_Poirot_Holmes_Sunnah_Hadith_Quran_Bible_Patanjali_Dao_Simplici
ssimus.wrd.txt MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
SKYFALL_TXT2HTML, revision 2, written by Kaze.
Size of 1st input file: 8916987
Size of 2nd input file: 3869529

Doing Search for each word with Railgun_Quadruplet_7 ...
Railgun_Quadruplet_7 performance: 1747+KB/clock
Average Pattern Length: 11
Function Invocations: 420,640
Function Inner-Loop Iterations: 146,826,792,351
Function Really Traversed: 1,142,740,439KB

E:\_KAZE_Benchmark_2013>SKYFALL_TXT2HTML_MicrosoftV16.exe MIX_Tesla_BABAJI_Castaneda_Poirot_Holmes_Sunnah_Hadith_Quran_Bible_Patanjali_Dao_Simplici
ssimus.wrd.txt MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd /Hasherezade
SKYFALL_TXT2HTML, revision 2, written by Kaze.
Size of 1st input file: 8916987
Size of 2nd input file: 3869529

Doing Search for each word with Railgun_Hasherezade ...
Railgun_Hasherezade performance: 1774+KB/clock
Average Pattern Length: 11
Function Invocations: 420,640
Function Inner-Loop Iterations: 128,900,655,391
Function Really Traversed: 1,142,740,439KB

There are some up-and-downs when 32bit and 64bit code is used also significant margins between Microsoft & Intel.

Gulliver's "engine" is order 2 BM-Horspool tuned by me.

As you can see on my humble laptop Gulliver searches for patterns (Average Pattern Length: 11) at 1898MB/s, even the super beautiful BNDM bends a knee here.

回复收藏 0 原文

~没有更多了~