任何用于检查字符串是否包含给定子字符串的 perl 标准库

发布于 2024-12-05 02:34:55 字数 138 浏览 4 评论 0原文

给定一个查询,我想检查它是否包含给定的子字符串(可以包含多个单词)。但我不想进行详尽的搜索,因为这个子字符串只能开始一个新单词。

有没有 Perl 标准库可以实现这一点,这样我就可以得到一些高效的东西,而不必重新发明轮子?

谢谢,

Given a query, I would like to check if this contains a given substring (can contain more than one word) . But I don't want exhaustive search, because this substring can only start a fresh word.

Any perl standard libraries for this, so that I get something efficient and don't have to reinvent the wheel?

Thanks,

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

泪眸﹌ 2024-12-12 02:34:55

也许您会发现内置的 index() 适合这项工作。

这是一个非常快速的子字符串搜索功能(实现了 Boyer-Moore 算法)。

只需使用 perldoc -f index 检查其文档即可。

Maybe you'll find builtin index() suited for the job.

It's a very fast substring search function ( implements the Boyer-Moore algorithm ).

Just check its documentation with perldoc -f index.

探春 2024-12-12 02:34:55

我将创建一个哈希,其键是 9000 个子字符串的第一个单词,值是包含该第一个单词的所有子字符串的数组。如果许多字符串包含相同的第一个单词,则可以使用前两个单词。

然后,对于每个查询,对于每个单词,我会查看该单词是否在哈希中,然后只需要匹配哈希数组中的那些字符串,从使用索引函数的字符串中的该点开始。

假设匹配是稀疏的,这将非常有效。每个单词进行一次哈希查找,并最少搜索潜在匹配项。

当我写这篇文章时,它让我想起了 Aho-Corasick 搜索。 (请参阅 CPAN 中的 Algorithm::AhoCorasick。)我从未使用过该模块,但该算法花费了大量时间从搜索键构建有限状态机,因此查找匹配非常高效。我不知道 CPAN 实现是否处理字边界问题。

I would make a hash with the key being the first word of the 9000 substrings and the value an array with all substrings with that first word. If many strings contain the same first word, you could use the first two words.

Then for each query, for each word, I would see if that word is in the hash, and then need to match only those strings in the hash's array, starting at that point in the string using the index function.

Assuming that matching is sparse, this would be pretty efficient. One hash lookup per word and minimal searching for potential matches.

As I write this it reminds me of an Aho-Corasick search. (See Algorithm::AhoCorasick in CPAN.) I've never used the module, but the algorithm spends a lot of time building a finite state machine out of the search keys so finding a match is super efficient. I don't know if the CPAN implementation handles word boundaries issues.

°如果伤别离去 2024-12-12 02:34:55

您可以使用这种方法:

# init
my $re = join"|", map quotemeta, sort @substrings;
$re = qr/\b(?:$re)/;

# usage
while (<>) {
  found($1) if /($re)/;
}

其中 found 是找到子字符串时您想要执行的操作。

You can use this approach:

# init
my $re = join"|", map quotemeta, sort @substrings;
$re = qr/\b(?:$re)/;

# usage
while (<>) {
  found($1) if /($re)/;
}

where found is action what you want to do if substring found.

空袭的梦i 2024-12-12 02:34:55

内置 index 函数是检查字符串是否包含子字符串的最快通用方法。

my $find = 'abc';

my $str = '123 abc xyz';

if (index($str, $find) != -1) {
    # process matching $str here
}

如果 index 仍然不够快,并且您知道子字符串可能位于字符串中的位置,则可以使用 substr 缩小范围,然后使用 eq< /code> 进行实际比较:

my $find = 'abc';

my $str = '123 abc xyz';

if (substr($str, 4, 3) eq $find) {
    # process matching $str here
}

如果不降级到 C,你不会比 Perl 更快。

The builtin index function is the fastest general purpose way to check if a string contains a substring.

my $find = 'abc';

my $str = '123 abc xyz';

if (index($str, $find) != -1) {
    # process matching $str here
}

If index still is not fast enough, and you know where in the string your substring might be, you can narrow down on it using substr and then use eq for the actual comparison:

my $find = 'abc';

my $str = '123 abc xyz';

if (substr($str, 4, 3) eq $find) {
    # process matching $str here
}

You are not going to get faster than that in Perl without dropping down to C.

绝影如岚 2024-12-12 02:34:55

这听起来像是正则表达式的完美工作:

if($string =~ m/your substring/) { 
    say "substring found"; 
} else { 
    say "nothing found"; 
}

This sounds like the perfect job for regular expressions:

if($string =~ m/your substring/) { 
    say "substring found"; 
} else { 
    say "nothing found"; 
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文