在多个文本上搜索相同正则表达式的有效方法

发布于 2024-12-19 14:27:12 字数 232 浏览 0 评论 0原文

我有多个文本字段，每个字段都是文本段落，我想使用正则表达式在这些字段上搜索特定模式：

my $text1 =~/(my pattern)/ig;
my $text2 =~/(my pattern)/ig;
...
my $textn=~/(my pattern)/ig;

我想知道是否有一种有效的方法可以在 perl 或 I 上使用相同的正则表达式搜索多个文本应该使用上面的格式吗？

原文

I have multiple texts fields every field is paragraph of text and I want to search for a specifc pattern on those fields using regular expression for example:

my $text1 =~/(my pattern)/ig;
my $text2 =~/(my pattern)/ig;
...
my $textn=~/(my pattern)/ig;

I wonder if there are an effecint way to search multiple text with the same regular expression on perl or I should use the above format?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

似狗非友 2024-12-26 14:27:13

my $pattern = qr/((?:i)my pattern)/;
my @matches;
push @matches, $text1 =~ /$pattern/g;
push @matches, $text2 =~ /$pattern/g;
push @matches, $textn =~ /$pattern/g;

这与我能想到的一样高效 - 理论上预编译正则表达式一次，尽管我不确定将其插回 // 以获取“g”修饰符是否会撤消任何编译。当然，我也想知道这是否真的是一个瓶颈，以及您是否只是在考虑一些过早的优化。

my $pattern = qr/((?:i)my pattern)/;
my @matches;
push @matches, $text1 =~ /$pattern/g;
push @matches, $text2 =~ /$pattern/g;
push @matches, $textn =~ /$pattern/g;

That's about as efficient as I can think of - theoretically pre-compiles the regex once, though I'm not sure if interpolating it back into // to get the 'g' modifier undoes any of that compilation. Of course, I also have to wonder if this is really a bottleneck, and if you're just looking at some premature optimisation.

回复收藏 0 原文

执笏见 2024-12-26 14:27:13

这个问题的答案取决于您的模式是否包含任何变量。如果没有，perl 已经足够聪明，只需构建一次 RE，只要它在各处都是相同的。

现在，如果您确实使用变量，那么 @Tanktalus 的答案很接近，但通过额外编译 RE 增加了不必要的复杂性。

使用这个：

my @matches;
push @matches, $text1 =~ /((?:i)my pattern with a $variable)/o;
push @matches, $text2 =~ /((?:i)my pattern with a $variable)/o;
push @matches, $textn =~ /((?:i)my pattern with a $variable)/o;

为什么？

通过在 RE 模式中使用变量，perl 被迫为每个实例重新编译，即使该变量是预编译的 RE（如 @Tanktalus 的答案中所示）。 /o 确保它只在第一次遇到时编译一次，但在代码中每次出现时仍然必须编译一次。这是因为 Perl 无法知道 $pattern 在不同用途之间是否发生了变化。

其他考虑因素

在实践中，正如 @Tanktalus 也所说，我怀疑这是一个过早优化的大案例。 /o/ 仅当您的模式包含变量时才重要（否则 Perl 足够聪明，无论如何都只能编译一次！）

正如 @Tanktalus 建议的那样，使用预编译 RE 的更有用的原因是提高代码可读性。如果你有一个大的 RE，那么在任何地方使用 $pattern 将大大提高可读性，并且只需要很小的性能成本（你不太可能注意到）。

结论

如果您的 RE 包含变量，只需使用 /o 即可（除非您实际上需要这些变量在每次运行时更改 RE），否则不用担心。

The answer to this question depends on whether your pattern contains any variables. If it does not, perl is already smart enough to only build the RE once, as long as it's identical everywhere.

Now, if you do use variables, then @Tanktalus's answer is close, but adds unnecessary complexity, by compiling the RE an additional time.

Use this:

my @matches;
push @matches, $text1 =~ /((?:i)my pattern with a $variable)/o;
push @matches, $text2 =~ /((?:i)my pattern with a $variable)/o;
push @matches, $textn =~ /((?:i)my pattern with a $variable)/o;

Why?

By using a variable in the RE pattern, perl is forced to re-compile for every instance, even when that variable is a pre-compiled RE as in @Tanktalus's answer. The /o ensures that it is only compiled once, the first time it's encountered, but it still must be compiled once for every occurence int he code. This is because Perl has no way of knowing if $pattern changed between the different uses.

Other considerations

In practice, as @Tanktalus also said, I suspect this is a big fat case of premature optimization. /o/ only matters when your pattern contains variables (otherwise perl is smart enough to only compile once anyway!)

The far more useful reason to use a pre-compiled RE as @Tanktalus has suggested, is to improve code readability. If you have a big hairy RE, then using $pattern everywhere will greatly improve readability, and with only a minor cost in performance (one you're not likely to ever notice).

Conclusion

Just use /o for your REs if they contain variables (unless you actually need the variables to change the RE on every run), and don't worry about it otherwise.

回复收藏 0 原文

旧时浪漫 2024-12-26 14:27:12

使用主题器。

for ($text1, $text2, $textn) {
    /(my pattern)/ig && do { ... };
}

当您有编号变量时，您应该考虑使用复合数据结构，这是一个危险信号。对于一个简单的数组，它看起来几乎相同：

for (@texts) {

Use a topicaliser.

for ($text1, $text2, $textn) {
    /(my pattern)/ig && do { ... };
}

When you have numbered variables, it's a red flag that you should consider a compound data structure instead. With a simple array it looks nearly the same: