对于改进(优化)Perl 代码中现有的字符串替换有什么建议吗?

发布于 2024-11-01 15:49:37 字数 1061 浏览 0 评论 0原文

Perl 5.8

对现有 Perl 脚本中相当简单的字符串替换进行了改进。
代码的意图很明确,并且代码正在运行。

对于给定字符串,将每个出现的 TAB、LF 或 CR 字符替换为一个空格,并将每个出现的双引号替换为两个双引号。以下是现有代码的片段:


# replace all tab, newline and return characters with single space
$val01  =~s/[\t\n\r]/ /g;
$val02  =~s/[\t\n\r]/ /g;
$val03  =~s/[\t\n\r]/ /g;

# escape all double quote characters by replacing with two double quotes
$val01  =~s/"/""/g;
$val02  =~s/"/""/g;
$val03  =~s/"/""/g;

问题:是否有更好的方法来执行这些字符串操作?

我所说的“更好的方式”是指更有效地执行它们,避免使用正则表达式(可能使用 tr/// 来替换制表符、换行符和 lf 字符),或者可能使用 ( qr//) 以避免重新编译。

注意:我考虑过将字符串操作操作移动到子例程中,以减少正则表达式的重复。

注意:这段代码可以工作,但并没有真正损坏。我只是想知道是否有更合适的编码约定。

注意:这些操作在循环中执行,大量(> 10000)迭代。

注意:此脚本当前在 perl v5.8.8 下执行。 (该脚本具有 require 5.6.0,但可以更改为 require 5.8.8。(安装更高版本的 Perl 目前不是生产版本的选项)服务器。)


    > perl -v
    This is perl, v5.8.8 built for sun4-solaris-thread-multi
    (with 33 registered patches, see perl -V for more detail)

Perl 5.8

Improvements for fairly straightforward string substitutions, in an existing Perl script.
The intent of the code is clear, and the code is working.

For a given string, replace every occurrence of a TAB, LF or CR character with a single space, and replace every occurrence of a double quote with two double quotes. Here's a snippet from the existing code:


# replace all tab, newline and return characters with single space
$val01  =~s/[\t\n\r]/ /g;
$val02  =~s/[\t\n\r]/ /g;
$val03  =~s/[\t\n\r]/ /g;

# escape all double quote characters by replacing with two double quotes
$val01  =~s/"/""/g;
$val02  =~s/"/""/g;
$val03  =~s/"/""/g;

Question:Is there a better way to perform these string manipulations?

By "better way", I mean to perform them more efficiently, avoiding use of regular expressions (possibly using tr/// to replace the tab, newline and lf characters), or possibly using using the (qr//) to avoid recompilation.

NOTE: I've considered moving the string manipulation operations to a subroutine, to reduce the repetition of the regular expressions.

NOTE: This code works, it isn't really broken. I just want to know if there is a more appropriate coding convention.

NOTE: These operations are performed in a loop, a large number (>10000) of iterations.

NOTE: This script currently executes under perl v5.8.8. (The script has a require 5.6.0, but this can be changed to require 5.8.8. (Installing a later version of Perl is not currently an option on the production server.)


    > perl -v
    This is perl, v5.8.8 built for sun4-solaris-thread-multi
    (with 33 registered patches, see perl -V for more detail)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

半世蒼涼 2024-11-08 15:49:37

您现有的解决方案对我来说看起来不错。

至于避免重新编译,你不需要担心这一点。 Perl 的正则表达式仅按原样编译一次,除非它们包含内插表达式,而您的正则表达式则不包含内插表达式。

为了完整起见,我应该提到,即使存在插值表达式,您也可以通过提供 /o 标志来告诉 Perl 仅编译一次正则表达式。

$var =~ s/foo/bar/;    # compiles once
$var =~ s/$foo/bar/;   # compiles each time
$var =~ s/$foo/bar/o;  # compiles once, using the value $foo has
                       # the first time the expression is evaluated

Your existing solution looks fine to me.

As for avoiding recompilation, you don't need to worry about that. Perl's regular expressions are compiled only once as it is, unless they contain interpolated expressions, which yours don't.

For the sake of completeness, I should mention that even if interpolated expressions are present, you can tell Perl to compile the regex once only by supplying the /o flag.

$var =~ s/foo/bar/;    # compiles once
$var =~ s/$foo/bar/;   # compiles each time
$var =~ s/$foo/bar/o;  # compiles once, using the value $foo has
                       # the first time the expression is evaluated
你丑哭了我 2024-11-08 15:49:37

TMTOWTDI

您可以使用 tr索引substrsplit 函数作为替代方案。但您必须进行测量以确定适合您的特定系统的最佳方法。

TMTOWTDI

You could use the tr or the index or the substr or the split functions as alternatives. But you must make measurements to identify the best method for your particular system.

◇流星雨 2024-11-08 15:49:37

您可能过早地进行了优化。您是否尝试过使用探查器(例如 Devel::NYTProf)来查看位置你的程序花费了最多的时间?

You might be prematurely optimizing. Have you tried using a profiler, such as Devel::NYTProf, to see where your program spends the most of its time?

伴我心暖 2024-11-08 15:49:37

我的猜测是,在第一个正则表达式中, tr/// 会(稍微)比 s/// 快。当然,速度有多快取决于我不了解您的程序和环境的因素。分析和基准测试将回答这个问题。

但是,如果您对代码的任何改进感兴趣,我可以建议可维护性修复吗?您对三个变量运行相同的替换(或一组替换)。这意味着当您更改该替换时,您需要更改它三次 - 并且做同样的事情三次总是危险的:)

您可能会考虑重构代码以使其看起来像这样:

foreach ($val01, $val02, $val03) {
    s/[\t\n\r]/ /g;
    s/"/""/g;
}

此外,这可能是一个好主意将这些值放在一个数组中,而不是三个类似命名的变量中。

foreach (@vals) {
    s/[\t\n\r]/ /g;
    s/"/""/g;
}

My guess would be that tr/// would be (slightly) quicker than s/// in your first regex. How much faster would, of course, be determined by factors that I don't know about your program and your environment. Profiling and benchmarking will answer that question.

But if you're interested in any kind of improvement to your code, can I suggest a maintainability fix? You run the same substitution (or set of substitutions) on three variables. This means that when you change that substitution, you need to change it three times - and doing the same thing three times is always dangerous :)

You might consider refactoring the code to look something like this:

foreach ($val01, $val02, $val03) {
    s/[\t\n\r]/ /g;
    s/"/""/g;
}

Also, it would probably be a good idea to have those values in an array rather than three such similarly named variables.

foreach (@vals) {
    s/[\t\n\r]/ /g;
    s/"/""/g;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文