哪些情况可以从 Perl 的研究中受益?
我正在研究 study,这是一个 Perl 功能,用于检查字符串以使后续正则表达式可能更快:
while( <> ) {
study;
$count++ if /PATTERN/;
$count++ if /OTHER/;
$count++ if /PATTERN2/;
}
关于哪些情况会从中受益并没有太多说明。您可以从文档中梳理出一些内容:
- 具有常量字符串的模式
- 多个模式
- 较短的目标字符串可能会更好(需要更少的时间来学习)
我正在寻找具体的案例,在这些案例中我不仅可以展示出巨大的优势,而且还可以稍微调整以失去这种优势。 文档中的警告之一是您应该对个别案例进行基准测试。我想找到一些边缘情况,其中字符串(或模式)的微小差异会对性能产生很大影响。
如果您没有使用过学习,请不要回答。我宁愿有格式良好的正确答案,而不是快速猜测。这里没有紧急情况,也不会妨碍任何工作。
而且,作为奖励,我一直在使用基准测试工具来比较两次 NYTProf 运行,我宁愿使用它而不是通常的基准测试工具。如果我想出一种自动化的方法,我也会分享。
I'm playing around with study, a Perl feature to examine a string to make subsequent regular expressions potentially much speedier:
while( <> ) {
study;
$count++ if /PATTERN/;
$count++ if /OTHER/;
$count++ if /PATTERN2/;
}
There's not much said about which situations will benefit from this. A few things you can tease out of the docs:
- Patterns with constant strings
- Multiple patterns
- Shorter target strings might be better (takes less time to study)
I'm looking for concrete cases where I not only can demonstrate a big advantage, but also cases that I can slightly tweak to lose that advantage. One of the warnings in the docs is that you should benchmark individual cases. I want to find some of the edge cases where a small difference in a string (or pattern) makes a big difference in performance.
If you haven't used study, please don't answer. I'd rather have well-formed correct answers instead fast guesses. There's no urgency here, and this isn't holding up any work.
And, as a bonus, I've been playing with a benchmarking tool comparing two NYTProf runs, which I'd rather use than the usual benchmarking tool. If I come up with a way to automate that, I'll share that too.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
Google 发现了这个可爱的测试场景< /a>:
请注意,对于最有利可图的情况(蛋白质匹配),报告的收益仅为约 2%:
Google turned up this lovely test scenario:
Note that the reported gain is only around ~2% for the most profitable case (protein matches):
我将留下笔记作为答案,稍后我会将其发展为实际答案:
在 pp.c 的
PP(pp_study)
中,它有这些奇怪的行(减去注释):看起来设置了 UTF8 标志的标量根本没有被研究过。
I'm going to leave notes as an answer, and later I'll develop it into an actual answer:
In pp.c's
PP(pp_study)
, it has these curious lines (minus a comment):It looks like scalars with the UTF8 flag set aren't studied at all.
没有任何。自 2012 年以来,研究没有任何作用。
目前,该代码
意味着
study
在以前会执行某些操作的情况下返回 true,否则返回 false - 但它实际上从未执行任何操作。None. Since 2012, study does nothing.
Currently the code has
which means that
study
returns true in the case where it would formerly have done something, and false otherwise -- but it never actually does anything.并不真地。如果你搜索,大多数结果都在 Perl 测试套件中,这意味着没有人使用它。另外,由于错误,您只能 注意到全局变量的速度优势。它实际上在处理英语时带来了一些速度增强(有时甚至快了 2 倍),但你必须使变量全局化。
有时它还会导致无限循环或误报(
研究
可以添加程序中的错误,即使它只是应该使其更快),因此它是 在 Perl 5.16 中被删除(或者更确切地说,无操作) – 没有人愿意维护一个没人关心的部分。Not really. If you search, and most results are in Perl test suite, that means nobody uses it. Also, because of bug, you could only notice speed benefits on global variables. It actually brought some speed enhancements when dealing with English (sometimes even 2 times faster), but you had to make variable global.
It also sometimes caused infinite loops or false positives (
study
could add bugs to your program, even when it was just supposed to make it faster), and because of that it was removed (or rather, made no-op) in Perl 5.16 – nobody wanted to maintain a part nobody cares about anyway.