如何分析 Perl 正则表达式?
分析 Perl 正则表达式以确定它们有多昂贵的最佳方法是什么?
What's the best way to profile Perl regexes to determine how expensive they are?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
分析 Perl 正则表达式以确定它们有多昂贵的最佳方法是什么?
What's the best way to profile Perl regexes to determine how expensive they are?
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(3)
Perl 附带了 Benchmark 模块,它可以获取大量代码示例,并回答以下问题“哪个更快?”。 我在 Perl 提示 /tips/2007-07-04.html" rel="noreferrer">基准测试基础知识,虽然它本身不使用正则表达式,但它确实对该主题提供了快速而有用的介绍,以及进一步的介绍参考。
brian d foy 在他的掌握 Perl优秀章节> 书。 他非常友善地将这一章作为草稿放在网上,非常值得一读。 我真的极力推荐它。
保罗
Perl comes with the Benchmark module, which can take a number of code samples, and answer the question of "which one is faster?". I've got a Perl Tip on Benchmarking Basics, and while that doesn't use regexps per se, it does give a quick and useful introduction to the topic, along with further references.
brian d foy also has an excellent chapter on benchmarking in his Mastering Perl book. He's been kind enough to put the chapter on-line as a draft, which is well worth the read. I really can't recommend it enough.
Paul
不过,仅仅说“使用基准”模块并不能真正回答问题。 对正则表达式进行基准测试与对计算进行基准测试不同; 您需要大量真实数据,以便可以像真实数据一样强调正则表达式。 如果您的大部分数据都匹配,您需要一个快速匹配的正则表达式; 如果大多数都会失败,那么您需要一个快速失败的正则表达式。 它们最终可能是相同的正则表达式,但也可能不是。
Just saying "use the Benchmark" module doesn't really answer the question, though. Benchmarking a regex is different than benchmarking a calculation; you need a large amount of realistic data so you can stress the regex as real data would. If most of your data will match, you'd want a regex that matches quickly; if most will fail, you want a regex that fails quickly. They could wind up being the same regex, but maybe not.
我的首选方法是向 RE 提供大量输入数据,然后处理该数据 N 次(例如 100,000),看看需要多长时间。
然后调整 RE 并重试(保留所有旧的 RE 作为注释,以防将来需要再次对它们进行基准测试,谁知道 Perl 7 中可能会出现什么奇妙的优化?)。
很可能有一些工具可以分析 RE,为您提供特定输入的执行路径(例如 DBMS 中的分析工具),但是,由于 Perl 是懒惰者的语言(Larry 本人传下来的戒律),所以我不能费心去找它:-)。
My preferred way would be to have a large set of input data to the RE then process that data N times (e.g., 100,000) to see how long it takes.
Then tweak the RE and try again (keep all the old REs as comments in case you need to benchmark them again in future, who knows what wondrous optimizations may appear in Perl 7?).
There may well be tools which can analyze REs to give you execution paths for specific inputs (like the analysis tools in DBMS') but, since Perl is the language of the lazy (a commandment handed down by Larry himself), I couldn't be bothered going to find it :-).