如何调试正则表达式?
正则表达式可能变得相当复杂。缺少空白使得它们难以阅读。我无法使用调试器单步执行正则表达式。那么专家如何调试复杂的正则表达式呢?
Regular expressions can become quite complex. The lack of white space makes them difficult to read. I can't step though a regular expression with a debugger. So how do experts debug complex regular expressions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(21)
您购买 RegexBuddy 并使用其内置的 调试功能。如果您每年使用正则表达式两次以上,您将立即赚回这笔钱。 RegexBuddy 还将帮助您创建简单和复杂的正则表达式,甚至为您生成多种语言的代码。
此外,根据开发人员的说法,该工具在 与 WINE 一起使用时的 Linux。
You buy RegexBuddy and use its built in debug feature. If you work with regexes more than twice a year, you will make this money back in time saved in no time. RegexBuddy will also help you to create simple and complex regular expressions, and even generate the code for you in a variety of languages.
Also, according to the developer, this tool runs nearly flawlessly on Linux when used with WINE.
对于 Perl 5.10,
使用 re 'debug';
。 (或者debugcolor
,但我无法在 Stack Overflow 上正确格式化输出。)此外,您可以向正则表达式添加空格和注释,以使其更具可读性。在 Perl 中,这是通过
/x
修饰符完成的。对于pcre
,有PCRE_EXTENDED
标志。With Perl 5.10,
use re 'debug';
. (Ordebugcolor
, but I can't format the output properly on Stack Overflow.)Also, you can add whitespace and comments to regexes to make them more readable. In Perl, this is done with the
/x
modifier. Withpcre
, there is thePCRE_EXTENDED
flag.我将添加另一个,这样我就不会忘记它: debuggex
它很好,因为它非常直观:
I'll add another so that I don't forget it : debuggex
It's good because it's very visual:
当我陷入正则表达式困境时,我通常会转向这个:
https://regexr.com/
非常适合快速测试出现问题的地方。
When I get stuck on a regex I usually turn to this:
https://regexr.com/
Its perfect for quickly testing where something is going wrong.
我使用 Kodos - Python 正则表达式调试器:
在 Linux、Unix、Windows、Mac 上运行。
I use Kodos - The Python Regular Expression Debugger:
Runs on Linux, Unix, Windows, Mac.
我认为他们没有。如果您的正则表达式太复杂,并且有问题以至于您需要调试器,那么您应该创建一个特定的解析器,或使用其他方法。它将更具可读性和可维护性。
I think they don't. If your regexp is too complicated, and problematic to the point you need a debugger, you should create a specific parser, or use another method. It will be much more readable and maintainable.
有一个出色的免费工具,Regex Coach。最新版本仅适用于Windows;它的作者 Edmund Weitz 博士因为下载的人太少而停止维护 Linux 版本,但下载页面上有一个针对 Linux 的旧版本。
There is an excellent free tool, the Regex Coach. The latest version is only available for Windows; its author Dr. Edmund Weitz stopped maintaining the Linux version because too few people downloaded it, but there is an older version for Linux on the download page.
我刚刚看了 Regexp::Debugger 的创建者 Damian Conway 的演示。
非常令人印象深刻的东西:就地运行或使用命令行工具(rxrx),交互地或在“记录的”执行文件(存储在 JSON 中)上运行,在任何点前进和后退,在断点或事件处停止,彩色输出(用户可配置) )、正则表达式热图和用于优化的字符串等...
可在 CPAN 上免费获取:
http://search.cpan.org/~dconway/Regexp -Debugger/lib/Regexp/Debugger.pm
I've just seen a presentation of Regexp::Debugger by its creator: Damian Conway.
Very impressive stuff: run inplace or using a command line tool (rxrx), interactively or on a "logged" execution file (stored in JSON), step forward and backward at any point, stop on breakpoints or events, colored output (user configurable), heat maps on regexp and string for optimization, etc...
Available on CPAN for free:
http://search.cpan.org/~dconway/Regexp-Debugger/lib/Regexp/Debugger.pm
对于我来说,我通常使用 pcretest 实用程序,它可以转储任何正则表达式的字节代码,并且通常它更容易阅读(至少对我来说)。例子:
As for me I usually use pcretest utility which can dump the byte code of any regex, and usually it is much more easier to read (for me at least). Example:
我使用这个在线工具来调试我的正则表达式:
https://www.regextester.com/
但是,是的,它无法击败 RegexBuddy。
I use this online tool to debug my regex:
https://www.regextester.com/
But yeah, it can't beat RegexBuddy.
我用自己的眼睛调试我的正则表达式。这就是为什么我使用
/x
修饰符,为它们编写注释并将它们分成几部分。阅读 Jeffrey Friedl 的掌握正则表达式,了解如何开发快速且可读的正则表达式。各种正则表达式调试工具只会引发巫术编程。I debug my regexes with my own eyes. That's why I use
/x
modifier, write comments for them and split them in parts. Read Jeffrey Friedl's Mastering Regular Expressions to learn how to develop fast and readable regular expressions. Various regex debugging tools just provoke voodoo programming.我使用:
http://regexlib.com/RETester.aspx
您也可以尝试 Regex Hero(使用Silverlight):
http://regexhero.net/tester/
I use:
http://regexlib.com/RETester.aspx
You can also try Regex Hero (uses Silverlight):
http://regexhero.net/tester/
如果我感觉卡住了,我喜欢向后退并使用 txt2re 直接从示例文本生成正则表达式(尽管我通常最终会手动调整生成的正则表达式)。
If I'm feeling stuck, I like to go backward and generate the regex directly from a sample text using txt2re (although I usually end up tweaking the resulting regex by hand).
如果您是 Mac 用户,我刚刚遇到了这个:
http://atastypixel.com /blog/reginald-regex-explorer/
它是免费的,并且使用简单,对我总体上掌握正则表达式有很大帮助。
If you're a Mac user, I just came across this one:
http://atastypixel.com/blog/reginald-regex-explorer/
It's free, and simple to use, and it's been a great help for me to get to grips with RegExs in general.
查看(非免费)regular-expressions.info 上的工具。 RegexBuddy 特别是。 这是 Jeff Atwood 的帖子主题。
Have a look at the (non-free) tools on regular-expressions.info. RegexBuddy in particular. Here is Jeff Atwood's post on the subject.
使用 PCRE 之类的符号编写 reg exe 就像编写汇编程序:如果您只能在头脑中看到相应的有限状态自动机,那就没问题,但它可能很难快速维护。
不使用调试器的原因与不使用编程语言的调试器的原因非常相似:您可以修复本地错误,但它们不会帮助您解决导致您首先犯本地错误的设计问题地方。
更具反思性的方法是使用数据表示形式以编程语言生成正则表达式,并使用适当的抽象来构建它们。 Olin Shiver 对其方案正则表达式表示法的介绍 很好地概述了所面临的问题在设计这些数据表示时。
Writing reg exes using a notation like PCREs is like writing assembler: it's fine if you can just see the corresponding finite state automata in your head, but it can get difficult to maintain very quickly.
The reasons for not using a debugger are much the same as for not using a debugger with a programming language: you can fix local mistakes, but they won't help you solve the design problems that led you to make the local mistakes in the first place.
The more reflective way is to use data representations to generate regexps in your programming language, and have appropriate abstractions to build them. Olin Shiver's introduction to his scheme regexp notation gives an excellent overview of the issues faced in designing these data representations.
我经常使用 pcretest - 几乎不是一个“调试器”,但它在纯文本 SSH 连接上工作,并准确解析我需要的正则表达式方言:我的(C++)代码链接到 libpcre,因此在什么是神奇的以及什么是微妙的差异方面没有任何困难。不是,等等。
总的来说,我同意上面那个人的观点,他认为需要正则表达式调试器是一种代码味道。对我来说,使用正则表达式最困难的通常不是正则表达式本身,而是使它们工作所需的多层引用。
I often use pcretest - hardly a "debugger" but it works over a text-only SSH connection and parses exactly the regex dialect I need: my (C++) code links to libpcre, so there's no difficulty with subtle differences in what's magic and what isn't, etc.
In general I agree with the guy above to whom needing a regex debugger is a code smell. For me the hardest about using regexes is usually not the regex itself, but the multiple layers of quoting needed to make them work.
我经常使用基于 Ruby 的正则表达式测试器 Rubular
,也在 Emacs 中使用 Mx 重新构建器
Firefox 还有一个 有用的扩展
I often use Ruby based regexp tester Rubular
and also in Emacs use M-x re-builder
Firefox also has a useful extension
我使用 ActiveState Komodo 附带的 Rx 工具包。
I use the Rx Toolkit included with ActiveState Komodo.
你可以试试这个
http://www.pagecolumn.com/tool/regtest.htm
You could try this one
http://www.pagecolumn.com/tool/regtest.htm
对我来说,在观察了正则表达式之后(因为我相当流利,并且几乎总是使用 /x 或等效的),如果我不确定是否会遇到一些退化匹配(即过度回溯的东西),我可能会进行调试而不是测试例如,看看我是否可以通过修改运算符的贪婪性来解决此类问题。
为此,我会使用上面提到的方法之一:pcretest、RegexBuddy(如果我当前的工作场所已获得许可)或类似方法,如果我使用 C# 正则表达式,有时我会在 Linqpad 中计时。
(perl 技巧对我来说是一个新技巧,因此可能也会将其添加到我的正则表达式工具包中。)
For me, after having eyeballed the regex (as I'm fairly fluent, and nearly always use /x or equivalent), I might debug rather than test if I am unsure if I would hit some degenerate matching (i.e. something that excessively backtracks) to see if I could solve such issues by modifying the greedyness of an operator for example.
To do that, I'd use one of the methods mentioned above: pcretest, RegexBuddy (if my current workplace has licensed it) or similar, and sometimes I time it in Linqpad if I'm working in C# regexes.
(The perl trick is a new one for me, so will probably add that to my regex toolkit too.)