如何调试正则表达式?

发布于 2024-08-23 08:36:57 字数 70 浏览 7 评论 0原文

正则表达式可能变得相当复杂。缺少空白使得它们难以阅读。我无法使用调试器单步执行正则表达式。那么专家如何调试复杂的正则表达式呢?

Regular expressions can become quite complex. The lack of white space makes them difficult to read. I can't step though a regular expression with a debugger. So how do experts debug complex regular expressions?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(21

三生一梦 2024-08-30 08:36:57

您购买 RegexBuddy 并使用其内置的 调试功能。如果您每年使用正则表达式两次以上,您将立即赚回这笔钱。 RegexBuddy 还将帮助您创建简单和复杂的正则表达式,甚至为您生成多种语言的代码。

alt text

此外,根据开发人员的说法,该工具在 与 WINE 一起使用时的 Linux

You buy RegexBuddy and use its built in debug feature. If you work with regexes more than twice a year, you will make this money back in time saved in no time. RegexBuddy will also help you to create simple and complex regular expressions, and even generate the code for you in a variety of languages.

alt text

Also, according to the developer, this tool runs nearly flawlessly on Linux when used with WINE.

浅唱ヾ落雨殇 2024-08-30 08:36:57

对于 Perl 5.10,使用 re 'debug';。 (或者debugcolor,但我无法在 Stack Overflow 上正确格式化输出。)

$ perl -Mre=debug -e'"foobar"=~/(.)\1/'
Compiling REx "(.)\1"
Final program:
   1: OPEN1 (3)
   3:   REG_ANY (4)
   4: CLOSE1 (6)
   6: REF1 (8)
   8: END (0)
minlen 1
Matching REx "(.)\1" against "foobar"
   0 <> <foobar>             |  1:OPEN1(3)
   0 <> <foobar>             |  3:REG_ANY(4)
   1 <f> <oobar>             |  4:CLOSE1(6)
   1 <f> <oobar>             |  6:REF1(8)
                                  failed...
   1 <f> <oobar>             |  1:OPEN1(3)
   1 <f> <oobar>             |  3:REG_ANY(4)
   2 <fo> <obar>             |  4:CLOSE1(6)
   2 <fo> <obar>             |  6:REF1(8)
   3 <foo> <bar>             |  8:END(0)
Match successful!
Freeing REx: "(.)\1"

此外,您可以向正则表达式添加空格和注释,以使其更具可读性。在 Perl 中,这是通过 /x 修饰符完成的。对于 pcre,有 PCRE_EXTENDED 标志。

"foobar" =~ /
    (.)  # any character, followed by a
    \1   # repeat of previously matched character
/x;

pcre *pat = pcre_compile("(.)  # any character, followed by a\n"
                         "\\1  # repeat of previously matched character\n",
                         PCRE_EXTENDED,
                         ...);
pcre_exec(pat, NULL, "foobar", ...);

With Perl 5.10, use re 'debug';. (Or debugcolor, but I can't format the output properly on Stack Overflow.)

$ perl -Mre=debug -e'"foobar"=~/(.)\1/'
Compiling REx "(.)\1"
Final program:
   1: OPEN1 (3)
   3:   REG_ANY (4)
   4: CLOSE1 (6)
   6: REF1 (8)
   8: END (0)
minlen 1
Matching REx "(.)\1" against "foobar"
   0 <> <foobar>             |  1:OPEN1(3)
   0 <> <foobar>             |  3:REG_ANY(4)
   1 <f> <oobar>             |  4:CLOSE1(6)
   1 <f> <oobar>             |  6:REF1(8)
                                  failed...
   1 <f> <oobar>             |  1:OPEN1(3)
   1 <f> <oobar>             |  3:REG_ANY(4)
   2 <fo> <obar>             |  4:CLOSE1(6)
   2 <fo> <obar>             |  6:REF1(8)
   3 <foo> <bar>             |  8:END(0)
Match successful!
Freeing REx: "(.)\1"

Also, you can add whitespace and comments to regexes to make them more readable. In Perl, this is done with the /x modifier. With pcre, there is the PCRE_EXTENDED flag.

"foobar" =~ /
    (.)  # any character, followed by a
    \1   # repeat of previously matched character
/x;

pcre *pat = pcre_compile("(.)  # any character, followed by a\n"
                         "\\1  # repeat of previously matched character\n",
                         PCRE_EXTENDED,
                         ...);
pcre_exec(pat, NULL, "foobar", ...);
暖树树初阳… 2024-08-30 08:36:57

我将添加另一个,这样我就不会忘记它: debuggex

它很好,因为它非常直观: Debuggex 正则表达式帮助程序的照片

I'll add another so that I don't forget it : debuggex

It's good because it's very visual: Photo of the Debuggex regex helper

嘦怹 2024-08-30 08:36:57

当我陷入正则表达式困境时,我通常会转向这个:
https://regexr.com/

非常适合快速测试出现问题的地方。

When I get stuck on a regex I usually turn to this:
https://regexr.com/

Its perfect for quickly testing where something is going wrong.

最佳男配角 2024-08-30 08:36:57

我使用 Kodos - Python 正则表达式调试器:

Kodos 是一个 Python GUI 实用程序,用于创建、测试和调试 Python 编程语言的正则表达式。 Kodos 应该帮助任何开发人员高效、轻松地在 Python 中开发正则表达式。由于 Python 的正则表达式实现基于 PCRE 标准,Kodos 应该会让其他编程语言的开发人员受益遵守 PCRE 标准(Perl、PHP 等)。

(...)

“替代文本”

在 Linux、Unix、Windows、Mac 上运行。

I use Kodos - The Python Regular Expression Debugger:

Kodos is a Python GUI utility for creating, testing and debugging regular expressions for the Python programming language. Kodos should aid any developer to efficiently and effortlessly develop regular expressions in Python. Since Python's implementation of regular expressions is based on the PCRE standard, Kodos should benefit developers in other programming languages that also adhere to the PCRE standard (Perl, PHP, etc...).

(...)

alt text

Runs on Linux, Unix, Windows, Mac.

感性不性感 2024-08-30 08:36:57

我认为他们没有。如果您的正则表达式太复杂,并且有问题以至于您需要调试器,那么您应该创建一个特定的解析器,或使用其他方法。它将更具可读性和可维护性。

I think they don't. If your regexp is too complicated, and problematic to the point you need a debugger, you should create a specific parser, or use another method. It will be much more readable and maintainable.

沉默的熊 2024-08-30 08:36:57

有一个出色的免费工具,Regex Coach。最新版本仅适用于Windows;它的作者 Edmund Weitz 博士因为下载的人太少而停止维护 Linux 版本,但下载页面上有一个针对 Linux 的旧版本。

There is an excellent free tool, the Regex Coach. The latest version is only available for Windows; its author Dr. Edmund Weitz stopped maintaining the Linux version because too few people downloaded it, but there is an older version for Linux on the download page.

不语却知心 2024-08-30 08:36:57

我刚刚看了 Regexp::Debugger 的创建者 Damian Conway 的演示。
非常令人印象深刻的东西:就地运行或使用命令行工具(rxrx),交互地或在“记录的”执行文件(存储在 JSON 中)上运行,在任何点前进和后退,在断点或事件处停止,彩色输出(用户可配置) )、正则表达式热图和用于优化的字符串等...

可在 CPAN 上免费获取:
http://search.cpan.org/~dconway/Regexp -Debugger/lib/Regexp/Debugger.pm

I've just seen a presentation of Regexp::Debugger by its creator: Damian Conway.
Very impressive stuff: run inplace or using a command line tool (rxrx), interactively or on a "logged" execution file (stored in JSON), step forward and backward at any point, stop on breakpoints or events, colored output (user configurable), heat maps on regexp and string for optimization, etc...

Available on CPAN for free:
http://search.cpan.org/~dconway/Regexp-Debugger/lib/Regexp/Debugger.pm

半山落雨半山空 2024-08-30 08:36:57

对于我来说,我通常使用 pcretest 实用程序,它可以转储任何正则表达式的字节代码,并且通常它更容易阅读(至少对我来说)。例子:

PCRE version 8.30-PT1 2012-01-01

  re> /ab|c[de]/iB
------------------------------------------------------------------
  0   7 Bra
  3  /i ab
  7  38 Alt
 10  /i c
 12     [DEde]
 45  45 Ket
 48     End
------------------------------------------------------------------

As for me I usually use pcretest utility which can dump the byte code of any regex, and usually it is much more easier to read (for me at least). Example:

PCRE version 8.30-PT1 2012-01-01

  re> /ab|c[de]/iB
------------------------------------------------------------------
  0   7 Bra
  3  /i ab
  7  38 Alt
 10  /i c
 12     [DEde]
 45  45 Ket
 48     End
------------------------------------------------------------------
一杯敬自由 2024-08-30 08:36:57

我使用这个在线工具来调试我的正则表达式:

https://www.regextester.com/

但是,是的,它无法击败 RegexBuddy。

I use this online tool to debug my regex:

https://www.regextester.com/

But yeah, it can't beat RegexBuddy.

酒浓于脸红 2024-08-30 08:36:57

我用自己的眼睛调试我的正则表达式。这就是为什么我使用 /x 修饰符,为它们编写注释并将它们分成几部分。阅读 Jeffrey Friedl 的掌握正则表达式,了解如何开发快速且可读的正则表达式。各种正则表达式调试工具只会引发巫术编程。

I debug my regexes with my own eyes. That's why I use /x modifier, write comments for them and split them in parts. Read Jeffrey Friedl's Mastering Regular Expressions to learn how to develop fast and readable regular expressions. Various regex debugging tools just provoke voodoo programming.

白日梦 2024-08-30 08:36:57

我使用:

http://regexlib.com/RETester.aspx

您也可以尝试 Regex Hero(使用Silverlight):

http://regexhero.net/tester/

I use:

http://regexlib.com/RETester.aspx

You can also try Regex Hero (uses Silverlight):

http://regexhero.net/tester/

仅此而已 2024-08-30 08:36:57

如果我感觉卡住了,我喜欢向后退并使用 txt2re 直接从示例文本生成正则表达式(尽管我通常最终会手动调整生成的正则表达式)。

If I'm feeling stuck, I like to go backward and generate the regex directly from a sample text using txt2re (although I usually end up tweaking the resulting regex by hand).

如果您是 Mac 用户,我刚刚遇到了这个:

http://atastypixel.com /blog/reginald-regex-explorer/

它是免费的,并且使用简单,对我总体上掌握正则表达式有很大帮助。

If you're a Mac user, I just came across this one:

http://atastypixel.com/blog/reginald-regex-explorer/

It's free, and simple to use, and it's been a great help for me to get to grips with RegExs in general.

烟─花易冷 2024-08-30 08:36:57

使用 PCRE 之类的符号编写 reg exe 就像编写汇编程序:如果您只能在头脑中看到相应的有限状态自动机,那就没问题,但它可能很难快速维护。

不使用调试器的原因与不使用编程语言的调试器的原因非常相似:您可以修复本地错误,但它们不会帮助您解决导致您首先犯本地错误的设计问题地方。

更具反思性的方法是使用数据表示形式以编程语言生成正则表达式,并使用适当的抽象来构建它们。 Olin Shiver 对其方案正则表达式表示法的介绍 很好地概述了所面临的问题在设计这些数据表示时。

Writing reg exes using a notation like PCREs is like writing assembler: it's fine if you can just see the corresponding finite state automata in your head, but it can get difficult to maintain very quickly.

The reasons for not using a debugger are much the same as for not using a debugger with a programming language: you can fix local mistakes, but they won't help you solve the design problems that led you to make the local mistakes in the first place.

The more reflective way is to use data representations to generate regexps in your programming language, and have appropriate abstractions to build them. Olin Shiver's introduction to his scheme regexp notation gives an excellent overview of the issues faced in designing these data representations.

微暖i 2024-08-30 08:36:57

我经常使用 pcretest - 几乎不是一个“调试器”,但它在纯文本 SSH 连接上工作,并准确解析我需要的正则表达式方言:我的(C++)代码链接到 libpcre,因此在什么是神奇的以及什么是微妙的差异方面没有任何困难。不是,等等。

总的来说,我同意上面那个人的观点,他认为需要正则表达式调试器是一种代码味道。对我来说,使用正则表达式最困难的通常不是正则表达式本身,而是使它们工作所需的多层引用。

I often use pcretest - hardly a "debugger" but it works over a text-only SSH connection and parses exactly the regex dialect I need: my (C++) code links to libpcre, so there's no difficulty with subtle differences in what's magic and what isn't, etc.

In general I agree with the guy above to whom needing a regex debugger is a code smell. For me the hardest about using regexes is usually not the regex itself, but the multiple layers of quoting needed to make them work.

咋地 2024-08-30 08:36:57

我经常使用基于 Ruby 的正则表达式测试器 Rubular

,也在 Emacs 中使用 Mx 重新构建器

Firefox 还有一个 有用的扩展

I often use Ruby based regexp tester Rubular

and also in Emacs use M-x re-builder

Firefox also has a useful extension

半世晨晓 2024-08-30 08:36:57

我使用 ActiveState Komodo 附带的 Rx 工具包。

I use the Rx Toolkit included with ActiveState Komodo.

很酷不放纵 2024-08-30 08:36:57

对我来说,在观察了正则表达式之后(因为我相当流利,并且几乎总是使用 /x 或等效的),如果我不确定是否会遇到一些退化匹配(即过度回溯的东西),我可能会进行调试而不是测试例如,看看我是否可以通过修改运算符的贪婪性来解决此类问题。

为此,我会使用上面提到的方法之一:pcretest、RegexBuddy(如果我当前的工作场所已获得许可)或类似方法,如果我使用 C# 正则表达式,有时我会在 Linqpad 中计时。

(perl 技巧对我来说是一个新技巧,因此可能也会将其添加到我的正则表达式工具包中。)

For me, after having eyeballed the regex (as I'm fairly fluent, and nearly always use /x or equivalent), I might debug rather than test if I am unsure if I would hit some degenerate matching (i.e. something that excessively backtracks) to see if I could solve such issues by modifying the greedyness of an operator for example.

To do that, I'd use one of the methods mentioned above: pcretest, RegexBuddy (if my current workplace has licensed it) or similar, and sometimes I time it in Linqpad if I'm working in C# regexes.

(The perl trick is a new one for me, so will probably add that to my regex toolkit too.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文