StackOverflowError 与 Checkstyle 4.4 RegExp 检查
你好,
背景:
我正在使用 Checkstyle 4.4.2 和 RegExp 检查器模块来检测 java 源头中的文件名何时与它们所在的类或接口。当开发人员将标头从一个类复制到另一个类并且不修改“File:”标记时,可能会发生这种情况。
RexExp 检查器中使用的正则表达式已经经历了多次变体,并且(尽管此时可能有些过分)如下所示:
File: (\w+)\.java\n(?:.*\n)*?(?:[\w|\s]*?(?: class | interface )\1)
我正在检查的文件的基本形式(尽管已大大简化)如下所示
/*
*
* Copyright 2009
* ...
* File: Bar.java
* ...
*/
package foo
...
import ..
...
/**
* ...
*/
public class Bar
{...}
问题:
当找不到匹配时(即,当包含“File: Bar.java”的标头被复制到文件 Bat.java 中时),我在非常长的文件上收到 StackOverflowError (我的测试用例是@1300 行)。
我已经尝试了几个可视化正则表达式测试器,可以看到,在不匹配的情况中,当正则表达式引擎传递包含类或接口名称的行时,它会再次开始搜索在下一行并进行一些回溯,这可能会导致 StackOverflowError
问题:
如何通过修改正则表达式来防止 StackOverflowError
有没有某种方法可以修改我的正则表达式,以便在 < i> 不匹配的情况(即,当包含“File: Bar.java”的标头被复制到文件 Bat.java 中时),一旦检查包含接口或类的行,匹配就会停止name 并发现“\1”与第一组不匹配。
或者,如果可以做到这一点,是否可以最小化检查包含接口或类的行后发生的搜索和匹配,从而最小化处理和(希望)StackOverflow 错误?
Hello,
Background:
I'm using Checkstyle 4.4.2 with a RegExp checker module to detect when the file name in out java source headers do not match the file name of the class or interface in which they reside. This can happen when a developer copies a header from one class to another and does not modify the "File:" tag.
The regular expression use in the RexExp checker has been through many incarnations and (though it is possibly overkill at this point) looks like this:
File: (\w+)\.java\n(?:.*\n)*?(?:[\w|\s]*?(?: class | interface )\1)
The basic form of files I am checking (though greatly simplified) looks like this
/*
*
* Copyright 2009
* ...
* File: Bar.java
* ...
*/
package foo
...
import ..
...
/**
* ...
*/
public class Bar
{...}
The Problem:
When no match is found, (i.e. when a header containing "File: Bar.java" is copied into file Bat.java ) I receive a StackOverflowError on very long files (my test case is @1300 lines).
I have experimented with several visual regular expression testers and can see that in the non-matching case when the regex engine passes the line containing the class or interface name it starts searching again on the next line and does some backtracking which probably causes the StackOverflowError
The Question:
How to prevent the StackOverflowError by modifying the regular expression
Is there some way to modify my regular expression such that in the non-matching case (i.e. when a header containing "File: Bar.java" is copied into file Bat.java ) that the matching would stop once it examines the line containing the interface or class name and sees that "\1" does not match the first group.
Alternatively if that can be done, Is is possible minimize the searching and matching that takes place after it examines the line containing the interface or class thus minimizing processing and (hopefully) the StackOverflow error?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
尝试
使用点匹配全部模式。基本原理:
[\w\s]
(| 不属于那里)匹配任何内容,包括换行符。这会导致大量回溯到正则表达式前一部分已匹配的行。如果您让贪婪的点(快速)吞噬所有内容,直到文件末尾,然后回溯,直到找到以单词或空格/制表符(但没有换行符)开头的行,然后
class
或interface
和 \1,那么就不需要那么多的堆栈空间。一个不同的、甚至可能更好的解决方案是将问题分成几个部分。
首先匹配
File: (\w+)\.java
部分。然后使用^[\w \t]+(?:class|interface)
加上同一文件上第一次搜索的\1
匹配项进行第二次搜索。Try
in dot-matches-all mode. Rationale:
[\w\s]
(the | doesn't belong there) matches anything, including line breaks. This results in a lot of backtracking back up into the lines that the previous part of the regex had matched.If you let the greedy dot gobble up everything up to the end of the file (quick) and then backtrack until you find a line that starts with words or spaces/tabs (but no newlines) and then
class
orinterface
and \1, then that doesn't require as much stack space.A different, and probably even better solution would be to split the problem into parts.
First match the
File: (\w+)\.java
part. Then do a second search with^[\w \t]+(?:class|interface)
plus the\1
match from the first search on the same file.跟进:
我在上面插入了 Tim Pietzcher 的建议,他的贪婪解决方案确实失败得更快,并且在没有找到匹配项时没有出现 StackOverflowError。然而,在积极的情况下,StackOverflowError仍然发生。
我看了一下源代码 RegexpCheck.java。类模式以多行模式构造,以便表达式 ^ 和 $ 分别匹配行终止符或输入序列末尾的后面或前面。然后它将整个类文件读取到一个字符串中,并对模式进行递归搜索(请参阅 findMatch())。这无疑是 StackOverflowException 的来源。
最后我没有让它工作(并放弃了)自从 Maven 2 大约 6 周前发布了 maven-checkstyle-plugin-2.4/Checkstyle 5.0 以来,我们决定升级我们的工具。这可能无法解决 StackOverflowError 问题,但它会给我一些其他的工作要做,直到有人决定我们需要再次解决这个问题。
Follow up:
I plugged in Tim Pietzcher's suggestion above and his greedy solution did indeed fail faster and without a StackOverflowError when no match was found. However, in the positive case, the StackOverflowError still occurred.
I took a look at the source code RegexpCheck.java. The classes pattern is constructed in multiline mode such that the expressions ^ and $ match just after or just before, respectively, a line terminator or the end of the input sequence. Then it reads the entire class file into a string and does a recursive search for the pattern(see findMatch()). That is undoubtedly the source of the StackOverflowException.
In the end I didn't get it to work (and gave up) Since Maven 2 released the maven-checkstyle-plugin-2.4/Checkstyle 5.0 about 6 weeks ago we've decided to upgrade our tools. This may not solve the StackOverflowError problem, but it will give me something else to work on until someone decides that we need to pursue this again.