Java RegEx API“后向组在索引附近没有明显的最大长度...”

发布于 2024-08-30 15:20:17 字数 1186 浏览 11 评论 0原文

我正在进行一些 SQL where 子句解析，并设计了一个工作正则表达式，以使用使用 .NET API 的“Rad Software Regular Expression Desginer”查找字符串文字之外的列。为了确保设计的 RegEx 也适用于 Java，我当然使用 API（1.5 和 1.6）对其进行了测试。但你猜怎么着，这是行不通的。我收到消息了

“后视组在索引 28 附近没有明显的最大长度”。

我试图解析的字符串是

Column_1='test''the''stuff''all''day''long' AND Column_2='000' AND  TheVeryColumnIWantToFind      =    'Column_1=''test''''the''''stuff''''all''''day''''long'' AND Column_2=''000'' AND  TheVeryColumnIWantToFind   =    ''   TheVeryColumnIWantToFind   =    '' AND (Column_3 is null or Column_3 = ''Not interesting'') AND ''1'' = ''1''' AND (Column_3 is null or Column_3 = 'Still not interesting') AND '1' = '1'

正如您可能已经猜到的，我尝试创建某种最坏的情况，以确保 RegEx 不会在更复杂的 SQL where 子句上失败。

正则表达式本身看起来像这样，

(?i:(?<!=\s*'(?:[^']|(?:''))*)((?<=\s*)TheVeryColumnIWantToFind(?=(?:\s+|=))))

我不确定是否有更优雅的正则表达式（很可能会有一个），但这现在并不重要，因为它可以解决问题。

用几句话解释正则表达式：如果它找到我要查找的列，它会执行负向后查找来确定列名是否在字符串文字中使用。如果是这样，则不会匹配。如果没有，就会匹配。

回到问题。正如我之前提到的，它不适用于 Java。什么会起作用并产生我想要的效果？
我发现，Java 似乎不支持无限的后视，但我仍然无法让它工作。
从搜索偏移量到当前搜索位置，后视总是对自身施加限制，这不是正确的吗？那么它会导致类似“位置 - 偏移”的结果吗？

原文

I'm on to some SQL where clause parsing and designed a working RegEx to find a column outside string literals using "Rad Software Regular Expression Desginer" which is using the .NET API. To make sure the designed RegEx works with Java too, I tested it by using the API of course (1.5 and 1.6). But guess what, it won't work. I got the message

"Look-behind group does not have an obvious maximum length near index 28".

The string that I'm trying to get parsed is

Column_1='test''the''stuff''all''day''long' AND Column_2='000' AND  TheVeryColumnIWantToFind      =    'Column_1=''test''''the''''stuff''''all''''day''''long'' AND Column_2=''000'' AND  TheVeryColumnIWantToFind   =    ''   TheVeryColumnIWantToFind   =    '' AND (Column_3 is null or Column_3 = ''Not interesting'') AND ''1'' = ''1''' AND (Column_3 is null or Column_3 = 'Still not interesting') AND '1' = '1'

As you may have guessed, I tried to create some kind of worst case to ensure the RegEx won't fail on more complicated SQL where clauses.

The RegEx itself looks like this

(?i:(?<!=\s*'(?:[^']|(?:''))*)((?<=\s*)TheVeryColumnIWantToFind(?=(?:\s+|=))))

I'm not sure if there is a more elegant RegEx (there'll most likely be one), but that's not important right now as it does the trick.

To explain the RegEx in a few words:
If it finds the column I'm after, it does a negative look-behind to figure out if the column name is used in a string literal. If so, it won't match. If not, it'll match.

Back to the question. As I mentioned before, it won't work with Java. What will work and result in what I want?
I found out, that Java does not seem to support unlimited look-behinds but still I couldn't get it to work.
Isn't it right that a look-behind is always putting a limit up on itself from the search offset to the current search position? So it would result in something like "position - offset"?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

病毒体 2024-09-06 15:20:17

我终于找到了解决方案，因为我在这里提出了这个问题，所以我当然会与您分享。

private static final String SQL_STRING_LITERALS_REGEX = "'(?:(?:[^']|(?:''))*)'";
private static final char DOT = '.';

private ArrayList<int[]> getNonStringLiteralRegions(String exclusion) {
    ArrayList<int[]> regions = new ArrayList<int[]>();

    int lastEnd = 0;
    Matcher m = Pattern.compile(SQL_STRING_LITERALS_REGEX).matcher(exclusion);
    while (m.find()) {
        regions.add(new int[] {lastEnd, m.start()});
        lastEnd = m.end();
    }
    if (lastEnd < exclusion.length())
        // We didn't cover the last part of the exclusion yet.
        regions.add(new int[] {lastEnd, exclusion.length()});

    return regions;
}

protected final String getFixedExclusion(String exclusion, String[] columns, String alias) {
    if (alias == null)
        throw new NullPointerException("Alias must not be null.");
    else if (alias.charAt(alias.length() - 1) != DOT)
        alias += DOT;

    StringBuilder b = new StringBuilder(exclusion);
    ArrayList<int[]> regions = getNonStringLiteralRegions(exclusion);
    for (int i = regions.size() - 1; i >= 0; --i) {
        // Reverse iteration to keep valid indices for the lower regions.
        int start = regions.get(i)[0], end = regions.get(i)[1];
        String s = exclusion.substring(start, end);
        for (String column : columns)
            s = s.replaceAll("(?<=^|[\\W&&\\D])(?i:" + column + ")(?=[\\W&&\\D]|$)", alias + column);
        b.replace(start, end, s);
    }

    return b.toString();
}

这次的技巧是简单地找到任何 SQL 字符串文字，并在用“Alias.ColumnName”替换列时避免使用它们。替换时确保完整的列名非常重要。因此，如果我们要替换 where 子句

WHERE Column_1 = Column_2 AND Column_11 = Column_22

“Column_11”中的列“Column_1”，则保持不变。（我认为记住这一点很重要，这就是为什么我在这里为面临类似问题的人提到它。）
不过，我认为这只是一种解决方法，如果您可以避免这种逻辑的需要，最好这样做。

好的，无论如何，感谢您的帮助，如果有的话，我很乐意回答您即将提出的问题。

I finally found a solution and because I asked the question here I'll share it with you of course.

private static final String SQL_STRING_LITERALS_REGEX = "'(?:(?:[^']|(?:''))*)'";
private static final char DOT = '.';

private ArrayList<int[]> getNonStringLiteralRegions(String exclusion) {
    ArrayList<int[]> regions = new ArrayList<int[]>();

    int lastEnd = 0;
    Matcher m = Pattern.compile(SQL_STRING_LITERALS_REGEX).matcher(exclusion);
    while (m.find()) {
        regions.add(new int[] {lastEnd, m.start()});
        lastEnd = m.end();
    }
    if (lastEnd < exclusion.length())
        // We didn't cover the last part of the exclusion yet.
        regions.add(new int[] {lastEnd, exclusion.length()});

    return regions;
}

protected final String getFixedExclusion(String exclusion, String[] columns, String alias) {
    if (alias == null)
        throw new NullPointerException("Alias must not be null.");
    else if (alias.charAt(alias.length() - 1) != DOT)
        alias += DOT;

    StringBuilder b = new StringBuilder(exclusion);
    ArrayList<int[]> regions = getNonStringLiteralRegions(exclusion);
    for (int i = regions.size() - 1; i >= 0; --i) {
        // Reverse iteration to keep valid indices for the lower regions.
        int start = regions.get(i)[0], end = regions.get(i)[1];
        String s = exclusion.substring(start, end);
        for (String column : columns)
            s = s.replaceAll("(?<=^|[\\W&&\\D])(?i:" + column + ")(?=[\\W&&\\D]|$)", alias + column);
        b.replace(start, end, s);
    }

    return b.toString();
}

This time the trick is to simply find any SQL string literals and avoid them when replacing the columns with "Alias.ColumnName". It is important to ensure whole column names when replacing. So if we were to to replace the column "Column_1" in the where clause

WHERE Column_1 = Column_2 AND Column_11 = Column_22

"Column_11" is to be left untouched. (I think it is important to keep that in mind, that's why I mention it here for anyone who faces a similar problem.)
Still, I think this is only a workaround and if you can avoid the need for this logic, it is best to do so.

OK, thanks for the help anyway and I'd be glad to answer upcoming questions to you, if any.

回复收藏 0 原文

~没有更多了~