分割字符串(尤其是在 Java 中使用 java.util.regex 或其他东西)

发布于 2024-07-18 08:11:15 字数 159 浏览 5 评论 0原文

有谁知道如何在考虑到转义序列的情况下拆分字符上的字符串?

例如,如果字符为“:”,则“a:b”将被拆分为两部分(“a”和“b”),而“a:b”则根本不会拆分。

我认为用正则表达式很难(不可能?)做到这一点。

提前谢谢你,

基达尔

Does anyone know how to split a string on a character taking into account its escape sequence?

For example, if the character is ':', "a:b" is split into two parts ("a" and "b"), whereas "a:b" is not split at all.

I think this is hard (impossible?) to do with regular expressions.

Thank you in advance,

Kedar

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

作死小能手 2024-07-25 08:11:15

由于 Java 支持可变长度后向查找(只要它们是有限的),您可以这样做:

import java.util.regex.*;

public class RegexTest {
    public static void main(String[] argv) {

        Pattern p = Pattern.compile("(?<=(?<!\\\\)(?:\\\\\\\\){0,10}):");

        String text = "foo:bar\\:baz\\\\:qux\\\\\\:quux\\\\\\\\:corge";

        String[] parts = p.split(text);

        System.out.printf("Input string: %s\n", text);
        for (int i = 0; i < parts.length; i++) {
            System.out.printf("Part %d: %s\n", i+1, parts[i]);
        }

    }
}
  • (?<=(? 在后面查找偶数个反斜杠(包括零个,最多 10 个)。

输出:

输入字符串:foo:bar\:baz\\:qux\\\:quux\\\\:corge
第 1 部分:foo
第 2 部分:bar\:baz\\
第 3 部分:qux\\\:quux\\\\
第 4 部分:corge

另一种方法是匹配各部分本身,而不是在分隔符处拆分。

Pattern p2 = Pattern.compile("(?<=\\A|\\G:)((?:\\\\.|[^:\\\\])*)");
List<String> parts2 = new LinkedList<String>();
Matcher m = p2.matcher(text);
while (m.find()) {
    parts2.add(m.group(1));
}

奇怪的语法源于它需要处理字符串开头和结尾处的空片段的情况。 当匹配恰好跨越零个字符时,下一次尝试将从其末尾的一个字符开始。 如果没有,它将匹配另一个空字符串,另一个空字符串,无限…

  • (?<=\A|\G:) 将在后面查找字符串的开头(第一段),或上一个匹配的结尾,后跟分隔符。 如果我们这样做 (?:\A|\G:),如果第一部分为空(输入以分隔符开头),则会失败。
  • \\. 匹配任何转义字符。
  • [^:\\] 匹配不在转义序列中的任何字符(因为 \\. 消耗了这两个字符)。
  • ((?:\\.|[^:\\])*) 将第一个非转义分隔符之前的所有字符捕获到捕获组 1 中。

Since Java supports variable-length look-behinds (as long as they are finite), you could do do it like this:

import java.util.regex.*;

public class RegexTest {
    public static void main(String[] argv) {

        Pattern p = Pattern.compile("(?<=(?<!\\\\)(?:\\\\\\\\){0,10}):");

        String text = "foo:bar\\:baz\\\\:qux\\\\\\:quux\\\\\\\\:corge";

        String[] parts = p.split(text);

        System.out.printf("Input string: %s\n", text);
        for (int i = 0; i < parts.length; i++) {
            System.out.printf("Part %d: %s\n", i+1, parts[i]);
        }

    }
}
  • (?<=(?<!\\)(?:\\\\){0,10}) looks behind for an even number of back-slashes (including zero, up to a maximum of 10).

Output:

Input string: foo:bar\:baz\\:qux\\\:quux\\\\:corge
Part 1: foo
Part 2: bar\:baz\\
Part 3: qux\\\:quux\\\\
Part 4: corge

Another way would be to match the parts themselves, instead of split at the delimiters.

Pattern p2 = Pattern.compile("(?<=\\A|\\G:)((?:\\\\.|[^:\\\\])*)");
List<String> parts2 = new LinkedList<String>();
Matcher m = p2.matcher(text);
while (m.find()) {
    parts2.add(m.group(1));
}

The strange syntax stems from that it need to handle the case of empty pieces at the start and end of the string. When a match spans exactly zero characters, the next attempt will start one character past the end of it. If it didn't, it would match another empty string, and another, ad infinitum…

  • (?<=\A|\G:) will look behind for either the start of the string (the first piece), or the end of the previous match, followed by the separator. If we did (?:\A|\G:), it would fail if the first piece is empty (input starts with a separator).
  • \\. matches any escaped character.
  • [^:\\] matches any character that is not in an escape sequence (because \\. consumed both of those).
  • ((?:\\.|[^:\\])*) captures all characters up until the first non-escaped delimiter into capture-group 1.
溺孤伤于心 2024-07-25 08:11:15

(?<=^|[^\\]): 让您接近,但不解决转义斜杠。 (这是一个文字正则表达式,当然你必须转义其中的斜杠才能将其转换为java字符串)

(?<=(^|[^\\])(\\\\)*):怎么样? 我认为这应该满足前面有偶数个斜杠的任何“:”。

编辑:不要对此投赞成票。 MizardX 的解决方案更好:)

(?<=^|[^\\]): gets you close, but doesn't address escaped slashes. (That's a literal regex, of course you have to escape the slashes in it to get it into a java string)

(?<=(^|[^\\])(\\\\)*): How about that? I think that should satisfy any ':' that is preceded by an even number of slashes.

Edit: don't vote this up. MizardX's solution is better :)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文