创建一个支持字符串的 Guava Splitter
我想为 Java 创建一个 Guava Splitter,它可以将 Java 字符串作为一个块进行处理。例如,我希望以下断言为真:
@Test
public void testSplitter() {
String toSplit = "a,b,\"c,d\\\"\",e";
List<String> expected = ImmutableList.of("a", "b", "c,d\"","e");
Splitter splitter = Splitter.onPattern(...);
List<String> actual = ImmutableList.copyOf(splitter.split(toSplit));
assertEquals(expected, actual);
}
我可以编写正则表达式来查找所有元素,并且不考虑“,”,但我找不到用作分隔符的正则表达式一个分离器。
如果不可能,请直接说出来,然后我将从 findAll 正则表达式构建列表。
I would like to create a Guava Splitter for Java that can handles Java strings as one block. For instance, I would like the following assertion to be true:
@Test
public void testSplitter() {
String toSplit = "a,b,\"c,d\\\"\",e";
List<String> expected = ImmutableList.of("a", "b", "c,d\"","e");
Splitter splitter = Splitter.onPattern(...);
List<String> actual = ImmutableList.copyOf(splitter.split(toSplit));
assertEquals(expected, actual);
}
I can write the regex to find all the elements and don't consider the ',' but I can't find the regex that would act as a separator to be used with a Splitter.
If it's impossible, please just say so, then I'll build the list from the findAll regex.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这看起来你应该使用 CSV 库,例如 opencsv 。分离值和处理像引用块这样的情况就是它们的全部内容。
This seems like something you should use a CSV library such as opencsv for. Separating values and handling cases like quoted blocks are what they're all about.
这是 Guava 功能请求:http://code.google。 com/p/guava-libraries/issues/detail?id=412
This is a Guava feature request: http://code.google.com/p/guava-libraries/issues/detail?id=412
我有同样的问题(除了不需要支持转义引号字符)。我不喜欢为这么简单的事情包含另一个库。然后我想到,我需要一个可变的 CharMatcher。与 Bart Kiers 的解决方案一样,它保留了引号字符。
I've same problem (except no need to support escaping of quote character). I don't like to include another library for such simple thing. And then i came to idea, that i need a mutable CharMatcher. As with solution of Bart Kiers, it keeps quote character.
您可以按照以下模式进行拆分:
使用
(?x)
标志可能看起来(有点)友好:但即使在这个注释版本中,它仍然是一个怪物。用简单的英语来说,这个正则表达式可以解释如下:
因此,看到此内容后,您可能会同意 ColinD(我同意!)的观点,即在这种情况下使用某种 CSV 解析器是可行的方法。
请注意,上面的正则表达式将保留标记周围的 qoutes,即字符串
a,b,"c,d\"",e
(作为文字:"a,b, \"c,d\\\"\",e"
) 将被分割如下:You could split on the following pattern:
which might look (a bit) friendlier with the
(?x)
flag:But even in this commented-version, it still is a monster. In plain English, this regex could be explained as follows:
So, after seeing this, you might agree with ColinD (I do!) that using some sort of a CSV parser is the way to go in this case.
Note that the regex above will leave the qoutes around the tokens, i.e., the string
a,b,"c,d\"",e
(as a literal:"a,b,\"c,d\\\"\",e"
) will be split as follows:对@Rage-Steel 的答案进行了一些改进。
然后,
注意线程安全(或者,简单来说 - 没有)
Improving on @Rage-Steel 's answer a bit.
And then,
Pay attention to thread safety (or, to simplify - there isn't any)