使用正则表达式时如何保留分隔符?
我做了一个关于标点符号和正则表达式的问题,但很混乱。
假设我有这样的文本:
String text = "wor.d1, :word2. wo,rd3? word4!";
我正在这样做:
String parts[] = text.split(" ");
并且我有这个:
wor.d1, | :word2. | wor,d3? | word4!;
我需要做什么才能得到这个?(将符号保留在边框处,但仅限我指定的:.,!?:
,不是全部)。
wor,d1 | , | : | word2 | . | wor,d3 | ? | word4 | !
更新
我使用这些正则表达式得到了一些好的结果,但它在单词开头的标点符号上的所有分割之前给出了一个空字符。
有办法让开头没有这个空字符吗?
这个正则表达式很好,还是有更简单的方法?
public static final String PUNCTUATION_SEPARATOR =
"("
+ "("
+ "(?=^[\"'!?.,;:(){}\\[\\]]+)"
+ "|"
+ "(?<=^[\"'!?.,;:(){}\\[\\]]+)"
+ ")"
+ "|"
+ "("
+ "(?=[\"'!?.,;:(){}\\[\\]]+($|\n))"
+ "|"
+ "(?<=[\"'!?.,;:(){}\\[\\]]+($|\n))"
+ ")"
+ ")";
I did a question about punctuation and regex, but it was confusing.
Supossing I have this text:
String text = "wor.d1, :word2. wo,rd3? word4!";
I'm doing this:
String parts[] = text.split(" ");
And I have this:
wor.d1, | :word2. | wor,d3? | word4!;
What I need to do to have this? (Keep the the symbols at the borders, but only that I specify: .,!?:
, not all).
wor,d1 | , | : | word2 | . | wor,d3 | ? | word4 | !
UPDATE
I'm getting some good results with these regex, but it's giving an empty char before all splits on punctuation at start of a word.
There is a way to not have this empty char at the start?
Is this regex is good, or there is a more simple way?
public static final String PUNCTUATION_SEPARATOR =
"("
+ "("
+ "(?=^[\"'!?.,;:(){}\\[\\]]+)"
+ "|"
+ "(?<=^[\"'!?.,;:(){}\\[\\]]+)"
+ ")"
+ "|"
+ "("
+ "(?=[\"'!?.,;:(){}\\[\\]]+($|\n))"
+ "|"
+ "(?<=[\"'!?.,;:(){}\\[\\]]+($|\n))"
+ ")"
+ ")";
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您确定要使用正则表达式吗?
有一个更快的按单个字符分割的实现:StringTokenizer。
它可以返回分隔符。
Are you sure you want to use regex ?
There's a faster implementation for splitting by single char: StringTokenizer.
And it that can return the delimiters.
对于简单的分隔符,我推荐 StringTokenizer。但这里有一个使用正则表达式和另一个辅助分隔符的解决方案:
For simple separators I recommend the StringTokenizer. But here's a solution using regex and another auxiliary separator:
这是我认为可行的正则表达式:
Here's a regex that I think will work:
在我看来,你想要 这个 。首先爆炸你的字符串,第二步使用内爆函数。
In my opinion you want this. First you explode your string and second step you use implode function.