Java解析带有大量空格的字符串
我有一个包含多个空格的字符串,但是当我使用分词器时,它会在所有这些空格处将其分开。我需要令牌来包含这些空格。如何利用 StringTokenizer 返回带有我要分割的标记的值?
I have a string with multiple spaces, but when I use the tokenizer it breaks it apart at all of those spaces. I need the tokens to contain those spaces. How can I utilize the StringTokenizer to return the values with the tokens I am splitting on?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您会在
StringTokenizer
的文档中注意到,建议不要将其用于任何新代码,并且String.split(regex)
就是您想要的想要编辑添加:或者,如果您比简单的拆分有更大的需求,则可以使用
Pattern
和Matcher
类来实现更复杂的正则表达式匹配和提取。再次编辑:如果您想保留空格,实际上了解一些正则表达式确实有帮助:
这将在单词边界上进行分割,将每个单词之间的空格保留为
字符串
;输出:
You'll note in the docs for the
StringTokenizer
that it is recommended it shouldn't be used for any new code, and thatString.split(regex)
is what you wantEdit to add: Or, if you have greater needs than a simple split, then use the
Pattern
andMatcher
classes for more complex regular expression matching and extracting.Edit again: If you want to preserve your space, actually knowing a bit about regular expressions really helps:
This will split on word boundaries, preserving the space between each word as a
String
;Output:
听起来您可能需要使用正则表达式(http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/package-summary.html)而不是
StringTokenizer
。Sounds like you may need to use regular expressions (http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/package-summary.html) instead of
StringTokenizer
.使用
String.split("\\s+")
而不是StringTokenizer
。请注意,这只会提取由至少一个空格字符分隔的非空格字符,如果您希望前导/尾随空格字符包含在非空格字符中,那将是一种完全不同的解决方案!
从您最初的问题来看,这一要求并不清楚,并且有一个待处理的编辑试图澄清它。
在几乎所有非人为的情况下,
StringTokenizer
都是错误的工具。Use
String.split("\\s+")
instead ofStringTokenizer
.Note that this will only extract the non-whitespace characters separated by at least one whitespace character, if you want leading/trailing whitespace characters included with the non-whitespace characters that will be a completely different solution!
This requirement isn't clear from your original question, and there is an edit pending that tries to clarify it.
StringTokenizer
in almost every non-contrived case is the wrong tool for the job.我认为如果您首先使用
replaceAll
函数将所有多个空格替换为单个空格,然后使用split
函数进行标记化,那就太好了。I think It will be good if you use first
replaceAll
function to replace all the multiple spaces by a single space and then do tokenization usingsplit
function.