如何从字符串中取出数字?
我正在使用 Java StreamTokenizer 来提取字符串的各种单词和数字,但遇到了包含逗号的数字的问题,例如 10,567 被读取为 10.0 和 ,567。
我还需要从可能出现的数字中删除所有非数字字符,例如 $678.00 应该是 678.00 或 -87 应该是 87。
我相信这些可以通过 whiteSpace 和 wordChars 方法来实现,但有人知道如何做它?
目前基本的streamTokenizer代码是:
BufferedReader br = new BufferedReader(new StringReader(text));
StreamTokenizer st = new StreamTokenizer(br);
st.parseNumbers();
st.wordChars(44, 46); // ASCII comma, - , dot.
st.wordChars(48, 57); // ASCII 0 - 9.
st.wordChars(65, 90); // ASCII upper case A - Z.
st.wordChars(97, 122); // ASCII lower case a - z.
while (st.nextToken() != StreamTokenizer.TT_EOF) {
if (st.ttype == StreamTokenizer.TT_WORD) {
System.out.println("String: " + st.sval);
}
else if (st.ttype == StreamTokenizer.TT_NUMBER) {
System.out.println("Number: " + st.nval);
}
}
br.close();
或者有人可以建议使用REGEXP来实现这一点吗?我不确定 REGEXP 在这里是否有用,因为从字符串中读取令牌后会发生任何 Parding。
谢谢
摩根先生。
I'm using a Java StreamTokenizer to extract the various words and numbers of a String but have run into a problem where numbers which include commas are concerned, e.g. 10,567 is being read as 10.0 and ,567.
I also need to remove all non-numeric characters from numbers where they might occur, e.g. $678.00 should be 678.00 or -87 should be 87.
I believe these can be achieved via the whiteSpace and wordChars methods but does anyone have any idea how to do it?
The basic streamTokenizer code at present is:
BufferedReader br = new BufferedReader(new StringReader(text));
StreamTokenizer st = new StreamTokenizer(br);
st.parseNumbers();
st.wordChars(44, 46); // ASCII comma, - , dot.
st.wordChars(48, 57); // ASCII 0 - 9.
st.wordChars(65, 90); // ASCII upper case A - Z.
st.wordChars(97, 122); // ASCII lower case a - z.
while (st.nextToken() != StreamTokenizer.TT_EOF) {
if (st.ttype == StreamTokenizer.TT_WORD) {
System.out.println("String: " + st.sval);
}
else if (st.ttype == StreamTokenizer.TT_NUMBER) {
System.out.println("Number: " + st.nval);
}
}
br.close();
Or could someone suggest a REGEXP to achieve this? I'm not sure if REGEXP is useful here given that any parding would take place after the tokens are read from the string.
Thanks
Mr Morgan.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
StreamTokenizer 已过时,最好使用 Scanner< /a>,这是您问题的示例代码:
如果您想使用逗号作为浮点分隔符,请使用
fi.useLocale(Locale.FRANCE);
StreamTokenizer is outdated, is is better to use Scanner, this is sample code for your problem:
If you want to use comma as a floating point delimiter, use
fi.useLocale(Locale.FRANCE);
试试这个:
SanitizedText 将仅包含字母数字和空格;之后将其标记化应该是一件轻而易举的事。
编辑
进行编辑以保留小数点(在括号末尾)。
.
对于正则表达式来说是“特殊的”,因此它需要反斜杠转义。Try this:
SanitizedText will contain only alphanumerics and whitespace; tokenizing it after that should be a breeze.
EDIT
Edited to retain the decimal point as well (at the end of the bracket).
.
is "special" to regexp so it needs a backslash escape.这对我有用:
This worked for me :
当然,这可以通过正则表达式来完成:
但是请注意,它会吃掉所有逗号,如果使用美国数字格式(其中逗号仅分隔千位),这可能就是您想要的。在某些语言中,使用逗号代替点作为小数点分隔符。因此,解析国际数据时要小心。
我把它翻译成 Java 的任务就交给你了。
Sure this can be done with regexp:
However notice that it eats all commas, which is probably what you want if using american number format where comma is only separating thousands. In some languages comma is used instead of the point as a decimal separator. So take care when parsing international data.
I leave it on you to translate this to Java.