如何从字符串中取出数字?

发布于 2024-09-10 11:12:53 字数 1119 浏览 2 评论 0原文

我正在使用 Java StreamTokenizer 来提取字符串的各种单词和数字,但遇到了包含逗号的数字的问题,例如 10,567 被读取为 10.0 和 ,567。

我还需要从可能出现的数字中删除所有非数字字符,例如 $678.00 应该是 678.00 或 -87 应该是 87。

我相信这些可以通过 whiteSpace 和 wordChars 方法来实现,但有人知道如何做它?

目前基本的streamTokenizer代码是:

        BufferedReader br = new BufferedReader(new StringReader(text));
        StreamTokenizer st = new StreamTokenizer(br);
        st.parseNumbers();
        st.wordChars(44, 46); // ASCII comma, - , dot.
        st.wordChars(48, 57); // ASCII 0 - 9.
        st.wordChars(65, 90); // ASCII upper case A - Z.
        st.wordChars(97, 122); // ASCII lower case a - z.
        while (st.nextToken() != StreamTokenizer.TT_EOF) {
            if (st.ttype == StreamTokenizer.TT_WORD) {                    
                System.out.println("String: " + st.sval);
            }
            else if (st.ttype == StreamTokenizer.TT_NUMBER) {
                System.out.println("Number: " + st.nval);
            }
        }
        br.close(); 

或者有人可以建议使用REGEXP来实现这一点吗?我不确定 REGEXP 在这里是否有用,因为从字符串中读取令牌后会发生任何 Parding。

谢谢

摩根先生。

I'm using a Java StreamTokenizer to extract the various words and numbers of a String but have run into a problem where numbers which include commas are concerned, e.g. 10,567 is being read as 10.0 and ,567.

I also need to remove all non-numeric characters from numbers where they might occur, e.g. $678.00 should be 678.00 or -87 should be 87.

I believe these can be achieved via the whiteSpace and wordChars methods but does anyone have any idea how to do it?

The basic streamTokenizer code at present is:

        BufferedReader br = new BufferedReader(new StringReader(text));
        StreamTokenizer st = new StreamTokenizer(br);
        st.parseNumbers();
        st.wordChars(44, 46); // ASCII comma, - , dot.
        st.wordChars(48, 57); // ASCII 0 - 9.
        st.wordChars(65, 90); // ASCII upper case A - Z.
        st.wordChars(97, 122); // ASCII lower case a - z.
        while (st.nextToken() != StreamTokenizer.TT_EOF) {
            if (st.ttype == StreamTokenizer.TT_WORD) {                    
                System.out.println("String: " + st.sval);
            }
            else if (st.ttype == StreamTokenizer.TT_NUMBER) {
                System.out.println("Number: " + st.nval);
            }
        }
        br.close(); 

Or could someone suggest a REGEXP to achieve this? I'm not sure if REGEXP is useful here given that any parding would take place after the tokens are read from the string.

Thanks

Mr Morgan.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

原谅过去的我 2024-09-17 11:12:53

StreamTokenizer 已过时,最好使用 Scanner< /a>,这是您问题的示例代码:

    String s = "$23.24 word -123";
    Scanner fi = new Scanner(s);
    //anything other than alphanumberic characters, 
    //comma, dot or negative sign is skipped
    fi.useDelimiter("[^\\p{Alnum},\\.-]"); 
    while (true) {
        if (fi.hasNextInt())
            System.out.println("Int: " + fi.nextInt());
        else if (fi.hasNextDouble())
            System.out.println("Double: " + fi.nextDouble());
        else if (fi.hasNext())
            System.out.println("word: " + fi.next());
        else
            break;
    }

如果您想使用逗号作为浮点分隔符,请使用 fi.useLocale(Locale.FRANCE);

StreamTokenizer is outdated, is is better to use Scanner, this is sample code for your problem:

    String s = "$23.24 word -123";
    Scanner fi = new Scanner(s);
    //anything other than alphanumberic characters, 
    //comma, dot or negative sign is skipped
    fi.useDelimiter("[^\\p{Alnum},\\.-]"); 
    while (true) {
        if (fi.hasNextInt())
            System.out.println("Int: " + fi.nextInt());
        else if (fi.hasNextDouble())
            System.out.println("Double: " + fi.nextDouble());
        else if (fi.hasNext())
            System.out.println("word: " + fi.next());
        else
            break;
    }

If you want to use comma as a floating point delimiter, use fi.useLocale(Locale.FRANCE);

夏了南城 2024-09-17 11:12:53

试试这个:

String sanitizedText = text.replaceAll("[^\\w\\s\\.]", "");

SanitizedText 将仅包含字母数字和空格;之后将其标记化应该是一件轻而易举的事。

编辑

进行编辑以保留小数点(在括号末尾)。 . 对于正则表达式来说是“特殊的”,因此它需要反斜杠转义。

Try this:

String sanitizedText = text.replaceAll("[^\\w\\s\\.]", "");

SanitizedText will contain only alphanumerics and whitespace; tokenizing it after that should be a breeze.

EDIT

Edited to retain the decimal point as well (at the end of the bracket). . is "special" to regexp so it needs a backslash escape.

她说她爱他 2024-09-17 11:12:53

这对我有用:

String onlyNumericText = text.replaceAll("\\\D", "");

This worked for me :

String onlyNumericText = text.replaceAll("\\\D", "");
烟酉 2024-09-17 11:12:53
    String str = "1,222";
    StringBuffer sb = new StringBuffer();
    for(int i=0; i<str.length(); i++)
    {
        if(Character.isDigit(str.charAt(i)))
            sb.append(str.charAt(i));
    }
    return sb.toString()
    String str = "1,222";
    StringBuffer sb = new StringBuffer();
    for(int i=0; i<str.length(); i++)
    {
        if(Character.isDigit(str.charAt(i)))
            sb.append(str.charAt(i));
    }
    return sb.toString()
筱果果 2024-09-17 11:12:53

当然,这可以通过正则表达式来完成:

s/[^\d\.]//g

但是请注意,它会吃掉所有逗号,如果使用美国数字格式(其中逗号仅分隔千位),这可能就是您想要的。在某些语言中,使用逗号代替点作为小数点分隔符。因此,解析国际数据时要小心。

我把它翻译成 Java 的任务就交给你了。

Sure this can be done with regexp:

s/[^\d\.]//g

However notice that it eats all commas, which is probably what you want if using american number format where comma is only separating thousands. In some languages comma is used instead of the point as a decimal separator. So take care when parsing international data.

I leave it on you to translate this to Java.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文