Python：识别数字字符串？

发布于 2024-08-20 23:31:23 字数 343 浏览 6 评论 0原文

我尝试了几种方法，我实际上只关心性能，而不关心正确性。我注意到基于正则表达式的实现比使用类型强制的实现慢大约 3-4 倍。还有另一种更有效的方法吗？

def IsNumber(x):
    try:
        _ = float(x)
    except ValueError:
        return False
    return True

 def IsNumber2(x):
     import re
     if re.match("^\d*.?\d*$", x) == None:
         return False
     return True

谢谢！

原文

I tried a couple of approaches, I am really only concerned with performance, not correctness. I noticed that the regex based implementation is about 3-4x slower than the one that uses type coercion. Is there another, more efficient way of doing this?

def IsNumber(x):
    try:
        _ = float(x)
    except ValueError:
        return False
    return True

 def IsNumber2(x):
     import re
     if re.match("^\d*.?\d*$", x) == None:
         return False
     return True

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

北笙凉宸 2024-08-27 23:31:23

首先，他们做的不是同一件事。例如，浮点数可以指定为“1e3”，float() 会接受它。它也不是强制，而是转换。

其次，不要在 IsNumber2 中导入 re，特别是当您尝试将它与 timeit 一起使用时。在函数外部进行导入。

最后，float() 更快并不让我感到惊讶。它是一个用 C 语言编写的专用例程，用于特定目的，而正则表达式必须转换为可解释的形式。

您的第一个版本（使用 float()）足够快吗？应该是的，而且我不知道有更好的方法在 Python 中做同样的事情。

回复收藏 0 原文

烟火散人牵绊 2024-08-27 23:31:23

并不真地。强制是做到这一点的公认方法。

回复收藏 0 原文

长亭外，古道边 2024-08-27 23:31:23

答案很大程度上取决于“数字字符串”的含义。如果您对数字字符串的定义是“float 接受的任何内容”，那么很难改进 try- except 方法。

但请记住，浮点数可能比您想要的更自由：在大多数机器上，它会接受表示无穷大和 nan 的字符串。例如，在我的机器上，它接受 'nan(dead!$#parrot)'。它还接受前导和尾随空格。根据您的应用程序，您可能希望排除浮点数的指数表示形式。在这些情况下，使用正则表达式是有意义的。要排除无穷大和 nan，使用 try- except 方法然后使用 math.isnan 和 math.isinf 检查转换结果可能会更快。

为数字字符串编写正确的正则表达式是一项非常容易出错的任务。例如，您的 IsNumber2 函数接受字符串 '.'。您可以在十进制模块源代码中找到经过实战测试的数字字符串正则表达式版本。它是这样的（进行了一些小的编辑）：

_parser = re.compile(r"""        # A numeric string consists of:
    (?P<sign>[-+])?              # an optional sign, followed by either...
    (
        (?=\d|\.\d)              # ...a number (with at least one digit)
        (?P<int>\d*)             # having a (possibly empty) integer part
        (\.(?P<frac>\d*))?       # followed by an optional fractional part
        (E(?P<exp>[-+]?\d+))?    # followed by an optional exponent, or...
    |
        Inf(inity)?              # ...an infinity, or...
    |
        (?P<signal>s)?           # ...an (optionally signaling)
        NaN                      # NaN
        (?P<diag>\d*)            # with (possibly empty) diagnostic info.
    )
    \Z
""", re.VERBOSE | re.IGNORECASE | re.UNICODE).match

这几乎与 float 接受的内容完全匹配，除了前导和尾随空格以及 nan 的一些细微差异（用于发信号 nan 的额外 's' 和诊断信息）。当我需要一个数字正则表达式时，我通常从这个开始并编辑掉我不需要的位。

注意：可以想象，float 可能比正则表达式慢，因为它不仅需要解析字符串，还需要将其转换为 float，这是一个相当复杂的计算；不过，如果是这样，那仍然会让人感到惊讶。

The answer depends a lot on what you mean by 'numeric string'. If your definition of numeric string is 'anything that float accepts', then it's difficult to improve on the try-except method.

But bear in mind that float may be more liberal than you want it to be: on most machines, it'll accept strings representing infinities and nans. On my machine, it accepts 'nan(dead!$#parrot)', for example. It will also accept leading and trailing whitespace. And depending on your application, you may want to exclude exponential representations of floats. In these cases, using a regex would make sense. To just exclude infinities and nans, it might be quicker to use the try-except method and then use math.isnan and math.isinf to check the result of the conversion.

Writing a correct regex for numeric strings is a surprisingly error-prone task. Your IsNumber2 function accepts the string '.', for example. You can find a battle-tested version of a numeric-string regex in the decimal module source. Here it is (with some minor edits):

_parser = re.compile(r"""        # A numeric string consists of:
    (?P<sign>[-+])?              # an optional sign, followed by either...
    (
        (?=\d|\.\d)              # ...a number (with at least one digit)
        (?P<int>\d*)             # having a (possibly empty) integer part
        (\.(?P<frac>\d*))?       # followed by an optional fractional part
        (E(?P<exp>[-+]?\d+))?    # followed by an optional exponent, or...
    |
        Inf(inity)?              # ...an infinity, or...
    |
        (?P<signal>s)?           # ...an (optionally signaling)
        NaN                      # NaN
        (?P<diag>\d*)            # with (possibly empty) diagnostic info.
    )
    \Z
""", re.VERBOSE | re.IGNORECASE | re.UNICODE).match

This pretty much matches exactly what float accepts, except for the leading and trailing whitespace and some slight differences for nans (the extra 's' for signaling nans, and the diagnostic info). When I need a numeric regex, I usually start with this one and edit out the bits I don't need.

N.B. It's conceivable that float could be slower than a regex, since it not only has to parse the string, but also turn it into a float, which is quite an involved computation; it would still be a surprise if it were, though.

回复收藏 0 原文