明智地解析科学记数法？

发布于 2024-07-15 08:58:12 字数 1162 浏览 12 评论 0原文

我希望能够编写一个函数，该函数以字符串形式接收科学记数法中的数字，并将系数和指数作为单独的项目从中分离出来。我可以只使用正则表达式，但传入的数字可能不会标准化，我更希望能够标准化然后分解各个部分。

一位同事已经使用 VB6 获得了部分解决方案，但还没有完全实现，如下面的文字记录所示。

cliVe> a = 1e6
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 10 exponent: 5

应该是 1 和 6

cliVe> a = 1.1e6
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.1 exponent: 6

正确

cliVe> a = 123345.6e-7
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.233456 exponent: -2

正确

cliVe> a = -123345.6e-7
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.233456 exponent: -2

应该是 -1.233456 和 -2

cliVe> a = -123345.6e+7
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.233456 exponent: 12

正确 >

有什么想法吗？顺便说一下，Clive 是一个基于 VBScript 的 CLI，可以在我的博客上找到。

原文

I want to be able to write a function which receives a number in scientific notation as a string and splits out of it the coefficient and the exponent as separate items. I could just use a regular expression, but the incoming number may not be normalised and I'd prefer to be able to normalise and then break the parts out.

A colleague has got part way of an solution using VB6 but it's not quite there, as the transcript below shows.

cliVe> a = 1e6
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 10 exponent: 5

should have been 1 and 6

cliVe> a = 1.1e6
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.1 exponent: 6

correct

cliVe> a = 123345.6e-7
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.233456 exponent: -2

correct

cliVe> a = -123345.6e-7
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.233456 exponent: -2

should be -1.233456 and -2

cliVe> a = -123345.6e+7
cliVe> ? "coeff: " & o.spt(a) & " exponent: " & o.ept(a)
coeff: 1.233456 exponent: 12

correct

Any ideas? By the way, Clive is a CLI based on VBScript and can be found on my weblog.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

明月松间行 2024-07-22 08:58:12

Google “科学记数法正则表达式” 显示了许多匹配项，包括 < a href="http://www.regular-expressions.info/floatingpoint.html" rel="noreferrer">这个（不要使用它！！！）其中使用

*** warning: questionable ***
/[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?/

包括 -.5e7 和 +00000e33 等情况（您可能不想允许这两种情况）。

相反，我强烈建议您使用 Doug Crockford 的 JSON 网站上的语法，该网站明确记录 JSON 中数字的构成。以下是取自该页面的相应语法图：

_{（来源：json.org）}

如果您查看第 456 行他的 json2.js 脚本（在javascript），你会看到正则表达式的这一部分：

/-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/

具有讽刺意味的是，它与他的语法图不匹配......（看起来我应该提交一个错误）我相信确实实现该语法图的正则表达式就是这个:

/-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+\-]?\d+)?/

如果您也想允许首字母 +，您将得到：

/[+\-]?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+\-]?\d+)?/

根据您的喜好添加捕获括号。

我还强烈建议您充实一堆测试用例，以确保包含您想要包含（或不包含）的那些可能性，例如：

allowed:
+3
3.2e23
-4.70e+9
-.2E-4
-7.6603

not allowed:
+0003   (leading zeros)
37.e88  (dot before the e)

祝您好运！

Google on "scientific notation regexp" shows a number of matches, including this one (don't use it!!!!) which uses

*** warning: questionable ***
/[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?/

which includes cases such as -.5e7 and +00000e33 (both of which you may not want to allow).

Instead, I would highly recommend you use the syntax on Doug Crockford's JSON website which explicitly documents what constitutes a number in JSON. Here's the corresponding syntax diagram taken from that page:

_{(source: json.org)}

If you look at line 456 of his json2.js script (safe conversion to/from JSON in javascript), you'll see this portion of a regexp:

/-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/

which, ironically, doesn't match his syntax diagram.... (looks like I should file a bug) I believe a regexp that does implement that syntax diagram is this one:

/-?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+\-]?\d+)?/

and if you want to allow an initial + as well, you get:

/[+\-]?(?:0|[1-9]\d*)(?:\.\d+)?(?:[eE][+\-]?\d+)?/

Add capturing parentheses to your liking.

I would also highly recommend you flesh out a bunch of test cases, to ensure you include those possibilities you want to include (or not include), such as:

allowed:
+3
3.2e23
-4.70e+9
-.2E-4
-7.6603

not allowed:
+0003   (leading zeros)
37.e88  (dot before the e)

Good luck!

回复收藏 0 原文

遥远的她 2024-07-22 08:58:12

基于评分最高的答案，我将正则表达式稍微修改为 /^[+\-]?(?=.)(?:0|[1-9]\d*)?(?:\ .\d*)?(?:\d[eE][+\-]?\d+)?$/。

这样做的好处是：

允许匹配诸如 .9 之类的数字（我将 (?:0|[1-9]\d*) 设为可选，而 ?）
防止仅匹配开头的运算符并防止匹配零长度字符串（使用前瞻，(?=.)）
防止匹配e9，因为它需要科学记数法之前的 \d

我的目标是使用它来捕获重要数字并进行重要数学运算。因此，我还将使用捕获组将其分割，如下所示： /^[+\-]?(?=.)(0|[1-9]\d*)?(\.\d *)?(?:(\d)[eE][+\-]?\d+)?$/.

关于如何从中获取有效数字的解释：

整个捕获是您可以交给 parseFloat() 的数字。
匹配 1-3 将显示为未定义或字符串，因此将它们组合起来（替换 >undefined's with '') 应该给出可以提取有效数字的原始数字。

这个正则表达式还可以防止匹配左侧填充的零，JavaScript 有时会接受这种匹配，但我发现它会导致问题，并且不会给有效数字添加任何内容，因此我认为防止左侧填充的零是一个好处（尤其是在表单中）。但是，我确信可以修改正则表达式以吞噬左侧填充的零。

我发现这个正则表达式的另一个问题是它不会匹配 90.e9 或其他此类数字。然而，我发现这个或类似的匹配极不可能，因为科学计数法中的惯例是避免此类数字。虽然您可以在 JavaScript 中输入它，但您也可以轻松输入 9.0e10 并获得相同的有效数字。

更新

在我的测试中，我还发现了它可能匹配 '.' 的错误。因此，应将前瞻修改为 (?=\.\d|\d) ，这将导致最终的正则表达式：

/^[+\-]?(?=\.\d|\d)(?:0|[1-9]\d*)?(?:\.\d*)?(?:\d[eE][+\-]?\d+)?$/

Building off of the highest rated answer, I modified the regex slightly to be /^[+\-]?(?=.)(?:0|[1-9]\d*)?(?:\.\d*)?(?:\d[eE][+\-]?\d+)?$/.

The benefits this provides are:

allows matching numbers like .9 (I made the (?:0|[1-9]\d*) optional with ?)
prevents matching just the operator at the beginning and prevents matching zero-length strings (uses lookahead, (?=.))
prevents matching e9 because it requires the \d before the scientific notation

My goal in this is to use it for capturing significant figures and doing significant math. So I'm also going to slice it up with capturing groups like so: /^[+\-]?(?=.)(0|[1-9]\d*)?(\.\d*)?(?:(\d)[eE][+\-]?\d+)?$/.

An explanation of how to get significant figures from this:

The entire capture is the number you can hand to parseFloat()
Matches 1-3 will show up as undefined or strings, so combining them (replace undefined's with '') should give the original number from which significant figures can be extracted.

This regex also prevents matching left-padded zeros, which JavaScript sometimes accepts but which I have seen cause issues and which adds nothing to significant figures, so I see preventing left-padded zeros as a benefit (especially in forms). However, I'm sure the regex could be modified to gobble up left-padded zeros.

Another problem I see with this regex is it won't match 90.e9 or other such numbers. However, I find this or similar matches highly unlikely as it is the convention in scientific notation to avoid such numbers. Though you can enter it in JavaScript, you can just as easily enter 9.0e10 and achieve the same significant figures.

UPDATE

In my testing, I also caught the error that it could match '.'. So the look-ahead should be modified to (?=\.\d|\d) which leads to the final regex:

/^[+\-]?(?=\.\d|\d)(?:0|[1-9]\d*)?(?:\.\d*)?(?:\d[eE][+\-]?\d+)?$/

回复收藏 0 原文

伤感在游骋 2024-07-22 08:58:12

在 @Troy Weber 的基础上，我建议

/^[+\-]?(?=\.\d|\d)(?:0|[1-9]\d*)?(?:\.\d+)?(?:(?<=\d)(?:[eE][+\-]?\d+))?$/

根据 @Jason S 的规则避免匹配 3.

Building on @Troy Weber, I would suggest

/^[+\-]?(?=\.\d|\d)(?:0|[1-9]\d*)?(?:\.\d+)?(?:(?<=\d)(?:[eE][+\-]?\d+))?$/

to avoid matching 3., per @Jason S's rules

回复收藏 0 原文

于我来说 2024-07-22 08:58:12

这是我刚刚快速编写的一些 Perl 代码。

my($sign,$coeffl,$coeffr,$exp) = $str =~ /^\s*([-+])?(\d+)(\.\d*)?e([-+]?\d+)\s*$/;

my $shift = length $coeffl;
$shift = 0 if $shift == 1;

my $coeff =
  substr( $coeffl, 0, 1 );

if( $shift || $coeffr ){
  $coeff .=
    '.'.
    substr( $coeffl, 1 );
}

$coeff .= substr( $coeffr, 1 ) if $coeffr;

$coeff = $sign . $coeff if $sign;

$exp += $shift;

say "coeff: $coeff exponent: $exp";

Here is some Perl code I just hacked together quickly.

my($sign,$coeffl,$coeffr,$exp) = $str =~ /^\s*([-+])?(\d+)(\.\d*)?e([-+]?\d+)\s*$/;

my $shift = length $coeffl;
$shift = 0 if $shift == 1;

my $coeff =
  substr( $coeffl, 0, 1 );

if( $shift || $coeffr ){
  $coeff .=
    '.'.
    substr( $coeffl, 1 );
}

$coeff .= substr( $coeffr, 1 ) if $coeffr;

$coeff = $sign . $coeff if $sign;

$exp += $shift;

say "coeff: $coeff exponent: $exp";

回复收藏 0 原文

~没有更多了~