使用 GAWK 打印千个独立的浮点数

发布于 2024-07-17 06:46:20 字数 721 浏览 15 评论 0原文

我必须用 gawk 处理一些大文件。我的主要问题是我必须使用千位分隔符打印一些浮点数。例如：10000 在输出中应显示为 10.000，10000,01 应显示为 10.000,01。

我（和谷歌）想出了这个函数，但是对于浮点数来说这失败了：

function commas(n) {
  gsub(/,/,"",n)
  point = index(n,".") - 1
  if (point < 0) point = length(n)
    while (point > 3) {
      point -= 3
      n = substr(n,1,point)"."substr(n,point + 1)
    }
  sub(/-\./,"-",n)
  return d n
}

但是它对于浮点数失败了。

现在我正在考虑将输入拆分为整数和 << 1 部分，然后格式化整数后再次粘合它们，但是没有更好的方法吗？

免责声明：

我不是程序员，
我通过一些 SHELL env 知道这一点。可以设置千位分隔符的变量，但它必须在具有不同语言和/或区域设置的不同环境中工作。
英语是我的第二语言，如果我使用不正确，请抱歉

原文

I must process some huge file with gawk. My main problem is that I have to print some floats using thousand separators. E.g.: 10000 should appear as 10.000 and 10000,01 as 10.000,01 in the output.

I (and Google) come up with this function, but this fails for floats:

function commas(n) {
  gsub(/,/,"",n)
  point = index(n,".") - 1
  if (point < 0) point = length(n)
    while (point > 3) {
      point -= 3
      n = substr(n,1,point)"."substr(n,point + 1)
    }
  sub(/-\./,"-",n)
  return d n
}

But it fails with floats.

Now I'm thinking of splitting the input to an integer and a < 1 part, then after formatting the integer gluing them again, but isn't there a better way to do it?

Disclaimer:

I'm not a programmer
I know that via some SHELL env. variables the thousand separators can be set, but it must be working in different environments with different lang and/or locale settings.
English is my 2nd language, sorry if I'm using it incorrectly

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

我不会写诗 2024-07-24 06:46:20

它因浮点数而失败，因为您传递的是欧洲类型数字（1.000.000,25 表示一百万又四分之一）。如果您只是更改逗号和句点，您给出的函数应该可以工作。我首先使用 1000000.25 测试当前版本，看看它是否适用于非欧洲号码。

可以使用“echo 1 | awk -f xx.gawk”来调用以下 awk 脚本，它将向您显示正在运行的“正常”版本和欧洲版本。它输出：

123,456,789.1234
123.456.789,1234

显然，您只对函数感兴趣，现实世界的代码将使用输入流将值传递给函数，而不是固定字符串。

function commas(n) {
    gsub(/,/,"",n)
    point = index(n,".") - 1
    if (point < 0) point = length(n)
    while (point > 3) {
        point -= 3
        n = substr(n,1,point)","substr(n,point + 1)
    }
    return n
}
function commaseuro(n) {
    gsub(/\./,"",n)
    point = index(n,",") - 1
    if (point < 0) point = length(n)
    while (point > 3) {
        point -= 3
        n = substr(n,1,point)"."substr(n,point + 1)
    }
    return n
}
{ print commas("1234,56789.1234") "\n" commaseuro("12.3456789,1234") }

除了处理逗号和句点之外，这些函数是相同的。在下面的描述中，我们将它们称为分隔符和小数点：

gsub 删除所有现有的分隔符，因为我们将把它们放回去。
point 找到小数点所在的位置，因为那是我们的起点。
如果没有小数，则 if 语句从末尾开始。
当还剩下三个以上的字符时我们会循环。
在循环内部，我们调整插入分隔符的位置，然后插入。
循环完成后，我们返回调整后的值。

It fails with floats because you're passing in European type numbers (1.000.000,25 for a million and a quarter). The function you've given should work if you just change over commas and periods. I'd test the current version first with 1000000.25 to see if it works with non-European numbers.

The following awk script can be called with "echo 1 | awk -f xx.gawk" and it will show you both the "normal" and European version in action. It outputs:

123,456,789.1234
123.456.789,1234

Obviously, you're only interested in the functions, real-world code would use the input stream to pass values to the functions, not a fixed string.

function commas(n) {
    gsub(/,/,"",n)
    point = index(n,".") - 1
    if (point < 0) point = length(n)
    while (point > 3) {
        point -= 3
        n = substr(n,1,point)","substr(n,point + 1)
    }
    return n
}
function commaseuro(n) {
    gsub(/\./,"",n)
    point = index(n,",") - 1
    if (point < 0) point = length(n)
    while (point > 3) {
        point -= 3
        n = substr(n,1,point)"."substr(n,point + 1)
    }
    return n
}
{ print commas("1234,56789.1234") "\n" commaseuro("12.3456789,1234") }

The functions are identical except in their handling of commas and periods. We'll call them separators and decimals in the following description:

gsub removes all of the existing separators since we'll be putting them back.
point finds where the decimal is since that's our starting point.
if there's no decimal, the if-statement starts at the end.
we loop while there's more than three characters left.
inside the loop, we adjust the position for inserting a separator, and insert it.
once the loop is finished, we return the adjusted value.

回复收藏 0 原文