如何添加前导零?

发布于 2024-11-03 21:40:24 字数 615 浏览 8 评论 0原文

我有一组数据,看起来像这样:

anim <- c(25499,25500,25501,25502,25503,25504)
sex  <- c(1,2,2,1,2,1)
wt   <- c(0.8,1.2,1.0,2.0,1.8,1.4)
data <- data.frame(anim,sex,wt)

data
   anim sex  wt anim2
1 25499   1 0.8     2
2 25500   2 1.2     2
3 25501   2 1.0     2
4 25502   1 2.0     2
5 25503   2 1.8     2
6 25504   1 1.4     2

我想在每个动物 id 之前添加一个零:

data
   anim sex  wt anim2
1 025499   1 0.8     2
2 025500   2 1.2     2
3 025501   2 1.0     2
4 025502   1 2.0     2
5 025503   2 1.8     2
6 025504   1 1.4     2

出于兴趣,如果我需要在动物 id 之前添加两个或三个零怎么办?

I have a set of data which looks something like this:

anim <- c(25499,25500,25501,25502,25503,25504)
sex  <- c(1,2,2,1,2,1)
wt   <- c(0.8,1.2,1.0,2.0,1.8,1.4)
data <- data.frame(anim,sex,wt)

data
   anim sex  wt anim2
1 25499   1 0.8     2
2 25500   2 1.2     2
3 25501   2 1.0     2
4 25502   1 2.0     2
5 25503   2 1.8     2
6 25504   1 1.4     2

I would like a zero to be added before each animal id:

data
   anim sex  wt anim2
1 025499   1 0.8     2
2 025500   2 1.2     2
3 025501   2 1.0     2
4 025502   1 2.0     2
5 025503   2 1.8     2
6 025504   1 1.4     2

And for interest sake, what if I need to add two or three zeros before the animal id's?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

荒路情人 2024-11-10 21:40:24

简短版本:使用 formatCsprintf


较长版本:

有多种函数可用于格式化数字,包括添加前导零。哪一种最好取决于您想要执行的其他格式设置。

问题中的示例非常简单,因为所有值一开始都有相同的位数,所以让我们尝试一个更难的示例,即制作 10 宽 8 的幂。

anim <- 25499:25504
x <- 10 ^ (0:5)

粘贴(以及它的变体 paste0) 通常是您遇到的第一个字符串操作函数。它们并不是真正为操纵数字而设计的,但它们可以用于此目的。在我们总是需要在前面添加一个零的简单情况下,paste0 是最好的解决方案。

paste0("0", anim)
## [1] "025499" "025500" "025501" "025502" "025503" "025504"

对于数字中位数可变的情况,您必须手动计算要在前面添加多少个零,这太可怕了,您只能出于病态的好奇心才这样做。


str_pad 来自< code>stringr 的工作方式与 paste 类似,使得您想要填充的内容更加明确。

library(stringr)
str_pad(anim, 6, pad = "0")
## [1] "025499" "025500" "025501" "025502" "025503" "025504"

同样,它并不是真正设计用于数字,因此更困难的情况需要稍微考虑一下。我们应该只能说“用零填充宽度为 8”,但看看这个输出:

str_pad(x, 8, pad = "0")
## [1] "00000001" "00000010" "00000100" "00001000" "00010000" "0001e+05"

您需要设置科学惩罚 选项 以便数字始终使用固定表示法(而不是科学记数法)进行格式化。

library(withr)
with_options(
  c(scipen = 999), 
  str_pad(x, 8, pad = "0")
)
## [1] "00000001" "00000010" "00000100" "00001000" "00010000" "00100000"

stri_pad stringi 中的工作方式与 stringr 中的 str_pad 完全相同。


formatC 是一个C 函数 printf 的接口。使用它需要了解该底层函数的奥秘(请参阅链接)。在本例中,重要的一点是 width 参数、format 为“integer”的 "d" 以及 "0 " flag 用于前置零。

formatC(anim, width = 6, format = "d", flag = "0")
## [1] "025499" "025500" "025501" "025502" "025503" "025504"
formatC(x, width = 8, format = "d", flag = "0")
## [1] "00000001" "00000010" "00000100" "00001000" "00010000" "00100000"

这是我最喜欢的解决方案,因为很容易修改宽度,并且该功能足够强大,可以进行其他格式更改。


sprintf 是一个同名 C 函数的接口;类似于 formatC 但具有不同的语法。

sprintf("%06d", anim)
## [1] "025499" "025500" "025501" "025502" "025503" "025504"
sprintf("%08d", x)
## [1] "00000001" "00000010" "00000100" "00001000" "00010000" "00100000"

sprintf 的主要优点是您可以将格式化的数字嵌入到较长的文本中。

sprintf(
  "Animal ID %06d was a %s.", 
  anim, 
  sample(c("lion", "tiger"), length(anim), replace = TRUE)
)
## [1] "Animal ID 025499 was a tiger." "Animal ID 025500 was a tiger."
## [3] "Animal ID 025501 was a lion."  "Animal ID 025502 was a tiger."
## [5] "Animal ID 025503 was a tiger." "Animal ID 025504 was a lion." 

另请参阅goodside 的回答


为了完整起见,值得一提的是其他格式化函数,它们偶尔有用,但没有前置零的方法。

格式,用于格式化任何类型对象的通用函数,具有数字方法。它的工作原理有点像 formatC,但有另一个接口。

prettyNum 尚未另一个格式化函数,主要用于创建手动轴刻度标签。它对于大范围的数字特别有效。

scales 包具有多种功能,例如 百分比date_formatdollar 适用于专业格式类型。

The short version: use formatC or sprintf.


The longer version:

There are several functions available for formatting numbers, including adding leading zeroes. Which one is best depends upon what other formatting you want to do.

The example from the question is quite easy since all the values have the same number of digits to begin with, so let's try a harder example of making powers of 10 width 8 too.

anim <- 25499:25504
x <- 10 ^ (0:5)

paste (and it's variant paste0) are often the first string manipulation functions that you come across. They aren't really designed for manipulating numbers, but they can be used for that. In the simple case where we always have to prepend a single zero, paste0 is the best solution.

paste0("0", anim)
## [1] "025499" "025500" "025501" "025502" "025503" "025504"

For the case where there are a variable number of digits in the numbers, you have to manually calculate how many zeroes to prepend, which is horrible enough that you should only do it out of morbid curiosity.


str_pad from stringr works similarly to paste, making it more explicit that you want to pad things.

library(stringr)
str_pad(anim, 6, pad = "0")
## [1] "025499" "025500" "025501" "025502" "025503" "025504"

Again, it isn't really designed for use with numbers, so the harder case requires a little thinking about. We ought to just be able to say "pad with zeroes to width 8", but look at this output:

str_pad(x, 8, pad = "0")
## [1] "00000001" "00000010" "00000100" "00001000" "00010000" "0001e+05"

You need to set the scientific penalty option so that numbers are always formatted using fixed notation (rather than scientific notation).

library(withr)
with_options(
  c(scipen = 999), 
  str_pad(x, 8, pad = "0")
)
## [1] "00000001" "00000010" "00000100" "00001000" "00010000" "00100000"

stri_pad in stringi works exactly like str_pad from stringr.


formatC is an interface to the C function printf. Using it requires some knowledge of the arcana of that underlying function (see link). In this case, the important points are the width argument, format being "d" for "integer", and a "0" flag for prepending zeroes.

formatC(anim, width = 6, format = "d", flag = "0")
## [1] "025499" "025500" "025501" "025502" "025503" "025504"
formatC(x, width = 8, format = "d", flag = "0")
## [1] "00000001" "00000010" "00000100" "00001000" "00010000" "00100000"

This is my favourite solution, since it is easy to tinker with changing the width, and the function is powerful enough to make other formatting changes.


sprintf is an interface to the C function of the same name; like formatC but with a different syntax.

sprintf("%06d", anim)
## [1] "025499" "025500" "025501" "025502" "025503" "025504"
sprintf("%08d", x)
## [1] "00000001" "00000010" "00000100" "00001000" "00010000" "00100000"

The main advantage of sprintf is that you can embed formatted numbers inside longer bits of text.

sprintf(
  "Animal ID %06d was a %s.", 
  anim, 
  sample(c("lion", "tiger"), length(anim), replace = TRUE)
)
## [1] "Animal ID 025499 was a tiger." "Animal ID 025500 was a tiger."
## [3] "Animal ID 025501 was a lion."  "Animal ID 025502 was a tiger."
## [5] "Animal ID 025503 was a tiger." "Animal ID 025504 was a lion." 

See also goodside's answer.


For completeness it is worth mentioning the other formatting functions that are occasionally useful, but have no method of prepending zeroes.

format, a generic function for formatting any kind of object, with a method for numbers. It works a little bit like formatC, but with yet another interface.

prettyNum is yet another formatting function, mostly for creating manual axis tick labels. It works particularly well for wide ranges of numbers.

The scales package has several functions such as percent, date_format and dollar for specialist format types.

初见 2024-11-10 21:40:24

对于无论 data$anim 中有多少位都有效的通用解决方案,请使用 sprintf 函数。它的工作原理如下:

sprintf("%04d", 1)
# [1] "0001"
sprintf("%04d", 104)
# [1] "0104"
sprintf("%010d", 104)
# [1] "0000000104"

在您的情况下,您可能需要: data$anim <- sprintf("%06d", data$anim)

For a general solution that works regardless of how many digits are in data$anim, use the sprintf function. It works like this:

sprintf("%04d", 1)
# [1] "0001"
sprintf("%04d", 104)
# [1] "0104"
sprintf("%010d", 104)
# [1] "0000000104"

In your case, you probably want: data$anim <- sprintf("%06d", data$anim)

筑梦 2024-11-10 21:40:24

扩展 @goodside 的回复:

在某些情况下,您可能需要用零填充字符串(例如 fips 代码或其他类似数字的因素)。在 OSX/Linux 中:

> sprintf("%05s", "104")
[1] "00104"

但是由于 sprintf() 调用操作系统的 C sprintf() 命令,因此讨论了 此处,在 Windows 7 中,您会得到不同的结果:

> sprintf("%05s", "104")
[1] "  104"

因此,在 Windows 计算机上,解决方法是:

> sprintf("%05d", as.numeric("104"))
[1] "00104"

Expanding on @goodside's repsonse:

In some cases you may want to pad a string with zeros (e.g. fips codes or other numeric-like factors). In OSX/Linux:

> sprintf("%05s", "104")
[1] "00104"

But because sprintf() calls the OS's C sprintf() command, discussed here, in Windows 7 you get a different result:

> sprintf("%05s", "104")
[1] "  104"

So on Windows machines the work around is:

> sprintf("%05d", as.numeric("104"))
[1] "00104"
怪我闹别瞎闹 2024-11-10 21:40:24

stringr 包中的 str_pad 是一种替代方案。

anim = 25499:25504
str_pad(anim, width=6, pad="0")

str_pad from the stringr package is an alternative.

anim = 25499:25504
str_pad(anim, width=6, pad="0")
情徒 2024-11-10 21:40:24

这是一个通用的基本 R 函数:

pad_left <- function(x, len = 1 + max(nchar(x)), char = '0'){

    unlist(lapply(x, function(x) {
        paste0(
            paste(rep(char, len - nchar(x)), collapse = ''),
            x
        )
    }))
}

pad_left(1:100)

我喜欢 sprintf,但它有一些警告,例如:

但是实际实现将遵循 C99 标准,细节(尤其是用户错误下的行为)可能取决于平台

Here's a generalizable base R function:

pad_left <- function(x, len = 1 + max(nchar(x)), char = '0'){

    unlist(lapply(x, function(x) {
        paste0(
            paste(rep(char, len - nchar(x)), collapse = ''),
            x
        )
    }))
}

pad_left(1:100)

I like sprintf but it comes with caveats like:

however the actual implementation will follow the C99 standard and fine details (especially the behaviour under user error) may depend on the platform

a√萤火虫的光℡ 2024-11-10 21:40:24

这是另一种在字符串中添加前导 0 的替代方法,例如 CUSIPs 有时看起来像数字,许多应用程序(例如 Excel)会损坏并删除前导 0 或将其转换为科学记数法。

当我尝试 @metasequoia 提供的答案时,返回的向量有前导空格而不是 0。这与 @user1816679 提到的问题相同 - 删除 0 周围的引号或从 %d 更改为 %s 并没有使也有区别。仅供参考,我正在使用在 Ubuntu 服务器上运行的 RStudio 服务器。这个两步解决方案对我有用:

gsub(pattern = " ", replacement = "0", x = sprintf(fmt = "%09s", ids[,CUSIP]))

使用来自 magrittr 包的 %>% 管道函数可能如下所示:

sprintf(fmt = "%09s", ids[,CUSIP]) %> ;% gsub(pattern = " ", replacement = "0", x = .)

我更喜欢单一功能的解决方案,但它确实有效。

Here is another alternative for adding leading to 0s to strings such as CUSIPs which can sometimes look like a number and which many applications such as Excel will corrupt and remove the leading 0s or convert them to scientific notation.

When I tried the answer provided by @metasequoia the vector returned had leading spaces and not 0s. This was the same problem mentioned by @user1816679 -- and removing the quotes around the 0 or changing from %d to %s did not make a difference either. FYI, I am using RStudio Server running on an Ubuntu Server. This little two-step solution worked for me:

gsub(pattern = " ", replacement = "0", x = sprintf(fmt = "%09s", ids[,CUSIP]))

using the %>% pipe function from the magrittr package it could look like this:

sprintf(fmt = "%09s", ids[,CUSIP]) %>% gsub(pattern = " ", replacement = "0", x = .)

I'd prefer a one-function solution, but it works.

爱冒险 2024-11-10 21:40:24

对于其他希望数字串一致的情况,我做了一个函数。

有人可能会发现这很有用:

idnamer<-function(x,y){#Alphabetical designation and number of integers required
    id<-c(1:y)
    for (i in 1:length(id)){
         if(nchar(id[i])<2){
            id[i]<-paste("0",id[i],sep="")
         }
    }
    id<-paste(x,id,sep="")
    return(id)
}
idnamer("EF",28)

对格式感到抱歉。

For other circumstances in which you want the number string to be consistent, I made a function.

Someone may find this useful:

idnamer<-function(x,y){#Alphabetical designation and number of integers required
    id<-c(1:y)
    for (i in 1:length(id)){
         if(nchar(id[i])<2){
            id[i]<-paste("0",id[i],sep="")
         }
    }
    id<-paste(x,id,sep="")
    return(id)
}
idnamer("EF",28)

Sorry about the formatting.

李白 2024-11-10 21:40:24
data$anim <- sapply(0, paste0,data$anim)
data$anim <- sapply(0, paste0,data$anim)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文