如何使用sep2 =“”在Data.Table上

发布于 2025-02-12 21:55:43 字数 741 浏览 1 评论 0原文

有没有一种方法可以使用data.table :: fwrite写入列的值而没有它们之间没有任何分离的值？

例如：

library("data.table")
geno <- data.table(
  IID = 1:10,
  SNP = lapply(1:10, function(i) sample(0:2, 10, replace = TRUE))
)
fwrite(geno, "Geno.txt", col.names = FALSE, sep = " ", sep2 = c("","",""))

但是SEP2不允许它，并给我以下错误：

Error in fwrite(geno, "Geno.txt", col.names = FALSE, row.names = FALSE,  : 
  is.character(sep2) && length(sep2) == 3L && nchar(sep2[2L]) ==  .... is not TRUE

我想获得以下结果，而不必在将其写入文件之前折叠所有值。

1 2221210202
2 0020010221
3 1010022212
4 0120121221
5 1212211202
6 2100002010
7 1110011210
8 1212012121
9 2221121021
10 1122220101

谢谢。

原文

Is there a way to use data.table::fwrite to write the values of a column without any separation between them?

For example:

library("data.table")
geno <- data.table(
  IID = 1:10,
  SNP = lapply(1:10, function(i) sample(0:2, 10, replace = TRUE))
)
fwrite(geno, "Geno.txt", col.names = FALSE, sep = " ", sep2 = c("","",""))

But the sep2 does not allow it and gives me the following error:

Error in fwrite(geno, "Geno.txt", col.names = FALSE, row.names = FALSE,  : 
  is.character(sep2) && length(sep2) == 3L && nchar(sep2[2L]) ==  .... is not TRUE

I would like to have the following result, without having to collapse all values before writing it to a file.

1 2221210202
2 0020010221
3 1010022212
4 0120121221
5 1212211202
6 2100002010
7 1110011210
8 1212012121
9 2221121021
10 1122220101

Thank you.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

非要怀念 2025-02-19 21:55:43

根据？fwrite，sep2 [2] 必须是一个字符。因此，您必须折叠列表，而不是使用sep2。

您可以使用

fwrite(geno[, .(IID, SNP=sapply(SNP, paste0, collapse=''))], 'test.txt', sep=' ')

According to ?fwrite, sep2[2] must be a single character. Therefore you have to collapse the list, rather than use sep2.

You can use

fwrite(geno[, .(IID, SNP=sapply(SNP, paste0, collapse=''))], 'test.txt', sep=' ')

回复收藏 0 原文

人间不值得 2025-02-19 21:55:43

替代方法：用已知在数据中不存在的字符编写，然后在文件上以编程方式删除它。这里的第二步可以在R中完成，但是坦率地说，命令行工具对此速度更快。我将在此处使用tr，因为它可能是最快的。

fwrite(geno, "Geno.txt", col.names = FALSE, sep = " ", sep2 = c("","\037",""))
readLines("Geno.txt", n=2)
# [1] "1 1\0372\0371\0372\0372\0370\0371\0370\0372\0370" "2 1\0370\0372\0372\0371\0370\0372\0372\0371\0370"

system2("tr", c("-d", "\037"), stdin="Geno.txt", stdout="Geno2.txt")
readLines("Geno2.txt", n=2)
# [1] "1 1212201020" "2 1022102210"

Tr应在包括MacOS在内的所有类似Unix的OS上可用，Windows 下的Windows rtools-4.0中都可以使用。 /code>或最接近安装的路径。

为此，我选择了unicode \ 037，许多东西用作 “定界符” ，在大多数数据集中似乎不太可能找到。但是，其他人将同样容易工作，包括sep2 = c（“”，“ |”，“”）， “ system2（tr”，c（“ - d”，“” | ”），...）。

Alternative: write it with a character known to not exist in the data and then remove it programmatically on the file. The second step here can be done in R, but frankly command-line tools are much faster at this. I'll use tr here, as it is likely to be the fastest.

fwrite(geno, "Geno.txt", col.names = FALSE, sep = " ", sep2 = c("","\037",""))
readLines("Geno.txt", n=2)
# [1] "1 1\0372\0371\0372\0372\0370\0371\0370\0372\0370" "2 1\0370\0372\0372\0371\0370\0372\0372\0371\0370"

system2("tr", c("-d", "\037"), stdin="Geno.txt", stdout="Geno2.txt")
readLines("Geno2.txt", n=2)
# [1] "1 1212201020" "2 1022102210"

tr should be available on all unix-like OSes including MacOS, and within Rtools-4.0 for windows under "c:\\rtools40\\usr\\bin\\tr.exe" or whichever path is closest for your install.

For this, I chose the unicode \037 which is used by many things as a "Delimiter", and seems unlikely to be found in most datasets. However, others will work just as easily, including sep2 = c("", "|", "") with "system2(tr", c("-d", "|"), ...).

回复收藏 0 原文

~没有更多了~