如何使用sep2 =“”在Data.Table上
有没有一种方法可以使用data.table :: fwrite
写入列的值而没有它们之间没有任何分离的值?
例如:
library("data.table")
geno <- data.table(
IID = 1:10,
SNP = lapply(1:10, function(i) sample(0:2, 10, replace = TRUE))
)
fwrite(geno, "Geno.txt", col.names = FALSE, sep = " ", sep2 = c("","",""))
但是SEP2不允许它,并给我以下错误:
Error in fwrite(geno, "Geno.txt", col.names = FALSE, row.names = FALSE, :
is.character(sep2) && length(sep2) == 3L && nchar(sep2[2L]) == .... is not TRUE
我想获得以下结果,而不必在将其写入文件之前折叠所有值。
1 2221210202
2 0020010221
3 1010022212
4 0120121221
5 1212211202
6 2100002010
7 1110011210
8 1212012121
9 2221121021
10 1122220101
谢谢。
Is there a way to use data.table::fwrite
to write the values of a column without any separation between them?
For example:
library("data.table")
geno <- data.table(
IID = 1:10,
SNP = lapply(1:10, function(i) sample(0:2, 10, replace = TRUE))
)
fwrite(geno, "Geno.txt", col.names = FALSE, sep = " ", sep2 = c("","",""))
But the sep2 does not allow it and gives me the following error:
Error in fwrite(geno, "Geno.txt", col.names = FALSE, row.names = FALSE, :
is.character(sep2) && length(sep2) == 3L && nchar(sep2[2L]) == .... is not TRUE
I would like to have the following result, without having to collapse all values before writing it to a file.
1 2221210202
2 0020010221
3 1010022212
4 0120121221
5 1212211202
6 2100002010
7 1110011210
8 1212012121
9 2221121021
10 1122220101
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
根据
?fwrite
,sep2 [2] 必须是一个字符。因此,您必须折叠列表,而不是使用sep2。您可以使用
According to
?fwrite
, sep2[2] must be a single character. Therefore you have to collapse the list, rather than use sep2.You can use
替代方法:用已知在数据中不存在的字符编写,然后在文件上以编程方式删除它。这里的第二步可以在R中完成,但是坦率地说,命令行工具对此速度更快。我将在此处使用
tr
,因为它可能是最快的。Tr
应在包括MacOS在内的所有类似Unix的OS上可用,Windows下的Windows rtools-4.0中都可以使用。 /code>或最接近安装的路径。
为此,我选择了unicode
\ 037
,许多东西用作 “定界符” ,在大多数数据集中似乎不太可能找到。但是,其他人将同样容易工作,包括sep2 = c(“”,“ |”,“”)
,“ system2(tr”,c(“ - d”,“” | ”),...)
。Alternative: write it with a character known to not exist in the data and then remove it programmatically on the file. The second step here can be done in R, but frankly command-line tools are much faster at this. I'll use
tr
here, as it is likely to be the fastest.tr
should be available on all unix-like OSes including MacOS, and within Rtools-4.0 for windows under"c:\\rtools40\\usr\\bin\\tr.exe"
or whichever path is closest for your install.For this, I chose the unicode
\037
which is used by many things as a "Delimiter", and seems unlikely to be found in most datasets. However, others will work just as easily, includingsep2 = c("", "|", "")
with"system2(tr", c("-d", "|"), ...)
.