在 R 中将 UTF-8 BOM 导出为 .csv

发布于 2024-12-04 14:20:30 字数 633 浏览 0 评论 0原文

我正在通过 RJDBC 从 MySQL 数据库读取文件,它正确显示 R 中的所有字母(例如,נווה שאנן)。 但是,即使使用 write.csv 和 fileEncoding="UTF-8" 导出它,输出也看起来像 <代码>.(在本例中这不是上面的字符串,而是保加利亚语字符串),适用于保加利亚语、希伯来语、中文等。其他特殊字符如 ã、ç 等都可以正常工作。

我怀疑这是因为UTF-8 BOM,但我在网上没有找到解决方案

我的操作系统是德国Windows7。

编辑:我尝试了

con<-file("file.csv",encoding="UTF-8")
write.csv(x,con,row.names=FALSE)

(据我所知)等效的 write.csv(x, file="file.csv",fileEncoding="UTF-8",row.names=FALSE)

I am reading a file through RJDBC from a MySQL database and it correctly displays all letters in R (e.g., נווה שאנן).
However, even when exporting it using write.csv and fileEncoding="UTF-8" the output looks like
<U+0436>.<U+043A>. <U+041B><U+043E><U+0437><U+0435><U+043D><U+0435><U+0446>(in this case this is not the string above but a Bulgarian one) for Bulgarian, Hebrew, Chinese and so on. Other special characters like ã,ç etc work fine.

I suspect this is because of UTF-8 BOM but I did not find a solution on the net

My OS is a German Windows7.

edit: I tried

con<-file("file.csv",encoding="UTF-8")
write.csv(x,con,row.names=FALSE)

and the (afaik) equivalent write.csv(x, file="file.csv",fileEncoding="UTF-8",row.names=FALSE).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

霊感 2024-12-11 14:20:30

接受的答案对我在类似的应用程序中没有帮助(Windows 中的 R 3.1,当我尝试在 Excel 中打开文件时)。无论如何,基于 file 文档的这一部分:

如果在写入时需要(不推荐)BOM,则应显式写入,例如通过 writeChar("\ufeff", con, eos = NULL) 或 writeBin(as.raw(c(0xef, 0xbb, 0xbf)),binary_con)

我想出了以下解决方法:

write.csv.utf8.BOM <- function(df, filename)
{
    con <- file(filename, "w")
    tryCatch({
    for (i in 1:ncol(df))
        df[,i] = iconv(df[,i], to = "UTF-8") 
    writeChar(iconv("\ufeff", to = "UTF-8"), con, eos = NULL)
    write.csv(df, file = con)
    },finally = {close(con)})
}

请注意,df是data.frame,filename是csv文件的路径。

The accepted answer did not help me in a similar application (R 3.1 in Windows, while I was trying to open the file in Excel). Anyway, based on this part of file documentation:

If a BOM is required (it is not recommended) when writing it should be written explicitly, e.g. by writeChar("\ufeff", con, eos = NULL) or writeBin(as.raw(c(0xef, 0xbb, 0xbf)), binary_con)

I came up with the following workaround:

write.csv.utf8.BOM <- function(df, filename)
{
    con <- file(filename, "w")
    tryCatch({
    for (i in 1:ncol(df))
        df[,i] = iconv(df[,i], to = "UTF-8") 
    writeChar(iconv("\ufeff", to = "UTF-8"), con, eos = NULL)
    write.csv(df, file = con)
    },finally = {close(con)})
}

Note that df is the data.frame and filename is the path to the csv file.

猫弦 2024-12-11 14:20:30

Encoding 的帮助页面 (help("Encoding")) 您可以阅读有关特殊编码 - bytes 的信息。

使用它,我能够通过以下方式生成 csv 文件:

v <- "נווה שאנן"
X <- data.frame(v1=rep(v,3), v2=LETTERS[1:3], v3=0, stringsAsFactors=FALSE)

Encoding(X$v1) <- "bytes"
write.csv(X, "test.csv", row.names=FALSE)

注意 factorcharacter 之间的差异。以下应该有效:

id_characters <- which(sapply(X,
    function(x) is.character(x) && Encoding(x)=="UTF-8"))
for (i in id_characters) Encoding(X[[i]]) <- "bytes"

id_factors <- which(sapply(X,
    function(x) is.factor(x) && Encoding(levels(x))=="UTF-8"))
for (i in id_factors) Encoding(levels(X[[i]])) <- "bytes"

write.csv(X, "test.csv", row.names=FALSE)

On help page to Encoding (help("Encoding")) you could read about special encoding - bytes.

Using this I was able to generate csv file by:

v <- "נווה שאנן"
X <- data.frame(v1=rep(v,3), v2=LETTERS[1:3], v3=0, stringsAsFactors=FALSE)

Encoding(X$v1) <- "bytes"
write.csv(X, "test.csv", row.names=FALSE)

Take care about differences between factor and character. The following should work:

id_characters <- which(sapply(X,
    function(x) is.character(x) && Encoding(x)=="UTF-8"))
for (i in id_characters) Encoding(X[[i]]) <- "bytes"

id_factors <- which(sapply(X,
    function(x) is.factor(x) && Encoding(levels(x))=="UTF-8"))
for (i in id_factors) Encoding(levels(X[[i]])) <- "bytes"

write.csv(X, "test.csv", row.names=FALSE)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文