可靠地导入CSV列的“双重”
我正在尝试在for循环中导入多个CSV文件。迭代地试图求解所产生的代码的错误,我可以在下面完成此操作。
for (E in EDCODES) {
Filename <- paste("$. Data/2. Liabilities/",
E,
sep="")
Framename <- gsub("\\..*",
"",
E)
assign(Framename,
read.csv(Filename,
header = TRUE,
sep = ",",
stringsAsFactors = FALSE,
na.strings = c("\"ND",
"ND,5",
"5\""),
colClasses = c("BAA35" = "double"),
encoding = "UTF-8",
quote = ""))}
首先,我意识到代码并不总是将最重要的列“ BAA35”识别为数字,因此我添加了colclasses参数。然后我意识到数据具有“ NA”的多个版本,因此我添加了Na.strings参数。最常见的Na值是“ ND,5”,其中包含分离器”。因此,如果我添加上面定义的Na.strings参数,我会在引用的字符串警告中获得很多eof。其他也是“ ND,[Number]”或“ ND,4,[Yyyy-Mm]的版本”。
如果我尝试用我可以找到的最常见的建议来对待这个问题,添加QUOTE =“”
我最终会比列更多,而不是列名称
问题。
数据有78列,因此我不相信将其发布在此处将以可用的方式显示。
有人可以建议我如何可靠地将本列作为数字值导入并让R正确识别NAS吗?
我认为问题可能是Na.strings包含逗号,在某些情况下,第5条被读为一个带有ND的一列,另一种为5,在其他情况下则被视为Na.String。有什么办法告诉r不要将“ nd,5”分为两列?
I am trying to import multiple CSV files in a for loop. Iteratively trying to solve the errors the code produced I go to the below to do this.
for (E in EDCODES) {
Filename <- paste("$. Data/2. Liabilities/",
E,
sep="")
Framename <- gsub("\\..*",
"",
E)
assign(Framename,
read.csv(Filename,
header = TRUE,
sep = ",",
stringsAsFactors = FALSE,
na.strings = c("\"ND",
"ND,5",
"5\""),
colClasses = c("BAA35" = "double"),
encoding = "UTF-8",
quote = ""))}
First I realized that the code does not always recognize the most important column "BAA35" as numeric, so I added the colClasses argument. Then I realized that the data has multiple versions of "NA", so I added the na.strings argument. The most common NA value is "ND, 5", which contains the separator ",". So if I add the na.strings argument as defined above I get a lot of EOF within quoted string
warnings. The others are also versions of "ND, [NUMBER]" or "ND, 4, [YYYY-MM]".
If I then try to treat that issue with the most common recommendation I could find, adding quote = ""
I just end up with a more columns than column names
issue.
The data has 78 columns, so I don't believe posting it here will display in a usable way.
Can somebody recommend any solution for how I can reliable import this column as a numeric value and have R recognize NAs in the data correctly?
I think the issue might be that the na.strings contain commas and in some cases the ND,5 is read as one column with ND and one with a 5 and in other cases it's seen as the na.string. Any way to tell R to not split "ND,5" into two columns?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论