使用 read.csv 时,为什么数据框变量名称中添加了 X?
当我使用R
中的read.csv()
函数加载数据时,我经常发现变量名称中添加了一个X。我想我总是在第一个变量中看到它,但我可能是错的。
起初,我认为 R 可能会这样做,因为我在变量名的开头有一个空格 - 但我没有。
其次,我在某处读到,如果您有一个以数字开头的变量,或者是一个非常短的变量名称,R
会添加 X。变量名称是所有文本和长度这个变量的名字有12个字符,所以它不短。
现在,这纯粹是一种烦恼。我可以重命名该列,但它确实增加了一个步骤,尽管是一小步。
有没有办法防止流氓 X 渗透到我的数据框?
这是我的原始代码:
df <- read.csv("/file/location.filecsv", header=T, sep=",")
这是有问题的变量:
str(orders)
'data.frame': 2620276 obs. of 26 variables:
$ X.OrderDetailID : Factor w/ 2620193 levels "(2620182 row(s) affected)",..: 105845
When I use the read.csv()
function in R
to load data, I often find that an X has been added to variable names. I think I just about always see it it in the first variable, but I could be wrong.
At first, I thought R
might be doing this because I had a space at the beginning of the variable name - I don't.
Second, I had read somewhere that if you have a variable that starts with a number, or is a very short variable name, R
would add the X. The variable name is all text and the length of the name of this variable is 12 characters, so it's not short.
Now, this is purely an annoyance. I can rename the column, but it does add a step, albeit a small one.
Is there a way to prevent this from rogue X from infiltrating my data frame?
Here is my original code:
df <- read.csv("/file/location.filecsv", header=T, sep=",")
Here is the variable in question:
str(orders)
'data.frame': 2620276 obs. of 26 variables:
$ X.OrderDetailID : Factor w/ 2620193 levels "(2620182 row(s) affected)",..: 105845
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
read.table
和read.csv
有一个check.names=
参数,您可以将其设置为FALSE
。例如,尝试使用仅包含标题的输入:
与
read.table
andread.csv
have acheck.names=
argument that you can set toFALSE
.For example, try it with this input consisting of just a header:
versus
这是令人惊讶的行为,但我认为我们需要一个可重现的例子。也许您的文件中隐藏着一些不可见/特殊字符?
表现良好。您能举一个例子来说明您的问题吗?
正如另一个答案中所述,
check.names=FALSE
是一种可能的解决方法。您可以尝试使用 make.names 来确定行为...It is surprising behavior, but I think we would need a reproducible example. Perhaps you have some invisible/special characters hiding in your file?
behaves fine. Can you make an example along these lines that demonstrates your problem?
As described in the other answer,
check.names=FALSE
is a possible workaround. You can experiment withmake.names
to determine the behavior ...正如 Gabor 所说,默认情况下,
read.csv
默认将标题行中的名称转换为有效的变量名称(使用check.names = FALSE
将其关闭)。这是使用函数make.names
完成的。该函数的帮助页面解释了有效变量名称的构成。保留字列表可在帮助页面
?reserved
上找到。另一个条件是变量名称必须为 10000 个字符或更少,但 make.names 不会缩短它。因此,要小心变量名称的真正冗长。
您可以使用检查有效的变量名称
As Gabor said, by default
read.csv
deafults to converting the names in your header row to be valid variable names (usecheck.names = FALSE
to turn this off). This is done using the functionmake.names
. The help page for that function explains what constitutes a valid variable name.The list of reserved words is found on the help page
?reserved
.The other condition is that the variable name must be 10000 characters or less, but
make.names
won't shorten it. So be careful of being really verbose with your variable names.You can check for valid variable names using