使用 read.csv 时,为什么数据框变量名称中添加了 X?

发布于 2025-01-01 09:25:24 字数 625 浏览 0 评论 0原文

当我使用R中的read.csv()函数加载数据时,我经常发现变量名称中添加了一个X。我想我总是在第一个变量中看到它,但我可能是错的。

起初,我认为 R 可能会这样做,因为我在变量名的开头有一个空格 - 但我没有。

其次,我在某处读到,如果您有一个以数字开头的变量,或者是一个非常短的变量名称,R 会添加 X。变量名称是所有文本和长度这个变量的名字有12个字符,所以它不短。

现在,这纯粹是一种烦恼。我可以重命名该列,但它确实增加了一个步骤,尽管是一小步。

有没有办法防止流氓 X 渗透到我的数据框?

这是我的原始代码:

df <- read.csv("/file/location.filecsv", header=T, sep=",")

这是有问题的变量:

str(orders)
'data.frame':   2620276 obs. of  26 variables:
 $ X.OrderDetailID    : Factor w/ 2620193 levels "(2620182 row(s) affected)",..: 105845

When I use the read.csv() function in R to load data, I often find that an X has been added to variable names. I think I just about always see it it in the first variable, but I could be wrong.

At first, I thought R might be doing this because I had a space at the beginning of the variable name - I don't.

Second, I had read somewhere that if you have a variable that starts with a number, or is a very short variable name, R would add the X. The variable name is all text and the length of the name of this variable is 12 characters, so it's not short.

Now, this is purely an annoyance. I can rename the column, but it does add a step, albeit a small one.

Is there a way to prevent this from rogue X from infiltrating my data frame?

Here is my original code:

df <- read.csv("/file/location.filecsv", header=T, sep=",")

Here is the variable in question:

str(orders)
'data.frame':   2620276 obs. of  26 variables:
 $ X.OrderDetailID    : Factor w/ 2620193 levels "(2620182 row(s) affected)",..: 105845

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

枯叶蝶 2025-01-08 09:25:24

read.tableread.csv 有一个 check.names= 参数,您可以将其设置为 FALSE

例如,尝试使用仅包含标题的输入:

> read.csv(text = "a,1,b")
[1] a  X1 b 
<0 rows> (or 0-length row.names)

> read.csv(text = "a,1,b", check.names = FALSE)
[1] a 1 b
<0 rows> (or 0-length row.names)

read.table and read.csv have a check.names= argument that you can set to FALSE.

For example, try it with this input consisting of just a header:

> read.csv(text = "a,1,b")
[1] a  X1 b 
<0 rows> (or 0-length row.names)

versus

> read.csv(text = "a,1,b", check.names = FALSE)
[1] a 1 b
<0 rows> (or 0-length row.names)
卸妝后依然美 2025-01-08 09:25:24

这是令人惊讶的行为,但我认为我们需要一个可重现的例子。也许您的文件中隐藏着一些不可见/特殊字符?

names(read.csv(textConnection(
"abcdefghijkl, a1,2x")))

表现良好。您能举一个例子来说明您的问题吗?

正如另一个答案中所述, check.names=FALSE 是一种可能的解决方法。您可以尝试使用 make.names 来确定行为...

It is surprising behavior, but I think we would need a reproducible example. Perhaps you have some invisible/special characters hiding in your file?

names(read.csv(textConnection(
"abcdefghijkl, a1,2x")))

behaves fine. Can you make an example along these lines that demonstrates your problem?

As described in the other answer, check.names=FALSE is a possible workaround. You can experiment with make.names to determine the behavior ...

久隐师 2025-01-08 09:25:24

正如 Gabor 所说,默认情况下,read.csv 默认将标题行中的名称转换为有效的变量名称(使用 check.names = FALSE 将其关闭)。这是使用函数 make.names 完成的。该函数的帮助页面解释了有效变量名称的构成。

语法上有效的名称由字母、数字和点或
下划线字符并以字母或不跟随的点开头
通过一个数字。诸如“.2way”之类的名称无效,
保留字。

保留字列表可在帮助页面?reserved 上找到。

另一个条件是变量名称必须为 10000 个字符或更少,但 make.names 不会缩短它。因此,要小心变量名称的真正冗长。

您可以使用检查有效的变量名称

library(assertive.code)
is_valid_variable_name(x)

As Gabor said, by default read.csv deafults to converting the names in your header row to be valid variable names (use check.names = FALSE to turn this off). This is done using the function make.names. The help page for that function explains what constitutes a valid variable name.

A syntactically valid name consists of letters, numbers and the dot or
underline characters and starts with a letter or the dot not followed
by a number. Names such as ".2way" are not valid, and neither are the
reserved words.

The list of reserved words is found on the help page ?reserved.

The other condition is that the variable name must be 10000 characters or less, but make.names won't shorten it. So be careful of being really verbose with your variable names.

You can check for valid variable names using

library(assertive.code)
is_valid_variable_name(x)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文