将 csv 数据集导入到 R 但值变成因子

发布于 2024-10-20 18:01:14 字数 323 浏览 11 评论 0原文

我对 R 非常陌生，在访问我导入的数据集时遇到问题。我正在使用 RStudio，并在导入 csv 文件时使用导入数据集功能，并将该行从控制台窗口粘贴到源窗口。代码如下：

setwd("c:/kalle/R")
stuckey <- read.csv("C:/kalle/R/stuckey.csv")
point <- stuckey$PTS
time <- stuckey$MP

但是，数据不是我习惯的整数或数字，而是因子，因此当我尝试绘制变量时，我只得到直方图，而不是通常的图。检查数据时，它似乎是有序的，只是我无法使用它，因为它是因子形式。

原文

I am very new to R and I am having trouble accessing a dataset I've imported. I'm using RStudio and used the Import Dataset function when importing my csv-file and pasted the line from the console-window to the source-window. The code looks as follows:

setwd("c:/kalle/R")
stuckey <- read.csv("C:/kalle/R/stuckey.csv")
point <- stuckey$PTS
time <- stuckey$MP

However, the data isn't integer or numeric as I am used to but factors so when I try to plot the variables I only get histograms, not the usual plot. When checking the data it seems to be in order, just that I'm unable to use it since it's in factor form.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

愛放△進行李 2024-10-27 18:01:14

数据导入功能（此处：read.csv()）以及全局选项都为您提供了stringsAsFactors=FALSE，这应该可以解决此问题。

回复收藏 0 原文

つ低調成傷 2024-10-27 18:01:14

默认情况下，read.csv 检查数据的前几行，以查看是否将每个变量视为数字。如果它找到非数字值，则假定该变量是字符数据，并且字符变量将转换为因子。

数据集中的 PTS 和 MP 变量似乎包含非数字，这就是您得到意外结果的原因。您可以使用以下命令强制这些变量为数字

point <- as.numeric(as.character(point))
time <- as.numeric(as.character(time))

，但任何无法转换的值都将丢失。（R FAQ 给出了一个稍微不同的因子 -> 数字转换方法，但我永远不记得它是什么。）

By default, read.csv checks the first few rows of your data to see whether to treat each variable as numeric. If it finds non-numeric values, it assumes the variable is character data, and character variables are converted to factors.

It looks like the PTS and MP variables in your dataset contain non-numerics, which is why you're getting unexpected results. You can force these variables to numeric with

point <- as.numeric(as.character(point))
time <- as.numeric(as.character(time))

But any values that can't be converted will become missing. (The R FAQ gives a slightly different method for factor -> numeric conversion but I can never remember what it is.)

回复收藏 0 原文

从来不烧饼 2024-10-27 18:01:14

您可以使用以下命令对所有 read.csv/read.* 命令进行全局设置：
options(stringsAsFactors=F)

然后读取文件如下：
my.tab <- read.table( "filename.csv", as.is=T )

回复收藏 0 原文

数理化全能战士 2024-10-27 18:01:14

导入 csv 数据文件时，导入命令应反映每列之间的数据分隔 (;) 以及数值的浮点数分隔符（对于数值变量 = 2,5，这将是“,”）。

因此，导入 csv 的命令必须更加全面，包含更多命令：

stuckey <- read.csv2("C:/kalle/R/stuckey.csv", header=TRUE, sep=";", dec=",")

这应该将所有变量导入为整数或数字。

When importing csv data files the import command should reflect both the data separation between each column (;) and the float-number separator for your numeric values (for numerical variable = 2,5 this would be ",").

The command for importing a csv, therefore, has to be a bit more comprehensive with more commands:

stuckey <- read.csv2("C:/kalle/R/stuckey.csv", header=TRUE, sep=";", dec=",")

This should import all variables as either integers or numeric.

回复收藏 0 原文

浮世清欢 2024-10-27 18:01:14

这些答案都没有提到 colClasses 参数，这是在 read.csv 中指定变量类的另一种方法。

 stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = "numeric") # all variables to numeric

或者您可以指定要转换的列：

stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = c("PTS" = "numeric", "MP" = "numeric") # specific columns to numeric

请注意，如果变量无法转换为数字，那么它将默认转换为因子，这使得转换为数字变得更加困难。因此，建议将所有变量读取为 'character' colClasses = "character"，然后在读入 csv 后将特定列转换为数字：

stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = "character")
point <- as.numeric(stuckey$PTS)
time <- as.numeric(stuckey$MP)

None of these answers mention the colClasses argument which is another way to specify the variable classes in read.csv.

 stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = "numeric") # all variables to numeric

or you can specify which columns to convert:

stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = c("PTS" = "numeric", "MP" = "numeric") # specific columns to numeric

Note that if a variable can't be converted to numeric then it will be converted to factor as default which makes it more difficult to convert to number. Therefore, it can be advisable just to read all variables in as 'character' colClasses = "character" and then convert the specific columns to numeric once the csv is read in:

stuckey <- read.csv("C:/kalle/R/stuckey.csv", colClasses = "character")
point <- as.numeric(stuckey$PTS)
time <- as.numeric(stuckey$MP)

回复收藏 0 原文

乖乖兔^ω^ 2024-10-27 18:01:14

我也是 R 新手，也遇到了完全相同的问题。但后来我查看了我的数据，发现这是由于我的 csv 文件在所有数字列中使用逗号分隔符 (,) 造成的（例如：1,233,444.56 而不是 1233444.56）。

我删除了 csv 文件中的逗号分隔符，然后重新加载到 R 中。我的数据框现在将所有列识别为数字。

我确信 read.csv 函数本身有一种方法可以处理这个问题。

回复收藏 0 原文

奢欲 2024-10-27 18:01:14

仅当在 read.csv 命令中包含 strip.white = TRUE 时，这才对我有效。

（我在此处找到了解决方案。）

回复收藏 0 原文

南渊 2024-10-27 18:01:14

对我来说，解决方案是包括skip = 0
（文件顶部要跳过的行数。可以设置 >0）

mydata <- read.csv(file = "file.csv", header = TRUE, sep = ",",skip = 22)

回复收藏 0 原文

~没有更多了~

关于作者

蓝眼泪

暂无简介

文章

28 人气

关注发私信

友情链接

文江博客

将 csv 数据集导入到 R 但值变成因子

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

微信用户

夜夜流光相皎洁

零度℉

百度③文鱼

qq_O3Ao6frw

Wugswg

友情链接

将 csv 数据集导入到 R 但值变成因子

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（8）

关于作者

相关话题

热门标签

推荐作者

微信用户

夜夜流光相皎洁

零度℉

百度③文鱼

qq_O3Ao6frw

Wugswg

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。