如何在R中读取大数据集

发布于 2024-09-27 18:01:23 字数 1453 浏览 7 评论 0原文

可能的重复:
在 R 中快速读取非常大的表作为数据帧 < /p>

Hi ,

尝试在 R 中读取大型数据集,控制台显示以下错误:

data<-read.csv("UserDailyStats.csv", sep=",", header=T, na.strings="-", stringsAsFactors=FALSE)
> data = data[complete.cases(data),]
> dataset<-data.frame(user_id=as.character(data[,1]),event_date= as.character(data[,2]),day_of_week=as.factor(data[,3]),distinct_events_a_count=as.numeric(as.character(data[,4])),total_events_a_count=as.numeric(as.character(data[,5])),events_a_duration=as.numeric(as.character(data[,6])),distinct_events_b_count=as.numeric(as.character(data[,7])),total_events_b=as.numeric(as.character(data[,8])),events_b_duration= as.numeric(as.character(data[,9])))
Error: cannot allocate vector of size 94.3 Mb
In addition: Warning messages:
1: In data.frame(user_msisdn = as.character(data[, 1]), calls_date = as.character(data[,  :
  NAs introduced by coercion
2: In data.frame(user_msisdn = as.character(data[, 1]), calls_date = as.character(data[,  :
  NAs introduced by coercion
3: In class(value) <- "data.frame" :
  Reached total allocation of 3583Mb: see help(memory.size)
4: In class(value) <- "data.frame" :
  Reached total allocation of 3583Mb: see help(memory.size)

有谁知道如何读取大型数据集? UserDailyStats.csv 的大小约为 2GB。

Possible Duplicate:
Quickly reading very large tables as dataframes in R

Hi,

trying to read a large dataset in R the console displayed the follwing errors:

data<-read.csv("UserDailyStats.csv", sep=",", header=T, na.strings="-", stringsAsFactors=FALSE)
> data = data[complete.cases(data),]
> dataset<-data.frame(user_id=as.character(data[,1]),event_date= as.character(data[,2]),day_of_week=as.factor(data[,3]),distinct_events_a_count=as.numeric(as.character(data[,4])),total_events_a_count=as.numeric(as.character(data[,5])),events_a_duration=as.numeric(as.character(data[,6])),distinct_events_b_count=as.numeric(as.character(data[,7])),total_events_b=as.numeric(as.character(data[,8])),events_b_duration= as.numeric(as.character(data[,9])))
Error: cannot allocate vector of size 94.3 Mb
In addition: Warning messages:
1: In data.frame(user_msisdn = as.character(data[, 1]), calls_date = as.character(data[,  :
  NAs introduced by coercion
2: In data.frame(user_msisdn = as.character(data[, 1]), calls_date = as.character(data[,  :
  NAs introduced by coercion
3: In class(value) <- "data.frame" :
  Reached total allocation of 3583Mb: see help(memory.size)
4: In class(value) <- "data.frame" :
  Reached total allocation of 3583Mb: see help(memory.size)

Does anyone know how to read large datasets? The size of UserDailyStats.csv is approximately 2GB.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

感情废物 2024-10-04 18:01:23

当然:

  1. 买一台更大的计算机,特别是更多的内存
  2. 运行 64 位操作系统,请参阅 1) 关于更多的内存,现在您可以使用它
  3. 只读取您需要的列
  4. 读取更少的行
  5. 以二进制形式读取数据,而不是重新解析 2GB (这是非常低效的)。

R 站点。

Sure:

  1. Get a bigger computer, in particular more ram
  2. Run a 64-bit OS, see 1) about more ram now that you can use it
  3. Read only the columns you need
  4. Read fewer rows
  5. Read the data in binary rather than re-parsing 2gb (which is mighty inefficient).

There is also a manual for this at the R site.

嘦怹 2024-10-04 18:01:23

您可以尝试使用 colClassesread.csv 调用中指定数据类型。

data<-read.csv("UserDailyStats.csv", sep=",", header=T, na.strings="-", stringsAsFactors=FALSE, colClasses=c("character","character","factor",rep("numeric",6)))

尽管对于这种大小的数据集,它可能仍然存在问题,并且没有足够的内存用于您可能想要执行的任何分析。添加 RAM 和使用 64 位计算将提供更大的灵活性。

You could try specifying the data type in the read.csv call using colClasses.

data<-read.csv("UserDailyStats.csv", sep=",", header=T, na.strings="-", stringsAsFactors=FALSE, colClasses=c("character","character","factor",rep("numeric",6)))

Though with a dataset of this size it may still be problematic and there isn't a great deal of memory left for any analysis you may want to do. Adding RAM & using 64-bit computing would provide more flexibility.

空城之時有危險 2024-10-04 18:01:23

如果这是从控制台输出,那么您读取数据,但转换存在问题。

如果您以交互方式工作,则在 read.csv 使用 save(data, file="data.RData") 保存数据后,关闭 R,运行新实例,使用以下命令加载数据load("data.RData"),看看是否失败。

但从这个错误消息中我发现您在转换方面遇到了问题,因此您应该查看一下。

If this is output from console then you read data, but there is problem with transformations.

If you work interactively then after read.csv save your data with save(data, file="data.RData"), close R, run fresh instance, load data with load("data.RData"), and see if it fail.

But from this error messages I see that you have problem with conversion so you should look at that.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文