如何在R中读取大数据集
可能的重复:
在 R 中快速读取非常大的表作为数据帧 < /p>
Hi ,
尝试在 R 中读取大型数据集,控制台显示以下错误:
data<-read.csv("UserDailyStats.csv", sep=",", header=T, na.strings="-", stringsAsFactors=FALSE)
> data = data[complete.cases(data),]
> dataset<-data.frame(user_id=as.character(data[,1]),event_date= as.character(data[,2]),day_of_week=as.factor(data[,3]),distinct_events_a_count=as.numeric(as.character(data[,4])),total_events_a_count=as.numeric(as.character(data[,5])),events_a_duration=as.numeric(as.character(data[,6])),distinct_events_b_count=as.numeric(as.character(data[,7])),total_events_b=as.numeric(as.character(data[,8])),events_b_duration= as.numeric(as.character(data[,9])))
Error: cannot allocate vector of size 94.3 Mb
In addition: Warning messages:
1: In data.frame(user_msisdn = as.character(data[, 1]), calls_date = as.character(data[, :
NAs introduced by coercion
2: In data.frame(user_msisdn = as.character(data[, 1]), calls_date = as.character(data[, :
NAs introduced by coercion
3: In class(value) <- "data.frame" :
Reached total allocation of 3583Mb: see help(memory.size)
4: In class(value) <- "data.frame" :
Reached total allocation of 3583Mb: see help(memory.size)
有谁知道如何读取大型数据集? UserDailyStats.csv 的大小约为 2GB。
Possible Duplicate:
Quickly reading very large tables as dataframes in R
Hi,
trying to read a large dataset in R the console displayed the follwing errors:
data<-read.csv("UserDailyStats.csv", sep=",", header=T, na.strings="-", stringsAsFactors=FALSE)
> data = data[complete.cases(data),]
> dataset<-data.frame(user_id=as.character(data[,1]),event_date= as.character(data[,2]),day_of_week=as.factor(data[,3]),distinct_events_a_count=as.numeric(as.character(data[,4])),total_events_a_count=as.numeric(as.character(data[,5])),events_a_duration=as.numeric(as.character(data[,6])),distinct_events_b_count=as.numeric(as.character(data[,7])),total_events_b=as.numeric(as.character(data[,8])),events_b_duration= as.numeric(as.character(data[,9])))
Error: cannot allocate vector of size 94.3 Mb
In addition: Warning messages:
1: In data.frame(user_msisdn = as.character(data[, 1]), calls_date = as.character(data[, :
NAs introduced by coercion
2: In data.frame(user_msisdn = as.character(data[, 1]), calls_date = as.character(data[, :
NAs introduced by coercion
3: In class(value) <- "data.frame" :
Reached total allocation of 3583Mb: see help(memory.size)
4: In class(value) <- "data.frame" :
Reached total allocation of 3583Mb: see help(memory.size)
Does anyone know how to read large datasets? The size of UserDailyStats.csv is approximately 2GB.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
当然:
R 站点。
Sure:
There is also a manual for this at the R site.
您可以尝试使用
colClasses
在read.csv
调用中指定数据类型。尽管对于这种大小的数据集,它可能仍然存在问题,并且没有足够的内存用于您可能想要执行的任何分析。添加 RAM 和使用 64 位计算将提供更大的灵活性。
You could try specifying the data type in the
read.csv
call usingcolClasses
.Though with a dataset of this size it may still be problematic and there isn't a great deal of memory left for any analysis you may want to do. Adding RAM & using 64-bit computing would provide more flexibility.
如果这是从控制台输出,那么您读取数据,但转换存在问题。
如果您以交互方式工作,则在
read.csv
使用save(data, file="data.RData")
保存数据后,关闭 R,运行新实例,使用以下命令加载数据load("data.RData")
,看看是否失败。但从这个错误消息中我发现您在转换方面遇到了问题,因此您应该查看一下。
If this is output from console then you read data, but there is problem with transformations.
If you work interactively then after
read.csv
save your data withsave(data, file="data.RData")
, close R, run fresh instance, load data withload("data.RData")
, and see if it fail.But from this error messages I see that you have problem with conversion so you should look at that.