将R中的大数据压入CSV中,无效或列表
首次发布:
我正在准备arules()read.transactions
的数据:
Invoice001,客户ID,Country,Stockcodexyz,Stockcode123
Invoice002 ...等
但是,在重复每个stockcode
的发票时,数据读取如下:
Invoice001,customerId,country,stockcodexyz
Invoice001,customerId,country,stockcode123
Invoice002 .... etc
我一直在尝试Pivot_wider()生成285m+的零单元格中,我很难解决,无法写入CSV或读取
arules
。我还尝试过keep(〜!is.null(。)),丢弃(is.null),compact()
而无需成功,并且对实现上述预期结果的任何方法开放。
但是,我觉得我应该能够使用内置的arules()read.transactions()fx
来解决它,但是当我在那里尝试不同的事情时,我会遇到各种错误。
数据是从加州大学欧文分校开放的,在此处找到: https://archive.ics.uci.edu/ml/machine-learning-databases/00352/online%20retail.xlsx
任何帮助都会非常感谢。
library(readxl)
url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx"
destfile <- "Online_20Retail.xlsx"
curl::curl_download(url, destfile)
Online_20Retail <- read_excel(destfile)
trans <- read.transactions(????????????)
FIRST TIME POSTING:
I'm preparing data for arules() read.transactions
and need to compress unique Invoice data (500k+ cases) so that each unique Invoice and its associated info fits on a single line like this:
Invoice001,CustomerID,Country,StockCodeXYZ,StockCode123
Invoice002...etc
However, the data reads in repeating the Invoice for each StockCode
like this:
Invoice001,CustomerID,Country,StockCodeXYZ
Invoice001,CustomerID,Country,StockCode123
Invoice002....etc
I've been trying pivot_wider()
and then unite()
, but it generates 285M+ MOSTLY NULL cells into a LIST which I'm having a hard time resolving and unable to write to csv or read into arules
. I've also tried keep(~!is.null(.)), discard(is.null), compact()
without success and am open to any method to achieve the desired outcome above.
However, I feel like I should be able to solve it using the built-in arules() read.transactions() fx
, but am getting various errors as I try different things there too.
The data is opensource from University of California, Irvin and found here: https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx
Any help would be greatly appreciated.
library(readxl)
url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx"
destfile <- "Online_20Retail.xlsx"
curl::curl_download(url, destfile)
Online_20Retail <- read_excel(destfile)
trans <- read.transactions(????????????)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这张发票“ 573585” HAST超过1.000 ITENS,因此如果您仅从发票项目中获取库存号码,它将生成圆柱数量的数量...仍然我们有1.000列超过1.000列。
this one invoice "573585" hast over 1.000 itens so it will generate the acording number of columns if you only get the stock number from the invoice items... still we have a bit over 1.000 columns.