大数据转化为“交易”来自 arules 包

发布于 2024-12-02 04:03:54 字数 498 浏览 0 评论 0原文

R 中的 arules 包使用“事务”类。因此，为了使用函数 apriori()，我需要转换现有数据。我有一个 2 列和大约 1.6 毫米行的矩阵，并尝试像这样转换数据：

transaction_data <- as(split(original_data[,"id"], original_data[,"type"]), "transactions")

其中，original_data 是我的数据矩阵。由于数据量很大，我使用了最大的 AWS Amazon 机器，配备 64GB RAM。过了一会儿我得到

生成的向量超出“AnswerType”中的向量长度限制

机器的内存使用率仍然“仅”为 60%。这是基于 R 的限制吗？除了使用采样之外，还有什么方法可以解决这个问题吗？当仅使用 1/4 的数据时，转换效果很好。

编辑：正如所指出的，其中一个变量是一个因素而不是性格。更改后，转换处理迅速且正确。

原文

The arules package in R uses the class 'transactions'. So in order to use the function apriori() I need to convert my existing data. I've got a Matrix with 2 columns and roughly 1.6mm rows and tried to convert the data like this:

transaction_data <- as(split(original_data[,"id"], original_data[,"type"]), "transactions")

where original_data is my data matrix. Because of the amount of data I used the largest AWS Amazon machine with 64gb RAM. After a while I get

resulting vector exceeds vector length limit in 'AnswerType'

The Memory Usage of the machine was still 'only' at 60%. Is this a R-based limitation? Is there any way to work around this other than using sampling? When only using 1/4 of the data the transformation worked fine.

Edit: As pointed out, one of the variables was a factor instead of character. After changing the transformation was processed quickly and correct.

分享到QQ

分享到微博