比较 R 中的数据集
我在以下格式的 CSV 文件中收集了一组交易:
{Pierre, lait, oeuf, beurre, pain}
{Paul, mange du pain,jambon, lait}
{Jacques, oeuf, va chez la crémière, pain, voiture}
我计划进行简单的关联规则分析,但首先我想从每个交易中排除不属于 ReferenceSet = {lait, oeuf, beurre,疼痛}
。
因此,在我的示例中,我得到的数据集将是:
{Pierre, lait, oeuf, beurre, pain}
{Paul,lait}
{Jacques, oeuf, pain,}
我确信这非常简单,但很乐意阅读建议/答案来帮助我一点。
I have gathered a set of transactions in a CSV file of the format:
{Pierre, lait, oeuf, beurre, pain}
{Paul, mange du pain,jambon, lait}
{Jacques, oeuf, va chez la crémière, pain, voiture}
I plan to do a simple association rule analysis, but first I want to exclude items from each transactions which do not belong to ReferenceSet = {lait, oeuf, beurre, pain}
.
Thus my resulting dataset would be, in my example :
{Pierre, lait, oeuf, beurre, pain}
{Paul,lait}
{Jacques, oeuf, pain,}
I'm sure this is quite simple, but would love to read suggestions/answers to help me a bit.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
另一个答案引用了
%in%
,但在这种情况下intersect
甚至更方便(您可能也想看看match
- 但我认为它与%in%
记录在同一位置) - 使用lapply
和intersect
我们可以将答案变成一行:资料:
答案:
Another answer references
%in%
, but in this caseintersect
is even handier (you may want to look atmatch
, too -- but I think it's documented in the same place as%in%
) -- withlapply
andintersect
we can make the answer into a one-liner:Data:
Answer:
一种方法如下(但是,由于我将结构保留为矩阵,所以我留下了已删除数据的 NA(如果导出回 CSV,则可以删除这些数据);我也确信无需这样做就可以做到这一点循环 - 这会让它更快(但是,恕我直言,可读性较差),而且我确信还有一种更有效的方法来执行逻辑 - 我也有兴趣看到其他人对此的看法)
One way is follows (but, as I'm leaving the structure as a matrix I've left NAs where data has been removed (these could be removed if exporting back to CSV); I'm also sure it's possible to do it without loops - this would make it faster (but, IMHO less readable), and I'm sure there's a more efficient way to do the logic too - I'd also be interested in seeing someone's else view on this)
%in%
运算符会派上用场。The
%in%
operator will come in handy.