比较 R 中的数据集

发布于 2024-12-26 19:28:43 字数 432 浏览 5 评论 0原文

我在以下格式的 CSV 文件中收集了一组交易:

{Pierre, lait, oeuf, beurre, pain}
{Paul, mange du pain,jambon, lait}
{Jacques, oeuf, va chez la crémière, pain, voiture}

我计划进行简单的关联规则分析,但首先我想从每个交易中排除不属于 ReferenceSet = {lait, oeuf, beurre,疼痛}

因此,在我的示例中,我得到的数据集将是:

{Pierre, lait, oeuf, beurre, pain}
{Paul,lait}
{Jacques, oeuf, pain,}

我确信这非常简单,但很乐意阅读建议/答案来帮助我一点。

I have gathered a set of transactions in a CSV file of the format:

{Pierre, lait, oeuf, beurre, pain}
{Paul, mange du pain,jambon, lait}
{Jacques, oeuf, va chez la crémière, pain, voiture}

I plan to do a simple association rule analysis, but first I want to exclude items from each transactions which do not belong to ReferenceSet = {lait, oeuf, beurre, pain}.

Thus my resulting dataset would be, in my example :

{Pierre, lait, oeuf, beurre, pain}
{Paul,lait}
{Jacques, oeuf, pain,}

I'm sure this is quite simple, but would love to read suggestions/answers to help me a bit.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

偷得浮生 2025-01-02 19:28:43

另一个答案引用了 %in%,但在这种情况下 intersect 甚至更方便(您可能也想看看 match - 但我认为它与 %in% 记录在同一位置) - 使用 lapplyintersect 我们可以将答案变成一行:

资料:

> L <- list(pierre=c("lait","oeuf","beurre","pain") ,
+           paul=c("mange du pain", "jambon", "lait"),
+           jacques=c("oeuf","va chez la crémière", "pain", "voiture"))
> reference <- c("lait", "oeuf", "beurre", "pain")

答案:

> lapply(L,intersect,reference)
$pierre
[1] "lait"   "oeuf"   "beurre" "pain"  

$paul
[1] "lait"

$jacques
[1] "oeuf" "pain"

Another answer references %in%, but in this case intersect is even handier (you may want to look at match, too -- but I think it's documented in the same place as %in%) -- with lapply and intersect we can make the answer into a one-liner:

Data:

> L <- list(pierre=c("lait","oeuf","beurre","pain") ,
+           paul=c("mange du pain", "jambon", "lait"),
+           jacques=c("oeuf","va chez la crémière", "pain", "voiture"))
> reference <- c("lait", "oeuf", "beurre", "pain")

Answer:

> lapply(L,intersect,reference)
$pierre
[1] "lait"   "oeuf"   "beurre" "pain"  

$paul
[1] "lait"

$jacques
[1] "oeuf" "pain"
蹲在坟头点根烟 2025-01-02 19:28:43

一种方法如下(但是,由于我将结构保留为矩阵,所以我留下了已删除数据的 NA(如果导出回 CSV,则可以删除这些数据);我也确信无需这样做就可以做到这一点循环 - 这会让它更快(但是,恕我直言,可读性较差),而且我确信还有一种更有效的方法来执行逻辑 - 我也有兴趣看到其他人对此的看法)

ref <- c("lait","oeuf","beurre","pain")
input <- read.csv("info.csv",sep=",",header=FALSE,strip.white=TRUE)

> input
   V1            V2                  V3     V4      V5
1  Pierre          lait                oeuf beurre    pain
2    Paul mange du pain              jambon   lait        
3 Jacques          oeuf va chez la crémière   pain voiture

input <- as.matrix(input)
output <- matrix(nrow=nrow(input),ncol=ncol(input))
currentRow <- c()

for(i in 1:nrow(input)) {
  j <- 2
  output[i,1]<-input[i,1]
  for(k in 2:length(input[i,])) {
    if(toString(input[i,k]) %in% ref){
      output[i,j] <- toString(input[i,k])
      j<-j+1
    }
  }
}

> output
     [,1]      [,2]   [,3]   [,4]     [,5]  
[1,] "Pierre"  "lait" "oeuf" "beurre" "pain"
[2,] "Paul"    "lait" NA     NA       NA    
[3,] "Jacques" "oeuf" "pain" NA       NA    

One way is follows (but, as I'm leaving the structure as a matrix I've left NAs where data has been removed (these could be removed if exporting back to CSV); I'm also sure it's possible to do it without loops - this would make it faster (but, IMHO less readable), and I'm sure there's a more efficient way to do the logic too - I'd also be interested in seeing someone's else view on this)

ref <- c("lait","oeuf","beurre","pain")
input <- read.csv("info.csv",sep=",",header=FALSE,strip.white=TRUE)

> input
   V1            V2                  V3     V4      V5
1  Pierre          lait                oeuf beurre    pain
2    Paul mange du pain              jambon   lait        
3 Jacques          oeuf va chez la crémière   pain voiture

input <- as.matrix(input)
output <- matrix(nrow=nrow(input),ncol=ncol(input))
currentRow <- c()

for(i in 1:nrow(input)) {
  j <- 2
  output[i,1]<-input[i,1]
  for(k in 2:length(input[i,])) {
    if(toString(input[i,k]) %in% ref){
      output[i,j] <- toString(input[i,k])
      j<-j+1
    }
  }
}

> output
     [,1]      [,2]   [,3]   [,4]     [,5]  
[1,] "Pierre"  "lait" "oeuf" "beurre" "pain"
[2,] "Paul"    "lait" NA     NA       NA    
[3,] "Jacques" "oeuf" "pain" NA       NA    
空城缀染半城烟沙 2025-01-02 19:28:43

%in% 运算符会派上用场。

pierre <- c("lait","oeuf","beurre","pain")  
paul <- c("mange du pain", "jambon", "lait")  
jacques <- c("oeuf","va chez la crémière", "pain", "voiture")

reference <- c("lait", "oeuf", "beurre", "pain")

pierre_fixed <- pierre[pierre %in% reference]
paul_fixed <- paul[paul %in% reference]
jacques_fixed <- jacques[jacques %in% reference]  

pierre_fixed 
paul_fixed
jacques_fixed

The %in% operator will come in handy.

pierre <- c("lait","oeuf","beurre","pain")  
paul <- c("mange du pain", "jambon", "lait")  
jacques <- c("oeuf","va chez la crémière", "pain", "voiture")

reference <- c("lait", "oeuf", "beurre", "pain")

pierre_fixed <- pierre[pierre %in% reference]
paul_fixed <- paul[paul %in% reference]
jacques_fixed <- jacques[jacques %in% reference]  

pierre_fixed 
paul_fixed
jacques_fixed
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文