R查找重复值

发布于 2025-02-11 01:14:38 字数 1777 浏览 2 评论 0原文

我有一个名为“数据”的表格,其中有10列,其中有名为:手提箱,列,行,对象。

在“手提箱”列中,值从1到10。

在“列”列中,该值从1

行列中的1到20,值从1到20,

对象列中的值从1开始。守路。

我尝试了以下方法(在论坛上出现了一个类似的方法,有一个不同的问题):

duplicates <- function(data, var)
{
  library(tidyverse)
  data |> 
    add_count(!sym(var)) |> 
    filter(n == 2) |> 
    select(-n)
}


for (x in suitcases) {
  duplicates(Data, objects)  
}

我想获得一个新的表,其中只有这样的行,其中对象列的值完全出现了两次,而不是更多,而不是更多说明手提箱列中编号的重置以及列中的值:列和行。 由于重新编号,

不幸的是,随后的手提箱中可能会出现重复(尽管列:列和行中的值相同),但我不知道如何考虑重置编号。因此,我正在向论坛寻求帮助和放纵,如果问题不是很好,我在这里是新手。

example_data

example_output

structure(list(rows = c(6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 5L, 
    5L, 5L, 5L, 6L, 6L), columns = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 6L, 6L, 6L, 6L, 6L, 3L, 3L), time.min = c(5L, 0L, 5L, 0L, 
    0L, 5L, 5L, 5L, 0L, 2L, 5L, 0L, 2L, 10L, 10L), status = c(38L, 
    66L, 57L, 38L, 57L, 20L, 20L, 3L, 58L, 58L, 14L, 14L, 5L, 5L, 
    27L), postion = c(38L, 17L, 6L, 7L, 31L, 31L, 32L, 21L, 2L, 67L, 
    1L, 31L, 6L, 35L, 37L), x = c(58L, 14L, 14L, 14L, 68L, 12L, 27L, 
    448L, 981L, 860L, 147L, 417L, 884L, 417L, 884L), y = c(216L, 
    212L, 483L, 520L, 234L, 515L, 521L, 795L, 93L, 668L, 75L, 787L, 
    310L, 827L, 144L), z = c(38L, 66L, 57L, 38L, 57L, 20L, 20L, 1L, 
    7L, 6L, 981L, 147L, 781L, 417L, 884L), suitcases = c(3L, 3L, 
    3L, 2L, 7L, 7L, 7L, 7L, 5L, 1L, 4L, 3L, 3L, 10L, 10L), objects = c(6L, 
    1L, 6L, 22L, 5L, 14L, 27L, 14L, 1L, 14L, 1L, 26L, 5L, 4L, 4L)), class = "data.frame", row.names = c(NA, 
    -15L))

I have a table called Data with 10 columns and among them there are columns named: suitcases, columns, rows, objects.

In the suitcases column, the values go from 1 to 10.

In the columns column, the values go from 1 to 20

In the rows column, the values go from 1 to 20

The numbering of values in the objects column starts from 1 for each suitcas.

I tried the following method (a similar one appeared on the forum with a different question):

duplicates <- function(data, var)
{
  library(tidyverse)
  data |> 
    add_count(!sym(var)) |> 
    filter(n == 2) |> 
    select(-n)
}


for (x in suitcases) {
  duplicates(Data, objects)  
}

I want to get a new table in which there are only such rows in which the values for the objects column occur exactly twice and not more, taking into account the resetting of the numbering in the suitcases column and the values in the columns: columns and rows.
Due to the re-numbering, repetitions may appear in subsequent suitcases (despite the same values in columns: columns and rows)

Unfortunately, I have no idea how to take into account the resetting numbering. Therefore, I am asking the forum for help and indulgence, if the question is not well-formed, I am new here.

Example_data

Example_output

structure(list(rows = c(6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 5L, 
    5L, 5L, 5L, 6L, 6L), columns = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 6L, 6L, 6L, 6L, 6L, 3L, 3L), time.min = c(5L, 0L, 5L, 0L, 
    0L, 5L, 5L, 5L, 0L, 2L, 5L, 0L, 2L, 10L, 10L), status = c(38L, 
    66L, 57L, 38L, 57L, 20L, 20L, 3L, 58L, 58L, 14L, 14L, 5L, 5L, 
    27L), postion = c(38L, 17L, 6L, 7L, 31L, 31L, 32L, 21L, 2L, 67L, 
    1L, 31L, 6L, 35L, 37L), x = c(58L, 14L, 14L, 14L, 68L, 12L, 27L, 
    448L, 981L, 860L, 147L, 417L, 884L, 417L, 884L), y = c(216L, 
    212L, 483L, 520L, 234L, 515L, 521L, 795L, 93L, 668L, 75L, 787L, 
    310L, 827L, 144L), z = c(38L, 66L, 57L, 38L, 57L, 20L, 20L, 1L, 
    7L, 6L, 981L, 147L, 781L, 417L, 884L), suitcases = c(3L, 3L, 
    3L, 2L, 7L, 7L, 7L, 7L, 5L, 1L, 4L, 3L, 3L, 10L, 10L), objects = c(6L, 
    1L, 6L, 22L, 5L, 14L, 27L, 14L, 1L, 14L, 1L, 26L, 5L, 4L, 4L)), class = "data.frame", row.names = c(NA, 
    -15L))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

一直在等你来 2025-02-18 01:14:38

您可以在分组和过滤时处理此操作,只需请注意,您的预期输出尚不清楚小组顺序(或按照您提出的重置编号),这意味着不同的顺序可以在提供的示例,但是在您的真实数据集中,您可能会期望其他一些:

library(dplyr)
Data %>% 
  group_by(rows,columns,suitcases,objects) %>%
  filter (n() == 2) %>%
  ungroup()

结果:

#> # A tibble: 4 × 10
#>    rows columns time.min status postion     x     y     z suitcases objects
#>   <int>   <int>    <int>  <int>   <int> <int> <int> <int>     <int>   <int>
#> 1     6       3        5     38      38    58   216    38         3       6
#> 2     6       3        5     57       6    14   483    57         3       6
#> 3     6       3       10      5      35   417   827   417        10       4
#> 4     6       3       10     27      37   884   144   884        10       4

You can approach this though grouping and filtering, just note that your expected output is bit unclear about group order (or resetting the numbering, as you put it), meaning that different order can provide the same result on provided sample, but on your real dataset you might expect something else :

library(dplyr)
Data %>% 
  group_by(rows,columns,suitcases,objects) %>%
  filter (n() == 2) %>%
  ungroup()

Result:

#> # A tibble: 4 × 10
#>    rows columns time.min status postion     x     y     z suitcases objects
#>   <int>   <int>    <int>  <int>   <int> <int> <int> <int>     <int>   <int>
#> 1     6       3        5     38      38    58   216    38         3       6
#> 2     6       3        5     57       6    14   483    57         3       6
#> 3     6       3       10      5      35   417   827   417        10       4
#> 4     6       3       10     27      37   884   144   884        10       4
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文