SQL:查找组内的重复条目(元素数量可变)

发布于 2024-12-19 09:31:41 字数 952 浏览 1 评论 0原文

我有一个由 group_id (PK) 和其他一些字段定义的组表 (GROUPS)。 每个组可以由可变数量的元素及其值组成。该组组合存储在第二个表(GROUP_COMPOSITION)中,该表具有 PK 字段(计数器)、group_id 字段、元素名称字段和元素名称值字段。

例如:

Table of groups:
groupId
g1
g2

Table of Group composition:

PK     groupID       Element_Name    Element_Value
1      g1            Material        A
2      g1            Temperature     37
3      g2            Color           white
4      g2            Temperature     50
5      g2            Material        B          
6      g3            Material        C
7      g4            Color           Red

因此,如果尝试插入由材料=B、颜色=白色和温度=50专门定义的“新组”(g5),我想将其标识为重复组(g2)。

我想防止在第二个表中重复插入“组组成”,其中组组成由元素总数及其值定义。

我正在考虑对要插入的所有可能元素进行 INTERSECT 查询,但不确定这是否是最佳方法。

这与帖子 SQL to find重复条目(在组内)相关,但在这种情况下,查找重复插入的标准是基于元素的数量(而不是其性质)。

我真的很感激任何帮助

谢谢

I have a table of groups (GROUPS) defined by the group_id (PK) and some other fields.
Each group may consist of a variable number of elements and their values. This group composition is stored in a second table (GROUP_COMPOSITION) that has a PK field (counter),a field for the group_id, a field for the element name and a field for the value of the element name.

For example:

Table of groups:
groupId
g1
g2

Table of Group composition:

PK     groupID       Element_Name    Element_Value
1      g1            Material        A
2      g1            Temperature     37
3      g2            Color           white
4      g2            Temperature     50
5      g2            Material        B          
6      g3            Material        C
7      g4            Color           Red

So, if trying to insert a "new group" (g5) exclusively defined by Material=B and Color=white and Temperature =50, i would like to identify it as a repeated group (g2).

I would like to prevent duplicate "group composition" insertion in the second table, where group composition is defined by the total number of elements and their values.

I am thinking of a INTERSECT query for all posible elements to be inserted, but not sure whether this is the optimal way.

This is related to the post SQL to find duplicate entries (within a group), but in this case, the criteria to find duplicate insertion was based on the number of elements (not on their nature).

I would really apprettiate any help

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

如梦初醒的夏天 2024-12-26 09:31:41

我想我会构建一个函数并走交叉路口路线。我想您还可以构建一个字符串来将每个组与表示所有组数据的值关联起来。然后将相同的函数应用于您的候选组并检查元素是否匹配。下面是一个可能聚合的 Postgresql 示例:

SELECT g.groupid, array_to_string(g.element_array, ',') elements
  FROM (SELECT o.groupid, array_agg(o.element_name ||'='|| o.element_value) AS element_array
      FROM (SELECT groupid, element_name, element_value
              FROM composition
              ORDER BY 1, 2) o
      GROUP BY groupid) g
  ORDER BY groupid


 groupid |          elements
---------+---------------------------------------
 g1      | Material=A,Temperature=37
 g2      | Color=white,Material=B,Temperature=50
 g3      | Material=C
 g4      | Color=Red

内部排序是为了确保它生成一致。似乎 Oracle 11gR2 有一个用于字符串连接的 LISTAGG 函数,这可能很有用。或者您可以构建自己的聚合函数来执行此操作。如果此数据相对静态,您可能希望在插入时预先计算和存储,而不是在每次查询时重新生成。

I think I would build a function and go the intersection route. I guess you could also build a string to associate each group with a value representing all the group data. Then apply the same function to your candidate group and check for a match on the elements. Here is a Postgresql example of a possible aggegation:

SELECT g.groupid, array_to_string(g.element_array, ',') elements
  FROM (SELECT o.groupid, array_agg(o.element_name ||'='|| o.element_value) AS element_array
      FROM (SELECT groupid, element_name, element_value
              FROM composition
              ORDER BY 1, 2) o
      GROUP BY groupid) g
  ORDER BY groupid


 groupid |          elements
---------+---------------------------------------
 g1      | Material=A,Temperature=37
 g2      | Color=white,Material=B,Temperature=50
 g3      | Material=C
 g4      | Color=Red

The inner ordering is to ensure it generates consistently. Seems Oracle 11gR2 has a LISTAGG function for string concatenation which could be useful. Or you could build your own aggregate function to do this. If this data is relatively static, you might want to pre-compute and store on insert rather than regenerate with each query.

一口甜 2024-12-26 09:31:41

既然你的数据似乎表明这些群体的财产范围相当稳定,为什么不让他们成为一等公民并将他们提升为专栏呢?

groupid | Material | Temperature | Color
--------+----------+-------------+-----------
g1      | A        | 37          | <null>
g2      | B        | 50          | White
g3      | C        | <null>      | <null>
g4      | <null>   | <null>      | Red

如果您愿意的话,根据需要动态添加列并不会太难,但我有点好奇您想如何将组定义为“重复”?每当新组的属性在现有组中至少有一个对应的属性时?如果是这样,查询将非常简单:

select groupid from table where
Material=B and
Color=White

如果有任何返回的行,那么您已经拥有一个至少具有这些属性的组。

Since your data seems to indicate there is a fairly stable range of properties to these groups, why not make them first-class citizens instead and promote them to columns?

groupid | Material | Temperature | Color
--------+----------+-------------+-----------
g1      | A        | 37          | <null>
g2      | B        | 50          | White
g3      | C        | <null>      | <null>
g4      | <null>   | <null>      | Red

It wouldn't be too hard to dynamically add columns as needed if you were so inclined, but I'm a little curious as to how you want to define a group as "repeated"? Whenever the attributes of a new group has at least one corresponding atribute in the existing group? If so, the query would be really simple:

select groupid from table where
Material=B and
Color=White

If there are any returned rows, then you already have a group with at least those properties.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文