一种仅允许一个唯一输入的聚合函数

发布于 2024-10-07 18:18:53 字数 2359 浏览 1 评论 0原文

我经常发现自己在 group by 子句中添加了我确信是唯一的表达式。有时事实证明我错了——因为我的 SQL 中存在错误或错误的假设,并且该表达式并不是真正唯一的。

在很多情况下,我宁愿这会生成 SQL 错误,也不愿默默地(有时非常巧妙地)扩展我的结果集。

我希望能够做类似的事情:

select product_id, unique description from product group by product_id

但显然我自己无法实现这一点 - 但可以使用某些数据库上的用户定义的聚合来实现几乎同样简洁的东西。

一种只允许一个唯一输入值的特殊聚合通常对所有版本的 SQL 都有帮助吗?如果是这样,这样的事情现在可以在大多数数据库上实现吗? null 值应该像任何其他值一样被考虑 - 与内置聚合 avg 通常的工作方式不同。 (我已经添加了针对 postgres 和 Oracle 实现此方法的答案。)

以下示例旨在展示如何使用聚合,但这是一个简单的情况,其中很明显哪些表达式应该是唯一的。真正的使用更有可能是在较大的查询中,其中更容易对唯一性

表做出错误的假设:

 product_id | description
------------+-------------
          1 | anvil
          2 | brick
          3 | clay
          4 | door

 sale_id | product_id |  cost
---------+------------+---------
       1 |          1 | £100.00
       2 |          1 | £101.00
       3 |          1 | £102.00
       4 |          2 |   £3.00
       5 |          2 |   £3.00
       6 |          2 |   £3.00
       7 |          3 |  £24.00
       8 |          3 |  £25.00

查询:

> select * from product join sale using (product_id);

 product_id | description | sale_id |  cost
------------+-------------+---------+---------
          1 | anvil       |       1 | £100.00
          1 | anvil       |       2 | £101.00
          1 | anvil       |       3 | £102.00
          2 | brick       |       4 |   £3.00
          2 | brick       |       5 |   £3.00
          2 | brick       |       6 |   £3.00
          3 | clay        |       7 |  £24.00
          3 | clay        |       8 |  £25.00

> select product_id, description, sum(cost) 
  from product join sale using (product_id) 
  group by product_id, description;

 product_id | description |   sum
------------+-------------+---------
          2 | brick       |   £9.00
          1 | anvil       | £303.00
          3 | clay        |  £49.00

> select product_id, solo(description), sum(cost) 
  from product join sale using (product_id) 
  group by product_id;

 product_id | solo  |   sum
------------+-------+---------
          1 | anvil | £303.00
          3 | clay  |  £49.00
          2 | brick |   £9.00

错误情况:

> select solo(description) from product;
ERROR:  This aggregate only allows one unique input

I often find myself adding expressions in the group by clause that I am sure are unique. It sometimes turns out I am wrong - because of an error in my SQL or a mistaken assumption, and that expression is not really unique.

There are many cases when I would much rather this would generate a SQL error rather than expanding my result set silently and sometimes very subtly.

I would love to be able to do something like:

select product_id, unique description from product group by product_id

but obviously I can't implement that myself - but something nearly as concise can be implemented with user defined aggregates on some databases.

Would a special aggregate that only allows one unique input value be generally helpful in all versions of SQL? If so, could such a thing be implemented now on most databases? null values should be considered just like any other value - unlike the way the built-in aggregate avg typically works. (I have added answers with ways of implementing this for postgres and Oracle.)

The following example is intended to show how the aggregate would be used, but is a simple case where it is obvious which expressions should be unique. Real usage would more likely be in larger queries where it is easier to make mistaken assumptions about uniqueness

tables:

 product_id | description
------------+-------------
          1 | anvil
          2 | brick
          3 | clay
          4 | door

 sale_id | product_id |  cost
---------+------------+---------
       1 |          1 | £100.00
       2 |          1 | £101.00
       3 |          1 | £102.00
       4 |          2 |   £3.00
       5 |          2 |   £3.00
       6 |          2 |   £3.00
       7 |          3 |  £24.00
       8 |          3 |  £25.00

queries:

> select * from product join sale using (product_id);

 product_id | description | sale_id |  cost
------------+-------------+---------+---------
          1 | anvil       |       1 | £100.00
          1 | anvil       |       2 | £101.00
          1 | anvil       |       3 | £102.00
          2 | brick       |       4 |   £3.00
          2 | brick       |       5 |   £3.00
          2 | brick       |       6 |   £3.00
          3 | clay        |       7 |  £24.00
          3 | clay        |       8 |  £25.00

> select product_id, description, sum(cost) 
  from product join sale using (product_id) 
  group by product_id, description;

 product_id | description |   sum
------------+-------------+---------
          2 | brick       |   £9.00
          1 | anvil       | £303.00
          3 | clay        |  £49.00

> select product_id, solo(description), sum(cost) 
  from product join sale using (product_id) 
  group by product_id;

 product_id | solo  |   sum
------------+-------+---------
          1 | anvil | £303.00
          3 | clay  |  £49.00
          2 | brick |   £9.00

error case:

> select solo(description) from product;
ERROR:  This aggregate only allows one unique input

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

是你 2024-10-14 18:18:53

ORACLE 解决方案是,

select product_id, 
       case when min(description) != max(description) then to_char(1/0) 
            else min(description) end description, 
       sum(cost) 
  from product join sale using (product_id) 
  group by product_id;

您可以使用一个简单的函数,而不是 to_char(1/0) [这会引发 DIVIDE_BY_ZERO 错误),

CREATE OR REPLACE FUNCTION solo (i_min IN VARCHAR2, i_max IN VARCHAR2) 
RETURN VARCHAR2 IS
BEGIN
  IF i_min != i_max THEN
    RAISE_APPLICATION_ERROR(-20001, 'Non-unique value specified');
  ELSE
    RETURN i_min;
  END;
END;
/
select product_id, 
       solo(min(description),max(description)) end description, 
       sum(cost) 
from product join sale using (product_id) 
group by product_id;

您可以使用用户定义的聚合,但我担心在之间切换对性能的影响SQL 和 PL/SQL。

An ORACLE solution is

select product_id, 
       case when min(description) != max(description) then to_char(1/0) 
            else min(description) end description, 
       sum(cost) 
  from product join sale using (product_id) 
  group by product_id;

Rather than the to_char(1/0) [which raises a DIVIDE_BY_ZERO error), you can use a simple function which does

CREATE OR REPLACE FUNCTION solo (i_min IN VARCHAR2, i_max IN VARCHAR2) 
RETURN VARCHAR2 IS
BEGIN
  IF i_min != i_max THEN
    RAISE_APPLICATION_ERROR(-20001, 'Non-unique value specified');
  ELSE
    RETURN i_min;
  END;
END;
/
select product_id, 
       solo(min(description),max(description)) end description, 
       sum(cost) 
from product join sale using (product_id) 
group by product_id;

You can use a user defined aggregate, but I'd be worried about the performance impact of switching between SQL and PL/SQL.

如梦初醒的夏天 2024-10-14 18:18:53

这是我对 postgres 的实现(编辑为将 null 也视为唯一值):

create function solo_sfunc(inout anyarray, anyelement) 
       language plpgsql immutable as $
begin
  if $1 is null then
    $1[1] := $2;
  else
    if ($1[1] is not null and $2 is null) 
         or ($1[1] is null and $2 is not null) 
         or ($1[1]!=$2) then 
      raise exception 'This aggregate only allows one unique input'; 
    end if;
  end if;
  return;
end;$;

create function solo_ffunc(anyarray) returns anyelement 
       language plpgsql immutable as $
begin
  return $1[1];
end;$;

create aggregate solo(anyelement)
                     (sfunc=solo_sfunc, stype=anyarray, ffunc=solo_ffunc);

用于测试的示例表:

create table product(product_id integer primary key, description text);

insert into product(product_id, description)
values (1, 'anvil'), (2, 'brick'), (3, 'clay'), (4, 'door');

create table sale( sale_id serial primary key, 
                   product_id integer not null references product, 
                   cost money not null );

insert into sale(product_id, cost)
values (1, '100'::money), (1, '101'::money), (1, '102'::money),
       (2, '3'::money), (2, '3'::money), (2, '3'::money),
       (3, '24'::money), (3, '25'::money);

Here is my implementation for postgres (edited to treat null as a unique value too):

create function solo_sfunc(inout anyarray, anyelement) 
       language plpgsql immutable as $
begin
  if $1 is null then
    $1[1] := $2;
  else
    if ($1[1] is not null and $2 is null) 
         or ($1[1] is null and $2 is not null) 
         or ($1[1]!=$2) then 
      raise exception 'This aggregate only allows one unique input'; 
    end if;
  end if;
  return;
end;$;

create function solo_ffunc(anyarray) returns anyelement 
       language plpgsql immutable as $
begin
  return $1[1];
end;$;

create aggregate solo(anyelement)
                     (sfunc=solo_sfunc, stype=anyarray, ffunc=solo_ffunc);

example tables for testing:

create table product(product_id integer primary key, description text);

insert into product(product_id, description)
values (1, 'anvil'), (2, 'brick'), (3, 'clay'), (4, 'door');

create table sale( sale_id serial primary key, 
                   product_id integer not null references product, 
                   cost money not null );

insert into sale(product_id, cost)
values (1, '100'::money), (1, '101'::money), (1, '102'::money),
       (2, '3'::money), (2, '3'::money), (2, '3'::money),
       (3, '24'::money), (3, '25'::money);
杀お生予夺 2024-10-14 18:18:53

您应该在 (product_id,description) 上定义一个 UNIQUE 约束,这样您就不必担心一个产品有两个描述。

You should define a UNIQUE constraint on (product_id, description), then you never have to worry about there being two descriptions for one product.

画离情绘悲伤 2024-10-14 18:18:53

这是我对 Oracle 的实现 - 不幸的是,我认为每种基本类型都需要一个实现:

create type SoloNumberImpl as object
(
  val number, 
  flag char(1), 
  static function ODCIAggregateInitialize(sctx in out SoloNumberImpl) 
         return number,
  member function ODCIAggregateIterate( self in out SoloNumberImpl, 
                                        value in number )
         return number,
  member function ODCIAggregateTerminate( self in SoloNumberImpl, 
                                          returnValue out number, 
                                          flags in number ) 
         return number,
  member function ODCIAggregateMerge( self in out SoloNumberImpl, 
                                      ctx2 in SoloNumberImpl ) 
         return number
);
/

create or replace type body SoloNumberImpl is 
static function ODCIAggregateInitialize(sctx in out SoloNumberImpl)
       return number is 
begin
  sctx := SoloNumberImpl(null, 'N');
  return ODCIConst.Success;
end;

member function ODCIAggregateIterate( self in out SoloNumberImpl, 
                                      value in number ) 
       return number is
begin
  if self.flag='N' then
    self.val:=value;
    self.flag:='Y';
  else
    if (self.val is null and value is not null) 
         or (self.val is not null and value is null) 
         or (self.val!=value) then
      raise_application_error( -20001, 
                               'This aggregate only allows one unique input' );
    end if;
  end if;
  return ODCIConst.Success;
end;

member function ODCIAggregateTerminate( self in SoloNumberImpl, 
                                        returnValue out number, 
                                        flags in number )  
       return number is
begin
  returnValue := self.val;
  return ODCIConst.Success;
end;

member function ODCIAggregateMerge( self in out SoloNumberImpl, 
                                    ctx2 in SoloNumberImpl ) 
       return number is
begin
  if self.flag='N' then
    self.val:=ctx2.val;
    self.flag=ctx2.flag;
  elsif ctx2.flag='Y' then
    if (self.val is null and ctx2.val is not null) 
          or (self.val is not null and ctx2.val is null) 
          or (self.val!=ctx2.val) then
      raise_application_error( -20001, 
                               'This aggregate only allows one unique input' );
    end if;
  end if;
  return ODCIConst.Success;
end;
end;
/

create function SoloNumber (input number) 
return number aggregate using SoloNumberImpl;
/

And here is my implementation for Oracle - unfortunately I think you need one implementation for each base type:

create type SoloNumberImpl as object
(
  val number, 
  flag char(1), 
  static function ODCIAggregateInitialize(sctx in out SoloNumberImpl) 
         return number,
  member function ODCIAggregateIterate( self in out SoloNumberImpl, 
                                        value in number )
         return number,
  member function ODCIAggregateTerminate( self in SoloNumberImpl, 
                                          returnValue out number, 
                                          flags in number ) 
         return number,
  member function ODCIAggregateMerge( self in out SoloNumberImpl, 
                                      ctx2 in SoloNumberImpl ) 
         return number
);
/

create or replace type body SoloNumberImpl is 
static function ODCIAggregateInitialize(sctx in out SoloNumberImpl)
       return number is 
begin
  sctx := SoloNumberImpl(null, 'N');
  return ODCIConst.Success;
end;

member function ODCIAggregateIterate( self in out SoloNumberImpl, 
                                      value in number ) 
       return number is
begin
  if self.flag='N' then
    self.val:=value;
    self.flag:='Y';
  else
    if (self.val is null and value is not null) 
         or (self.val is not null and value is null) 
         or (self.val!=value) then
      raise_application_error( -20001, 
                               'This aggregate only allows one unique input' );
    end if;
  end if;
  return ODCIConst.Success;
end;

member function ODCIAggregateTerminate( self in SoloNumberImpl, 
                                        returnValue out number, 
                                        flags in number )  
       return number is
begin
  returnValue := self.val;
  return ODCIConst.Success;
end;

member function ODCIAggregateMerge( self in out SoloNumberImpl, 
                                    ctx2 in SoloNumberImpl ) 
       return number is
begin
  if self.flag='N' then
    self.val:=ctx2.val;
    self.flag=ctx2.flag;
  elsif ctx2.flag='Y' then
    if (self.val is null and ctx2.val is not null) 
          or (self.val is not null and ctx2.val is null) 
          or (self.val!=ctx2.val) then
      raise_application_error( -20001, 
                               'This aggregate only allows one unique input' );
    end if;
  end if;
  return ODCIConst.Success;
end;
end;
/

create function SoloNumber (input number) 
return number aggregate using SoloNumberImpl;
/
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文