Rails 用范围扩展领域,PG 不喜欢它

发布于 2024-11-03 00:34:30 字数 2207 浏览 1 评论 0原文

我有一个小部件模型。小部件属于 Store 模型,Store 模型属于 Area 模型,Area 模型属于 Company。在公司模型中,我需要找到所有关联的小部件。简单:

class Widget < ActiveRecord::Base
  def self.in_company(company)
    includes(:store => {:area => :company}).where(:companies => {:id => company.id})
  end
end

这将生成这个漂亮的查询:

> Widget.in_company(Company.first).count

SQL (50.5ms)  SELECT COUNT(DISTINCT "widgets"."id") FROM "widgets" LEFT OUTER JOIN "stores" ON "stores"."id" = "widgets"."store_id" LEFT OUTER JOIN "areas" ON "areas"."id" = "stores"."area_id" LEFT OUTER JOIN "companies" ON "companies"."id" = "areas"."company_id" WHERE "companies"."id" = 1
 => 15088 

但是,我稍后需要在更复杂的范围中使用这个范围。问题在于 AR 通过选择单个字段来扩展查询,这在 PG 中失败,因为所选字段必须位于 GROUP BY 子句或聚合函数中。

这是更复杂的范围。

def self.sum_amount_chart_series(company, start_time)
  orders_by_day = Widget.in_company(company).archived.not_void.
                  where(:print_datetime => start_time.beginning_of_day..Time.zone.now.end_of_day).
                  group(pg_print_date_group).
                  select("#{pg_print_date_group} as print_date, sum(amount) as total_amount")

end

def self.pg_print_date_group
  "CAST((print_datetime + interval '#{tz_offset_hours} hours') AS date)"
end

这是它向 PG 抛出的选择:

> Widget.sum_amount_chart_series(Company.first, 1.day.ago)

SELECT "widgets"."id" AS t0_r0, "widgets"."user_id" AS t0_r1,<...BIG SNIP, YOU GET THE IDEA...> FROM "widgets" LEFT OUTER JOIN "stores" ON "stores"."id" = "widgets"."store_id" LEFT OUTER JOIN "areas" ON "areas"."id" = "stores"."area_id" LEFT OUTER JOIN "companies" ON "companies"."id" = "areas"."company_id" WHERE "companies"."id" = 1 AND "widgets"."archived" = 't' AND "widgets"."voided" = 'f' AND ("widgets"."print_datetime" BETWEEN '2011-04-24 00:00:00.000000' AND '2011-04-25 23:59:59.999999') GROUP BY CAST((print_datetime + interval '-7 hours') AS date)

它会生成以下错误:

PGError:错误:列 “widgets.id”必须出现在 GROUP BY 子句或用于 聚合函数第 1 行:选择 “小部件”。“id”AS t0_r0, “小部件”。“user_id...

如何重写 Widget.in_company 范围,以便 AR 不会扩展选择查询以包含每个 Widget 模型字段?

I have a model of Widgets. Widgets belong to a Store model, which belongs to an Area model, which belongs to a Company. At the Company model, I need to find all associated widgets. Easy:

class Widget < ActiveRecord::Base
  def self.in_company(company)
    includes(:store => {:area => :company}).where(:companies => {:id => company.id})
  end
end

Which will generate this beautiful query:

> Widget.in_company(Company.first).count

SQL (50.5ms)  SELECT COUNT(DISTINCT "widgets"."id") FROM "widgets" LEFT OUTER JOIN "stores" ON "stores"."id" = "widgets"."store_id" LEFT OUTER JOIN "areas" ON "areas"."id" = "stores"."area_id" LEFT OUTER JOIN "companies" ON "companies"."id" = "areas"."company_id" WHERE "companies"."id" = 1
 => 15088 

But, I later need to use this scope in more complex scope. The problem is that AR is expanding the query by selecting individual fields, which fails in PG because selected fields must in the GROUP BY clause or the aggregate function.

Here is the more complex scope.

def self.sum_amount_chart_series(company, start_time)
  orders_by_day = Widget.in_company(company).archived.not_void.
                  where(:print_datetime => start_time.beginning_of_day..Time.zone.now.end_of_day).
                  group(pg_print_date_group).
                  select("#{pg_print_date_group} as print_date, sum(amount) as total_amount")

end

def self.pg_print_date_group
  "CAST((print_datetime + interval '#{tz_offset_hours} hours') AS date)"
end

And this is the select it is throwing at PG:

> Widget.sum_amount_chart_series(Company.first, 1.day.ago)

SELECT "widgets"."id" AS t0_r0, "widgets"."user_id" AS t0_r1,<...BIG SNIP, YOU GET THE IDEA...> FROM "widgets" LEFT OUTER JOIN "stores" ON "stores"."id" = "widgets"."store_id" LEFT OUTER JOIN "areas" ON "areas"."id" = "stores"."area_id" LEFT OUTER JOIN "companies" ON "companies"."id" = "areas"."company_id" WHERE "companies"."id" = 1 AND "widgets"."archived" = 't' AND "widgets"."voided" = 'f' AND ("widgets"."print_datetime" BETWEEN '2011-04-24 00:00:00.000000' AND '2011-04-25 23:59:59.999999') GROUP BY CAST((print_datetime + interval '-7 hours') AS date)

Which generates this error:

PGError: ERROR: column
"widgets.id" must appear in the
GROUP BY clause or be used in an
aggregate function LINE 1: SELECT
"widgets"."id" AS t0_r0,
"widgets"."user_id...

How do I rewrite the Widget.in_company scope so that AR does not expand the select query to include every Widget model field?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

·深蓝 2024-11-10 00:34:30

正如 Frank 所解释的,PostgreSQL 将拒绝任何不返回可重现行集的查询。

假设您有一个如下查询:

select a, b, agg(c)
from tbl
group by a

PostgreSQL 将拒绝它,因为 bgroup by 语句中未指定。相比之下,在 MySQL 中运行它,它将被接受。然而,在后一种情况下,启动一些插入、更新和删除,并且磁盘页面上的行的顺序最终会有所不同。

如果没记错的话,实现细节是 MySQL 实际上按 a、b 排序并返回集合中的第一个 b。但就 SQL 标准而言,其行为是未指定的 —— 果然,PostgreSQL 并不总是在运行聚合函数之前进行排序。

这可能会导致 PostgreSQL 结果集中的 b 值不同。因此,除非你更具体,否则 PostgreSQL 会产生错误:

select a, b, agg(c)
from tbl
group by a, b

Frank 强调的是,在 PostgreSQL 9.1 中,如果 a 是主键,那么你可以保留 b未指定——当适用的主键暗示唯一行时,规划器被教导忽略后续的分组依据字段。

特别是对于您的问题,您需要像当前一样指定您的分组依据,加上您聚合所基于的每个字段,即 "widgets"."id", " widgets"."user_id", [snip] 但不是像 sum(amount) 这样的聚合函数调用。

作为一个题外话,我不确定你的 ORM/模型是如何工作的,但它生成的 SQL 并不是最佳的。许多左外连接看起来应该是内连接。这将允许规划者在适用的情况下选择适当的连接顺序。

As Frank explained, PostgreSQL will reject any query which doesn't return a reproducible set of rows.

Suppose you've a query like:

select a, b, agg(c)
from tbl
group by a

PostgreSQL will reject it because b is left unspecified in the group by statement. Run that in MySQL, by contrast, and it will be accepted. In the latter case, however, fire up a few inserts, updates and deletes, and the order of the rows on disk pages ends up different.

If memory serves, implementation details are so that MySQL will actually sort by a, b and return the first b in the set. But as far as the SQL standard is concerned, the behavior is unspecified -- and sure enough, PostgreSQL does not always sort before running aggregate functions.

Potentially, this might result in different values of b in result set in PostgreSQL. And thus, PostgreSQL yields an error unless you're more specific:

select a, b, agg(c)
from tbl
group by a, b

What Frank highlighted is that, in PostgreSQL 9.1, if a is the primary key, than you can leave b unspecified -- the planner has been taught to ignore subsequent group by fields when applicable primary keys imply a unique row.

For your problem in particular, you need to specify your group by as you currently do, plus every field that you're basing your aggregate onto, i.e. "widgets"."id", "widgets"."user_id", [snip] but not stuff like sum(amount), which are the aggregate function calls.

As an off topic side note, I'm not sure how your ORM/model works but the SQL it's generating isn't optimal. Many of those left outer joins seem like they should be inner joins. This will result in allowing the planner to pick an appropriate join order where applicable.

明媚如初 2024-11-10 00:34:30

PostgreSQL 版本 9.1(目前为测试版)可能会解决您的问题,但前提是存在对主键的功能依赖。

从发行说明来看:

允许非 GROUP BY 列
主键时查询目标列表
在 GROUP BY 子句中指定
(彼得·艾森特劳特)

已经有一些其他数据库系统了
允许这种行为,并且因为
主键,结果是
明确。

您可以运行测试,看看它是否可以解决您的问题。如果您可以等待生产版本,这可以在不更改代码的情况下解决问题。

PostgreSQL version 9.1 (beta at this moment) might fix your problem, but only if there is a functional dependency on the primary key.

From the release notes:

Allow non-GROUP BY columns in the
query target list when the primary key
is specified in the GROUP BY clause
(Peter Eisentraut)

Some other database system already
allowed this behavior, and because of
the primary key, the result is
unambiguous.

You could run a test and see if it fixes your problem. If you can wait for the production release, this can fix the problem without changing your code.

暗地喜欢 2024-11-10 00:34:30

首先,通过将所有日期存储在标准时区来简化您的生活。为了方便用户,更改日期和时区实际上应该在视图中完成。仅此一项就可以为您减轻很多痛苦。

如果您已经投入生产,请编写一个迁移来创建一个 normalized_date 列,只要它有帮助。

我建议这里的另一个问题是原始 SQL 的使用,rails 不会为你提供这些信息。为了避免这种情况,请尝试使用名为 Squeel 的 gem(又名 Metawhere 2) http://metautonomo.us/projects/squeel/

如果你使用这个,你应该能够删除硬编码的 SQL,让 Rails 重新发挥它的魔力。

例如:

.select("#{pg_print_date_group} as print_date, sum(amount) as total_amount")

变为(一旦您不再需要标准化日期):

.select{sum(amount).as(total_amount)}

Firstly simplify your life by storing all dates in a standard time-zone. Changing dates with time-zones should really be done in the view as a user convenience. This alone should save you a lot of pain.

If you're already in production write a migration to create a normalised_date column wherever it would be helpful.

nrI propose that the other problem here is the use of raw SQL, which rails won't poke around for you. To avoid this try using the gem called Squeel (aka Metawhere 2) http://metautonomo.us/projects/squeel/

If you use this you should be able to remove hard coded SQL and let rails get back to doing its magic.

For example:

.select("#{pg_print_date_group} as print_date, sum(amount) as total_amount")

becomes (once your remove the need for normalising the date):

.select{sum(amount).as(total_amount)}
往昔成烟 2024-11-10 00:34:30

很抱歉回答我自己的问题,但我已经弄清楚了。

首先,让我向那些认为我可能遇到 SQL 或 Postgres 问题的人道歉,事实并非如此。问题出在 ActiveRecord 及其生成的 SQL 上。

答案是...使用.joins而不是.includes。所以我只是更改了顶部代码中的行,它按预期工作。

class Widget < ActiveRecord::Base
  def self.in_company(company)
    joins(:store => {:area => :company}).where(:companies => {:id => company.id})
  end
end

我猜测当使用 .includes 时,ActiveRecord 试图变得聪明并在 SQL 中使用 JOINS,但对于这种特殊情况它不够聪明,并且生成了丑陋的 SQL 来选择所有关联的列。

不过,所有的回复都教会了我很多我不知道的关于Postgres的知识,所以非常感谢。

Sorry to answer my own question, but I figured it out.

First, let me apologize to those who thought I might be having an SQL or Postgres issue, it is not. The issue is with ActiveRecord and the SQL it is generating.

The answer is... use .joins instead of .includes. So I just changed the line in the top code and it works as expected.

class Widget < ActiveRecord::Base
  def self.in_company(company)
    joins(:store => {:area => :company}).where(:companies => {:id => company.id})
  end
end

I'm guessing that when using .includes, ActiveRecord is trying to be smart and use JOINS in the SQL, but it's not smart enough for this particular case and was generating that ugly SQL to select all associated columns.

However, all the replies have taught me quite a bit about Postgres that I did not know, so thank you very much.

傲鸠 2024-11-10 00:34:30

在 mysql 中排序:

> ids = [11,31,29]
=> [11, 31, 29]
> Page.where(id: ids).order("field(id, #{ids.join(',')})")

在 postgres 中:

def self.order_by_ids(ids)
  order_by = ["case"]
  ids.each_with_index.map do |id, index|
    order_by << "WHEN id='#{id}' THEN #{index}"
  end
  order_by << "end"
  order(order_by.join(" "))
end

User.where(:id => [3,2,1]).order_by_ids([3,2,1]).map(&:id) 
#=> [3,2,1]

sort in mysql:

> ids = [11,31,29]
=> [11, 31, 29]
> Page.where(id: ids).order("field(id, #{ids.join(',')})")

in postgres:

def self.order_by_ids(ids)
  order_by = ["case"]
  ids.each_with_index.map do |id, index|
    order_by << "WHEN id='#{id}' THEN #{index}"
  end
  order_by << "end"
  order(order_by.join(" "))
end

User.where(:id => [3,2,1]).order_by_ids([3,2,1]).map(&:id) 
#=> [3,2,1]
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文