奇怪的周期性分组问题(ORA-00979)

发布于 2024-11-28 02:26:08 字数 2662 浏览 4 评论 0原文

我们使用以下 SQL 查询来生成月平均值。该语句被大量使用并且工作得很好,但每隔一两个月就会失败一次'ORA-00979:不是 GROUP BY 表达式',我们不知道为什么。

首先关于过程:

  • 我们每隔几分钟就有原始数据,
  • =>原始数据被平均为每小时、每天、每月和每年的值

原始 ->每小时

  • 在平均值表中创建平均类型为 2 的条目
  • 永远不会出现问题

每小时=>;每日/每日=>每月/每月=>每年

  • 报表非常相似,
  • “较低”类型的平均值被平均为较高类型
  • 平均类型为:2 小时、3 每天、(4 每周未使用)5 每月和 6 年

  • 该错误仅在步骤“每天=>”时出现。

查询:

  • 我们无法重现该错误,聚合作业的下一次运行通常不会出现问题。
  • 这些错误每 50-60 天就会发生一次,没有真正的模式
  • 环境:Oracle 10g

有人知道问题可能是什么吗?

INSERT INTO averages
SELECT averages_seq.NEXTVAL,
       avg.*
FROM (
  SELECT
      m.city_id,            m.city_name,
      m.state_id,           m.state_name,
      m.district_id,        m.district_name,
      m.country_id,         m.country_name,
      m.currency_id,        m.currency_name,
      m.category_id,        m.category_name,
      5 average_type, -- average_type 5 ==> monthly average
      0 analysis_type,
      TRUNC(m.average_date, 'MM')  average_date,
      AVG(m.value) value,
      SUM(m.sum) sum,
      NULL uncertainty,
      NULL uncertainty_type,
      MIN(m.value_min) value_min,
      MAX(m.value_max) value_max,
      SUM(number_of_measurements) number_of_measurements,
      -- 6 * 24 => measurements per day
      -- (ADD_MONTHS(...)) => days per month 
      100 * SUM(number_of_measurements) / 
           (6 * 24 *
           (ADD_MONTHS(TRUNC(average_date, 'MM'), 1)  - TRUNC(average_date, 'MM'))) coverage_percent,
      SUM(customers) customers,
      NULL dummy_field,
      CURRENT_TIMESTAMP calculation_date,
      CURRENT_TIMESTAMP creation_date,
      'AGGREGATION' creation_user,
      CURRENT_TIMESTAMP modification_date,
      'AGGREGATION' modification_user,
      'n' constant_1,
      3   constant_2,
      -1 average_state
  FROM averages m
  WHERE   m.average_type = 3 -- average type 3 ==> daily average
  AND     m.average_date
      BETWEEN
        TO_TIMESTAMP('2011-06-01T00:00:00Z', 'YYYY-MM-DD"T"HH24:MI:SS"Z"')
        AND
        TO_TIMESTAMP('2011-06-30T23:59:59Z', 'YYYY-MM-DD"T"HH24:MI:SS"Z"')
  AND     m.analysis_type = 0
  GROUP BY
        m.city_id,            m.city_name,
        m.state_id,           m.state_name,
        m.district_id,        m.district_name,
        m.country_id,         m.country_name,
        m.currency_id,        m.currency_name,
        m.category_id,        m.category_name,
        TRUNC(m.average_date, 'MM')
  ) avg

We are using the following SQL query to produce monthly averages. The statement is heavy used and works pretty well, but it fails with a 'ORA-00979: not a GROUP BY expression' every month or two and we have no idea why.

First about the process:

  • we have raw data every few minutes,
  • => the raw data is getting averaged to hourly, daily, monthly and yearly values

raw -> hourly

  • creates entries in the averages table with average_type 2
  • never problems

hourly => daily / daily => monthly / monthly => yearly

  • Statements are pretty similar
  • averages of a 'lower' type is being averaged to a higher type
  • Average types are: 2 hourly, 3 daily, (4 weekly not used) 5 monthly and 6 yearly

  • The bug only appears with the step "daily => monthly".

Query:

  • We can't reproduce the bug, the next run of the aggregation job usually works without problems.
  • The errors occurs every 50-60 days, without a real pattern
  • Environment: Oracle 10g

Does anybody have an idea what the problem could be?

INSERT INTO averages
SELECT averages_seq.NEXTVAL,
       avg.*
FROM (
  SELECT
      m.city_id,            m.city_name,
      m.state_id,           m.state_name,
      m.district_id,        m.district_name,
      m.country_id,         m.country_name,
      m.currency_id,        m.currency_name,
      m.category_id,        m.category_name,
      5 average_type, -- average_type 5 ==> monthly average
      0 analysis_type,
      TRUNC(m.average_date, 'MM')  average_date,
      AVG(m.value) value,
      SUM(m.sum) sum,
      NULL uncertainty,
      NULL uncertainty_type,
      MIN(m.value_min) value_min,
      MAX(m.value_max) value_max,
      SUM(number_of_measurements) number_of_measurements,
      -- 6 * 24 => measurements per day
      -- (ADD_MONTHS(...)) => days per month 
      100 * SUM(number_of_measurements) / 
           (6 * 24 *
           (ADD_MONTHS(TRUNC(average_date, 'MM'), 1)  - TRUNC(average_date, 'MM'))) coverage_percent,
      SUM(customers) customers,
      NULL dummy_field,
      CURRENT_TIMESTAMP calculation_date,
      CURRENT_TIMESTAMP creation_date,
      'AGGREGATION' creation_user,
      CURRENT_TIMESTAMP modification_date,
      'AGGREGATION' modification_user,
      'n' constant_1,
      3   constant_2,
      -1 average_state
  FROM averages m
  WHERE   m.average_type = 3 -- average type 3 ==> daily average
  AND     m.average_date
      BETWEEN
        TO_TIMESTAMP('2011-06-01T00:00:00Z', 'YYYY-MM-DD"T"HH24:MI:SS"Z"')
        AND
        TO_TIMESTAMP('2011-06-30T23:59:59Z', 'YYYY-MM-DD"T"HH24:MI:SS"Z"')
  AND     m.analysis_type = 0
  GROUP BY
        m.city_id,            m.city_name,
        m.state_id,           m.state_name,
        m.district_id,        m.district_name,
        m.country_id,         m.country_name,
        m.currency_id,        m.currency_name,
        m.category_id,        m.category_name,
        TRUNC(m.average_date, 'MM')
  ) avg

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

伪装你 2024-12-05 02:26:09

通过比较来看,average_date 是带时区的时间戳(本地时区?),但 TRUNC 适用于日期。我想知道如果出现一些奇怪的情况,即选定的日期从一个月“跳”到另一个月(例如,它发生在一个时区的一月,但发生在另一个时区的二月),会发生什么。

在此基础上,还要考虑客户端是否有影响(例如,当从与数据库设置处于不同时区的客户端运行时可能会出错)。

我会扩展 Gerrat 指定列名称的建议,以便您可以分离出常量

INSERT INTO averages
  (average_type, analysis_type, uncertainty, uncertainty_type,
  dummy_field, calculation_date, creation_date, creation_user, 
  modification_date, modification_user, constant_1, constant_2,
   ....
SELECT averages_seq.NEXTVAL,
      5 average_type, -- average_type 5 ==> monthly average
      0 analysis_type,
      NULL uncertainty,
      NULL uncertainty_type,
      NULL dummy_field,
      CURRENT_TIMESTAMP calculation_date,
      CURRENT_TIMESTAMP creation_date,
      'AGGREGATION' creation_user,
      CURRENT_TIMESTAMP modification_date,
      'AGGREGATION' modification_user,
      'n' constant_1,
      3   constant_2,
      -1 average_state
       avg.*
FROM (
  SELECT ...

Judging by the comparison, the average_date is a timestamp with time zone (local time zone ?), but TRUNC works on a date. I'm wondering what happens if there is some oddity where a selected date is 'jumping' from one month to another (eg it occurred in January in one timezone but Feb in another).

Based on that, also consider whether the client is having an effect, (eg maybe it errors when run from a client that is in a different timezone from the database setting).

I'd expand Gerrat's suggestion of specifying the column names so that you can separate out the constants

INSERT INTO averages
  (average_type, analysis_type, uncertainty, uncertainty_type,
  dummy_field, calculation_date, creation_date, creation_user, 
  modification_date, modification_user, constant_1, constant_2,
   ....
SELECT averages_seq.NEXTVAL,
      5 average_type, -- average_type 5 ==> monthly average
      0 analysis_type,
      NULL uncertainty,
      NULL uncertainty_type,
      NULL dummy_field,
      CURRENT_TIMESTAMP calculation_date,
      CURRENT_TIMESTAMP creation_date,
      'AGGREGATION' creation_user,
      CURRENT_TIMESTAMP modification_date,
      'AGGREGATION' modification_user,
      'n' constant_1,
      3   constant_2,
      -1 average_state
       avg.*
FROM (
  SELECT ...
蹲墙角沉默 2024-12-05 02:26:08

我会添加一个分组依据:

(ADD_MONTHS(TRUNC(average_date, 'MM'), 1)  - TRUNC(average_date, 'MM'))

我知道如果不更改 TRUNC(m.average_date, 'MM') 就无法更改,但它似乎是唯一不在 GROUP BY 中的非聚合列。

此外,您可以删除内部 sql 之外的所有非聚合、常量列,并显式命名要插入的列并同时选择这些常量:

例如。

INSERT INTO averages(city_id, city_name, ...average_type, analysis_type, ...)
SELECT averages_seq.NEXTVAL,
avg.city_id, avg.city_name, ...
5, 0, ...

...不能说这部分会解决问题,但它肯定会消除它们未聚合的嫌疑。

I would add a group by:

(ADD_MONTHS(TRUNC(average_date, 'MM'), 1)  - TRUNC(average_date, 'MM'))

I know this can't change without TRUNC(m.average_date, 'MM') changing, but it seems to be the only non-aggregate column not in your GROUP BY.

In addition, you could remove all the non-aggregated, constant, columns outside the inner sql, and explicitly name the columns you're inserting and select these constants at the same time:

eg.

INSERT INTO averages(city_id, city_name, ...average_type, analysis_type, ...)
SELECT averages_seq.NEXTVAL,
avg.city_id, avg.city_name, ...
5, 0, ...

...can't say this part will fix the issue, but it would certainly remove them from suspicion of not being aggregated.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文