从部分重复的数据中选择首选行

发布于 2024-11-29 10:36:53 字数 2623 浏览 0 评论 0原文

我有以下查询:

select 
  mb.id as meter_id
  ,ds.mydate as mydate
  ,mb.name as metergroup
  ,sum(ms.stand) as measured_cum_value 
  ,me.name as energy_medium
  ,e.name as unit_of_measure
  ,min(ms.source) as source
  ,count(*) as debugcount
FROM datumselect ds                            <<-- mem table with dates to query.
INNER JOIN metergroup mb ON (mb.building_id = 1)   
INNER JOIN meter m ON (m.metergroup_id = mb.id)  <<-- meters are grouped
INNER JOIN medium me ON (me.id = mb.medium_id)   <<-- lookuptables for normalization
INNER JOIN unit e ON (e.id = mb.unit_id)         <<-- ditto
INNER JOIN meterstand ms ON (ms.meter_id = m.id AND ms.mydate = ds.mydate)
group by ds.mydate, mb.id, ms.source  <<-- this is prob. broken.
having source = MIN(ms.source)   <<-- this `having` does not work !
ORDER BY mb.id, ds.mydate 

我从下表中进行选择:

CREATE TABLE meterstand(
  id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
  meter_id INT(11) UNSIGNED NOT NULL,
  mydate DATETIME NOT NULL,
  stand DECIMAL(16, 5) NOT NULL,
  source ENUM('calculated', 'read', 'manual') NOT NULL DEFAULT 'read',
  PRIMARY KEY (id),
  INDEX FK_meterstand_meter_id (meter_id),
  UNIQUE INDEX UK_meterstand (datum, meter_id, bron),
  CONSTRAINT FK_meterstand_meter_id FOREIGN KEY (meter_id)
  REFERENCES vaanstermeters.meter (id) ON DELETE RESTRICT ON UPDATE CASCADE
)
ENGINE = INNODB
AUTO_INCREMENT = 181
AVG_ROW_LENGTH = 105
CHARACTER SET latin1
COLLATE latin1_swedish_ci;

给出以下数据的更简单的查询是:

SELECT 
  meter_id
  , mydate
  , sum(stand)
  , count(*) as debugcount
FROM meterstand
WHERE mydate IN (list_of_dates_im_interested_in)
GROUP BY meter_id, my_date
HAVING the_best(source) 

给定当前数据 debugcount 应该始终为 1,但是如果一个表中有多个计量表上述查询中的 group debugcount 应该是组中的计量器数量。

我可以在不同来源的值之间进行选择,我有:
- 手册源代码,这是黄金;
- 从数据源、某处建筑物中的仪表中读取源;
- 计算数据,通过插值来弥补缺失的数据。

具有相同meter_id+mydate 的单个数据点可以有多个源。
查询应优先于手动源而不是读取,并且仅在没有其他数据可用时选择计算数据。

以下是 meterstand 中的数据示例:

id  meter_id mydate stand       source
------------------------------------------------------
179 6   1-12-2010   94,75886    calculated
180 7   1-12-2010   256,02618   calculated
164 7   1-1-2011    285,41800   manual <<--- Query should only consider this row.
183 7   1-1-2011    0,00000     read   <<-- and forget about this one

用于选择最佳数据点的正确查询语法是什么?

I have the following query:

select 
  mb.id as meter_id
  ,ds.mydate as mydate
  ,mb.name as metergroup
  ,sum(ms.stand) as measured_cum_value 
  ,me.name as energy_medium
  ,e.name as unit_of_measure
  ,min(ms.source) as source
  ,count(*) as debugcount
FROM datumselect ds                            <<-- mem table with dates to query.
INNER JOIN metergroup mb ON (mb.building_id = 1)   
INNER JOIN meter m ON (m.metergroup_id = mb.id)  <<-- meters are grouped
INNER JOIN medium me ON (me.id = mb.medium_id)   <<-- lookuptables for normalization
INNER JOIN unit e ON (e.id = mb.unit_id)         <<-- ditto
INNER JOIN meterstand ms ON (ms.meter_id = m.id AND ms.mydate = ds.mydate)
group by ds.mydate, mb.id, ms.source  <<-- this is prob. broken.
having source = MIN(ms.source)   <<-- this `having` does not work !
ORDER BY mb.id, ds.mydate 

I'm selecting from the following table:

CREATE TABLE meterstand(
  id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
  meter_id INT(11) UNSIGNED NOT NULL,
  mydate DATETIME NOT NULL,
  stand DECIMAL(16, 5) NOT NULL,
  source ENUM('calculated', 'read', 'manual') NOT NULL DEFAULT 'read',
  PRIMARY KEY (id),
  INDEX FK_meterstand_meter_id (meter_id),
  UNIQUE INDEX UK_meterstand (datum, meter_id, bron),
  CONSTRAINT FK_meterstand_meter_id FOREIGN KEY (meter_id)
  REFERENCES vaanstermeters.meter (id) ON DELETE RESTRICT ON UPDATE CASCADE
)
ENGINE = INNODB
AUTO_INCREMENT = 181
AVG_ROW_LENGTH = 105
CHARACTER SET latin1
COLLATE latin1_swedish_ci;

A simpler query given the below data would be:

SELECT 
  meter_id
  , mydate
  , sum(stand)
  , count(*) as debugcount
FROM meterstand
WHERE mydate IN (list_of_dates_im_interested_in)
GROUP BY meter_id, my_date
HAVING the_best(source) 

Given the current data debugcount should be 1 always, but if there are multiple meter's in a group in the above query debugcount should be the number of meters in the group.

I can choose between values from different source, I have:
- manual source, this is golden;
- read sources from a datasource, a meter in a building somewhere;
- calculated data, interpolated to make up for missing data.

A single datapoint having the same meter_id+mydate can have multiple sources.
The query should favour manual sources over read and only select calculated data if no other data is available.

Here is a sample of the data in meterstand:

id  meter_id mydate stand       source
------------------------------------------------------
179 6   1-12-2010   94,75886    calculated
180 7   1-12-2010   256,02618   calculated
164 7   1-1-2011    285,41800   manual <<--- Query should only consider this row.
183 7   1-1-2011    0,00000     read   <<-- and forget about this one

What's the proper query syntax to use to select the best data points?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

拥抱没勇气 2024-12-06 10:36:53

从表面上看,MySQL 将枚举的排序顺序定义为它们在定义中列出的顺序。鉴于您已将顺序定义为它们出现的相反顺序,我相信以下内容将按预期工作(尽管没有要测试的实例):(

SELECT * 
FROM meterstand as a
JOIN (SELECT meter_id, mydate, MAX(source) as source
      FROM meterstand
      GROUP BY meter_id, mydate) as b
ON b.meter_id = a.meter_id
AND b.mydate = a.mydate
AND b.source = a.source

假设 [meter_id, mydate, source] 是唯一的,课程)。

看起来确实有一个错误导致枚举按字符串值排序(考虑到字符串,这对您根本没有帮助)。
如果它仍然存在(或者您想要对使用顺序有更多控制),您可能需要定义一个表:

Meter_Reading_Type
========================
Id   Description   Priority
1    Manual        10
2    Calculated    30
3    Read          20

然后将其作为 fk 引用并按(最小)优先级排序。

From the looks of it, MySQL defines sort ordering for enums as being the order in which they were listed in the definition. Given that you've defined the order as the reverse of which they are to appear, I believe the following will work as expected (no instance to test against, though):

SELECT * 
FROM meterstand as a
JOIN (SELECT meter_id, mydate, MAX(source) as source
      FROM meterstand
      GROUP BY meter_id, mydate) as b
ON b.meter_id = a.meter_id
AND b.mydate = a.mydate
AND b.source = a.source

(Assuming that [meter_id, mydate, source] is unique, of course).

It does look like there was a bug at one point that was causing the enums to be sorted by their string values (which won't help you at all, given the strings).
If it sill exists (or you want a little more control over use-order), you may want to define a table:

Meter_Reading_Type
========================
Id   Description   Priority
1    Manual        10
2    Calculated    30
3    Read          20

Then reference it as a fk and sort by (min) priority.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文