从部分重复的数据中选择首选行
我有以下查询:
select
mb.id as meter_id
,ds.mydate as mydate
,mb.name as metergroup
,sum(ms.stand) as measured_cum_value
,me.name as energy_medium
,e.name as unit_of_measure
,min(ms.source) as source
,count(*) as debugcount
FROM datumselect ds <<-- mem table with dates to query.
INNER JOIN metergroup mb ON (mb.building_id = 1)
INNER JOIN meter m ON (m.metergroup_id = mb.id) <<-- meters are grouped
INNER JOIN medium me ON (me.id = mb.medium_id) <<-- lookuptables for normalization
INNER JOIN unit e ON (e.id = mb.unit_id) <<-- ditto
INNER JOIN meterstand ms ON (ms.meter_id = m.id AND ms.mydate = ds.mydate)
group by ds.mydate, mb.id, ms.source <<-- this is prob. broken.
having source = MIN(ms.source) <<-- this `having` does not work !
ORDER BY mb.id, ds.mydate
我从下表中进行选择:
CREATE TABLE meterstand(
id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
meter_id INT(11) UNSIGNED NOT NULL,
mydate DATETIME NOT NULL,
stand DECIMAL(16, 5) NOT NULL,
source ENUM('calculated', 'read', 'manual') NOT NULL DEFAULT 'read',
PRIMARY KEY (id),
INDEX FK_meterstand_meter_id (meter_id),
UNIQUE INDEX UK_meterstand (datum, meter_id, bron),
CONSTRAINT FK_meterstand_meter_id FOREIGN KEY (meter_id)
REFERENCES vaanstermeters.meter (id) ON DELETE RESTRICT ON UPDATE CASCADE
)
ENGINE = INNODB
AUTO_INCREMENT = 181
AVG_ROW_LENGTH = 105
CHARACTER SET latin1
COLLATE latin1_swedish_ci;
给出以下数据的更简单的查询是:
SELECT
meter_id
, mydate
, sum(stand)
, count(*) as debugcount
FROM meterstand
WHERE mydate IN (list_of_dates_im_interested_in)
GROUP BY meter_id, my_date
HAVING the_best(source)
给定当前数据 debugcount
应该始终为 1,但是如果一个表中有多个计量表上述查询中的 group debugcount
应该是组中的计量器数量。
我可以在不同来源的值之间进行选择,我有:
- 手册
源代码,这是黄金;
- 从数据源、某处建筑物中的仪表中读取
源;
- 计算
数据,通过插值来弥补缺失的数据。
具有相同
查询应优先于手动
源而不是读取
,并且仅在没有其他数据可用时选择计算
数据。
以下是 meterstand
中的数据示例:
id meter_id mydate stand source
------------------------------------------------------
179 6 1-12-2010 94,75886 calculated
180 7 1-12-2010 256,02618 calculated
164 7 1-1-2011 285,41800 manual <<--- Query should only consider this row.
183 7 1-1-2011 0,00000 read <<-- and forget about this one
用于选择最佳数据点的正确查询语法是什么?
I have the following query:
select
mb.id as meter_id
,ds.mydate as mydate
,mb.name as metergroup
,sum(ms.stand) as measured_cum_value
,me.name as energy_medium
,e.name as unit_of_measure
,min(ms.source) as source
,count(*) as debugcount
FROM datumselect ds <<-- mem table with dates to query.
INNER JOIN metergroup mb ON (mb.building_id = 1)
INNER JOIN meter m ON (m.metergroup_id = mb.id) <<-- meters are grouped
INNER JOIN medium me ON (me.id = mb.medium_id) <<-- lookuptables for normalization
INNER JOIN unit e ON (e.id = mb.unit_id) <<-- ditto
INNER JOIN meterstand ms ON (ms.meter_id = m.id AND ms.mydate = ds.mydate)
group by ds.mydate, mb.id, ms.source <<-- this is prob. broken.
having source = MIN(ms.source) <<-- this `having` does not work !
ORDER BY mb.id, ds.mydate
I'm selecting from the following table:
CREATE TABLE meterstand(
id INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
meter_id INT(11) UNSIGNED NOT NULL,
mydate DATETIME NOT NULL,
stand DECIMAL(16, 5) NOT NULL,
source ENUM('calculated', 'read', 'manual') NOT NULL DEFAULT 'read',
PRIMARY KEY (id),
INDEX FK_meterstand_meter_id (meter_id),
UNIQUE INDEX UK_meterstand (datum, meter_id, bron),
CONSTRAINT FK_meterstand_meter_id FOREIGN KEY (meter_id)
REFERENCES vaanstermeters.meter (id) ON DELETE RESTRICT ON UPDATE CASCADE
)
ENGINE = INNODB
AUTO_INCREMENT = 181
AVG_ROW_LENGTH = 105
CHARACTER SET latin1
COLLATE latin1_swedish_ci;
A simpler query given the below data would be:
SELECT
meter_id
, mydate
, sum(stand)
, count(*) as debugcount
FROM meterstand
WHERE mydate IN (list_of_dates_im_interested_in)
GROUP BY meter_id, my_date
HAVING the_best(source)
Given the current data debugcount
should be 1 always, but if there are multiple meter's in a group in the above query debugcount
should be the number of meters in the group.
I can choose between values from different source, I have:
- manual
source, this is golden;
- read
sources from a datasource, a meter in a building somewhere;
- calculated
data, interpolated to make up for missing data.
A single datapoint having the same meter_id+mydate can have multiple sources.
The query should favour manual
sources over read
and only select calculated
data if no other data is available.
Here is a sample of the data in meterstand
:
id meter_id mydate stand source
------------------------------------------------------
179 6 1-12-2010 94,75886 calculated
180 7 1-12-2010 256,02618 calculated
164 7 1-1-2011 285,41800 manual <<--- Query should only consider this row.
183 7 1-1-2011 0,00000 read <<-- and forget about this one
What's the proper query syntax to use to select the best data points?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
从表面上看,MySQL 将枚举的排序顺序定义为它们在定义中列出的顺序。鉴于您已将顺序定义为它们出现的相反顺序,我相信以下内容将按预期工作(尽管没有要测试的实例):(
假设 [meter_id, mydate, source] 是唯一的,课程)。
看起来确实有一个错误导致枚举按字符串值排序(考虑到字符串,这对您根本没有帮助)。
如果它仍然存在(或者您想要对使用顺序有更多控制),您可能需要定义一个表:
然后将其作为 fk 引用并按(最小)优先级排序。
From the looks of it, MySQL defines sort ordering for enums as being the order in which they were listed in the definition. Given that you've defined the order as the reverse of which they are to appear, I believe the following will work as expected (no instance to test against, though):
(Assuming that [meter_id, mydate, source] is unique, of course).
It does look like there was a bug at one point that was causing the enums to be sorted by their string values (which won't help you at all, given the strings).
If it sill exists (or you want a little more control over use-order), you may want to define a table:
Then reference it as a fk and sort by (min) priority.