如何使用 Group By 和自连接返回每日最低、最高、开盘价和收盘价结果集?
已解决
万岁 StackOverlow!
当我离开时,人们留下了两个解决方案(谢谢大家——为两个可行的解决方案分发业力的协议是什么?)
这是我回来发布的解决方案。它源自另一个 StackOver 解决方案:
How使用聚合函数在 MySQL 查询中获取分组记录的第一个和最后一个记录?
...我的修改是:
SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`PRICE`) AS min_price,
MAX(`PRICE`) AS max_price,
SUBSTRING_INDEX(
GROUP_CONCAT(
PRICE
ORDER BY DTE ASC
)
, ',', 1 ) AS opn_price,
SUBSTRING_INDEX(
GROUP_CONCAT(
PRICE
ORDER BY DTE DESC
)
, ',', 1 ) AS cls_price
FROM `CHART_DATA`
GROUP BY trading_day
;
上面的“Q”开头的数据是我试图结束的数据。希望这对其他人有帮助,因为我怀疑我的日志相当常见。
我愿意打赌这三种解决方案之一具有性能优势。如果有人碰巧了解 MySQL 的内部工作原理和查询优化并愿意推荐“首选”解决方案,那么了解这些在将来将会很有用。
已解决
更新 #2
尝试使用以下方法从另一个方向解决问题:
http://forums.mysql.com/read.php?65,363723,363723
我get:
SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`PRICE`) AS min_price,
MAX(`PRICE`) AS max_price,
(SELECT opn_price FROM
(SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS a_day,
PRICE AS opn_price,
MIN(DTE) AS opn
FROM `CHART_DATA`
GROUP BY a_day
ORDER BY opn ASC LIMIT 1) AS tblO) AS opnqt,
(SELECT cls_price FROM
(SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS a_day,
PRICE AS cls_price,
MIN(DTE) AS cls
FROM `CHART_DATA`
GROUP BY a_day
ORDER BY cls DESC LIMIT 1) AS tblC) AS clsqt
FROM `CHART_DATA` cht
GROUP BY trading_day;
这与下面第一个 UPDATE 中的查询有类似的功能障碍;返回的“clsqt”(cls_price)值是数据中找到的最后收盘价。布莱赫。
另外,我们开始再次进入“极其复杂”的查询空间,这对性能没有好处。
但如果有人看到“clsqt”值的修复,我会很乐意接受它并在稍后解决性能问题。 :)
END UPDATE #2
UPDATE
非常接近...这就是我今天所处的位置:
SELECT
DATE_FORMAT(`cht1`.`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`cht1`.`PRICE`) AS min_price,
MAX(`cht1`.`PRICE`) AS max_price,
MIN(cht1.DTE) AS opn_date1,
MIN(cht1.DTE) AS opn_date2,
`cht2`.`PRICE` AS opn_price,
MAX(cht1.DTE) AS cls_date1,
MAX(cht3.DTE) AS cls_date3,
`cht3`.`PRICE` AS cls_price
FROM `CHART_DATA` cht1
LEFT JOIN `CHART_DATA` cht2
ON cht2.DTE = cht1.DTE
LEFT JOIN `CHART_DATA` cht3
ON cht3.DTE = cht1.DTE
GROUP BY trading_day
HAVING opn_date1 = opn_date2
AND cls_date1 = cls_date3
;
这会正确地重新调整所有内容,但正确的“cls_price”(它返回与'cls_price' 为 'opn_price')。
但是,“cls_date1”和“cls_date3”是正确的值,所以我必须接近。
有人看到我不是什么吗?
结束更新
我一直在研究左连接和自连接……我必须承认我并没有摸索。
我发现这个“Q”似乎非常接近我想要的: 左连接与mysql中右表的条件
我想做的是从单个表中检索开盘价、收盘价、最低价格和最高价格天数(下面的示例数据)。
最小值和最大值很简单:
SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`PRICE`) AS min_price,
MAX(`PRICE`) AS max_price
FROM `CHART_DATA`
GROUP BY trading_day;
我希望按日期分组返回结果,类似于:
'trading_day' 'opn_price' 'min_price' 'max_price' 'cls_price'
好的,所以我尝试仅使用一个连接进行“婴儿步骤”......
SELECT
DATE_FORMAT(`cht1`.`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`cht1`.`PRICE`) AS min_price,
MAX(`cht1`.`PRICE`) AS max_price,
`cht2`.`PRICE` AS opn_price
FROM `CHART_DATA` cht1
LEFT JOIN `CHART_DATA` cht2
ON cht2.DTE = MIN(cht1.DTE)
GROUP BY trading_day;
然后我收到消息“无效使用组函数
”当然,删除“GROUP BY”没有帮助,因为我需要返回聚合列。
我有一个非常复杂的解决方案,可以获取打开和关闭结果,但不能获取最小值和最大值 - 并且它们位于单独的结果集中。我感觉我已经让这个变得比必要的更加复杂,并且我可以掌握上面引用的“Q”中引用的自连接发生了什么,我的整体编码将得到不可估量的改进。但上周末我在这上面花了大约 12 个小时,我比以往任何时候都更加困惑。
在这一点上,欢迎所有的见解、解释和观察......
/* SAMPLE TABLE AND DATA */
CREATE TABLE `CHART_DATA` (
`ID` varchar(10) DEFAULT NULL,
`DTE` datetime DEFAULT NULL,
`PRICE` double DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
/*Data for the table `CHART_DATA` */
INSERT INTO `chart_data`
(`id`,`dte`,`price`)
VALUES ('1','2011-01-01 00:10:00',0.65),
('2','2011-01-01 06:10:00',0.92),
('3','2011-01-01 12:10:00',0.59),
('4','2011-01-01 18:10:00',0.16),
('5','2011-01-02 00:10:00',0.28),
('6','2011-01-02 06:10:00',0.12),
('7','2011-01-02 12:10:00',0.92),
('8','2011-01-02 18:10:00',0.1),
('9','2011-01-03 00:10:00',0.34),
('10','2011-01-03 06:10:00',0.79),
('11','2011-01-03 12:10:00',1.23),
('12','2011-01-03 18:10:00',1.24),
('13','2011-01-04 00:10:00',1.12),
('14','2011-01-04 06:10:00',0.8),
('15','2011-01-04 12:10:00',0.65),
('16','2011-01-04 18:10:00',0.78),
('17','2011-01-05 00:10:00',0.65),
('18','2011-01-05 06:10:00',1.19),
('19','2011-01-05 12:10:00',0.89),
('20','2011-01-05 18:10:00',1.05),
('21','2011-01-06 00:10:00',0.29),
('22','2011-01-06 06:10:00',0.43),
('23','2011-01-06 12:10:00',0.26),
('24','2011-01-06 18:10:00',0.34),
('25','2011-01-07 00:10:00',0.22),
('26','2011-01-07 06:10:00',0.37),
('27','2011-01-07 12:10:00',1.22),
('28','2011-01-07 18:10:00',1.16),
('29','2011-01-08 00:10:00',0.3),
('30','2011-01-08 06:10:00',1.17),
('31','2011-01-08 12:10:00',0.62),
('32','2011-01-08 18:10:00',0.86),
('33','2011-01-09 00:10:00',0.84),
('34','2011-01-09 06:10:00',1.11),
('35','2011-01-09 12:10:00',0.92),
('36','2011-01-09 18:10:00',1.03),
('37','2011-01-10 00:10:00',1.13),
('38','2011-01-10 06:10:00',0.58),
('39','2011-01-10 12:10:00',1.03),
('40','2011-01-10 18:10:00',0.21),
('41','2011-01-11 00:10:00',0.12),
('42','2011-01-11 06:10:00',1.01),
('43','2011-01-11 12:10:00',0.19),
('44','2011-01-11 18:10:00',1.14),
('45','2011-01-12 00:10:00',0.55),
('46','2011-01-12 06:10:00',0.75),
('47','2011-01-12 12:10:00',0.66),
('48','2011-01-12 18:10:00',1.1),
('49','2011-01-13 00:10:00',0.68),
('50','2011-01-13 06:10:00',0.3),
('51','2011-01-13 12:10:00',0.9),
('52','2011-01-13 18:10:00',0.88),
('53','2011-01-14 00:10:00',0.64),
('54','2011-01-14 06:10:00',1.06),
('55','2011-01-14 12:10:00',1.12),
('56','2011-01-14 18:10:00',0.76),
('57','2011-01-15 00:10:00',0.18),
('58','2011-01-15 06:10:00',1.08),
('59','2011-01-15 12:10:00',0.66),
('60','2011-01-15 18:10:00',0.38),
('61','2011-01-16 00:10:00',1),
('62','2011-01-16 06:10:00',1.18),
('63','2011-01-16 12:10:00',1.15),
('64','2011-01-16 18:10:00',0.58),
('65','2011-01-17 00:10:00',1.04),
('66','2011-01-17 06:10:00',0.81),
('67','2011-01-17 12:10:00',0.35),
('68','2011-01-17 18:10:00',0.91),
('69','2011-01-18 00:10:00',0.14),
('70','2011-01-18 06:10:00',0.13),
('71','2011-01-18 12:10:00',1.03),
('72','2011-01-18 18:10:00',0.16),
('73','2011-01-19 00:10:00',1.05),
('74','2011-01-19 06:10:00',1.13),
('75','2011-01-19 12:10:00',1.21),
('76','2011-01-19 18:10:00',0.34),
('77','2011-01-20 00:10:00',0.63),
('78','2011-01-20 06:10:00',0.62),
('79','2011-01-20 12:10:00',0.19),
('80','2011-01-20 18:10:00',1.21),
('81','2011-01-21 00:10:00',0.83),
('82','2011-01-21 06:10:00',0.99),
('83','2011-01-21 12:10:00',0.83),
('84','2011-01-21 18:10:00',0.21),
('85','2011-01-22 00:10:00',0.8),
('86','2011-01-22 06:10:00',0.69),
('87','2011-01-22 12:10:00',0.87);
SOLVED
All hail StackOverlow!
While I was gone, people left 2 solutions (thanks guys--what is the protocol for handing out karma for two working solutions?)
Here is the solution that I came back to post. it is derived from yet ANOTHER StackOver solution:
...and my adaptation is:
SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`PRICE`) AS min_price,
MAX(`PRICE`) AS max_price,
SUBSTRING_INDEX(
GROUP_CONCAT(
PRICE
ORDER BY DTE ASC
)
, ',', 1 ) AS opn_price,
SUBSTRING_INDEX(
GROUP_CONCAT(
PRICE
ORDER BY DTE DESC
)
, ',', 1 ) AS cls_price
FROM `CHART_DATA`
GROUP BY trading_day
;
The data the "Q" above starts with is the data with which I am trying to end. Hopefully, this helps someone else since I suspect that my log is fairly common.
I am willing to bet that one of these three solutions has a performance advantage. If anyone happens to know the inner workings of MySQL and query optimization and cares to recommend the "preferred" solution, that will be useful to know in the future.
END SOLVED
UPDATE #2
Tryhing to come at it from yet another direction using this:
http://forums.mysql.com/read.php?65,363723,363723
I get:
SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`PRICE`) AS min_price,
MAX(`PRICE`) AS max_price,
(SELECT opn_price FROM
(SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS a_day,
PRICE AS opn_price,
MIN(DTE) AS opn
FROM `CHART_DATA`
GROUP BY a_day
ORDER BY opn ASC LIMIT 1) AS tblO) AS opnqt,
(SELECT cls_price FROM
(SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS a_day,
PRICE AS cls_price,
MIN(DTE) AS cls
FROM `CHART_DATA`
GROUP BY a_day
ORDER BY cls DESC LIMIT 1) AS tblC) AS clsqt
FROM `CHART_DATA` cht
GROUP BY trading_day;
This suffers from a similar dysfunction as the query in the first UPDATE below; the 'clsqt' (cls_price) value returned is the last closing price found in the data. Blech.
Plus, we are starting to get into "hideously complex" query space again and that CANNOT be good for performance.
But if anyone sees the fix for the 'clsqt' value, I would accept it gladly and solve the performance issue at a later day. :)
END UPDATE #2
UPDATE
So close...here's where I am today:
SELECT
DATE_FORMAT(`cht1`.`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`cht1`.`PRICE`) AS min_price,
MAX(`cht1`.`PRICE`) AS max_price,
MIN(cht1.DTE) AS opn_date1,
MIN(cht1.DTE) AS opn_date2,
`cht2`.`PRICE` AS opn_price,
MAX(cht1.DTE) AS cls_date1,
MAX(cht3.DTE) AS cls_date3,
`cht3`.`PRICE` AS cls_price
FROM `CHART_DATA` cht1
LEFT JOIN `CHART_DATA` cht2
ON cht2.DTE = cht1.DTE
LEFT JOIN `CHART_DATA` cht3
ON cht3.DTE = cht1.DTE
GROUP BY trading_day
HAVING opn_date1 = opn_date2
AND cls_date1 = cls_date3
;
This retuns everything correctly BUT the correct 'cls_price' (it is returnign the same value for 'cls_price' as 'opn_price').
However, 'cls_date1' and 'cls_date3' are the correct values, so I must be close.
Anyone see what I am not?
END UPDATE
I have been poring over SO with regard to left joins and self joins...and I must admit that I am not grokking.
I found this "Q" that seems very close to what I want: left join with condition for right table in mysql
What I am trying to do is retrieve open, close, min, and max price days from a single table (sample data below).
Min and Max are easy:
SELECT
DATE_FORMAT(`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`PRICE`) AS min_price,
MAX(`PRICE`) AS max_price
FROM `CHART_DATA`
GROUP BY trading_day;
I want the results returned group by date, somthing like:
'trading_day' 'opn_price' 'min_price' 'max_price' 'cls_price'
Okay, so I try 'baby steps' with just one join...
SELECT
DATE_FORMAT(`cht1`.`DTE`, "%m/%d/%Y") AS trading_day,
MIN(`cht1`.`PRICE`) AS min_price,
MAX(`cht1`.`PRICE`) AS max_price,
`cht2`.`PRICE` AS opn_price
FROM `CHART_DATA` cht1
LEFT JOIN `CHART_DATA` cht2
ON cht2.DTE = MIN(cht1.DTE)
GROUP BY trading_day;
...and I get the message "Invalid use of group function"
Of course, removing the "GROUP BY" is no help, since I need to return aggegate columns.
I have a really complex solution that gets the open and close results, but not the min and max--and they are in separate result sets. I get the feeling that I have made this more complex than is necessary and that fi I could just grasp what is going on with the self joins cited in the "Q" referenced above, that my overall coding would improvie immeasurably. But I have spent something like 12 hours on this during the past weekend and am more confusted than ever.
All insight and explantion and observation is welcome at this point...
/* SAMPLE TABLE AND DATA */
CREATE TABLE `CHART_DATA` (
`ID` varchar(10) DEFAULT NULL,
`DTE` datetime DEFAULT NULL,
`PRICE` double DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
/*Data for the table `CHART_DATA` */
INSERT INTO `chart_data`
(`id`,`dte`,`price`)
VALUES ('1','2011-01-01 00:10:00',0.65),
('2','2011-01-01 06:10:00',0.92),
('3','2011-01-01 12:10:00',0.59),
('4','2011-01-01 18:10:00',0.16),
('5','2011-01-02 00:10:00',0.28),
('6','2011-01-02 06:10:00',0.12),
('7','2011-01-02 12:10:00',0.92),
('8','2011-01-02 18:10:00',0.1),
('9','2011-01-03 00:10:00',0.34),
('10','2011-01-03 06:10:00',0.79),
('11','2011-01-03 12:10:00',1.23),
('12','2011-01-03 18:10:00',1.24),
('13','2011-01-04 00:10:00',1.12),
('14','2011-01-04 06:10:00',0.8),
('15','2011-01-04 12:10:00',0.65),
('16','2011-01-04 18:10:00',0.78),
('17','2011-01-05 00:10:00',0.65),
('18','2011-01-05 06:10:00',1.19),
('19','2011-01-05 12:10:00',0.89),
('20','2011-01-05 18:10:00',1.05),
('21','2011-01-06 00:10:00',0.29),
('22','2011-01-06 06:10:00',0.43),
('23','2011-01-06 12:10:00',0.26),
('24','2011-01-06 18:10:00',0.34),
('25','2011-01-07 00:10:00',0.22),
('26','2011-01-07 06:10:00',0.37),
('27','2011-01-07 12:10:00',1.22),
('28','2011-01-07 18:10:00',1.16),
('29','2011-01-08 00:10:00',0.3),
('30','2011-01-08 06:10:00',1.17),
('31','2011-01-08 12:10:00',0.62),
('32','2011-01-08 18:10:00',0.86),
('33','2011-01-09 00:10:00',0.84),
('34','2011-01-09 06:10:00',1.11),
('35','2011-01-09 12:10:00',0.92),
('36','2011-01-09 18:10:00',1.03),
('37','2011-01-10 00:10:00',1.13),
('38','2011-01-10 06:10:00',0.58),
('39','2011-01-10 12:10:00',1.03),
('40','2011-01-10 18:10:00',0.21),
('41','2011-01-11 00:10:00',0.12),
('42','2011-01-11 06:10:00',1.01),
('43','2011-01-11 12:10:00',0.19),
('44','2011-01-11 18:10:00',1.14),
('45','2011-01-12 00:10:00',0.55),
('46','2011-01-12 06:10:00',0.75),
('47','2011-01-12 12:10:00',0.66),
('48','2011-01-12 18:10:00',1.1),
('49','2011-01-13 00:10:00',0.68),
('50','2011-01-13 06:10:00',0.3),
('51','2011-01-13 12:10:00',0.9),
('52','2011-01-13 18:10:00',0.88),
('53','2011-01-14 00:10:00',0.64),
('54','2011-01-14 06:10:00',1.06),
('55','2011-01-14 12:10:00',1.12),
('56','2011-01-14 18:10:00',0.76),
('57','2011-01-15 00:10:00',0.18),
('58','2011-01-15 06:10:00',1.08),
('59','2011-01-15 12:10:00',0.66),
('60','2011-01-15 18:10:00',0.38),
('61','2011-01-16 00:10:00',1),
('62','2011-01-16 06:10:00',1.18),
('63','2011-01-16 12:10:00',1.15),
('64','2011-01-16 18:10:00',0.58),
('65','2011-01-17 00:10:00',1.04),
('66','2011-01-17 06:10:00',0.81),
('67','2011-01-17 12:10:00',0.35),
('68','2011-01-17 18:10:00',0.91),
('69','2011-01-18 00:10:00',0.14),
('70','2011-01-18 06:10:00',0.13),
('71','2011-01-18 12:10:00',1.03),
('72','2011-01-18 18:10:00',0.16),
('73','2011-01-19 00:10:00',1.05),
('74','2011-01-19 06:10:00',1.13),
('75','2011-01-19 12:10:00',1.21),
('76','2011-01-19 18:10:00',0.34),
('77','2011-01-20 00:10:00',0.63),
('78','2011-01-20 06:10:00',0.62),
('79','2011-01-20 12:10:00',0.19),
('80','2011-01-20 18:10:00',1.21),
('81','2011-01-21 00:10:00',0.83),
('82','2011-01-21 06:10:00',0.99),
('83','2011-01-21 12:10:00',0.83),
('84','2011-01-21 18:10:00',0.21),
('85','2011-01-22 00:10:00',0.8),
('86','2011-01-22 06:10:00',0.69),
('87','2011-01-22 12:10:00',0.87);
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
因此,您希望:
C
对应于给定日期组中的所有行,因此您可以使用MAX()
和MIN() 覆盖该组中的行。
C2
对应于给定日期的最后一行。C3
对应于同一天晚于C2
的行。如果没有找到,即C3.*
为NULL,则C2
是当天最新的。这通常被标记为
greatest-n-per-group
查询,并且经常出现在 Stack Overflow 上。这是我针对您的测试数据测试的解决方案,但您可以按照我添加到您的问题中的标签来获取其他解决方案和讨论。编辑:我错过了开盘价和收盘价的要求。现编辑如下。
So you want:
C
to correspond to all rows in the group for a given day, so you can useMAX()
andMIN()
over the rows in that group.C2
to correspond to the last row in a given day.C3
to correspond to a row later thanC2
on the same day. If none is found, i.e.C3.*
is NULL, thenC2
is the latest on that day.This is often labeled a
greatest-n-per-group
query, and it comes up frequently on Stack Overflow. Here's a solution that I tested for your test data, but you can follow the tag I added to your question for other solutions and discussion.edit: I missed the requirement for both opening price and closing price. The following is edited.
注意:如果您的开仓或平仓条目与另一行具有完全相同的日期/时间值(即开仓后或平仓前立即发生的交易),此解决方案可能会出现一些问题。为了解决这个问题,我建议您添加一个序列号,该序列号以保证唯一性并随时间递增的方式存储。如果您这样做,那么您将使用序列值代替
dte
来替换我用作连接的open_date
和close_date
我的示例中的字段。Note: this solution may present some problems if your opening or closing entry has the exact same date/time value as another row (i.e. the transaction that came immediately after opening, or immediately before closing). To address this, I would suggest that you add a sequence number that is stored in a way that guarantees uniqueness, and increasing-with-respect-to-time. If you do this, then you would use the sequence value in the place of
dte
to replace theopen_date
andclose_date
I've used as join fields in my example.