将 SQL 中的记录与后续日期合并?
我想使用 sql 合并以下数据:
no code area rate startdate enddate 1 0101 EU 0.1% 20050101 20051231 2 0101 EU 0.1% 20060101 20061231 3 0101 EU 0.1% 20080101 20081231 4 0101 EFTA 0.2% 20050101 20051231 5 0101 EFTA 0.1% 20060101 20061231
- 在上面的示例中,不应合并 1 和 2,因为它们具有相同的代码、相同的费率、具有后续日期并且来自同一区域。
- No 3 不应合并,因为日期不是后续的。
- 4 和 5 不应合并,因为它们具有不同的速率。
最终结果应如下所示:
no code area rate startdate enddate 1 0101 EU 0.1% 20050101 20061231 3 0101 EU 0.1% 20080101 20081231 4 0101 EFTA 0.2% 20050101 20051231 5 0101 EFTA 0.1% 20060101 20061231
有没有办法仅使用 sql 来执行此操作?我正在使用 postgres。
谢谢
I want to merge the following data using sql:
no code area rate startdate enddate 1 0101 EU 0.1% 20050101 20051231 2 0101 EU 0.1% 20060101 20061231 3 0101 EU 0.1% 20080101 20081231 4 0101 EFTA 0.2% 20050101 20051231 5 0101 EFTA 0.1% 20060101 20061231
- In the above example no 1 and 2 should be merged as they have the same code, same rate, have subsequent dates and are from the same area.
- No 3 should not be merged as the dates are not subsequent.
- 4 and 5 should not be merged as they have different rates.
The end result should look as follows:
no code area rate startdate enddate 1 0101 EU 0.1% 20050101 20061231 3 0101 EU 0.1% 20080101 20081231 4 0101 EFTA 0.2% 20050101 20051231 5 0101 EFTA 0.1% 20060101 20061231
Is there a way to do this using only sql? I am using postgres.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
假设这是您想要做的一次性热门事情,并且不是某些正在进行的生产解决方案的一部分,我认为您执行此操作的唯一方法是使用光标。
我不知道 Postgres,但似乎它们的游标的工作方式与 MS-Sql 类似: http://www.postgresql.org/docs/current/static/plpgsql-cursors.html
游标允许您“循环”选择语句,将值放入临时变量中然后您可以在其中进行处理。它更类似于过程编程。
性能不是很好,这就是为什么它可以作为一次性的事情,但如果您遇到性能问题,可能会有更复杂(但性能更高)的解决方案。
我首先创建一个游标来循环遍历表,将代码、速率、开始日期和结束日期放入变量中。 (如果您按代码、费率和开始日期对 select 语句进行排序,您将保证所有记录都将以正确的顺序显示。)对于每条记录,您可以将当前代码和费率与变量中的值进行比较,并且如果它们相同,则更新结束日期变量。
使用第二个表(或表变量)作为结果。如果代码/速率与变量不同,则变量中的所有内容都是新行候选。将其写出来,然后用新数据更新变量并继续。
一口气您就创建了新表。然后,您可以删除原始的并替换,或将其存储在其他地方。
Assuming this is a one-off hit thing you want to do, and isn't part of some on-going production solution, I think the only way you'll do this is with a cursor.
I don't know Postgres, but it seems they have cursors which work in a similar way to MS-Sql: http://www.postgresql.org/docs/current/static/plpgsql-cursors.html
Cursors allow you to "loop through" a select statement, putting the values into temporary variables where you can then work on them. It's more akin to procedural programming.
Performance isn't great, which is why it's OK as a one-off thing, but there may be a more complex (but more performant) solution if you have performance issues.
I would start by creating a cursor to loop through the table, putting the code,rate, startdate and endate into variables. (If you sort the select statement by code,rate and startdate you'll guarantee that all your records will appear in the right order for you.) For each record you can compare the current code and rate to the value in the variable, and if they're the same, update the end date variable.
Use a 2nd table (or a table variable) for the results. If the code/rate isn't the same as the variables, then everything in the variables is a new row candidiate. Write it out, then update the variables with the new data and continue.
In one pass you'll have created your new table. You can either then delete the original one and replace, or store it elsewhere.
也许我错过了一些东西,但对我来说,你似乎想要:
Maybe I'm missing something, but to me it seems that you want:
我也遇到过同样的问题。我使用它的方法是将有序列表“划分”为具有后续日期的行“组”,方法是标记“组”的第一行(“原始”行),该行不位于其前任行之后,并且然后找到每个“组”的列表行 - 最后一行的结束日期是“聚合”结束日期。
使用新列 Subsequent (int) 创建附加表。将原始表中的所有行复制到此填充新列“后续”,如下所示:如果该行的开始日期等于上一行的结束日期,则后续 = 1(意味着该行是前一行的后续行),如果不等于 0(意味着该行开始新序列)。为了查找上一行的结束日期,请使用 lag() 函数,该函数按代码和区域分区并按开始日期排序。例如,在上面提到的示例中,第 1 行。 2 的后续 = 1,所有其他后续 = 0。
编写一个查询,将从上表中仅选择后续 = 0 的行,即“原始”行,并带有代码和费率,然后查找最大结束日期对于所选的每个原始行,在原始行之后的所有后续行中。
可以使用如下子查询找到最大结束日期:
(A) 尝试在原始行之后的后续行中查找它:
从满足条件的所有行中选择最大结束日期:
下一个原始行可以通过另一个子查询找到:选择所有行,其中
(B) 如果上面返回 null,即原始行没有后续行,则该行的结束日期是其自己的结束日期,
可以很好地在 (A) 和 (B) 之间做出选择
作为查询的结果,您应该有一个列表。您希望在您的示例中得到它。
I have faced the same problem. The approach I use it to 'divide' the ordered list in 'groups' of rows having subsequent dates, by marking the first ('original') row of the 'group' which is the one that is not subsequent to its predecessor, and then finding the list row of each 'group' - end date of such last row is the 'aggregated' end date.
Create additional table with new column Subsequent (int). Copy all rows from original table into this one filling new column 'Subsequent' as follows: if start date of the row is equal to end date of previous row, then Subsequent = 1 (means that this row is subsequent to previous one), if not equal then 0 (means that this row starts new sequence). In order to find end date of previous row use lag() function partitioned by code and area and ordered by start date. For instance in the above mentioned example line no. 2 will have subsequent = 1 and all others subsequent = 0.
Write a query which will select from above table only rows where Subsequent = 0, i.e. 'original' ones, with code and rate, and then find the maximum end date among all subsequent rows following after the original row, for each original row selected.
Maximum end date can be found using subquery like this:
(A) Try to find it among subsequent rows following this original one:
select maximum end date from all rows which meet criteria:
Next original row can be found by another subquery: select all rows where
(B) If above returned null, i.e. the original row has no subsequent rows, then end date for this row is its own end date.
Coalesce function make a choice between (A) and (B) nicely.
As the result of the query you should have a list which you would like to get in your example.