将 SQL 中的记录与后续日期合并？

发布于 2024-10-18 09:10:46 字数 687 浏览 3 评论 0原文

我想使用 sql 合并以下数据：

no code area rate startdate enddate  
1  0101 EU   0.1% 20050101  20051231  
2  0101 EU   0.1% 20060101  20061231
3  0101 EU   0.1% 20080101  20081231  
4  0101 EFTA 0.2% 20050101  20051231  
5  0101 EFTA 0.1% 20060101  20061231

在上面的示例中，不应合并 1 和 2，因为它们具有相同的代码、相同的费率、具有后续日期并且来自同一区域。
No 3 不应合并，因为日期不是后续的。
4 和 5 不应合并，因为它们具有不同的速率。

最终结果应如下所示：

no code area rate startdate enddate  
1  0101 EU   0.1% 20050101  20061231  
3  0101 EU   0.1% 20080101  20081231  
4  0101 EFTA 0.2% 20050101  20051231  
5  0101 EFTA 0.1% 20060101  20061231

有没有办法仅使用 sql 来执行此操作？我正在使用 postgres。

谢谢

原文

I want to merge the following data using sql:

no code area rate startdate enddate  
1  0101 EU   0.1% 20050101  20051231  
2  0101 EU   0.1% 20060101  20061231
3  0101 EU   0.1% 20080101  20081231  
4  0101 EFTA 0.2% 20050101  20051231  
5  0101 EFTA 0.1% 20060101  20061231

In the above example no 1 and 2 should be merged as they have the same code, same rate, have subsequent dates and are from the same area.
No 3 should not be merged as the dates are not subsequent.
4 and 5 should not be merged as they have different rates.

The end result should look as follows:

no code area rate startdate enddate  
1  0101 EU   0.1% 20050101  20061231  
3  0101 EU   0.1% 20080101  20081231  
4  0101 EFTA 0.2% 20050101  20051231  
5  0101 EFTA 0.1% 20060101  20061231

Is there a way to do this using only sql? I am using postgres.

Thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

○闲身 2024-10-25 09:10:46

假设这是您想要做的一次性热门事情，并且不是某些正在进行的生产解决方案的一部分，我认为您执行此操作的唯一方法是使用光标。

我不知道 Postgres，但似乎它们的游标的工作方式与 MS-Sql 类似： http://www.postgresql.org/docs/current/static/plpgsql-cursors.html

游标允许您“循环”选择语句，将值放入临时变量中然后您可以在其中进行处理。它更类似于过程编程。

性能不是很好，这就是为什么它可以作为一次性的事情，但如果您遇到性能问题，可能会有更复杂（但性能更高）的解决方案。

我首先创建一个游标来循环遍历表，将代码、速率、开始日期和结束日期放入变量中。（如果您按代码、费率和开始日期对 select 语句进行排序，您将保证所有记录都将以正确的顺序显示。）对于每条记录，您可以将当前代码和费率与变量中的值进行比较，并且如果它们相同，则更新结束日期变量。

使用第二个表（或表变量）作为结果。如果代码/速率与变量不同，则变量中的所有内容都是新行候选。将其写出来，然后用新数据更新变量并继续。

一口气您就创建了新表。然后，您可以删除原始的并替换，或将其存储在其他地方。

回复收藏 0 原文

最终幸福 2024-10-25 09:10:46

也许我错过了一些东西，但对我来说，你似乎想要：

SELECT DISTINCT ON (code, area, rate, startdate, enddate), 
       no, 
       code, 
       area, 
       rate, 
       startdate, 
       enddate  
FROM your_table

Maybe I'm missing something, but to me it seems that you want:

SELECT DISTINCT ON (code, area, rate, startdate, enddate), 
       no, 
       code, 
       area, 
       rate, 
       startdate, 
       enddate  
FROM your_table

回复收藏 0 原文

笑梦风尘 2024-10-25 09:10:46

我也遇到过同样的问题。我使用它的方法是将有序列表“划分”为具有后续日期的行“组”，方法是标记“组”的第一行（“原始”行），该行不位于其前任行之后，并且然后找到每个“组”的列表行 - 最后一行的结束日期是“聚合”结束日期。

使用新列 Subsequent (int) 创建附加表。将原始表中的所有行复制到此填充新列“后续”，如下所示：如果该行的开始日期等于上一行的结束日期，则后续 = 1（意味着该行是前一行的后续行），如果不等于 0（意味着该行开始新序列）。为了查找上一行的结束日期，请使用 lag() 函数，该函数按代码和区域分区并按开始日期排序。例如，在上面提到的示例中，第 1 行。 2 的后续 = 1，所有其他后续 = 0。
编写一个查询，将从上表中仅选择后续 = 0 的行，即“原始”行，并带有代码和费率，然后查找最大结束日期对于所选的每个原始行，在原始行之后的所有后续行中。

可以使用如下子查询找到最大结束日期：

(A) 尝试在原始行之后的后续行中查找它：

从满足条件的所有行中选择最大结束日期：

后续为 1，即只有后续行
代码等于代码原始行
率等于原始行的代码
开始日期大于原始行的开始日期
结束日期小于下一个原始行的开始日期，或者，如果没有下一个原始行，则结束日期 = 所有结束日期的最大值在代码和费率的这种组合上”

下一个原始行可以通过另一个子查询找到：选择所有行，其中

后续 = 0
开始日期 > 原始行
代码和费率的开始日期与原始行中的相同。

(B) 如果上面返回 null，即原始行没有后续行，则该行的结束日期是其自己的结束日期，

可以很好地在 (A) 和 (B) 之间做出选择

作为查询的结果，您应该有一个列表。您希望在您的示例中得到它。

I have faced the same problem. The approach I use it to 'divide' the ordered list in 'groups' of rows having subsequent dates, by marking the first ('original') row of the 'group' which is the one that is not subsequent to its predecessor, and then finding the list row of each 'group' - end date of such last row is the 'aggregated' end date.

Create additional table with new column Subsequent (int). Copy all rows from original table into this one filling new column 'Subsequent' as follows: if start date of the row is equal to end date of previous row, then Subsequent = 1 (means that this row is subsequent to previous one), if not equal then 0 (means that this row starts new sequence). In order to find end date of previous row use lag() function partitioned by code and area and ordered by start date. For instance in the above mentioned example line no. 2 will have subsequent = 1 and all others subsequent = 0.
Write a query which will select from above table only rows where Subsequent = 0, i.e. 'original' ones, with code and rate, and then find the maximum end date among all subsequent rows following after the original row, for each original row selected.

Maximum end date can be found using subquery like this:

(A) Try to find it among subsequent rows following this original one:

select maximum end date from all rows which meet criteria:

subsequent is 1, i.e. only subsequent rows
code equals to code of the original row
rate equals to code of the original row
start date greater than start date of original row
end date less than start date of next original row, or, if there is no next original row, then end date = max of all end dates over this combination of code and rate"

Next original row can be found by another subquery: select all rows where