SQL：匹配日期范围

发布于 2025-02-14 00:38:43 字数 2220 浏览 0 评论 0原文

我正在寻找随时间变化并由和 date字段定义的映射时，正在寻找最佳实践。例如，以此示例为例 postgres db db 使用员工信息。

它在员工和薪水之间有一个映射，可在不同的日期范围内有效：

> SELECT * FROM salaries LIMIT 5

|   emp_no |   salary | from_date   | to_date    |
|---------:|---------:|:------------|:-----------|
|    10001 |    60117 | 1986-06-26  | 1987-06-26 |
|    10001 |    62102 | 1987-06-26  | 1988-06-25 |
|    10001 |    66074 | 1988-06-25  | 1989-06-25 |
|    10001 |    66596 | 1989-06-25  | 1990-06-25 |
|    10001 |    66961 | 1990-06-25  | 1991-06-25 |

以及员工和头衔之间的映射：

> SELECT * FROM titles LIMIT 5

|   emp_no | title           | from_date   | to_date    |
|---------:|:----------------|:------------|:-----------|
|    10001 | Senior Engineer | 1986-06-26  | 9999-01-01 |
|    10002 | Staff           | 1996-08-03  | 9999-01-01 |
|    10003 | Senior Engineer | 1995-12-03  | 9999-01-01 |
|    10004 | Engineer        | 1986-12-01  | 1995-12-01 |
|    10004 | Senior Engineer | 1995-12-01  | 9999-01-01 |

将标题标题映射到薪水的最佳方法是什么，以便例如计算每个标题的平均薪水。请注意，雇员的薪水可能不会改变标题更改，反之亦然。

我目前的解决方案是将这些表与包含所有可能日期的表交叉加入：

WITH dates AS (
    SELECT date_trunc('day', dd):: date AS the_date
FROM generate_series
        ( '1985-01-01'::timestamp 
        , '2003-01-01'::timestamp
        , '1 day'::interval) dd
),
daily_salaries AS (
    SELECT
        emp_no,
        salary,
        the_date
    FROM salaries t
    CROSS JOIN dates d
    WHERE the_date BETWEEN from_date AND to_date
),
daily_titles AS (
    SELECT
        emp_no,
        title,
        the_date
    FROM titles t
    CROSS JOIN dates d
    WHERE the_date BETWEEN from_date AND to_date
)
SELECT
    title,
    AVG(salary) AS avg_salary
FROM daily_salaries
INNER JOIN daily_titles USING (emp_no, the_date)
GROUP BY 1
ORDER BY 2 DESC

但这似乎非常低效。做这件事的更好方法是什么？

编辑：正如卢克（Luuk）在答案中提到的那样，上面查询的问题是，十字架联接将产生大量记录。问题的症结在于，如何避免在每位员工的数天内扩大薪水和头衔，同时仍将员工的薪水与他们拥有头衔的时期相关联。

一般的问题是如何在开始和结束时定义的任意时间段，而无需每天在时间间隔内处理（如果可能）。

原文

I'm looking for best practices when dealing with mappings that vary in time and are defined by from and to date fields. For example, take this sample Postgres DB with employees info.

It has a mapping between employees and salaries that is valid for different date ranges:

> SELECT * FROM salaries LIMIT 5

|   emp_no |   salary | from_date   | to_date    |
|---------:|---------:|:------------|:-----------|
|    10001 |    60117 | 1986-06-26  | 1987-06-26 |
|    10001 |    62102 | 1987-06-26  | 1988-06-25 |
|    10001 |    66074 | 1988-06-25  | 1989-06-25 |
|    10001 |    66596 | 1989-06-25  | 1990-06-25 |
|    10001 |    66961 | 1990-06-25  | 1991-06-25 |

And a mapping between employees and titles:

> SELECT * FROM titles LIMIT 5

|   emp_no | title           | from_date   | to_date    |
|---------:|:----------------|:------------|:-----------|
|    10001 | Senior Engineer | 1986-06-26  | 9999-01-01 |
|    10002 | Staff           | 1996-08-03  | 9999-01-01 |
|    10003 | Senior Engineer | 1995-12-03  | 9999-01-01 |
|    10004 | Engineer        | 1986-12-01  | 1995-12-01 |
|    10004 | Senior Engineer | 1995-12-01  | 9999-01-01 |

What's the best way to map titles to salaries so I can, for instance, calculate average salaries per title. Note that an employee's salary may change without a title change and vice versa.

My current solution is to CROSS JOIN these tables with a table containing all possible dates if interest:

WITH dates AS (
    SELECT date_trunc('day', dd):: date AS the_date
FROM generate_series
        ( '1985-01-01'::timestamp 
        , '2003-01-01'::timestamp
        , '1 day'::interval) dd
),
daily_salaries AS (
    SELECT
        emp_no,
        salary,
        the_date
    FROM salaries t
    CROSS JOIN dates d
    WHERE the_date BETWEEN from_date AND to_date
),
daily_titles AS (
    SELECT
        emp_no,
        title,
        the_date
    FROM titles t
    CROSS JOIN dates d
    WHERE the_date BETWEEN from_date AND to_date
)
SELECT
    title,
    AVG(salary) AS avg_salary
FROM daily_salaries
INNER JOIN daily_titles USING (emp_no, the_date)
GROUP BY 1
ORDER BY 2 DESC

But this seems terribly inefficient. What's a better way to accomplish this?

Edit: As Luuk mentioned in his answer, the problem with the query above is that the CROSS JOIN will generate a very large number of records. The crux of the question is how to avoid expanding salaries and titles over days for each employee while still associating an employee's salary to only the period in which they held a title.

The general question is how to merge over arbitrary time periods defined by a start and end without dealing with every day in the intervals, if at all possible.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

橙味迷妹 2025-02-21 00:38:44

有点不清楚为什么您从2844047记录中选择（从薪水中选择计数（*）），然后将结果数乘以365（现有年度的每天），从而导致1038077155+记录的临时结果（+，因为有些一年的时间超过365天...）。最后，您正在计算avg_salary，在当年的365天中，这总是相同的。

我确实认为无需为所有单个日期生成_eries，并且此查询应给出正确的结果：

select 
   t2.title, 
   avg(avg_salary) 
from (select 
         emp_no, 
         avg(salary) avg_salary 
      from salaries 
      group by emp_no) x 
inner join titles t2 on t2.emp_no=x.emp_no 
group by 1 order by 2 desc;

在上述查询中，我没有从_date和/或to_date中选择，因为这将作为您的待办事项

It is, a bit unclear why you are selecting from 2844047 records (select count(*) from salaries), and multiplying the number of results by 365 (for every day in the existing year), resulting in a temporary result of 1038077155+ records (+, because some year have more than 365 days...). Finally you are calculating the avg_salary, which will always be the same for all of the 365 days in that year.

I do think there is no need to generate_series for all individual dates, and this query should give the correct results:

select 
   t2.title, 
   avg(avg_salary) 
from (select 
         emp_no, 
         avg(salary) avg_salary 
      from salaries 
      group by emp_no) x 
inner join titles t2 on t2.emp_no=x.emp_no 
group by 1 order by 2 desc;

In above query I did not select from_date and/or to_date, because that will be left as a TODO for you ????

Results of above query:

       title        |        avg
--------------------+--------------------
 Senior Staff       | 69119.550582564534
 Staff              | 66956.829691848575
 Manager            | 66044.384223847367
 Senior Engineer    | 59144.768351940127
 Engineer           | 57244.458456267581
 Technique Leader   | 57034.814130272188
 Assistant Engineer | 56963.530432485845
(7 rows)

EDIT:

First you need to get the salary of an employee with the correct starting date and ending date, this deals with changing functions somewhere during the year

select 
    s.emp_no, 
    s.salary ,
    t.title,
    case when t.from_date >=s.from_date  then t.from_date else s.from_date end StartSal,
    case when t.from_date >=s.from_date  then s.to_date else case when t.to_date>s.to_date then s.to_date else t.to_date end end EndSal
from salaries s 
left join titles t on s.emp_no =t.emp_no and s.from_date <t.to_date  and s.to_date >t.from_date 
-- where s.emp_no =10005 and extract(year from s.from_date) in(1995,1996,1997)
order by s.emp_no, StartSal

example output (with the WHERE enabled):

emp_no|salary|title       |startsal  |endsal    |
------+------+------------+----------+----------+
 10005| 88448|Staff       |1995-09-11|1996-09-10|
 10005| 88063|Staff       |1996-09-10|1996-09-12|
 10005| 88063|Senior Staff|1996-09-12|1997-09-10|
 10005| 89724|Senior Staff|1997-09-10|1998-09-10|

When you want to know the average salary on 1996-01-01, you can do:

select title, round(avg(salary),2)
from (
select 
    s.emp_no, 
    s.salary ,
    t.title,
    case when t.from_date >=s.from_date  then t.from_date else s.from_date end StartSal,
    case when t.from_date >=s.from_date  then s.to_date else case when t.to_date>s.to_date then s.to_date else t.to_date end end EndSal
from salaries s 
left join titles t on s.emp_no =t.emp_no and s.from_date <t.to_date  and s.to_date >t.from_date 
order by s.emp_no, StartSal
) sal
where '1996-01-01'::date between StartSal and EndSal
group by title;

output:

title             |round   |
------------------+--------+
Engineer          |55341.77|
Senior Engineer   |61998.56|
Manager           |67784.00|
Assistant Engineer|54692.42|
Staff             |64678.06|
Senior Staff      |72139.55|
Technique Leader  |58156.86|

P.S. There is one small error in this, because the to_date is the same as the start_date. I did not handle this in this script.

回复收藏 0 原文

~没有更多了~