如何使用 BASH 脚本的 AWK 生成给定开始和结束日期的日期序列？

发布于 2024-10-06 03:34:11 字数 465 浏览 3 评论 0原文

我有一个具有以下格式的数据集

第一个和第二个字段表示研究开始和结束的日期 (M/D/YYYY)。

如何使用 AWK 或 BASH 脚本将数据扩展为所需的输出格式，同时考虑到闰年？

非常感谢您的帮助。

输入

  7/2/2009   7/7/2009
  2/28/1996  3/3/1996
  12/30/2001 1/4/2002

期望输出

原文

I have a data set with the following format

The first and second fields denote the dates (M/D/YYYY) of starting and ending of a study.

How one expand the data into the desired output format, taking into account the leap years using AWK or BASH scripts?

Your help is very much appreciated.

Input

  7/2/2009   7/7/2009
  2/28/1996  3/3/1996
  12/30/2001 1/4/2002

Desired Output

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

中性美 2024-10-13 03:34:12

我更喜欢 ISO 8601 格式的日期 - 这是使用它们的解决方案。
如果您愿意，您可以轻松地将其改编为美国格式。

AWK 脚本

BEGIN {
    days[ 1] = 31; days[ 2] = 28; days[ 3] = 31;
    days[ 4] = 30; days[ 5] = 31; days[ 6] = 30;
    days[ 7] = 31; days[ 8] = 31; days[ 9] = 30;
    days[10] = 31; days[11] = 30; days[12] = 31;
}
function leap(y){
    return ((y %4) == 0 && (y % 100 != 0 || y % 400 == 0));
}
function last(m, l,  d){
    d = days[m] + (m == 2) * l;
    return d;
}
function prev_day(date,   y, m, d){
    y = substr(date, 1, 4)
    m = substr(date, 6, 2)
    d = substr(date, 9, 2)
    #print d "/" m "/" y
    if (d+0 == 1 && m+0 == 1){
        d = 31; m = 12; y--;
    }
    else if (d+0 == 1){
        m--; d = last(m, leap(y));
    }
    else
        d--
    return sprintf("%04d-%02d-%02d", y, m, d);
}
{
    d1 = $1; d2 = $2;
    print d2;
    while (d2 != d1){
        d2 = prev_day(d2);
        print d2;
    }
}

调用此文件：dates.awk

数据

2009-07-02 2009-07-07
1996-02-28 1996-03-03
2001-12-30 2002-01-04

调用此文件：dates.txt

结果

执行的命令：

awk -f dates.awk dates.txt

输出：

I prefer ISO 8601 format dates - here is a solution using them.
You can adapt it easily enough to American format if you wish.

AWK Script

BEGIN {
    days[ 1] = 31; days[ 2] = 28; days[ 3] = 31;
    days[ 4] = 30; days[ 5] = 31; days[ 6] = 30;
    days[ 7] = 31; days[ 8] = 31; days[ 9] = 30;
    days[10] = 31; days[11] = 30; days[12] = 31;
}
function leap(y){
    return ((y %4) == 0 && (y % 100 != 0 || y % 400 == 0));
}
function last(m, l,  d){
    d = days[m] + (m == 2) * l;
    return d;
}
function prev_day(date,   y, m, d){
    y = substr(date, 1, 4)
    m = substr(date, 6, 2)
    d = substr(date, 9, 2)
    #print d "/" m "/" y
    if (d+0 == 1 && m+0 == 1){
        d = 31; m = 12; y--;
    }
    else if (d+0 == 1){
        m--; d = last(m, leap(y));
    }
    else
        d--
    return sprintf("%04d-%02d-%02d", y, m, d);
}
{
    d1 = $1; d2 = $2;
    print d2;
    while (d2 != d1){
        d2 = prev_day(d2);
        print d2;
    }
}

Call this file: dates.awk

Data

2009-07-02 2009-07-07
1996-02-28 1996-03-03
2001-12-30 2002-01-04

Call this file: dates.txt

Results

Command executed:

awk -f dates.awk dates.txt

Output:

回复收藏 0 原文

辞旧 2024-10-13 03:34:12

您可以将日期转换为 unix 时间戳，然后对其进行排序，如果需要，您甚至可以具有纳秒的粒度（日期中带有 '%N'）

以下示例打印 2020- 的时间11-07 00:00:00 至 2020-11-07 01:00:00，间隔 5 分钟

# total seconds past 1970-01-01 00:00:00 as observed on UTC timestamp in UTC
# you change TZ to represent time in your timezone like TZ="Asia/Kolkata"

start_time=$(date -u -d 'TZ="UTC" 2020-11-07 00:00:00' '+%s')   
end_time=$(date -u -d 'TZ="UTC" 2020-11-07 01:00:00' '+%s')


# 60 seconds * 5 times (i.e. 5 minutes)
# you change interval according your needs or leave it to show every second

interval=$((60 * 5))


# generate sequence with intervals and convert back to timestamp in UTC
# again change TZ to represent timein your timezone

seq ${start_time} ${interval} ${end_time} | 
xargs -I{} date -u -d 'TZ="UTC" @'{} '+%F %T'

You can convert date to unix timestamp and then sequencing on it, you can even have granularity of nanoseconds if you want (with '%N' in date)

The following example prints time from 2020-11-07 00:00:00 to 2020-11-07 01:00:00 in intervals of 5 minutes

# total seconds past 1970-01-01 00:00:00 as observed on UTC timestamp in UTC
# you change TZ to represent time in your timezone like TZ="Asia/Kolkata"

start_time=$(date -u -d 'TZ="UTC" 2020-11-07 00:00:00' '+%s')   
end_time=$(date -u -d 'TZ="UTC" 2020-11-07 01:00:00' '+%s')


# 60 seconds * 5 times (i.e. 5 minutes)
# you change interval according your needs or leave it to show every second

interval=$((60 * 5))


# generate sequence with intervals and convert back to timestamp in UTC
# again change TZ to represent timein your timezone

seq ${start_time} ${interval} ${end_time} | 
xargs -I{} date -u -d 'TZ="UTC" @'{} '+%F %T'

回复收藏 0 原文

浮光之海 2024-10-13 03:34:11

单独使用 bash 可以很好地完成：

for i in `seq 1 5`;
do
  date -d "2017-12-01 $i days" +%Y-%m-%d;
done;

或使用管道：

seq 1 5 | xargs -I {} date -d "2017-12-01 {} days" +%Y-%m-%d

It can be done nicely with bash alone:

for i in `seq 1 5`;
do
  date -d "2017-12-01 $i days" +%Y-%m-%d;
done;

or with pipes:

seq 1 5 | xargs -I {} date -d "2017-12-01 {} days" +%Y-%m-%d

回复收藏 0 原文

oО清风挽发oО 2024-10-13 03:34:11

如果您有gawk：

#!/usr/bin/gawk -f
{
    split($1,s,"/")
    split($2,e,"/")
    st=mktime(s[3] " " s[1] " " s[2] " 0 0 0")
    et=mktime(e[3] " " e[1] " " e[2] " 0 0 0")
    for (i=et;i>=st;i-=60*60*24) print strftime("%m/%d/%Y",i)
}

演示：

./daterange.awk inputfile

输出：

编辑：

上面的脚本对天的长度有一个天真的假设。这是一个小问题，但在某些情况下可能会产生意想不到的结果。这里至少有一个其他答案也有这个问题。据推测，减去（或添加）天数的 date 命令不存在此问题。

有些答案要求您提前知道天数。

这是另一种有望解决这些问题的方法：

while read -r d1 d2
do
    t1=$(date -d "$d1 12:00 PM" +%s)
    t2=$(date -d "$d2 12:00 PM" +%s)
    if ((t2 > t1)) # swap times/dates if needed
    then
        temp_t=$t1; temp_d=$d1
        t1=$t2;     d1=$d2
        t2=$temp_t; d2=$temp_d
    fi
    t3=$t1
    days=0
    while ((t3 > t2))
    do
        read -r -u 3 d3 t3 3<<< "$(date -d "$d1 12:00 PM - $days days" '+%m/%d/%Y %s')"
        ((++days))
        echo "$d3"
    done
done < inputfile

If you have gawk:

#!/usr/bin/gawk -f
{
    split($1,s,"/")
    split($2,e,"/")
    st=mktime(s[3] " " s[1] " " s[2] " 0 0 0")
    et=mktime(e[3] " " e[1] " " e[2] " 0 0 0")
    for (i=et;i>=st;i-=60*60*24) print strftime("%m/%d/%Y",i)
}

Demonstration:

./daterange.awk inputfile

Output:

Edit:

The script above suffers from a naive assumption about the length of days. It's a minor nit, but it could produce unexpected results under some circumstances. At least one other answer here also has that problem. Presumably, the date command with subtracting (or adding) a number of days doesn't have this issue.

Some answers require you to know the number of days in advance.

Here's another method which hopefully addresses those concerns:

while read -r d1 d2
do
    t1=$(date -d "$d1 12:00 PM" +%s)
    t2=$(date -d "$d2 12:00 PM" +%s)
    if ((t2 > t1)) # swap times/dates if needed
    then
        temp_t=$t1; temp_d=$d1
        t1=$t2;     d1=$d2
        t2=$temp_t; d2=$temp_d
    fi
    t3=$t1
    days=0
    while ((t3 > t2))
    do
        read -r -u 3 d3 t3 3<<< "$(date -d "$d1 12:00 PM - $days days" '+%m/%d/%Y %s')"
        ((++days))
        echo "$d3"
    done
done < inputfile

回复收藏 0 原文

來不及說愛妳 2024-10-13 03:34:11

您可以在不使用 awk 的 shell 中执行此操作，假设您有 GNU 日期（这是 date -d @nnn 形式所需要的，并且可能能够在个位数的日期和月份中去除前导零）：

while read start end ; do
    for d in $(seq $(date +%s -d $end) -86400 $(date +%s -d $start)) ; do
        date +%-m/%-d/%Y -d @$d
    done
done

如果您所在的区域设置实行夏令时，那么如果请求中间发生夏令时切换的日期序列，这可能会变得混乱。使用 -u 强制使用 UTC，它也严格遵守每天 86400 秒。就像这样：

while read start end ; do
    for d in $(seq $(date -u +%s -d $end) -86400 $(date -u +%s -d $start)) ; do
        date -u +%-m/%-d/%Y -d @$d
    done
done

只需将您的输入提供给标准输入即可。

您的数据的输出是：

7/7/2009
7/6/2009
7/5/2009
7/4/2009
7/3/2009
7/2/2009
3/3/1996
3/2/1996
3/1/1996
2/29/1996
2/28/1996
1/4/2002
1/3/2002
1/2/2002
1/1/2002
12/31/2001
12/30/2001

You can do this in the shell without awk, assuming you have GNU date (which is needed for the date -d @nnn form, and possibly the ability to strip leading zeros on single digit days and months):

while read start end ; do
    for d in $(seq $(date +%s -d $end) -86400 $(date +%s -d $start)) ; do
        date +%-m/%-d/%Y -d @$d
    done
done

If you are in a locale that does daylight savings, then this can get messed up if requesting a date sequence where a daylight saving switch occurs in between. Use -u to force to UTC, which also strictly observes 86400 seconds per day. Like this:

while read start end ; do
    for d in $(seq $(date -u +%s -d $end) -86400 $(date -u +%s -d $start)) ; do
        date -u +%-m/%-d/%Y -d @$d
    done
done

Just feed this your input on stdin.

The output for your data is:

7/7/2009
7/6/2009
7/5/2009
7/4/2009
7/3/2009
7/2/2009
3/3/1996
3/2/1996
3/1/1996
2/29/1996
2/28/1996
1/4/2002
1/3/2002
1/2/2002
1/1/2002
12/31/2001
12/30/2001

回复收藏 0 原文

琴流音 2024-10-13 03:34:11

另一种选择是使用 dateutils 中的 dateeq (http://www.fresse.org/dateutils/#dateseq< /a>）。 -i 更改输入格式，-f 更改输出格式。当第一个日期晚于第二个日期时，-1 必须指定为增量。

$ dateseq -i %m/%d/%Y -f %m/%d/%Y 7/7/2009 -1 7/2/2009
07/07/2009
07/06/2009
07/05/2009
07/04/2009
07/03/2009
07/02/2009
$ dateseq 2017-04-01 2017-04-05
2017-04-01
2017-04-02
2017-04-03
2017-04-04
2017-04-05

Another option is to use dateseq from dateutils (http://www.fresse.org/dateutils/#dateseq). -i changes the input format and -f changes the output format. -1 must be specified as an increment when the first date is later than the second date.

$ dateseq -i %m/%d/%Y -f %m/%d/%Y 7/7/2009 -1 7/2/2009
07/07/2009
07/06/2009
07/05/2009
07/04/2009
07/03/2009
07/02/2009
$ dateseq 2017-04-01 2017-04-05
2017-04-01
2017-04-02
2017-04-03
2017-04-04
2017-04-05

回复收藏 0 原文

~没有更多了~