用 awk 人性化日期?

发布于 2024-07-26 18:20:29 字数 1467 浏览 3 评论 0原文

我有一个 awk 脚本,它运行一个文件并计算给定日期的每次出现。 原始文件中的日期格式是标准日期格式,如下所示:

Thu Mar 5 16:46:15 EST 2009
I use awk to throw away the weekday, time, and timezone, and then do my counting by pumping the dates into an associative array with the dates as indices.

为了让输出按日期排序,我将日期转换为可以使用 bash sort 排序的不同格式。

现在,我的输出如下所示:

Date    Count
03/05/2009   2
03/06/2009   1
05/13/2009   7
05/22/2009  14
05/23/2009   7
05/25/2009   7
05/29/2009  11
06/02/2009  12
06/03/2009  16

我真的希望输出具有更多人类可读的日期,如下所示:

Mar  5, 2009
Mar  6, 2009
May 13, 2009
May 22, 2009
May 23, 2009
May 25, 2009
May 29, 2009
Jun  2, 2009
Jun  3, 2009

对于我可以做到这一点的方法有什么建议吗? 如果我可以在输出计数值时即时执行此操作,那就最好了。

更新: 这是我的解决方案,结合了 Ghostdog74 的示例代码:

grep -i "E[DS]T 2009" original.txt | awk '{printf "%s %2.d, %s\r\n",$2,$3,$6}' >dates.txt #outputs dates for counting
date -f dates.txt +'%Y %m %d' | awk ' #reformat dates as YYYYMMDD for future sort
  {++total[$0]} #pump dates into associative array
  END { 
    for (item in total) printf "%s\t%s\r\n", item, total[item] #output dates as yyyy mm dd with counts
  }' | sort -t \t | awk ' #send to sort, then to cleanup
  BEGIN {printf "%s\t%s\r\n","Date","Count"}
  {t=$1" "$2" "$3" 0 0 0" #cleanup using example by ghostdog74
   printf "%s\t%2.d\r\n",strftime("%b %d, %Y",mktime(t)),$4
  }'
rm dates.txt

抱歉,这看起来很混乱。 我试图添加澄清评论。

I have this awk script that runs through a file and counts every occurrence of a given date. The date format in the original file is the standard date format, like this:

Thu Mar 5 16:46:15 EST 2009

I use awk to throw away the weekday, time, and timezone, and then do my counting by pumping the dates into an associative array with the dates as indices.

In order to get the output to be sorted by date, I converted the dates to a different format that I could sort with bash sort.

Now, my output looks like this:

Date    Count
03/05/2009   2
03/06/2009   1
05/13/2009   7
05/22/2009  14
05/23/2009   7
05/25/2009   7
05/29/2009  11
06/02/2009  12
06/03/2009  16

I'd really like the output to have more human readable dates, like this:

Mar  5, 2009
Mar  6, 2009
May 13, 2009
May 22, 2009
May 23, 2009
May 25, 2009
May 29, 2009
Jun  2, 2009
Jun  3, 2009

Any suggestions for a way I could do this? If I could do this on the fly when I output the count values that would be best.

UPDATE:
Here's my solution incorporating ghostdog74's example code:

grep -i "E[DS]T 2009" original.txt | awk '{printf "%s %2.d, %s\r\n",$2,$3,$6}' >dates.txt #outputs dates for counting
date -f dates.txt +'%Y %m %d' | awk ' #reformat dates as YYYYMMDD for future sort
  {++total[$0]} #pump dates into associative array
  END { 
    for (item in total) printf "%s\t%s\r\n", item, total[item] #output dates as yyyy mm dd with counts
  }' | sort -t \t | awk ' #send to sort, then to cleanup
  BEGIN {printf "%s\t%s\r\n","Date","Count"}
  {t=$1" "$2" "$3" 0 0 0" #cleanup using example by ghostdog74
   printf "%s\t%2.d\r\n",strftime("%b %d, %Y",mktime(t)),$4
  }'
rm dates.txt

Sorry this looks so messy. I've tried to put clarifying comments in.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

岁吢 2024-08-02 18:20:29

使用 awk 的排序和日期的 stdin 可以大大简化脚本

Date 将接受来自 stdin 的输入,因此您可以消除到 awk 和临时文件的一个管道。 您还可以使用 awk 的数组排序消除一个用于 sort 的管道,从而消除另一个到 awk 的管道。 此外,不需要协进程。

该脚本使用 date 进行月份名称转换,这可能会继续在其他语言中工作(但忽略时区和月/日顺序问题)。

最终结果看起来像“grep | date | awk”。 为了便于阅读,我将其分成单独的行(如果删除注释,它的大小大约是原来的一半):

grep -i "E[DS]T 2009" original.txt | 
date -f - +'%Y %m %d' | #reformat dates as YYYYMMDD for future sort
awk ' 
BEGIN { printf "%s\t%s\r\n","Date","Count" }

{ ++total[$0] #pump dates into associative array }

END {
    idx=1
    for (item in total) {
        d[idx]=item;idx++ # copy the array indices into the contents of a new array
    }
    c=asort(d) # sort the contents of the copy
    for (i=1;i<=c;i++) { # use the contents of the copy to index into the original
        printf "%s\t%2.d\r\n",strftime("%b %e, %Y",mktime(d[i]" 0 0 0")),total[d[i]]
    }
}'

Use awk's sort and date's stdin to greatly simplify the script

Date will accept input from stdin so you can eliminate one pipe to awk and the temporary file. You can also eliminate a pipe to sort by using awk's array sort and as a result, eliminate another pipe to awk. Also, there's no need for a coprocess.

This script uses date for the monthname conversion which would presumably continue to work in other languages (ignoring the timezone and month/day order issues, though).

The end result looks like "grep|date|awk". I have broken it into separate lines for readability (it would be about half as big if the comments were eliminated):

grep -i "E[DS]T 2009" original.txt | 
date -f - +'%Y %m %d' | #reformat dates as YYYYMMDD for future sort
awk ' 
BEGIN { printf "%s\t%s\r\n","Date","Count" }

{ ++total[$0] #pump dates into associative array }

END {
    idx=1
    for (item in total) {
        d[idx]=item;idx++ # copy the array indices into the contents of a new array
    }
    c=asort(d) # sort the contents of the copy
    for (i=1;i<=c;i++) { # use the contents of the copy to index into the original
        printf "%s\t%2.d\r\n",strftime("%b %e, %Y",mktime(d[i]" 0 0 0")),total[d[i]]
    }
}'
南城追梦 2024-08-02 18:20:29

当我看到有人在管道中使用 grep 和 awk(以及 sed、cut 等)时,我会变得暴躁。 awk 可以完全处理许多实用程序的工作。

这是一种清理更新后的代码以在 awk(好吧,gawk)的单个实例中运行的方法,并使用排序作为协同进程:

gawk '
    BEGIN {
        IGNORECASE = 1
    }
    function mon2num(mon) {
        return(((index("JanFebMarAprMayJunJulAugSepOctNovDec", mon)-1)/3)+1)
    }
    / E[DS]T [[:digit:]][[:digit:]][[:digit:]][[:digit:]]/ {
        month=$2
        day=$3
        year=$6
        date=sprintf("%4d%02d%02d", year, mon2num(month), day)
        total[date]++
        human[date] = sprintf("%3s %2d, %4d", month, day, year)
    }
    END {
        sort_coprocess = "sort"
        for (date in total) {
            print date |& sort_coprocess
        }
        close(sort_coprocess, "to")
        print "Date\tCount"
        while ((sort_coprocess |& getline date) > 0) {
            print human[date] "\t" total[date]
        }
        close(sort_coprocess)
    }
' original.txt

I get testy when I see someone using grep and awk (and sed, cut, ...) in a pipeline. Awk can fully handle the work of many utilities.

Here's a way to clean up your updated code to run in a single instance of awk (well, gawk), and using sort as a co-process:

gawk '
    BEGIN {
        IGNORECASE = 1
    }
    function mon2num(mon) {
        return(((index("JanFebMarAprMayJunJulAugSepOctNovDec", mon)-1)/3)+1)
    }
    / E[DS]T [[:digit:]][[:digit:]][[:digit:]][[:digit:]]/ {
        month=$2
        day=$3
        year=$6
        date=sprintf("%4d%02d%02d", year, mon2num(month), day)
        total[date]++
        human[date] = sprintf("%3s %2d, %4d", month, day, year)
    }
    END {
        sort_coprocess = "sort"
        for (date in total) {
            print date |& sort_coprocess
        }
        close(sort_coprocess, "to")
        print "Date\tCount"
        while ((sort_coprocess |& getline date) > 0) {
            print human[date] "\t" total[date]
        }
        close(sort_coprocess)
    }
' original.txt
坚持沉默 2024-08-02 18:20:29

如果您使用 gawk,

awk 'BEGIN{
    s="03/05/2009"
    m=split(s,date,"/")
    t=date[3]" "date[2]" "date[1]" 0 0 0"
    print strftime("%b %d",mktime(t))
}'

上面只是一个示例,因为您没有显示实际代码,因此无法将其合并到您的代码中。

if you are using gawk

awk 'BEGIN{
    s="03/05/2009"
    m=split(s,date,"/")
    t=date[3]" "date[2]" "date[1]" 0 0 0"
    print strftime("%b %d",mktime(t))
}'

the above is just an example, as you did not show your actual code and so cannot incorporate it into your code.

锦欢 2024-08-02 18:20:29

为什么不将 awk-date 添加到原始日期之前? 这会产生一个可排序的键,但人类可读。

(注意:要正确排序,您应该将其设为 yyyymmdd)

如果需要,cut 可以删除前置列。

Why don't you prepend your awk-date to the original date? This yields a sortable key, but is human readable.

(Note: to sort right, you should make it yyyymmdd)

If needed, cut can remove the prepended column.

默嘫て 2024-08-02 18:20:29

Gawk 有 strftime()。 您还可以调用 date 命令来格式化它们 ()。 Linux 论坛 给出了一些例子。

Gawk has strftime(). You can also call the date command to format them (man). Linux Forums gives some examples.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文