Shell 脚本对文件中的行进行编号

发布于 2024-07-09 08:11:01 字数 711 浏览 9 评论 0原文

我需要找到一种更快的方法，使用 awk 和 sed 等工具以特定方式对文件中的行进行编号。我需要以这种方式对每行的第一个字符进行编号：1,2,3,1,2,3,1,2,3 等。

例如，如果输入是这样的：

line 1
line 2
line 3
line 4
line 5
line 6
line 7

输出需要如下所示:

1line 1
2line 2
3line 3
1line 4
2line 5
3line 6
1line 7

这是我所拥有的一部分。 $lines 是数据文件中的行数除以 3。因此，对于 21000 行的文件，我处理此循环 7000 次。

export i=0
while [ $i -le $lines ]
do
    export start=`expr $i \* 3 + 1`
    export end=`expr $start + 2`
    awk NR==$start,NR==$end $1 | awk '{printf("%d%s\n", NR,$0)}' >> data.out
    export i=`expr $i + 1`
done

基本上，这一次抓取 3 行，对它们进行编号，然后添加到输出文件中。速度很慢...还有一些！我不知道还有另一种更快的方法来做到这一点......有什么想法吗？

原文

I need to find a faster way to number lines in a file in a specific way using tools like awk and sed. I need the first character on each line to be numbered in this fashion: 1,2,3,1,2,3,1,2,3 etc.

For example, if the input was this:

line 1
line 2
line 3
line 4
line 5
line 6
line 7

The output needs to look like this:

1line 1
2line 2
3line 3
1line 4
2line 5
3line 6
1line 7

Here is a chunk of what I have. $lines is the number of lines in the data file divided by 3. So for a file of 21000 lines I process this loop 7000 times.

export i=0
while [ $i -le $lines ]
do
    export start=`expr $i \* 3 + 1`
    export end=`expr $start + 2`
    awk NR==$start,NR==$end $1 | awk '{printf("%d%s\n", NR,$0)}' >> data.out
    export i=`expr $i + 1`
done

Basically this grabs 3 lines at a time, numbers them, and adds to an output file. It's slow...and then some! I don't know of another, faster, way to do this...any thoughts?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情场扛把子 2024-07-16 08:11:01

尝试使用 nl 命令。

请参阅 https://linux.die.net/man/1/nl （或其他链接到当您 Google 搜索“man nl”时出现的文档或在 shell 提示符下运行 man nl 时出现的文本版本）。

nl 实用程序从
命名文件或标准输入，如果
文件参数被省略，适用
可配置的行编号过滤器
运算并将结果写入
标准输出。

编辑：不，这是错误的，我很抱歉。 nl 命令没有每 n 行重新开始编号的选项，它只有在找到模式后重新开始编号的选项。我会将这个答案作为社区 wiki 答案，因为它可能会帮助某人了解 nl。

回复收藏 0 原文

昨迟人 2024-07-16 08:11:01

它很慢，因为你一遍又一遍地阅读相同的行。此外，您启动一个 awk 进程只是为了将其关闭并启动另一个进程。最好一次完成整个事情：

awk '{print ((NR-1)%3)+1 $0}' $1 > data.out

如果您希望在数字后面有一个空格：

awk '{print ((NR-1)%3)+1, $0}' $1 > data.out

It's slow because you are reading the same lines over and over. Also, you are starting up an awk process only to shut it down and start another one. Better to do the whole thing in one shot:

awk '{print ((NR-1)%3)+1 $0}' $1 > data.out

If you prefer to have a space after the number:

awk '{print ((NR-1)%3)+1, $0}' $1 > data.out

回复收藏 0 原文

路还长，别太狂 2024-07-16 08:11:01

我想到了 Perl：

perl -pe '$_ = (($.-1)%3)+1 . $_'

应该可以。毫无疑问，有一个 awk 等效项。基本上，((line# - 1) MOD 3) + 1。

Perl comes to mind:

perl -pe '$_ = (($.-1)%3)+1 . $_'

should work. No doubt there is an awk equivalent. Basically, ((line# - 1) MOD 3) + 1.

回复收藏 0 原文

深巷少女 2024-07-16 08:11:01

这可能对你有用：

 sed 's/^/1/;n;s/^/2/;n;s/^/3/' input

This might work for you:

 sed 's/^/1/;n;s/^/2/;n;s/^/3/' input

回复收藏 0 原文

走走停停 2024-07-16 08:11:01

另一种方法是使用 grep 并匹配所有内容。例如，这将枚举文件：

grep -n '.*' <<< `ls -1`

输出将是：

1:file.a
2:file.b
3:file.c

Another way is just to use grep and match everything. For example this will enumerate files:

grep -n '.*' <<< `ls -1`

Output will be:

1:file.a
2:file.b
3:file.c

回复收藏 0 原文

公布 2024-07-16 08:11:01

awk '{printf "%d%s\n", ((NR-1) % 3) + 1, $0;}' "$@"

awk '{printf "%d%s\n", ((NR-1) % 3) + 1, $0;}' "$@"

回复收藏 0 原文

够运 2024-07-16 08:11:01

Python

import sys
for count, line in enumerate(sys.stdin):
    stdout.write( "%d%s" % ( 1+(count % 3), line )

Python

import sys
for count, line in enumerate(sys.stdin):
    stdout.write( "%d%s" % ( 1+(count % 3), line )

回复收藏 0 原文

明月松间行 2024-07-16 08:11:01

你不需要为此离开 bash：

i=0; while read; do echo "$((i++ % 3 + 1)) $REPLY"; done < input

You don't need to leave bash for this:

i=0; while read; do echo "$((i++ % 3 + 1)) $REPLY"; done < input

回复收藏 0 原文

失退 2024-07-16 08:11:01

这应该可以解决问题。 $_ 将打印整行。

awk '{print ((NR-1)%3+1) $_}' < input
1line 1
2line 2
3line 3
1line 4
2line 5
3line 6
1line 7

# cat input 
  line 1
  line 2
  line 3
  line 4
  line 5
  line 6
  line 7

This should solve the problem. $_ will print the whole line.

awk '{print ((NR-1)%3+1) $_}' < input
1line 1
2line 2
3line 3
1line 4
2line 5
3line 6
1line 7

# cat input 
  line 1
  line 2
  line 3
  line 4
  line 5
  line 6
  line 7

回复收藏 0 原文

~没有更多了~