使用 awk（或 sed）根据下一行的第一个字符删除换行符

发布于 2024-08-21 03:52:00 字数 598 浏览 4 评论 0原文

这是我的情况：我有一个大文本文件，我想从中提取某些信息。我使用 sed 根据正则表达式提取所有相关信息，但是我提取的每条“信息”都在单独的行上，我希望每个“记录”都在自己的行上，以便可以轻松地将其导入到数据库。
这是我现在的数据示例：

92831,499,000
,0644321
79217,999,000
,5417178
,PK91622
,PK90755

理想情况下，我希望此输出如下所示：

92831,499,000 ,0644321
79217,999,000 ,5417178 ,PK91622
79217,999,000 ,5417178 ,PK90755

这可能更难做到，所以我会满足于最后一个“记录”的输出只出现一次，附加的“PK...”是该行的第四个“字段”。< br> 最后，我能想到的最简单的方法是，如果该行以逗号（ ^, ）开头，则换行符应被删除...不过，我对 awk 不太熟悉，所以如果你能给我一个从此开始，我们将不胜感激！谢谢！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

指尖凝香 2024-08-28 03:52:00

$ perl -0pe 's/\n,/,/g' < test.dat
92831,499,000,0644321
79217,999,000,5417178,PK91622,PK90755

翻译：批量读取，无需换行，只需用逗号替换换行符后面的每个逗号。

最短的代码在这里！

$ perl -0pe 's/\n,/,/g' < test.dat
92831,499,000,0644321
79217,999,000,5417178,PK91622,PK90755

Translation: Read in bulk without line separation, swap out each comma following a newline with just a comma.

Shortest code here!

回复收藏 0 原文

就像说晚安 2024-08-28 03:52:00

好吧，我想当我昨晚试图解决这个问题时，我应该仔细研究一下在 awk 中使用 Records 的情况……在查看它们 10 分钟后，我就开始工作了。对于任何感兴趣的人，我都是这样做的：
在我原来的 sed 脚本中，我在每条记录的开头放置了一个额外的换行符，因此现在有一个空行分隔每条记录。然后我使用以下 awk 命令：

awk 'BEGIN {RS = ""; FS = "\n"}
{
如果 (NF >= 3)
对于 (i = 3; i <= NF; i++)
打印 $1,$2,$i
}'

就像一个魅力，完全按照我想要的方式输出！

回复收藏 0 原文

满意归宿 2024-08-28 03:52:00

sedsed -d -n ':t;/^,/!x;H;n;/^,/{x;$!bt;x;H};x;s/\n//g;p;${x;/^,/!p}' filename

sedsed -d -n ':t;/^,/!x;H;n;/^,/{x;$!bt;x;H};x;s/\n//g;p;${x;/^,/!p}' filename

回复收藏 0 原文

ぇ气 2024-08-28 03:52:00

无需特殊外壳字段3，简单。

awk '
    !/^,/   { if (NR > 1) print x ; x = $0 }
    /^,/    { x = x OFS $0 }
    END     { if (NR) print x }
'

有了，更复杂但仍然不太难。

awk '
    !/^,/   { if (n && n < 3) print x ; x = $0 ; n = 1 }
    /^,/    { if (++n > 2) { print x, $0 } else { x = x OFS $0 } }
    END     { if (n && n < 3) print x }
'

Without special-casing field 3, easy.

awk '
    !/^,/   { if (NR > 1) print x ; x = $0 }
    /^,/    { x = x OFS $0 }
    END     { if (NR) print x }
'

With, more complex but still not too hard.

awk '
    !/^,/   { if (n && n < 3) print x ; x = $0 ; n = 1 }
    /^,/    { if (++n > 2) { print x, $0 } else { x = x OFS $0 } }
    END     { if (n && n < 3) print x }
'

回复收藏 0 原文

飘落散花 2024-08-28 03:52:00

这可能对您有用：

# sed ':a;N;s/\n,/,/;ta;P;D' test.dat | sed 's/,/\n/5;s/\(.*,\).*\n/&\1/'
92831,499,000,0644321
79217,999,000,5417178,PK91622
79217,999,000,5417178,PK90755

说明：

这分为两部分：

追加下一行，然后如果追加的行以 , 开头，则删除嵌入的新行 \n并重新开始。如果没有打印到换行符，然后删除到新行符。重复。

将第五个 , 替换为新行。然后将前四个字段插入嵌入的换行符和第六个字段之间。

This might work for you:

# sed ':a;N;s/\n,/,/;ta;P;D' test.dat | sed 's/,/\n/5;s/\(.*,\).*\n/&\1/'
92831,499,000,0644321
79217,999,000,5417178,PK91622
79217,999,000,5417178,PK90755

Explanation:

This comes in two parts:

Append the next line and then if the appended line begins with a , , delete the embedded new line \n and start again. If not print upto the newline and then delete upto the new line. Repeat.

Replace the 5th , with a new line. Then insert the first four fields inbetween the embedded newline and the sixth field.

回复收藏 0 原文

~没有更多了~