使用 sed 从字段中间删除换行符

发布于 2024-11-05 17:58:34 字数 433 浏览 5 评论 0原文

我的数据看起来像这样

a,b,c,d
a,b1
b2,c,d
A,B,C,D

发生的情况是,在字段 2 中,第二个字段中偶尔会出现一个新行字符,因此该行被分成两行

到目前为止,我已经找到了一个 sed 脚本,可以执行此操作,它读起来像

cat file| sed ':a;N;$!ba;s/\(\(b1\)\)\n/\1/g'

但我正在努力获得 (.*,) 的正确组合来完成这项工作,因此我用 b1 替换它以使这个示例正常工作,但在现实世界中 A、B、C 和 D 是混合长度的字段和内容

我正在寻找的最终结果是这样的

a,b,c,d
a,b1b2,c,d
A.B,C,D

非常感谢任何帮助

谢谢 马特

the data i have looks something like this

a,b,c,d
a,b1
b2,c,d
A,B,C,D

What is happening is that in field 2 there is occasionally a new line character in the second field so the line gets split over two lines

So far i have found a sed script that will do this it reads like

cat file| sed ':a;N;$!ba;s/\(\(b1\)\)\n/\1/g'

but i am struggling to get the correct combinations of (.*,) to make this work so i've substituted it with b1 to get this example to work but in the real world A, B, C and D are field of mixed length and content

The end result i'm looking for is this

a,b,c,d
a,b1b2,c,d
A.B,C,D

Any help is much appreciated

Thanks
Matt

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

青春有你 2024-11-12 17:58:34

我这里有一个尚不完美的解决方案 - 但我会进一步考虑它。如果您的 sed 版本支持扩展正则表达式,您可以这样做:

cat file | sed -r ':a;N;$!ba;s/((^|\n)([^,]+,){0,2}[^,]+)\n/\1/g'

如果前三列之一中有换行符,则这将起作用。到目前为止,它还不能在“行”中处理多个换行符。

说明:
(^|\n) 匹配行的开头(或换行符)
[^,]+, 表示:至少一个(+ 表示一个或多个)字符!="," 后跟一个 ","
([^,]+,){0,2} 如果有 0-2 列以“,”分隔,则匹配
[^,]+ 表示 0-2 列后面有一些(至少一个)字符!=","。
尾随 \n 匹配换行符

总结 s 命令将匹配所有包含 0-3 列且末尾有换行符的行,并将其替换为自身 (< code>\1) 不包括尾随换行符。

I have here a solution that is not yet perfect - but I will further think about it. If your version of sed supports extended regular expressions you could do:

cat file | sed -r ':a;N;$!ba;s/((^|\n)([^,]+,){0,2}[^,]+)\n/\1/g'

That will work if there's a line break in one of the first three columns. Up to now it does not yet work with multiple line breaks in a "line".

Explanation:
(^|\n) matches the beginning of the line (resp. a line break)
[^,]+, means: at least one (+ means one ore more) character!="," followed by a ","
([^,]+,){0,2} matches if there are 0-2 columns delimited with a ","
The [^,]+ means that there are some (at least one) character!="," following the 0-2 columns.
The trailing \n matches a line break

Summarized the s command will match all lines containing 0-3 columns with a line break at the end and will substitute it with itself (\1) excluding the trailing line break.

腹黑女流氓 2024-11-12 17:58:34

在 awk 中

awk -F, 'NF < 4 {getline nextline; $0 = $0 nextline} 1' filename

In awk

awk -F, 'NF < 4 {getline nextline; $0 = $0 nextline} 1' filename
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文