使用 sed 从字段中间删除换行符
我的数据看起来像这样
a,b,c,d
a,b1
b2,c,d
A,B,C,D
发生的情况是,在字段 2 中,第二个字段中偶尔会出现一个新行字符,因此该行被分成两行
到目前为止,我已经找到了一个 sed 脚本,可以执行此操作,它读起来像
cat file| sed ':a;N;$!ba;s/\(\(b1\)\)\n/\1/g'
但我正在努力获得 (.*,) 的正确组合来完成这项工作,因此我用 b1 替换它以使这个示例正常工作,但在现实世界中 A、B、C 和 D 是混合长度的字段和内容
我正在寻找的最终结果是这样的
a,b,c,d
a,b1b2,c,d
A.B,C,D
非常感谢任何帮助
谢谢 马特
the data i have looks something like this
a,b,c,d
a,b1
b2,c,d
A,B,C,D
What is happening is that in field 2 there is occasionally a new line character in the second field so the line gets split over two lines
So far i have found a sed script that will do this it reads like
cat file| sed ':a;N;$!ba;s/\(\(b1\)\)\n/\1/g'
but i am struggling to get the correct combinations of (.*,) to make this work so i've substituted it with b1 to get this example to work but in the real world A, B, C and D are field of mixed length and content
The end result i'm looking for is this
a,b,c,d
a,b1b2,c,d
A.B,C,D
Any help is much appreciated
Thanks
Matt
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我这里有一个尚不完美的解决方案 - 但我会进一步考虑它。如果您的 sed 版本支持扩展正则表达式,您可以这样做:
如果前三列之一中有换行符,则这将起作用。到目前为止,它还不能在“行”中处理多个换行符。
说明:
(^|\n)
匹配行的开头(或换行符)[^,]+,
表示:至少一个(+
表示一个或多个)字符!="," 后跟一个 ","([^,]+,){0,2}
如果有 0-2 列以“,”分隔,则匹配[^,]+
表示 0-2 列后面有一些(至少一个)字符!=","。尾随
\n
匹配换行符总结
s
命令将匹配所有包含 0-3 列且末尾有换行符的行,并将其替换为自身 (< code>\1) 不包括尾随换行符。I have here a solution that is not yet perfect - but I will further think about it. If your version of
sed
supports extended regular expressions you could do:That will work if there's a line break in one of the first three columns. Up to now it does not yet work with multiple line breaks in a "line".
Explanation:
(^|\n)
matches the beginning of the line (resp. a line break)[^,]+,
means: at least one (+
means one ore more) character!="," followed by a ","([^,]+,){0,2}
matches if there are 0-2 columns delimited with a ","The
[^,]+
means that there are some (at least one) character!="," following the 0-2 columns.The trailing
\n
matches a line breakSummarized the
s
command will match all lines containing 0-3 columns with a line break at the end and will substitute it with itself (\1
) excluding the trailing line break.在 awk 中
In awk