规避 sed 反向引用限制 \1 到 \9
sed 手册明确指出,可用于替换中的替换字符串的可用反向引用编号为 \1 到 \9。我正在尝试解析一个包含 10 个字段的日志文件。
我已经为其形成了正则表达式,但第十个匹配项(以及之后的任何内容)无法访问。
有没有人有一种优雅的方法来规避 KSH(或我可以移植到 shell 脚本的任何语言)中的这种限制?
The sed manual clearly states that the available backreferences available for the replacement string in a substitute are numbered \1 through \9. I'm trying to parse a log file that has 10 fields.
I have the regex formed for it but the tenth match (and anything after) isn't accessible.
Does anyone have an elegant way to circumvent this limitation in KSH (or any language that perhaps I can port to shell scripting)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
您可以使用
perl -pe 's/(match)(str)/$2$1/g;'
代替 sed 吗?规避反向引用限制的方法是使用 sed 以外的工具。另外,我想你可以分两步进行替换,但我不知道你的模式,所以我无法帮助你如何完成。
Can you user
perl -pe 's/(match)(str)/$2$1/g;'
in place of sed? The way to circumvent the backreference limit is to use something other than sed.Also, I suppose you could do your substitution in two steps, but I don't know your pattern so I can't help you out with how.
使用 -e 拆分流,只要替换的元素位于您拆分它们的组中。当我进行日期分割以便将日期时间重新组织为 14 位数字的字符串时,我必须将流分割 3 次。
20130205161449
Split the stream with -e, as long as the replaced elements are with in the group that you split them with. When I did a date split so I could re-org the date-time into a string of 14 digits, I had to split the stream up 3 times.
20130205161449
您正在寻求 shell 脚本解决方案 - 这意味着您不仅限于使用 sed,对吗?大多数 shell 支持数组,所以也许您可以将行解析为 shell 数组变量?如果需要,您甚至可以多次解析同一行,在每次传递中提取不同的信息位。
这样可以吗?
You're asking for a shell script solution - that means you're not limited to using just sed, correct? Most shells support arrays, so perhaps you can parse the line into a shell array variable? If need be, you could even parse the same line multiple times, extracting different bits of information on each pass.
Would that do?
如果您有 GNU awk,您可以更好地控制做事。为此,您需要
match(source,/regex/,array)
构造。示例:
测试的示例输入:
sed
工作正常,直到\9
:添加
\10
时sed
中断,它被认为是\1
+0
。awk
在添加任何超过 9 的反向引用时进行救援。这里添加了第 10 个引用:If you have
GNU awk
, You can do things with much more in control. For this you would be needingmatch(source,/regex/,array)
construct.Example:
Sample input for test:
sed
works fine till\9
:sed
broke when\10
is added, it is considered is\1
+0
.awk
to rescue when any back reference added more than 9 is added. Here 10th refrence is added:考虑一个不需要使用正则表达式反向引用的解决方案。例如,如果您有一个简单的字段分隔符,请使用
split
,甚至使用 awk 而不是 perl 进行处理。Consider a solution that doesn't require the use of regular expression backreferences. For example, if you have a simple field delimiter, use
split
, or even use awk for your processing instead of perl.