如何删除前导和尾随空格?
我正在使用 awk '{gsub(/^[ \t]+|[ \t]+$/,"");打印;}' in.txt > out.txt 删除前导和尾随空格。
问题是输出文件实际上有尾随空格!所有行的长度相同 - 它们都用空格填充。
我缺少什么?
更新1
该问题可能是由于尾随空格不是“正常”空格而是 \x20 字符(DC4)这一事实造成的。
更新2
我使用了gsub (/'[[:cntrl:]]|[[:space:]]|\x20/,"")
并且它有效。 两个奇怪的事情:
为什么 \x20 不被视为控制字符?
使用
'[[:cntrl:][:space:]\x20
不起作用。为什么?
I'm using awk '{gsub(/^[ \t]+|[ \t]+$/,""); print;}' in.txt > out.txt
to remove both leading and trailing whitespaces.
The problem is the output file actually has trailing whitespaces! All lines are of the same length - they are right padded with spaces.
What am I missing?
UPDATE 1
The problem is probably due to the the fact that the trailing spaces are nor "normal" spaces but \x20 characters (DC4).
UPDATE 2
I used gsub (/'[[:cntrl:]]|[[:space:]]|\x20/,"")
an it worked.
Two strange things:
Why isn't \x20 considered a control character?
Using
'[[:cntrl:][:space:]\x20
does NOT work. Why?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这个命令对我有用:
This command works for me:
你的代码对我来说没问题。
除了
空格
和制表
之外,您可能还有其他东西...hexdump -C
可以帮助您检查问题所在:更新:
好的,您确定了 DC4(可能还有其他一些控制字符...)
然后,您可以改进您的命令:
请参阅
awk
联机帮助页:[:alnum:] 字母数字字符。
[:alpha:] 字母字符。
[:blank:] 空格或制表符。
[:cntrl:]控制字符。
[:digit:]数字字符。
[:graph:] 可打印且可见的字符。 (空格可打印,但不可见,而 a 则两者皆可。)
[:lower:] 小写字母字符。
[:print:]可打印字符(不是控制字符的字符。)
[:punct:] 标点符号(非字母、数字、控制字符或空格字符的字符)。
[:space:] 空格字符(例如空格、制表符和换页符等)。
[:upper:] 大写字母字符。
[:xdigit:] 十六进制数字的字符。
前导/尾随
0x20
删除对于我来说,该命令没问题,我已经这样测试过:
但是,如果您有
0x20
位于文本中间=>那么它就不会被删除。
但这不是你的问题,不是吗?
Your code is OK for me.
You may have something else than
space
andtabulation
...hexdump -C
may help you to check what is wrong:UPDATE:
OK you identified DC4 (there may be some other control characters...)
Then, you can improve your command:
See
awk
manpage:[:alnum:] Alphanumeric characters.
[:alpha:] Alphabetic characters.
[:blank:] Space or tab characters.
[:cntrl:] Control characters.
[:digit:] Numeric characters.
[:graph:] Characters that are both printable and visible. (A space is printable, but not visible, while an a is both.)
[:lower:] Lower-case alphabetic characters.
[:print:] Printable characters (characters that are not control characters.)
[:punct:] Punctuation characters (characters that are not letter, digits, control characters, or space characters).
[:space:] Space characters (such as space, tab, and formfeed, to name a few).
[:upper:] Upper-case alphabetic characters.
[:xdigit:] Characters that are hexadecimal digits.
Leading/Trailing
0x20
removalFor me the command is OK, I have tested like this:
However if you have
0x20
in the middle of your text=> then it is not removed.
But this is not your question, isn't it?
您的文件可能有 Windows 行结尾。这意味着它们以
\r\n
结尾,因此匹配行末尾的制表符和空格序列将不起作用 - awk 尝试匹配 < 后面的所有制表符和空格。 em>在\r
之后。在将文件发送到 awk 之前,尝试通过 tr -d "\r" 运行该文件。Your files probably have Windows line endings. That means that they end with
\r\n
, so matching a sequence of tabs and spaces at the end of the line won't work -- awk tries to match all the tabs and spaces that come after the\r
. Try running the file throughtr -d "\r"
before sending it to awk.可以使用 Perl:
s/foo/bar/
使用正则表达式替换^
字符串开头\s*
零个或多个空格(.*\S)
任何以非空格结尾的字符。将其捕获到 1 美元\s*
零个或多个空格$
字符串结尾Perl could be used:
s/foo/bar/
substitute using regular expressions^
beginning of string\s*
zero or more spaces(.*\S)
any characters ending with a non-whitespace. Capture it into $1\s*
zero or more spaces$
end of string