如何删除前导和尾随空格?

发布于 2025-01-03 23:34:53 字数 450 浏览 0 评论 0原文

我正在使用 awk '{gsub(/^[ \t]+|[ \t]+$/,"");打印;}' in.txt > out.txt 删除前导和尾随空格。

问题是输出文件实际上有尾随空格!所有行的长度相同 - 它们都用空格填充。

我缺少什么?

更新1

该问题可能是由于尾随空格不是“正常”空格而是 \x20 字符(DC4)这一事实造成的。

更新2

我使用了gsub (/'[[:cntrl:]]|[[:space:]]|\x20/,"")并且它有效。 两个奇怪的事情:

  1. 为什么 \x20 不被视为控制字符?

  2. 使用'[[:cntrl:][:space:]\x20不起作用。为什么?

I'm using awk '{gsub(/^[ \t]+|[ \t]+$/,""); print;}' in.txt > out.txt to remove both leading and trailing whitespaces.

The problem is the output file actually has trailing whitespaces! All lines are of the same length - they are right padded with spaces.

What am I missing?

UPDATE 1

The problem is probably due to the the fact that the trailing spaces are nor "normal" spaces but \x20 characters (DC4).

UPDATE 2

I used gsub (/'[[:cntrl:]]|[[:space:]]|\x20/,"") an it worked.
Two strange things:

  1. Why isn't \x20 considered a control character?

  2. Using '[[:cntrl:][:space:]\x20 does NOT work. Why?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

风吹过旳痕迹 2025-01-10 23:34:53

这个命令对我有用:

$ awk '{$1=$1}1' file.txt

This command works for me:

$ awk '{$1=$1}1' file.txt
善良天后 2025-01-10 23:34:53

你的代码对我来说没问题。
除了空格制表之外,您可能还有其他东西...
hexdump -C 可以帮助您检查问题所在:

awk '{gsub(/^[ \t]+|[ \t]+$/,""); print;}' in.txt | hexdump -C | less

更新:

好的,您确定了 DC4(可能还有其他一些控制字符...)
然后,您可以改进您的命令:

awk '{gsub(/^[[:cntrl:][:space:]]+|[[:cntrl:][:space:]]+$/,""); print;}' in.txt > out.txt

请参阅 awk 联机帮助页:

[:alnum:] 字母数字字符。
[:alpha:] 字母字符。
[:blank:] 空格或制表符。
[:cntrl:]控制字符。
[:digit:]数字字符。
[:graph:] 可打印且可见的字符。 (空格可打印,但不可见,而 a 则两者皆可。)
[:lower:] 小写字母字符。
[:print:]可打印字符(不是控制字符的字符。)
[:punct:] 标点符号(非字母、数字、控制字符或空格字符的字符)。
[:space:] 空格字符(例如空格、制表符和换页符等)。
[:upper:] 大写字母字符。
[:xdigit:] 十六进制数字的字符。

前导/尾随 0x20 删除

对于我来说,该命令没问题,我已经这样测试过:

$ echo -e "\x20 \tTEXT\x20 \t" | hexdump -C
00000000  20 20 09 54 45 58 54 20  20 09 0a                 |  .TEXT  ..|
0000000b
$ echo -e "\x20 \tTEXT\x20 \t" | awk '{gsub(/^[[:cntrl:][:space:]]+|[[:cntrl:][:space:]]+$/,""); print;}' | hexdump -C
00000000  54 45 58 54 0a                                    |TEXT.|
00000005

但是,如果您有0x20 位于文本中间
=>那么它就不会被删除。
但这不是你的问题,不是吗?

Your code is OK for me.
You may have something else than space and tabulation...
hexdump -C may help you to check what is wrong:

awk '{gsub(/^[ \t]+|[ \t]+$/,""); print;}' in.txt | hexdump -C | less

UPDATE:

OK you identified DC4 (there may be some other control characters...)
Then, you can improve your command:

awk '{gsub(/^[[:cntrl:][:space:]]+|[[:cntrl:][:space:]]+$/,""); print;}' in.txt > out.txt

See awk manpage:

[:alnum:] Alphanumeric characters.
[:alpha:] Alphabetic characters.
[:blank:] Space or tab characters.
[:cntrl:] Control characters.
[:digit:] Numeric characters.
[:graph:] Characters that are both printable and visible. (A space is printable, but not visible, while an a is both.)
[:lower:] Lower-case alphabetic characters.
[:print:] Printable characters (characters that are not control characters.)
[:punct:] Punctuation characters (characters that are not letter, digits, control characters, or space characters).
[:space:] Space characters (such as space, tab, and formfeed, to name a few).
[:upper:] Upper-case alphabetic characters.
[:xdigit:] Characters that are hexadecimal digits.

Leading/Trailing 0x20 removal

For me the command is OK, I have tested like this:

$ echo -e "\x20 \tTEXT\x20 \t" | hexdump -C
00000000  20 20 09 54 45 58 54 20  20 09 0a                 |  .TEXT  ..|
0000000b
$ echo -e "\x20 \tTEXT\x20 \t" | awk '{gsub(/^[[:cntrl:][:space:]]+|[[:cntrl:][:space:]]+$/,""); print;}' | hexdump -C
00000000  54 45 58 54 0a                                    |TEXT.|
00000005

However if you have 0x20 in the middle of your text
=> then it is not removed.
But this is not your question, isn't it?

无法言说的痛 2025-01-10 23:34:53

您的文件可能有 Windows 行结尾。这意味着它们以 \r\n 结尾,因此匹配行末尾的制表符和空格序列将不起作用 - awk 尝试匹配 < 后面的所有制表符和空格。 em>在\r之后。在将文件发送到 awk 之前,尝试通过 tr -d "\r" 运行该文件。

Your files probably have Windows line endings. That means that they end with \r\n, so matching a sequence of tabs and spaces at the end of the line won't work -- awk tries to match all the tabs and spaces that come after the \r. Try running the file through tr -d "\r" before sending it to awk.

凉世弥音 2025-01-10 23:34:53

可以使用 Perl:

perl -lpe 's/^\s*(.*\S)\s*$/$1/' in.txt > out.txt

s/foo/bar/ 使用正则表达式替换
^ 字符串开头
\s* 零个或多个空格
(.*\S) 任何以非空格结尾的字符。将其捕获到 1 美元
\s* 零个或多个空格
$ 字符串结尾

Perl could be used:

perl -lpe 's/^\s*(.*\S)\s*$/$1/' in.txt > out.txt

s/foo/bar/ substitute using regular expressions
^ beginning of string
\s* zero or more spaces
(.*\S) any characters ending with a non-whitespace. Capture it into $1
\s* zero or more spaces
$ end of string

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文