当前位置：文江博客话题详情

如何删除前导和尾随空格？

发布于 2025-01-03 23:34:53 字数 450 浏览 4 评论 0原文

我正在使用 awk '{gsub(/^[ \t]+|[ \t]+$/,"");打印;}' in.txt > out.txt 删除前导和尾随空格。

问题是输出文件实际上有尾随空格！所有行的长度相同 - 它们都用空格填充。

我缺少什么？

更新1

该问题可能是由于尾随空格不是“正常”空格而是 \x20 字符（DC4）这一事实造成的。

更新2

我使用了gsub (/'[[:cntrl:]]|[[:space:]]|\x20/,"")并且它有效。两个奇怪的事情：

为什么 \x20 不被视为控制字符？
使用'[[:cntrl:][:space:]\x20不起作用。为什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风吹过旳痕迹 2025-01-10 23:34:53

这个命令对我有用：

$ awk '{$1=$1}1' file.txt

This command works for me:

$ awk '{$1=$1}1' file.txt

回复收藏 0 原文

善良天后 2025-01-10 23:34:53

你的代码对我来说没问题。
除了空格和制表之外，您可能还有其他东西...
hexdump -C 可以帮助您检查问题所在：

awk '{gsub(/^[ \t]+|[ \t]+$/,""); print;}' in.txt | hexdump -C | less

更新：

好的，您确定了 DC4（可能还有其他一些控制字符...）
然后，您可以改进您的命令：

awk '{gsub(/^[[:cntrl:][:space:]]+|[[:cntrl:][:space:]]+$/,""); print;}' in.txt > out.txt

请参阅 awk 联机帮助页：

[:alnum:] 字母数字字符。
[:alpha:] 字母字符。
[:blank:] 空格或制表符。
[:cntrl:]控制字符。
[:digit:]数字字符。
[:graph:] 可打印且可见的字符。（空格可打印，但不可见，而 a 则两者皆可。）
[:lower:] 小写字母字符。
[:print:]可打印字符（不是控制字符的字符。）
[:punct:] 标点符号（非字母、数字、控制字符或空格字符的字符）。
[:space:] 空格字符（例如空格、制表符和换页符等）。
[:upper:] 大写字母字符。
[:xdigit:] 十六进制数字的字符。

前导/尾随 `0x20` 删除

对于我来说，该命令没问题，我已经这样测试过：

$ echo -e "\x20 \tTEXT\x20 \t" | hexdump -C
00000000  20 20 09 54 45 58 54 20  20 09 0a                 |  .TEXT  ..|
0000000b
$ echo -e "\x20 \tTEXT\x20 \t" | awk '{gsub(/^[[:cntrl:][:space:]]+|[[:cntrl:][:space:]]+$/,""); print;}' | hexdump -C
00000000  54 45 58 54 0a                                    |TEXT.|
00000005

但是，如果您有0x20 位于文本中间
=>那么它就不会被删除。
但这不是你的问题，不是吗？

Your code is OK for me.
You may have something else than space and tabulation...
hexdump -C may help you to check what is wrong:

awk '{gsub(/^[ \t]+|[ \t]+$/,""); print;}' in.txt | hexdump -C | less

UPDATE:

OK you identified DC4 (there may be some other control characters...)
Then, you can improve your command:

awk '{gsub(/^[[:cntrl:][:space:]]+|[[:cntrl:][:space:]]+$/,""); print;}' in.txt > out.txt

See awk manpage:

[:alnum:] Alphanumeric characters.
[:alpha:] Alphabetic characters.
[:blank:] Space or tab characters.
[:cntrl:] Control characters.
[:digit:] Numeric characters.
[:graph:] Characters that are both printable and visible. (A space is printable, but not visible, while an a is both.)
[:lower:] Lower-case alphabetic characters.
[:print:] Printable characters (characters that are not control characters.)
[:punct:] Punctuation characters (characters that are not letter, digits, control characters, or space characters).
[:space:] Space characters (such as space, tab, and formfeed, to name a few).
[:upper:] Upper-case alphabetic characters.
[:xdigit:] Characters that are hexadecimal digits.

Leading/Trailing `0x20` removal

For me the command is OK, I have tested like this:

$ echo -e "\x20 \tTEXT\x20 \t" | hexdump -C
00000000  20 20 09 54 45 58 54 20  20 09 0a                 |  .TEXT  ..|
0000000b
$ echo -e "\x20 \tTEXT\x20 \t" | awk '{gsub(/^[[:cntrl:][:space:]]+|[[:cntrl:][:space:]]+$/,""); print;}' | hexdump -C
00000000  54 45 58 54 0a                                    |TEXT.|
00000005

However if you have 0x20 in the middle of your text
=> then it is not removed.
But this is not your question, isn't it?

回复收藏 0 原文

无法言说的痛 2025-01-10 23:34:53

您的文件可能有 Windows 行结尾。这意味着它们以 \r\n 结尾，因此匹配行末尾的制表符和空格序列将不起作用 - awk 尝试匹配 < 后面的所有制表符和空格。 em>在\r之后。在将文件发送到 awk 之前，尝试通过 tr -d "\r" 运行该文件。

回复收藏 0 原文

凉世弥音 2025-01-10 23:34:53

可以使用 Perl：

perl -lpe 's/^\s*(.*\S)\s*$/$1/' in.txt > out.txt

s/foo/bar/ 使用正则表达式替换
^ 字符串开头
\s* 零个或多个空格
(.*\S) 任何以非空格结尾的字符。将其捕获到 1 美元
\s* 零个或多个空格
$ 字符串结尾

Perl could be used:

perl -lpe 's/^\s*(.*\S)\s*$/$1/' in.txt > out.txt

s/foo/bar/ substitute using regular expressions
^ beginning of string
\s* zero or more spaces
(.*\S) any characters ending with a non-whitespace. Capture it into $1
\s* zero or more spaces
$ end of string

回复收藏 0 原文

~没有更多了~