Unicode 脚本中的 IFS 变量问题

发布于 2024-10-20 18:40:18 字数 1469 浏览 2 评论 0原文

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

欲拥i 2024-10-27 18:40:18

您要么遇到了 Unicode 问题,要么遇到了您尝试使用的 shell 问题,前者的可能性更大。

您选择作为分隔符 (-) 的字符在 ASCII 集之外,并且(通常)可以由计算机以两种不同的方式表示:它将被编码为 latin1 或类似的字符,其中该字符占用一个八位字节,否则它将被编码为 UTF-8 并使用两个八位字节。还有其他可能性,但这两种是最有可能的,所以请耐心等待。

如果您将脚本保存为编码为 UTF-8 的文件,并且尝试在非 unicode 语言环境中运行它,则 shell 会得到两个(错误的)字符作为分隔符,而不是一个。要对此进行测试,请尝试使用 ascii 字符作为分隔符,例如 ~

如果您发现使用 ~ 有效,则必须查看系统的全局配置,并确保区域设置与您用于创建脚本的环境中的区域设置相同,就像在脚本运行的环境中一样。您可以执行 locale 命令来执行此操作。您可以创建一个运行此命令的脚本并将其输出存储在文件中:

#!/bin/sh
locale > /tmp/locale-env

然后,例如,使其从 cron 运行,并查看 /tmp/locale-env 文件。当您从交互式 shell 运行它时,将其内容与 locale 的输出进行比较。根据您的发行版,您可以在 /etc/environment/etc/profile 或其他位置设置全局区域设置。您可能希望在系统范围内使用 UTF-8:

LANG=en_US.UTF-8
export LANG

这是我们国际用户比英语用户更了解的陷阱,因为 ASCII 和 UTF-8 对于英语字符来说是完全相同的,而这些问题更容易被忽视比没有。

You're either having a problem with Unicode, or with the shell you're trying to use, the former being more likely.

The character you chose as separator (¬) is outside of the ASCII set, and can be (generally) represented in two different ways by a computer: Either it'll bee encoded as latin1 or similar, where the character occupies an octet, or it'll be encoded as UTF-8 and use two octets. There are other possibilities, but these two are the most likely, so bear with me.

If you saved your script encoded as UTF-8 and you're trying to run it in a non-unicode locale, the shell will get two (wrong) characters as separator instead of one. To test for this, try using an ascii character as separator, like ~ for example.

If you find that using ~ works, you'll have to take a look at the global configuration of your system, and make sure that the locale is the same in the environment you used to create your script, as it is in the environment where the script runs. You can do this executing the locale command. You may create a script that runs this command and stores its output in a file:

#!/bin/sh
locale > /tmp/locale-env

Then you make it run from cron, for example, and take a look at the /tmp/locale-env file. Compare its contents with the output of locale as you run it from your interactive shell. Depending on your distribution, you may be able to set your global locale in /etc/environment, /etc/profile or other location. You may wish to go UTF-8 system-wide:

LANG=en_US.UTF-8
export LANG

This is a trap that we international users tend to know better than English speaking ones, since ASCII and UTF-8 is exactly the same for English characters, and these issues go unnoticed more often than not.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文