当前位置：文江博客话题详情

一致实施tr？

发布于 2024-09-15 19:40:18 字数 2969 浏览 2 评论 0原文

我有一个 ksh 脚本，它使用 /dev/urandom 和 tr 生成一个长随机字符串：

STRING="$(cat /dev/urandom|tr - dc 'a-zA-Z0-9-_'|fold -w 64 |head -1)"

在我使用它的 Linux 和 AIX 服务器上，它产生了 64 个大写和小写字母字符，数字、破折号和下划线字符。示例：

W-uch3_4fbnk34u2nc08w_nj23n089023ncNjxz979823n23-n88h30pmLCxkMKj

当我在 Solaris 上使用脚本时，范围被解释为文字，并生成来自集合 aAzZ09-_ 的字符串。示例：

AA0z9_aZ-a-z00aZ9_azAZa0zZza9-Az0-_za-9aa0az_a0z-0a0z000-A9Z_0a

奇怪的是，在此 Solaris 服务器上，tr 的手册页表明所使用的语法应该生成想要的结果。

这个想法是使用 /dev/urandom 生成一个伪随机字符串，我们从中提取字符，以便结果 a) 不包含空格，b) 不包含 shell 特殊字符。该字符串稍后将在脚本中作为参数在命令行上使用。我们不想使用像 :alnum: 这样的类，因为语言环境可以将它们转换为在命令行上不起作用的多字节值。这个 ksh 单行代码在许多安装中都完美地实现了这一点，直到我们到达 Solaris。

我们暂时将其转换为有点令人讨厌的 Perl 正则表达式。是否有 tr 或其他实用程序或 ksh 内置语法可以跨 UNIX 变体一致地执行此任务并且已普遍安装？不必是一句台词，但简单性是值得赞赏的。

更新：我们尝试了区域设置，但没有成功。等待使用xpg6版本的结果。

$ uname -a
SunOS hostname 5.10 Generic_142900-04 sun4u sparc SUNW,SPARC-Enterprise
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
0-a9-z9a_zzZAa_a_0az-9_z0a_90Z_9az09aZzZAa-9aa_-__za0ZA9_ZzzZazA
$ set | grep '^L[AC]'
LANG=C
LC_ALL=C
LC_COLLATE=en_US
LC_CTYPE=en_US
LC_MESSAGES=en_US
LC_MONETARY=en_US
LC_NUMERIC=en_US
LC_TIME=en_US
$ export LC_CTYPE="$LC_ALL" LC_MESSAGES="$LC_ALL"
$ set | grep '^L[AC]'
LANG=C
LC_ALL=C
LC_COLLATE=en_US
LC_CTYPE=C
LC_MESSAGES=C
LC_MONETARY=en_US
LC_NUMERIC=en_US
LC_TIME=en_US
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
0900z9az99_a0za09__0zA0_Z--Z_-Aa-AaA9zAZz-Aa90A00z__ZzA9A-Z0aA_-
$ unset LC_ALL; export LC_COLLATE=C LC_NUMERIC=C LC_TIME=C
$ set | grep '^L[AC]'
LANG=C
LC_COLLATE=C
LC_CTYPE=C
LC_MESSAGES=C
LC_MONETARY=en_US
LC_NUMERIC=C
LC_TIME=C
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
_AA9aA_Za-A0-AZa_A-0ZA--a_za-a9zZZz__a0az_-0A-9-0aA-0za00A-__9-0
$ unset LANG LC_COLLATE LC_NUMERIC LC_TIME
$ set | grep '^L[AC]'
LC_CTYPE=C
LC_MESSAGES=C
LC_MONETARY=en_US
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
_-_9zz9Z-Z-Z-Z_0_a9zzzZZaAa--9_zAZaaAZz-ZaAZ09Z-_z-zz09ZZAzAz0Z0
$ unset LC_CTYPE LC_MESSAGES LC_MONETARY
$ set | grep '^L[AC]'
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
_0aAa9_Z_a_Z--_Az-aa0ZA0ZzZ-9Aa9-Z0--0A_Z0Zaz-AA_Zz0z---Z_99z_a9
$ export LANG=C LC_ALL=C LC_COLLATE=C LC_CTYPE=C LC_MESSAGES=C LC_MONETARY=C LC_NUMERIC=C LC_TIME=C
$ set | grep '^L[AC]'
LANG=C
LC_ALL=C
LC_COLLATE=C
LC_CTYPE=C
LC_MESSAGES=C
LC_MONETARY=C
LC_NUMERIC=C
LC_TIME=C
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
Za_000z9aa--aA00zAAZza0AA90090--z0a00_zZ9ZA0_---aZZ09a0ZA0_0zZaa
$ cat /dev/urandom | tr -dc "[a-z][A-Z][0-9]-_" | fold -w 64 | head -1 | sed 's/^-/_/'
x7dni9gIXVF6AHQc3B-H6hjnBVHChJ9zM-z5EQ5UEruATI_NNFaCoVLOqM6gVaT5
$

当然，在 Linux 上，最后一个版本会抛出方括号。

原文

I have a ksh script that generates a long, random string using /dev/urandom and tr:

STRING="$(cat /dev/urandom|tr -dc 'a-zA-Z0-9-_'|fold -w 64 |head -1)"

On the Linux and AIX servers where I used this it resulted in 64 characters of upper and lower case alpha chars, digits, dash and underscore characters. Example:

W-uch3_4fbnk34u2nc08w_nj23n089023ncNjxz979823n23-n88h30pmLCxkMKj

When I used the script on Solaris the ranges were interpreted as literals and it resulted in strings from the set aAzZ09-_. Example:

AA0z9_aZ-a-z00aZ9_azAZa0zZza9-Az0-_za-9aa0az_a0z-0a0z000-A9Z_0a

Oddly, on this Solaris server the man page for tr indicates that the syntax used should have produced the desired result.

The idea is to use /dev/urandom to produce a pseudo-random string from which we extract characters so that the result a) does not contain spaces and b) does not contain shell special characters. The string will be used on the command line as an argument later on in the script. We don't want to use classes like :alnum: because locale can convert these into multi-byte values that don't work on the command line. This ksh one-liner did the trick perfectly on a great many installations until we got to Solaris.

We have temporarily converted this to a somewhat nasty Perl regex. Is there a syntax for tr or some other utility or ksh built-in that will perform this task consistently across UNIX variants and is universally installed? Doesn't have to be a one-liner but simplicity is appreciated.

Update: We tried the Locale settings with no luck. Waiting on results of using xpg6 version.

$ uname -a
SunOS hostname 5.10 Generic_142900-04 sun4u sparc SUNW,SPARC-Enterprise
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
0-a9-z9a_zzZAa_a_0az-9_z0a_90Z_9az09aZzZAa-9aa_-__za0ZA9_ZzzZazA
$ set | grep '^L[AC]'
LANG=C
LC_ALL=C
LC_COLLATE=en_US
LC_CTYPE=en_US
LC_MESSAGES=en_US
LC_MONETARY=en_US
LC_NUMERIC=en_US
LC_TIME=en_US
$ export LC_CTYPE="$LC_ALL" LC_MESSAGES="$LC_ALL"
$ set | grep '^L[AC]'
LANG=C
LC_ALL=C
LC_COLLATE=en_US
LC_CTYPE=C
LC_MESSAGES=C
LC_MONETARY=en_US
LC_NUMERIC=en_US
LC_TIME=en_US
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
0900z9az99_a0za09__0zA0_Z--Z_-Aa-AaA9zAZz-Aa90A00z__ZzA9A-Z0aA_-
$ unset LC_ALL; export LC_COLLATE=C LC_NUMERIC=C LC_TIME=C
$ set | grep '^L[AC]'
LANG=C
LC_COLLATE=C
LC_CTYPE=C
LC_MESSAGES=C
LC_MONETARY=en_US
LC_NUMERIC=C
LC_TIME=C
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
_AA9aA_Za-A0-AZa_A-0ZA--a_za-a9zZZz__a0az_-0A-9-0aA-0za00A-__9-0
$ unset LANG LC_COLLATE LC_NUMERIC LC_TIME
$ set | grep '^L[AC]'
LC_CTYPE=C
LC_MESSAGES=C
LC_MONETARY=en_US
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
_-_9zz9Z-Z-Z-Z_0_a9zzzZZaAa--9_zAZaaAZz-ZaAZ09Z-_z-zz09ZZAzAz0Z0
$ unset LC_CTYPE LC_MESSAGES LC_MONETARY
$ set | grep '^L[AC]'
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
_0aAa9_Z_a_Z--_Az-aa0ZA0ZzZ-9Aa9-Z0--0A_Z0Zaz-AA_Zz0z---Z_99z_a9
$ export LANG=C LC_ALL=C LC_COLLATE=C LC_CTYPE=C LC_MESSAGES=C LC_MONETARY=C LC_NUMERIC=C LC_TIME=C
$ set | grep '^L[AC]'
LANG=C
LC_ALL=C
LC_COLLATE=C
LC_CTYPE=C
LC_MESSAGES=C
LC_MONETARY=C
LC_NUMERIC=C
LC_TIME=C
$ cat /dev/urandom | tr -dc "a-zA-Z0-9-_" | fold -w 64 | head -1 | sed 's/^-/_/'
Za_000z9aa--aA00zAAZza0AA90090--z0a00_zZ9ZA0_---aZZ09a0ZA0_0zZaa
$ cat /dev/urandom | tr -dc "[a-z][A-Z][0-9]-_" | fold -w 64 | head -1 | sed 's/^-/_/'
x7dni9gIXVF6AHQc3B-H6hjnBVHChJ9zM-z5EQ5UEruATI_NNFaCoVLOqM6gVaT5
$

Of course, on Linux that last version spits out square brackets.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

心欲静而疯不止 2024-09-22 19:40:18

如果您将路径设置为 /usr/xpg6/bin/ 那么它将按预期工作
区域设置在这里似乎没有影响。跨平台黑客是：

tr -dc '[a-z][A-Z][0-9]_-' < /dev/urandom | tr -d '][' | fold -w64 | head -n1

If you set your path to /usr/xpg6/bin/ then it'll work as expected
The locale seems to have no affect here. A cross platform hack is:

tr -dc '[a-z][A-Z][0-9]_-' < /dev/urandom | tr -d '][' | fold -w64 | head -n1

回复收藏 0 原文

冷清清 2024-09-22 19:40:18

您所观察到的并不是操作系统之间的差异，而是具有不同区域设置的不同计算机之间的差异。您的 Solaris 计算机将 LC_COLLATE 设置为非默认值，这肯定会解决您遇到的此类问题。

区域设置从环境中设置，如下所示：

如果设置了环境变量 LC_ALL，则其值将用于所有类别。
否则，如果设置了 LC_FOO，则其值将用于类别 LC_FOO。 p>
将用于未明确设置的类别。
默认语言环境称为C。在 Unix 系统上，POSIX 是 C 的同义词。

主要的语言环境类别有：

LC_CTYPE 指示用于文件名、文件内容和终端 I/O 的字符集和编码。您应该小心地保留此设置，除非您知道它不准确（例如，因为特定的文件格式指定了特定的编码）。
LC_MESSAGES 是用户看到的消息的语言。您应该保留此设置。如果您确实需要解析错误消息，请设置LC_MESSAGES=C。
LC_COLLATE表示字符的排序顺序。这在脚本中几乎总是不可取的。除 C 之外的大多数值都会引起问题，例如 A-Z 匹配小写字母。
有时，LC_NUMERIC 可能会造成问题，因为数字可能会使用不同的标点符号打印，并且 LC_TIME 会影响某些命令显示日期和时间的方式。其他类别在脚本中几乎不重要。

以下是合理的脚本策略（警告，直接在浏览器中键入）：

unset LANGUAGE  # a GNU-specific setting
if [ -n "$LC_ALL" ]; then
  export LC_CTYPE="$LC_ALL" LC_MESSAGES="$LC_ALL"
  unset LC_ALL
elif [ -n "$LANG" ]; then
  export LC_COLLATE=C LC_NUMERIC=C LC_TIME=C
else
  unset LC_COLLATE LC_NUMERIC LC_TIME
fi

标准 shell 实用程序遵循区域设置。 Perl 不会这样做，除非你告诉它。

What you've observed is not a different between operating systems, but different machines having different locale settings. Your Solaris machine has LC_COLLATE set to a non-default value, which is a sure recipe for the kind of problems you have.

Locale settings are set from the environment as follows:

If the environment variable LC_ALL is set, its value is used for all categories.
Otherwise, if LC_FOO is set, its value is used for category LC_FOO.
Otherwise, if LANG is set, its value is used for categories that weren't explicitly set.
The default locale is called C. On Unix systems, POSIX is a synonym for C.

The main locale categories are:

LC_CTYPE indicates the character set and encoding used for file names, file contents and terminal I/O. You should carefully preserve this setting unless you know it's inaccurate (e.g. because a particular file format specifies a particular encoding).
LC_MESSAGES is the language of the messages that the user sees. You should preserve this setting. If you really need to parse an error message, set LC_MESSAGES=C.
LC_COLLATE indicates the sorting order of characters. It's nearly always undesirable in scripts. Most values other than C cause trouble, such as A-Z matching lowercase letters.
Occasionally LC_NUMERIC may cause trouble because numbers may be printed with different punctuation, and LC_TIME influences the way some commands show a date and time. The other categories hardly ever matter in scripts.

Here's a reasonable strategy for scripts (warning, typed directly into the browser):

unset LANGUAGE  # a GNU-specific setting
if [ -n "$LC_ALL" ]; then
  export LC_CTYPE="$LC_ALL" LC_MESSAGES="$LC_ALL"
  unset LC_ALL
elif [ -n "$LANG" ]; then
  export LC_COLLATE=C LC_NUMERIC=C LC_TIME=C
else
  unset LC_COLLATE LC_NUMERIC LC_TIME
fi

Standard shell utilities obey the locale settings. Perl doesn't unless you tell it to.

回复收藏 0 原文