如何使用“sh -c”重定向创建非ascii文件?

发布于 2024-10-19 13:30:21 字数 3595 浏览 5 评论 0原文

相同的命令:echo 1> filename 创建不同的文件名:

$ sh -c 'echo $LANG >=с=.sh' && ls *.sh | od -c
0000000   = 321   =   .   s   h  \n
0000007

其中

$ bash -c 'echo $LANG >=с=.bash' && ls *.bash | od -c
0000000   = 321 201   =   .   b   a   s   h  \n
0000012

сU+0441 字符 — 西里尔小写字母 ES。很明显,sh 吃掉了 utf-8 编码中的第二个字节。

$ ls *sh
=?=.sh  =с=.bash

在这两种情况下,$LANG 都是:

$ cat *sh
en_US.utf8
en_US.utf8

sh 在我的系统上链接到 dash

$ apt-cache show dash | grep -i version
Version: 0.5.5.1-7ubuntu1

stty iutf8 已设置。

是否有任何设置允许 dash 不破坏多字节字符?

我在手册中没有看到任何有关字符编码的提及:

$ man dash | grep -i encoding
$ man dash | grep -Pi 'multi.*byte'

更新

The Second byte '\201' of the 'с' character in utf-8编码是 在 C 中,-127 作为有符号字符(或 129 作为无符号字符)。

在源代码 (apt-get source dash) 中搜索-127 结果:

src/parser.h:38:#define CTL_FIRST -127      /* first 'special' character */
src/parser.h:39:#define CTLESC -127     /* escape next character */

搜索 CTLESC 会导致 rmescapes() 宏,从而导致 以下来自 src/expand.c:expandarg() 的片段:

/*
 * TODO - EXP_REDIR
 */
if (flag & EXP_FULL) {
    ifsbreakup(p, &exparg);
    *exparg.lastp = NULL;
    exparg.lastp = &exparg.list;
    expandmeta(exparg.list, flag);
} else {
    if (flag & EXP_REDIR) /*XXX - for now, just remove escapes */
        rmescapes(p);
    sp = (struct strlist *)stalloc(sizeof (struct strlist));
    sp->text = p;
    *exparg.lastp = sp;
    exparg.lastp = &sp->next;
}

TODOXXX 提示更新的版本可能 帮助。 debian/dash.README.source 指向:

$ git clone http://smarden.org/git/dash.git/
$ cd dash

有两个分支:

$ git br
* debian-sid
  release+patches

debian-sid 上,转义字节被删除。关于发布+补丁 分支 grep 找到丢失的字节。

$ ./configure
$ make && rm *.dash -f; ./dash -c 'echo 1 >fсf.dash' && 
> ls *.dash | od -c | grep 201

git diff debian-sid...release+patches 显示 rmescapes() 是 在 release-patches 中删除:

diff --git a/src/expand.c b/src/expand.c
index e4c4c8b..f2f964c 100644
--- a/src/expand.c
+++ b/src/expand.c
...
@@ -213,8 +210,6 @@ expandarg(union node *arg, struct arglist *arglist, int flag)
                exparg.lastp = &exparg.list;
                expandmeta(exparg.list, flag);
        } else {
-               if (flag & EXP_REDIR) /*XXX - for now, just remove escapes */
-                       rmescapes(p);
                sp = (struct strlist *)stalloc(sizeof (struct strlist));
                sp->text = p;
                *exparg.lastp = sp;
@@ -412,7 +407,7 @@ lose:
 }

目前尚不清楚这些更改是否会包含在 Ubuntu 上的 dash 0.5.6.1 中。

目前,使命令:

$ sh -c 'echo 1 >fсf.dash' &&  ls *.dash | od -c | grep 201

工作的唯一方法是将 sh 重新配置回 bash

$ sudo dpkg-reconfigure dash

还有其他选择吗?

The same command: echo 1 > filename creates different filenames:

$ sh -c 'echo $LANG >=с=.sh' && ls *.sh | od -c
0000000   = 321   =   .   s   h  \n
0000007

and

$ bash -c 'echo $LANG >=с=.bash' && ls *.bash | od -c
0000000   = 321 201   =   .   b   a   s   h  \n
0000012

Where с is the U+0441 character — CYRILLIC SMALL LETTER ES. It is clear that sh eats the second byte in the utf-8 encoding.

$ ls *sh
=?=.sh  =с=.bash

$LANG in both cases is:

$ cat *sh
en_US.utf8
en_US.utf8

sh is linked to dash on my system:

$ apt-cache show dash | grep -i version
Version: 0.5.5.1-7ubuntu1

stty iutf8 is set.

Is there any setting that allows dash not to mangle multi-byte characters?

I don't see any mentions about character encoding in the manual:

$ man dash | grep -i encoding
$ man dash | grep -Pi 'multi.*byte'

Update

The second byte '\201' of the 'с' character in utf-8 encoding is
-127 as signed char (or 129 as unsigned char) in C.

The search in the source code (apt-get source dash) for -127 results in:

src/parser.h:38:#define CTL_FIRST -127      /* first 'special' character */
src/parser.h:39:#define CTLESC -127     /* escape next character */

Search for CTLESC leads to rmescapes() macros that leads to the
following fragment from src/expand.c:expandarg():

/*
 * TODO - EXP_REDIR
 */
if (flag & EXP_FULL) {
    ifsbreakup(p, &exparg);
    *exparg.lastp = NULL;
    exparg.lastp = &exparg.list;
    expandmeta(exparg.list, flag);
} else {
    if (flag & EXP_REDIR) /*XXX - for now, just remove escapes */
        rmescapes(p);
    sp = (struct strlist *)stalloc(sizeof (struct strlist));
    sp->text = p;
    *exparg.lastp = sp;
    exparg.lastp = &sp->next;
}

TODO and XXX hints that a more recent version might
help. debian/dash.README.source points to:

$ git clone http://smarden.org/git/dash.git/
$ cd dash

There are two branches:

$ git br
* debian-sid
  release+patches

On debian-sid the escape byte is removed. On the release+patches
branch grep finds the missing byte.

$ ./configure
$ make && rm *.dash -f; ./dash -c 'echo 1 >fсf.dash' && 
> ls *.dash | od -c | grep 201

git diff debian-sid...release+patches shows that rmescapes() was
removed in release-patches:

diff --git a/src/expand.c b/src/expand.c
index e4c4c8b..f2f964c 100644
--- a/src/expand.c
+++ b/src/expand.c
...
@@ -213,8 +210,6 @@ expandarg(union node *arg, struct arglist *arglist, int flag)
                exparg.lastp = &exparg.list;
                expandmeta(exparg.list, flag);
        } else {
-               if (flag & EXP_REDIR) /*XXX - for now, just remove escapes */
-                       rmescapes(p);
                sp = (struct strlist *)stalloc(sizeof (struct strlist));
                sp->text = p;
                *exparg.lastp = sp;
@@ -412,7 +407,7 @@ lose:
 }

It is unclear whether these changes will be included in dash 0.5.6.1 on Ubuntu.

For now the only way to make the command:

$ sh -c 'echo 1 >fсf.dash' &&  ls *.dash | od -c | grep 201

to work is to reconfigure sh back to bash:

$ sudo dpkg-reconfigure dash

Are there other alternatives?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

无法回应 2024-10-26 13:30:21

在我尝试过的几个 shell(或版本)中,只有 Dash 和 Busybox Ash 失败了。

$ for sh in bash2.05b bash3.2 bash4.0 bash4.1 bash4.2 dash zsh ksh pdksh mksh ash; do $sh -c 'locale > с.$0'; done
$ csh -c 'locale > с.csh'
$ fish -c 'locale > с.fish'
$ ls -1
?.ash
?.dash
с.bash2.05b
с.bash3.2
с.bash4.0
с.bash4.1
с.bash4.2
с.csh
с.fish
с.ksh
с.mksh
с.pdksh
с.zsh

内容都是一样的。

来自 man dash

仅合并 POSIX 指定的功能以及一些 Berkeley 扩展
进入这个外壳。本手册页并非旨在成为教程或
外壳的完整规格。

POSIX 说:

POSIX 语言环境包含可移植字符集中的字符,这些字符具有 LC_CTYPE 中列出的属性。在其他语言环境中,任何附加字符的存在、含义和表示都是特定于语言环境的。

其他字符的宽字符代码由区域设置和实现定义。 ... POSIX.1-2008 没有提供定义宽字符代码集的方法。

Of the several shells (or versions) I tried, only Dash and Busybox Ash failed.

$ for sh in bash2.05b bash3.2 bash4.0 bash4.1 bash4.2 dash zsh ksh pdksh mksh ash; do $sh -c 'locale > с.$0'; done
$ csh -c 'locale > с.csh'
$ fish -c 'locale > с.fish'
$ ls -1
?.ash
?.dash
с.bash2.05b
с.bash3.2
с.bash4.0
с.bash4.1
с.bash4.2
с.csh
с.fish
с.ksh
с.mksh
с.pdksh
с.zsh

The contents were all the same.

From man dash:

Only features designated by POSIX, plus a few Berkeley extensions, are being incorporated
into this shell. This man page is not intended to be a tutorial or a
complete specification of the shell.

POSIX says:

The POSIX locale contains the characters in Portable Character Set , which have the properties listed in LC_CTYPE . In other locales, the presence, meaning, and representation of any additional characters are locale-specific.

and

Wide-character codes for other characters are locale and implementation-defined. ... POSIX.1-2008 provides no means of defining a wide-character codeset.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文