如何使用“sh -c”重定向创建非ascii文件?
相同的命令:echo 1> filename
创建不同的文件名:
$ sh -c 'echo $LANG >=с=.sh' && ls *.sh | od -c
0000000 = 321 = . s h \n
0000007
其中
$ bash -c 'echo $LANG >=с=.bash' && ls *.bash | od -c
0000000 = 321 201 = . b a s h \n
0000012
с
是 U+0441
字符 — 西里尔小写字母 ES。很明显,sh
吃掉了 utf-8
编码中的第二个字节。
$ ls *sh
=?=.sh =с=.bash
在这两种情况下,$LANG
都是:
$ cat *sh
en_US.utf8
en_US.utf8
sh
在我的系统上链接到 dash
:
$ apt-cache show dash | grep -i version
Version: 0.5.5.1-7ubuntu1
stty iutf8
已设置。
是否有任何设置允许 dash
不破坏多字节字符?
我在手册中没有看到任何有关字符编码的提及:
$ man dash | grep -i encoding
$ man dash | grep -Pi 'multi.*byte'
更新
The Second byte '\201'
of the 'с'
character in utf-8编码是 在 C 中,-127
作为有符号字符(或 129
作为无符号字符)。
在源代码 (apt-get source dash
) 中搜索-127
结果:
src/parser.h:38:#define CTL_FIRST -127 /* first 'special' character */
src/parser.h:39:#define CTLESC -127 /* escape next character */
搜索 CTLESC
会导致 rmescapes()
宏,从而导致 以下来自 src/expand.c:expandarg()
的片段:
/*
* TODO - EXP_REDIR
*/
if (flag & EXP_FULL) {
ifsbreakup(p, &exparg);
*exparg.lastp = NULL;
exparg.lastp = &exparg.list;
expandmeta(exparg.list, flag);
} else {
if (flag & EXP_REDIR) /*XXX - for now, just remove escapes */
rmescapes(p);
sp = (struct strlist *)stalloc(sizeof (struct strlist));
sp->text = p;
*exparg.lastp = sp;
exparg.lastp = &sp->next;
}
TODO
和 XXX
提示更新的版本可能 帮助。 debian/dash.README.source
指向:
$ git clone http://smarden.org/git/dash.git/
$ cd dash
有两个分支:
$ git br
* debian-sid
release+patches
在 debian-sid
上,转义字节被删除。关于发布+补丁
分支 grep
找到丢失的字节。
$ ./configure
$ make && rm *.dash -f; ./dash -c 'echo 1 >fсf.dash' &&
> ls *.dash | od -c | grep 201
git diff debian-sid...release+patches
显示 rmescapes()
是 在 release-patches
中删除:
diff --git a/src/expand.c b/src/expand.c
index e4c4c8b..f2f964c 100644
--- a/src/expand.c
+++ b/src/expand.c
...
@@ -213,8 +210,6 @@ expandarg(union node *arg, struct arglist *arglist, int flag)
exparg.lastp = &exparg.list;
expandmeta(exparg.list, flag);
} else {
- if (flag & EXP_REDIR) /*XXX - for now, just remove escapes */
- rmescapes(p);
sp = (struct strlist *)stalloc(sizeof (struct strlist));
sp->text = p;
*exparg.lastp = sp;
@@ -412,7 +407,7 @@ lose:
}
目前尚不清楚这些更改是否会包含在 Ubuntu 上的 dash 0.5.6.1
中。
目前,使命令:
$ sh -c 'echo 1 >fсf.dash' && ls *.dash | od -c | grep 201
工作的唯一方法是将 sh
重新配置回 bash
:
$ sudo dpkg-reconfigure dash
还有其他选择吗?
The same command: echo 1 > filename
creates different filenames:
$ sh -c 'echo $LANG >=с=.sh' && ls *.sh | od -c
0000000 = 321 = . s h \n
0000007
and
$ bash -c 'echo $LANG >=с=.bash' && ls *.bash | od -c
0000000 = 321 201 = . b a s h \n
0000012
Where с
is the U+0441
character — CYRILLIC SMALL LETTER ES. It is clear that sh
eats the second byte in the utf-8
encoding.
$ ls *sh
=?=.sh =с=.bash
$LANG
in both cases is:
$ cat *sh
en_US.utf8
en_US.utf8
sh
is linked to dash
on my system:
$ apt-cache show dash | grep -i version
Version: 0.5.5.1-7ubuntu1
stty iutf8
is set.
Is there any setting that allows dash
not to mangle multi-byte characters?
I don't see any mentions about character encoding in the manual:
$ man dash | grep -i encoding
$ man dash | grep -Pi 'multi.*byte'
Update
The second byte '\201'
of the 'с'
character in utf-8 encoding is-127
as signed char (or 129
as unsigned char) in C.
The search in the source code (apt-get source dash
) for -127
results in:
src/parser.h:38:#define CTL_FIRST -127 /* first 'special' character */
src/parser.h:39:#define CTLESC -127 /* escape next character */
Search for CTLESC
leads to rmescapes()
macros that leads to the
following fragment from src/expand.c:expandarg()
:
/*
* TODO - EXP_REDIR
*/
if (flag & EXP_FULL) {
ifsbreakup(p, &exparg);
*exparg.lastp = NULL;
exparg.lastp = &exparg.list;
expandmeta(exparg.list, flag);
} else {
if (flag & EXP_REDIR) /*XXX - for now, just remove escapes */
rmescapes(p);
sp = (struct strlist *)stalloc(sizeof (struct strlist));
sp->text = p;
*exparg.lastp = sp;
exparg.lastp = &sp->next;
}
TODO
and XXX
hints that a more recent version might
help. debian/dash.README.source
points to:
$ git clone http://smarden.org/git/dash.git/
$ cd dash
There are two branches:
$ git br
* debian-sid
release+patches
On debian-sid
the escape byte is removed. On the release+patches
branch grep
finds the missing byte.
$ ./configure
$ make && rm *.dash -f; ./dash -c 'echo 1 >fсf.dash' &&
> ls *.dash | od -c | grep 201
git diff debian-sid...release+patches
shows that rmescapes()
was
removed in release-patches
:
diff --git a/src/expand.c b/src/expand.c
index e4c4c8b..f2f964c 100644
--- a/src/expand.c
+++ b/src/expand.c
...
@@ -213,8 +210,6 @@ expandarg(union node *arg, struct arglist *arglist, int flag)
exparg.lastp = &exparg.list;
expandmeta(exparg.list, flag);
} else {
- if (flag & EXP_REDIR) /*XXX - for now, just remove escapes */
- rmescapes(p);
sp = (struct strlist *)stalloc(sizeof (struct strlist));
sp->text = p;
*exparg.lastp = sp;
@@ -412,7 +407,7 @@ lose:
}
It is unclear whether these changes will be included in dash 0.5.6.1
on Ubuntu.
For now the only way to make the command:
$ sh -c 'echo 1 >fсf.dash' && ls *.dash | od -c | grep 201
to work is to reconfigure sh
back to bash
:
$ sudo dpkg-reconfigure dash
Are there other alternatives?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在我尝试过的几个 shell(或版本)中,只有 Dash 和 Busybox Ash 失败了。
内容都是一样的。
来自
man dash
:POSIX 说:
和
Of the several shells (or versions) I tried, only Dash and Busybox Ash failed.
The contents were all the same.
From
man dash
:POSIX says:
and