使用 msysgit 进行字符编码

发布于 2024-11-16 19:38:22 字数 670 浏览 3 评论 0原文

在我的 winXP 盒子上创建的提交消息在我的 Win7 盒子上读取时会生成警告。

我的名字包含特殊字符(ö),我想这是问题的根源,因为我的名字在提交中。 我在尝试存储在 winXP 上创建的提交上的更改时看到了这个问题: 警告:提交消息不符合 UTF-8。

我想检查在 winXP 上使用什么编码来生成提交,但找不到如何操作。

$ git config --get i18n.commitencoding 在两台机器上返回空白。

http://www.kernel.org/pub/software /scm/git/docs/git-commit.html 似乎是说 git 检查提交对象中的编码。

git log、git show、git Blame 和 朋友们看一下编码头 提交对象,并尝试重新编码 将日志消息转换为 UTF-8 除非 另有说明。

这很好,但是为什么 git 在 win7 上抱怨而不是在 winXP 上呢?


两台机器上的 msysgit 版本相同:1.7.4.msysgit.0。

Commit messages created on my winXP box generate warnings when read on my Win7 box.

My name contains special characters (ö), I suppose that this is the source of the problem since my name is in the commit.
I saw the problem while trying to stash changes on a commit created on winXP:
Warning: commit message does not conform to UTF-8.

I would like to check what encoding was used to generate the commit on winXP, but can't find how.

$ git config --get i18n.commitencoding
returns blank on both machines.

http://www.kernel.org/pub/software/scm/git/docs/git-commit.html seems to say that git checks the encoding in the commit objects.

git log, git show, git blame and
friends look at the encoding header of
a commit object, and try to re-code
the log message into UTF-8 unless
otherwise specified.

That is fine, but then why does git complain on win7 and not on winXP?


msysgit versions are identical on both machines: 1.7.4.msysgit.0.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

九歌凝 2024-11-23 19:38:22

只是一个疯狂的猜测,但我最近在 Rakefile 中的某人名字中遇到了类似的问题,实际上我必须更改 CMD 环境的编码才能运行它。

查看此 wiki 上的第二步:

https://github。 com/NancyFx/Nancy/wiki/Having-trouble-with-rake%3F

有关 chcp 命令的 Microsoft 文档位于此处:
http://www .microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/chcp.mspx?mfr=true

Just a wild guess but I had a similar problem with letters in someone's name in a Rakefile recently and I actually had to change the encoding of my CMD environment to run it.

Look at step number two on this wiki:

https://github.com/NancyFx/Nancy/wiki/Having-trouble-with-rake%3F

The Microsoft documentation on the chcp command is here:
http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/chcp.mspx?mfr=true

别挽留 2024-11-23 19:38:22

使用 i18n.commitEncoding 得到更好的支持现代 Git(2019),但只有 Git 2.25(2020 年第一季度)提供了完整支持:处理在期间使用非 UTF-8 编码的提交对象“rebase -i”已得到改进。

请参阅 提交 52f52e5提交 5772b0c(2019 年 11 月 11 日),提交b375744提交 019a9d8, 提交 0798d16, 提交 e4b95b3, 提交 1ba6e7a(2019 年 11 月 8 日),以及 提交 99b2ba3(2019 年 11 月 7 日),作者:Doan Tran Cong Danh (congdanhqx-zz)
(由 Junio C Hamano -- gitster -- 合并于 提交 6511cb3,2019 年 12 月 1 日)

sequencer:重新编码旧的合并提交消息

签字人:Doan Tran Cong Danh

在变基期间,旧合并的消息(以旧编码进行编码)将用作新合并提交(由变基创建)的消息。

如果 i18n.commitencoding 的值在旧合并时间之后已更改。我们将收到此新合并的不可用消息。

更正它。


sequencer:之前重新编码为utf-8安排rebase的待办事项列表

签字人:Doan Tran Cong Danh

musl libc上,ISO-2022-JP编码器是由于太急于切换回 1 字节编码,musliconv 总是在每个组合字符后切换回来。
比较此命令的 glibcmusl 的输出

<前><代码>$ sed qt/t3900/ISO-2022-JP.txt| iconv -f ISO-2022-JP -t utf-8 `|`
iconv -f utf-8 -t ISO-2022-JP | iconv -f utf-8 -t ISO-2022-JP | xxd

glibc:
00000000: 1b24 4224 4f24 6c24 5224 5b24 551b 2842 .$B$O$l$R$[$U.(B
00000010: 0a 。

穆斯尔:
00000000: 1b24 4224 4f1b 2842 1b24 4224 6c1b 2842 .$B$O.(B.$B$l.(B
00000010: 1b24 4224 521b 2842 1b24 4224 5b1b 2842 .$B$R.(B.$B$[.(B
00000020: 1b24 4224 551b 2842 0a .$B$U。(B.

虽然 musl iconv 的输出不是最佳的,但它仍然是正确的。

来自 提交 7d509878b8 ("pretty.c:截断格式字符串logOutputEncoding”,2014-05-21, Git v2.1.0-rc0 -- 合并 中列出href="https://github.com/git/git/commit/cb682f8cfe63ecd0da08a526f404d295e51e3ab1" rel="nofollow noreferrer">batch #3),我们首先将消息编码为 utf-8,然后对其进行格式化并将消息转换为 git commit --squash

因此,t3900::test_commit_autosquash_flagsmusl libc 上失败。

在安排 rebase 的待办事项列表之前重新编码为 utf-8。


configure.ac:定义ICONV_OMITS_BOM 如有必要

签字人:Doan Tran Cong Danh

来自提交79444c9294(“utf8:句柄不为 UTF-16 编写 BOM 的系统”, 2019-02-12,Git v2.21.0-rc1 -- 合并列于 < a href="https://github.com/git/git/commit/" rel="nofollow noreferrer">批次#0),我们支持那些带有 iconv 且省略 BOM 的系统:

使 ICONV_OMITS_BOM=是

但是,配置脚本并未被教导如何检测这些系统。

教导配置这样做。

Using i18n.commitEncoding is better supported with modern Git (2019), but only Git 2.25 (Q1 2020), provides a full support: Handling of commit objects that use non UTF-8 encoding during "rebase -i" has been improved.

See commit 52f52e5, commit 5772b0c (11 Nov 2019), commit b375744, commit 019a9d8, commit 0798d16, commit e4b95b3, commit 1ba6e7a (08 Nov 2019), and commit 99b2ba3 (07 Nov 2019) by Doan Tran Cong Danh (congdanhqx-zz).
(Merged by Junio C Hamano -- gitster -- in commit 6511cb3, 01 Dec 2019)

sequencer: reencode old merge-commit message

Signed-off-by: Doan Tran Cong Danh

During rebasing, old merge's message (encoded in old encoding) will be used as message for new merge commit (created by rebase).

In case of the value of i18n.commitencoding has been changed after the old merge time. We will receive an unusable message for this new merge.

Correct it.


sequencer: reencode to utf-8 before arrange rebase's todo list

Signed-off-by: Doan Tran Cong Danh

On musl libc, ISO-2022-JP encoder is too eager to switch back to 1 byte encoding, musl's iconv always switch back after every combining character.
Comparing glibc and musl's output for this command

$ sed q t/t3900/ISO-2022-JP.txt| iconv -f ISO-2022-JP -t utf-8 `|`
        iconv -f utf-8 -t ISO-2022-JP | xxd

glibc: 
00000000: 1b24 4224 4f24 6c24 5224 5b24 551b 2842  .$B$O$l$R$[$U.(B
00000010: 0a                                       .

musl: 
00000000: 1b24 4224 4f1b 2842 1b24 4224 6c1b 2842  .$B$O.(B.$B$l.(B
00000010: 1b24 4224 521b 2842 1b24 4224 5b1b 2842  .$B$R.(B.$B$[.(B
00000020: 1b24 4224 551b 2842 0a                   .$B$U.(B.

Although musl iconv's output isn't optimal, it's still correct.

From commit 7d509878b8 ("pretty.c: format string with truncate respects logOutputEncoding", 2014-05-21, Git v2.1.0-rc0 -- merge listed in batch #3), we're encoding the message to utf-8 first, then format it and convert the message to the actual output encoding on git commit --squash.

Thus, t3900::test_commit_autosquash_flags is failing on musl libc.

Reencode to utf-8 before arranging rebase's todo list.


configure.ac: define ICONV_OMITS_BOM if necessary

Signed-off-by: Doan Tran Cong Danh

From commit 79444c9294 ("utf8: handle systems that don't write BOM for UTF-16", 2019-02-12, Git v2.21.0-rc1 -- merge listed in batch #0), we're supporting those systems with iconv that omits BOM with:

make ICONV_OMITS_BOM=Yes

However, configure script wasn't taught to detect those systems.

Teach configure to do so.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文