如何制作防止非 UTF-8 文件编码的预提交挂钩

发布于 2024-09-07 14:31:47 字数 99 浏览 10 评论 0原文

是否可以为 git 或 svn 创建一个预提交挂钩,以拒绝未以特定编码提交的文件?

我曾参与过几个项目,在这些项目中,坚持某种文件编码(例如 UTF-8)似乎是一个问题

Is it possible to make a precommit hook for git or svn that can reject files not committed in a specific encoding?

I have worked on several project where it seems to be a problem to stick to a certain file encoding (like UTF-8 for instance)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

尾戒 2024-09-14 14:31:47

您的 iconv 可能能够告诉您某些内容是否不是 UTF-8,但其他编码可能没那么容易(尤其是 8 位单字节编码,如 ISO-8859-1)。

对于 Git,您实际上可能需要一个更新挂钩而不是预提交挂钩(以便它可以在中央存储库中运行以强制执行规则)。

Git 预提交挂钩:

#!/bin/sh
git ls-files -z -- |
xargs -0 sh -c '

    e=""
    for f; do
        if ! git show :"$f" |
             iconv -f UTF-8 -t UTF-8 >/dev/null 2>&1; then
            e=1
            echo "Not UTF-8: $f"
            #exit 255 # to abort after first non-UTF-8 file
        fi
    done
    test -z "$e"

' -

在 git ls-files 命令行上的 -- 之后放置一个或多个 Git 路径规范,以限制检查的路径名。

要检查更新挂钩中更新后的引用的提示,请使用 git ls-tree --name-only -r -z $3 -- | 生成路径名(注意:它不处理模式路径规范(如 git ls-files,shell 代码中任何基于模式的过滤也是如此)和 git show "$3:$f" 来提取文件内容。您可能还想不仅检查提示提交,还想检查每个新提交(在 git rev-list ^$2 $3 中循环检查每个提交,而不仅仅是 $3)。

Your iconv may be able to tell you if something is not UTF-8, but other encodings may not be so easy (especially 8-bit, single byte encodings like ISO-8859-1).

For Git, you may actually want an update hook instead of a pre-commit hook (so that it can be run in a central repository to enforce the rule).

Git pre-commit hook:

#!/bin/sh
git ls-files -z -- |
xargs -0 sh -c '

    e=""
    for f; do
        if ! git show :"$f" |
             iconv -f UTF-8 -t UTF-8 >/dev/null 2>&1; then
            e=1
            echo "Not UTF-8: $f"
            #exit 255 # to abort after first non-UTF-8 file
        fi
    done
    test -z "$e"

' -

Put one or more Git pathspecs after the -- on the git ls-files command line to limit the pathnames that are checked.

To check the tip of the updated ref in an update hook, use git ls-tree --name-only -r -z $3 -- | to generate the pathnames (note: it does not handle pattern pathspecs like git ls-files, so do any pattern-based filtering in the shell code) and git show "$3:$f" to extract the file contents. You might also want to check not only the tip commit, but each new commit (loop for each commit in git rev-list ^$2 $3 instead of just $3).

时常饿 2024-09-14 14:31:47

预提交挂钩只是脚本。因此,如果您可以告诉脚本中的编码,那么您可以使用该信息来拒绝错误类型的文件。

您可以在文件中搜索正常字符范围之外的字符。如果有一个幻数或标签告诉您文件的编码,您可以检查一下。否则问问自己“我怎么知道这个文件的编码是错误的?”你能编码一下吗?

Precommit hooks are just scripts. So if you can tell the encoding in a script, then you can use that information to reject the wrong sort of file.

You could search the file for characters outside of the normal character range. If there's a magic number or a tag to tell you the encoding for a file, you can check that. Otherwise ask yourself "how would I know this file is in the wrong encoding?" Can you code that up?

秋凉 2024-09-14 14:31:47

您可以使用 iconv 实用程序将编码从 UTF-8 更改为 UTF-16 。如果更改失败,则源文件的编码不正确:

$ iconv -f UTF-8 -t UTF-16 Strings.java 
ÿþ
testing = iconv: illegal input sequence at position 11

You could maybe use iconv utility to change the encoding from UTF-8 to for example UTF-16. And if the change fails, the source file is not in correct encoding:

$ iconv -f UTF-8 -t UTF-16 Strings.java 
ÿþ
testing = iconv: illegal input sequence at position 11
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文