如何制作防止非 UTF-8 文件编码的预提交挂钩
是否可以为 git 或 svn 创建一个预提交挂钩,以拒绝未以特定编码提交的文件?
我曾参与过几个项目,在这些项目中,坚持某种文件编码(例如 UTF-8)似乎是一个问题
Is it possible to make a precommit hook for git or svn that can reject files not committed in a specific encoding?
I have worked on several project where it seems to be a problem to stick to a certain file encoding (like UTF-8 for instance)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您的 iconv 可能能够告诉您某些内容是否不是 UTF-8,但其他编码可能没那么容易(尤其是 8 位单字节编码,如 ISO-8859-1)。
对于 Git,您实际上可能需要一个更新挂钩而不是预提交挂钩(以便它可以在中央存储库中运行以强制执行规则)。
Git 预提交挂钩:
在 git ls-files 命令行上的
--
之后放置一个或多个 Git 路径规范,以限制检查的路径名。要检查更新挂钩中更新后的引用的提示,请使用 git ls-tree --name-only -r -z $3 -- | 生成路径名(注意:它不处理模式路径规范(如 git ls-files,shell 代码中任何基于模式的过滤也是如此)和
git show "$3:$f"
来提取文件内容。您可能还想不仅检查提示提交,还想检查每个新提交(在git rev-list ^$2 $3
中循环检查每个提交,而不仅仅是$3
)。Your iconv may be able to tell you if something is not UTF-8, but other encodings may not be so easy (especially 8-bit, single byte encodings like ISO-8859-1).
For Git, you may actually want an update hook instead of a pre-commit hook (so that it can be run in a central repository to enforce the rule).
Git pre-commit hook:
Put one or more Git pathspecs after the
--
on the git ls-files command line to limit the pathnames that are checked.To check the tip of the updated ref in an update hook, use
git ls-tree --name-only -r -z $3 -- |
to generate the pathnames (note: it does not handle pattern pathspecs like git ls-files, so do any pattern-based filtering in the shell code) andgit show "$3:$f"
to extract the file contents. You might also want to check not only the tip commit, but each new commit (loop for each commit ingit rev-list ^$2 $3
instead of just$3
).预提交挂钩只是脚本。因此,如果您可以告诉脚本中的编码,那么您可以使用该信息来拒绝错误类型的文件。
您可以在文件中搜索正常字符范围之外的字符。如果有一个幻数或标签告诉您文件的编码,您可以检查一下。否则问问自己“我怎么知道这个文件的编码是错误的?”你能编码一下吗?
Precommit hooks are just scripts. So if you can tell the encoding in a script, then you can use that information to reject the wrong sort of file.
You could search the file for characters outside of the normal character range. If there's a magic number or a tag to tell you the encoding for a file, you can check that. Otherwise ask yourself "how would I know this file is in the wrong encoding?" Can you code that up?
您可以使用 iconv 实用程序将编码从 UTF-8 更改为 UTF-16 。如果更改失败,则源文件的编码不正确:
You could maybe use iconv utility to change the encoding from UTF-8 to for example UTF-16. And if the change fails, the source file is not in correct encoding: