如何在 git repo 中查找双向 Unicode 控制字符?
我在 NPM 上有一个包,显示它包含 socket.dev 报告的“双向 unicode 控制字符” 。
我找到了这个问题的答案 如何更新 GitHub Actions CI 以检测木马代码提交(恶意[双向] unicode 字符,python)< /a>.
我使用过:
git grep -oP "[^\x00-\x7F]*"
它在二进制文件中找到了一些匹配项,因此我从 .gitattributes 中删除了所有二进制标志,现在我只有 __tests__
目录中包含 ANSI 文件和一个图像的文件,但是那些未发布到 NPM。
找到这些“双向 unicode 控制字符”的正确方法是什么?
I have a package on NPM that shows that it contain "Bidirectional unicode control characters" reported by socket.dev.
I've found answer to this question How to update GitHub Actions CI to detect Trojan Code commits (malicious [bidirectional] unicode chars, python).
I've used:
git grep -oP "[^\x00-\x7F]*"
It found some matches in binary files, so I've removed all binary flags from .gitattributes and now I only have files from __tests__
directory that has ANSI files and one image, but those are not published to NPM.
What is a proper way to find those: "Bidirectional unicode control characters"?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
perl
oneliner将打印出给定 UTF-8 编码文件中包含双向控制字符的行。
不幸的是,
git grep -P
使用的 PCRE 不支持 perl 正则表达式所支持的 unicode 属性级别。不过,您可以显式搜索控制字符:-I
选项会跳过二进制文件。(控制字符列表取自 perluniprops)。
The
perl
one linerwill print out lines of the given UTF-8 encoded files that have bidirectional control characters in them.
Unfortunately, PCRE, which
git grep -P
uses, doesn't support anywhere near the level of unicode properties that perl regular expressions do. You can search for the control characters explicitly, though:The
-I
option skips binary files.(List of control characters taken from perluniprops).