在工具链中添加自定义工具以在编译前删除 UTF-8 BOM

发布于 2024-09-07 04:24:19 字数 501 浏览 6 评论 0原文

我的问题是在 Code::Blocks 及其 MinGW 的调整版本和 Notepad++ 的上下文中。

我希望能够在我的源代码中包含 Unicode 文字,只要我使用 UTF-8 并且不使用 BOM,我就可以。

在某种程度上,这工作得很好,但每当我重新打开文件时,它就会 BOM 出来(不好的双关语);它(毫不奇怪)具有以 ANSI 形式显示 Unicode 的令人不安的副作用。 :(

那些非常有用但非常烦人的三个字节必须存在,然后它们必须消失!(在编译时)。

这听起来很简单,只需预处理源文件,并丢弃前三个字节(如果它们是 UTF-8 BOM)...

每次编译时我肯定不会成为处理器(通过手动删除),所以我什至对这些文字使用无 BOM #include 文件,但这从几个角度来看都是有问题的,其中最重要的是,这是众所周知的痛苦,而且我无法“看到”它们......

没有很多方法我可以利用它们 !带有自定义预处理器的工具链? ...或者如果我错过了一些明显的解决方案,我将非常感激听到它。

My question is in the context of Code::Blocks and its tweaked version of MinGW, and Notepad++ .

I want to be able to include Unicode literals in my source, and I can, so long as I use UTF-8 and not use a BOM.

This works fine, up to a point, but it BOMs out (bad pun) whenever I reopen the file; it (not surprisingly) has this un-nerving side-effect of displaying the Unicode in its ANSI form. :(

Those very useful and yet very annoying three bytes have to be there, and then they have to go! (at compile time).

It sounds easy enough, just preprocess the source file(s), and discard the first three bytes (if they are a UTF-8 BOM)...

I'm certainly not going to be the processor (by manual removal) each time I compile, so I've even resorted to using BOM-less #include files for these literals, but this is problematic from several perspectives, not the least of which is that it is a pain in the proverbial, and I can't "see" them! ..without a lot of juggling.

Is there some way I can tap into the toolchain with a custom preprocessor?
...or if I have missed some obvious solution, I'd very much appreciate hearing about it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

心欲静而疯不止 2024-09-14 04:24:19

您可能需要考虑将所有字符串文字外部化到一个单独的文件中,并使用 loadLit() 函数(或类似函数)在运行时获取它们。

这将允许您拥有一个包含所有字符串文字的单个文件(带有 BOM),并且如果您必须国际化您的应用程序,将使您的生活变得更加轻松。

我们对我们的东西这样做,但请记住,我们的 1 类程序必须针对 21 个不同的区域设置进行国际化,因此我们通过这样做可以节省大量工作:-) 您的里程可能会有所不同。

You might want to consider externalising all your string literals to a separate file anyway and using a loadLit() function (or similar) to get them at runtime.

This will allow you to have a single file (with a BOM) containing all your string literals and will make your life a lot easier if you ever have to internationalise your application.

We do that with our stuff but keep in mind our class 1 programs have to be i18n'ed for 21 different locales so we save a lot of work by doing it this way :-) Your mileage may vary.

咿呀咿呀哟 2024-09-14 04:24:19

我又摸索了一番,找到了一个初步的解决方案。我对此并不完全满意,因为它涉及修改源代码,而我实际上是在寻找管道解决方案,但似乎 g++.exe 只接受命令行参数(如果我错了,请纠正我)。

我的“解决方案”有点粗糙,但它有效,并且(对我来说)肯定比我遇到的任何其他可行的解决方案(没有!)它需要适当注意您的编辑的“文件已被外部修改”消息框(如果正在编辑文件),但实际上,BOM 仍在编辑器中,因此这有点没有实际意义。

这是一个简单的命令行黑客。我更喜欢一个更集成的选项,但这是这个(并且它有效):

在 Codeblocks 中,转到:设置 -> 。编译器和调试器->其他设置->
[高级选项]->命令行宏:

将这些 mods 添加到命令行。
它们应该都在一行上(当然),但为了清楚起见,我将它们分开:

cmd /c DropTheBOM.exe $file
& $compiler $options $includes -c $file -o $object // (use your compiler cmdline)
& MakeTheBOM.exe $file
// Write your own utils, or try here: http://code.google.com/p/utf-bom-utils/

PS:#include 文件不会删除它们的 BOM(如果有的话)..
一个简单的 BOM y/n arg 开关用于 #includes 这些文件的例程将非常简单地解决这个问题...(但这只是一个 Windows 问题...也许这就是为什么它没有得到满足...或者有吗?有人知道吗?

I've fossicked around a bit more, and I've worked out a tentative solution. I'm not completely happy with it because it involves modifying the source, whereas I was actually looking for a piped solution, but it seems that g++.exe only accepts command line args (please correct me if I'm wrong).

My "solution" is a bit rough-and-ready, but it works, and is certainly better (for me) than any other viable solution I've come across (which is none!) It requires due attention be paid to your editor's "File has been externally modified" message-box (if the file is being edited), but in fact, the BOM is still in the editor, so it is somewhat of a moot point.

It is a simple command line hack. I'd prefer a more-integrated option, but here is this one (and it works):

In Codeblocks, go to: Settings -> Compiler and Debugger -> Other settings ->
[Advanced options] -> Command line macro:

Make these mods to the command line.
They should all be on a single line (of course), but for clarity I've seperated them out:

cmd /c DropTheBOM.exe $file
& $compiler $options $includes -c $file -o $object // (use your compiler cmdline)
& MakeTheBOM.exe $file
// Write your own utils, or try here: http://code.google.com/p/utf-bom-utils/

PS: #include files are not stripiped of their BOM (if they have one)..
A simple BOM y/n arg switch for the routine which #includes these files would solve this issue quite simply... (but it is only a Windows problem... maybe thats why it hasn't been catered for... or has it? Does anyone know?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文