UTF-8 与 Visual Studio 2008 中包含欧洲字符的 HTML 和 JavaScript 的代码页 1252
我一直在开发一个解析器,它将JavaScript作为输入并创建该JavaScript的压缩版本作为输出。
我最初发现解析器在尝试读取输入 JavaScript 时失败。 我相信这与Visual Studio 2008默认将其文件保存为UTF-8有关。 这样做时,VS 在 UTF-8 文件的开头包含几个隐藏字符。
作为解决方法,我使用 Visual Studio 将文件另存为代码页 1252。 这样做之后,我的解析器能够读取输入的 JavaScript。
请注意,我需要使用包含重音符号的特殊欧洲字符。
所以,我的问题是:
- 我应该使用代码页 1252 还是 UTF-8?
- 为什么 Visual Studio 默认将文件保存为 UTF-8?
- 如果我选择将文件保存为 1252 会导致问题吗?
- 在我看来,Eclipse 默认将文件保存为代码页 1252。 听起来对吗?
I have been developing a parser that takes JavaScript as input and creates a compressed version of that JavaScript as output.
I found initially that the parser failed when attempting to read the input JavaScript. I believe this has something to do with the fact that Visual Studio 2008 saves its files by default as UTF-8. And when doing so, VS includes a couple of hidden characters at the start of the UTF-8 file.
As a workaround, I used Visual Studio to save the file as code page 1252. After doing so, my parser was able to read the input JavaScript.
Note that I need to use special European characters that include accents.
So, here are my questions:
- Should I use code page 1252 or UTF-8?
- Why does Visual Studio save files as UTF-8 by default?
- If I choose to save files as 1252 will that lead to problems?
- It appears to me that Eclipse saves files as code page 1252 by default. Does that sound right?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
UTF-8 是一个更好的选择,因为它确实支持所有 已知字符,而使用 1252 时,您可能最终会得到需要从中丢失的字符(即使是欧洲语言)。
显然,VS2008 使用 字节顺序标记 保存 UTF-8 - 应该可以将其关闭,或者让解析器识别它,或者在两者之间删除 BOM。
UTF-8 is a better option as it really support all known characters, while with 1252 you might end up with characters that you need missing from it (even in European languages).
Apparently, VS2008 saves UTF-8 with a byte order mark - it should be possible to either switch that off, or have the parser recognize it, or strip the BOM somewhere in between.
utf-8 在文件开头有字节顺序标记 (BOM) 签名,一些编辑器和显然库不理解... http://en.wikipedia.org/wiki/Byte-order_mark
如果您可以绕过它,那么 UTF-8 无论如何都是当今的首选。 尝试在将 JS 代码提供给该解析器之前剥离 BOM 的第一个字节,或者在 IDE 中找到一个选项,如果它不能写入
1252 不会导致此问题,并且您不会遇到问题,但您会以过时的格式输出您的网络,我今天不会这样做,过去网络上有很多编码混乱的情况,不同语言的 iso 与 win 代码页......
utf-8 has byte order mark (BOM) signature at the beginning of a file which some editors, and obviously libraries don't understand... http://en.wikipedia.org/wiki/Byte-order_mark
if you can get around it, UTF-8 is preferred today by all means. try stripping that first bytes of BOM before giving the JS code to that parser, or find an option in your IDE if it can not write that
1252 doesn't cause this issue and you won't have problems with it, but you'll output your web in an outdated format, i wouldn't do it today, there was a lot of encoding mess on the web in the past with iso vs. win codepages for different languages...
使用 UTF-8。 1252 并不覆盖整个欧洲,因此在某些国家(中欧)您应该使用 1250,或更正确地说 - iso 8859-2。 所以唯一真正的选择是UTF-8。
Use UTF-8. 1252 does not cover whole Europe, so in some countries (central Europe) you should use 1250, or more correctly - iso 8859-2. So the only real option is UTF-8.
使用1252会出现问题吗?
取决于您的应用程序需要工作的国家/地区
从我的想法来看,1252(或 ISO 8859-1)将在
更全面的列表:
http://en.wikipedia.org/wiki/ISO/IEC_8859-1
因此,如果您的应用程序仅在上述国家/语言中使用,您可以使用 CP 1252。
Using 1252 will cause issues?
Depends on the countries you app needs to work in
From the Top of my head, 1252 (or ISO 8859-1) will work in
Oh, Wikipedia has a more comprehensive List:
http://en.wikipedia.org/wiki/ISO/IEC_8859-1
So you can use CP 1252 if your app is only used in the mentioned countries/languages.
BOM 位于文件的开头。
恕我直言,你应该使用 utf8,它是当今最新的。
BOM was at the start of the file.
IMHO you should use utf8, its very current nowadays.