delphi 2009 unicode + 安西问题
我正在将 isapi(页面生成器)应用程序从 delphi 7 移植到 delphi 2009,页面基于 UTF8 的 html 文件。
一切都很顺利,除了 Onhtmltag 被触发时,我将透明标签替换为带有特殊字符的任何值,例如重音字符 (áé...) 这些字符在输出中被替换为 � 字符。
怎么了?
I'm porting an isapi (pageproducers) application from delphi 7 to delphi 2009, the pages are based on html files in UTF8.
Everything goes well except when Onhtmltag is fired and I replace a transparent tag with any value with special characters like accented characters (áé...) Those characters are replaced in the output with an � character.
What's wrong?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
作为调试过程的一部分,您应该准确找出浏览器收到的问号字符的字节值。
您应该知道,Delphi 2009 的字符串类型是 Unicode,而之前的所有版本都是 ANSI。 Delphi 7 引入了
Utf8String
类型,但 Delphi 2009 使该类型变得特殊。 如果您不使用该类型来保存编码为 UTF-8 的字符串,那么您应该开始这样做。 当您将一个变量分配给另一个变量时,Utf8String
变量中保存的值将自动转换为UnicodeString
值。如果您将 UTF-8 编码的字符串存储在普通的
AnsiString
变量中,那么当您将它们分配给UnicodeString
时,它们将使用默认系统代码页转换为 Unicode >。 那不是你想要的。如果您要将 UTF-8 编码的文字分配给
string
类型的变量,请停止这样做。 该类型期望其值被编码为 UTF-16,就像WideString
始终一样。如果您使用
LoadFromFile
将文件加载到TStrings
后代中,那么您需要开始使用该方法的第二个参数,该参数告诉它要使用什么编码。 UTF-8 编码文件应使用TEncoding.UTF8
。 默认值为TEncoding.Unicode
,即小端 UTF-16。As part of your debugging procedure, you should go find out exactly what byte value(s) the browser receives for the question-mark character.
As you should know, Delphi 2009's string type is Unicode, whereas all previous version were ANSI. Delphi 7 introduced the
Utf8String
type, but Delphi 2009 made that type special. If you're not using that type for holding strings that are encoded as UTF-8, then you should start doing so. Values held inUtf8String
variables will be converted toUnicodeString
values automatically when you assign one to the other.If you're storing your UTF-8-encoded strings in ordinary
AnsiString
variables, then they will be converted to Unicode using the default system code page if you assign them to aUnicodeString
. That's not what you want.If you're assigning UTF-8-encoded literals to variables of type
string
, stop that. That type expects its values to be encoded as UTF-16, just likeWideString
always has.If you are loading your files into a
TStrings
descendant withLoadFromFile
, then you need to start using that method's second parameter, which tells it what encoding to use. UTF-8-encoded files should useTEncoding.UTF8
. The default isTEncoding.Unicode
, which is little-endian UTF-16.这可能是字符编码问题。
Delphi IDE通常使用Windows-1252或UTF-16对源代码进行编码。
HTML 通常使用 UTF-8。
您可能需要在这些编码之间进行一些音译。
为此,您需要找出确切使用的编码(例如 Rob 提到的)。
或者恢复为 HTML 转义重音字符(例如 Ralph 提及)
您可以发布一个显示问题的小应用程序吗? (你可以发邮件给我,任何用户名中有jeroen、域名中有pluimers.com的内容都会到达我的邮箱)。
——杰罗恩
This is probably a character encoding issue.
The Delphi IDE usually uses Windows-1252 or UTF-16 to encode source code.
HTML often uses UTF-8.
You probably need some transliteration between those encodings.
For that you need to find out what encodings are used exactly (like Rob mentions).
Or revert to HTML escaping accented characters (like Ralph mentions)
Can you post a small app that shows the problem? (you can email me, about anything that has jeroen in the username and pluimers.com in the domain name will arrive in my mailbox).
--jeroen
谢谢您的帮助,经过一些测试,问题非常非常简单(或者也很愚蠢)
不需要在 unicodestring utf8string ansisstring Widestring 之间手动翻译。 Delphi 2009 字符串的使用近乎完美。
Thank you for your help, after some test the problem was very very simple (or stupid also)
No need to translate manually between unicodestring utf8string ansistring widestring. Delphi 2009 string usage is near to perfect.