将 Hi-Ansi 字符转换为等效的 Ascii 字符 (é -> e)
Delphi 2007 中是否有一个例程可以根据区域设置(代码页)将 ANSI 表高范围(>127)中的字符转换为纯 ASCII 中的等效字符(<=127)?
我知道有些字符不能很好地翻译,但大多数可以,尤其是。在 192-255 范围内:
- À → A
- à → a
- Ë → E
- ë → e
- Ç → C
- ç → c
- – (破折号) → - (连字符 - 可能比较棘手)
- — (破折号) → - (连字符)
Is there a routine available in Delphi 2007 to convert the characters in the high range of the ANSI table (>127) to their equivalent ones in pure ASCII (<=127) according to a locale (codepage)?
I know some chars cannot translate well but most can, esp. in the 192-255 range:
- À → A
- à → a
- Ë → E
- ë → e
- Ç → C
- ç → c
- – (en dash) → - (hyphen - that can be trickier)
- — (em dash) → - (hyphen)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
WideCharToMultiByte 对任何字符进行最佳映射指定字符集不支持的内容,包括剥离变音符号。您可以使用它并传递 20127 (US-ASCII) 作为代码页来完全执行您想要的操作。
用你的例子调用它会产生你正在寻找的结果,包括 emdash-to-minus 情况,我认为 Jeroen 的建议转换为标准化形式 D 不能处理这种情况。如果你确实想采用这种方法,Michael Kaplan 有一篇 博客文章 明确讨论了剥离变音符号(而不是标准化)通用),但它使用 C# 和 Vista 中引入的 API。您可以使用 FoldString api(任何 WinNT 版本)获得类似的东西。
当然,如果您只对一种字符集执行此操作,并且希望避免与 WideString 之间的转换产生的开销,那么 Padu 是正确的,简单的 for 循环和查找表也同样有效。
WideCharToMultiByte does best-fit mapping for any characters that aren't supported by the specified character set, including stripping diacritics. You can do exactly what you want by using that and passing 20127 (US-ASCII) as the codepage.
Calling that with your examples produces results you're looking for, including the emdash-to-minus case, which I don't think is handled by Jeroen's suggestion to convert to Normalization form D. If you did want to take that approach, Michael Kaplan has a blog post the explicitly discusses stripping diacritics (rather than normalization in general), but it uses C# and an API that was introduces in Vista. You can get something similar using the FoldString api (any WinNT release).
Of course if you're only doing this for one character set, and you want to avoid the overhead from converting to and from a WideString, Padu is correct that a simple for loop and a lookup table would be just as effective.
只是为了扩展 Craig 对 Delphi 2009 的回答:
如果您使用 Delphi 2009 及更高版本,您可以使用更具可读性的代码来获得相同的结果:
不幸的是,此代码仅适用于 MS Windows。在 Mac 上,重音符号不是用最合适的字符替换,而是用问号替换。
显然,Delphi 在 Windows 上内部使用 WideCharToMultiByte,而在 Mac 上则使用 iconv(请参阅 System.pas 中的 LocaleCharsFromUnicode)。
问题是不同操作系统上的这种不同行为是否应被视为错误并报告给 CodeCentral。
Just to extend Craig's answer for Delphi 2009:
If you use Delphi 2009 and newer, you can use a more readable code with the same result:
Unfortunately, this code does work only on MS Windows. On Mac, the accents are not replaced by best-fitted characters but by question marks.
Obviously, Delphi internally uses WideCharToMultiByte on Windows whereas on Mac iconv is used (see LocaleCharsFromUnicode in System.pas).
The question is if this different behaviour on different OS should be considered as bug and reported to CodeCentral.
我相信你最好的选择是创建一个查找表。
I believe your best bet is creating a lookup table.
您正在寻找的是标准化。
Michael Kaplan 写了一篇关于规范化的精彩博客文章。
它不会立即解决您的问题,但会为您指明正确的方向。
——杰罗恩
What you are looking for is normalization.
Michael Kaplan wrote a nice blog article about normalization.
It does not immediately solve your problem, but points you in the right direction.
--jeroen