将 Hi-Ansi 字符转换为等效的 Ascii 字符 (é -> e)

发布于 2024-09-06 21:39:28 字数 632 浏览 12 评论 0原文

Delphi 2007 中是否有一个例程可以根据区域设置（代码页）将 ANSI 表高范围（>127）中的字符转换为纯 ASCII 中的等效字符（<=127）？

我知道有些字符不能很好地翻译，但大多数可以，尤其是。在 192-255 范围内：

À → A
à → a
Ë → E
ë → e
Ç → C
ç → c
– （破折号） → - （连字符 - 可能比较棘手）
— （破折号） → - （连字符）

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

简单气质女生网名 2024-09-13 21:39:28

WideCharToMultiByte 对任何字符进行最佳映射指定字符集不支持的内容，包括剥离变音符号。您可以使用它并传递 20127 (US-ASCII) 作为代码页来完全执行您想要的操作。

function BestFit(const AInput: AnsiString): AnsiString;
const
  CodePage = 20127; //20127 = us-ascii
var
  WS: WideString;
begin
  WS := WideString(AInput);
  SetLength(Result, WideCharToMultiByte(CodePage, 0, PWideChar(WS),
    Length(WS), nil, 0, nil, nil));
  WideCharToMultiByte(CodePage, 0, PWideChar(WS), Length(WS),
    PAnsiChar(Result), Length(Result), nil, nil);
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
   ShowMessage(BestFit('aÀàËëÇç–—€¢Š'));
end;

用你的例子调用它会产生你正在寻找的结果，包括 emdash-to-minus 情况，我认为 Jeroen 的建议转换为标准化形式 D 不能处理这种情况。如果你确实想采用这种方法，Michael Kaplan 有一篇博客文章明确讨论了剥离变音符号（而不是标准化）通用），但它使用 C# 和 Vista 中引入的 API。您可以使用 FoldString api（任何 WinNT 版本）获得类似的东西。

当然，如果您只对一种字符集执行此操作，并且希望避免与 WideString 之间的转换产生的开销，那么 Padu 是正确的，简单的 for 循环和查找表也同样有效。

WideCharToMultiByte does best-fit mapping for any characters that aren't supported by the specified character set, including stripping diacritics. You can do exactly what you want by using that and passing 20127 (US-ASCII) as the codepage.

function BestFit(const AInput: AnsiString): AnsiString;
const
  CodePage = 20127; //20127 = us-ascii
var
  WS: WideString;
begin
  WS := WideString(AInput);
  SetLength(Result, WideCharToMultiByte(CodePage, 0, PWideChar(WS),
    Length(WS), nil, 0, nil, nil));
  WideCharToMultiByte(CodePage, 0, PWideChar(WS), Length(WS),
    PAnsiChar(Result), Length(Result), nil, nil);
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
   ShowMessage(BestFit('aÀàËëÇç–—€¢Š'));
end;

Calling that with your examples produces results you're looking for, including the emdash-to-minus case, which I don't think is handled by Jeroen's suggestion to convert to Normalization form D. If you did want to take that approach, Michael Kaplan has a blog post the explicitly discusses stripping diacritics (rather than normalization in general), but it uses C# and an API that was introduces in Vista. You can get something similar using the FoldString api (any WinNT release).

Of course if you're only doing this for one character set, and you want to avoid the overhead from converting to and from a WideString, Padu is correct that a simple for loop and a lookup table would be just as effective.

回复收藏 0 原文

月亮邮递员 2024-09-13 21:39:28

只是为了扩展 Craig 对 Delphi 2009 的回答：

如果您使用 Delphi 2009 及更高版本，您可以使用更具可读性的代码来获得相同的结果：

function OStripAccents(const aStr: String): String;
type
  USASCIIString = type AnsiString(20127);//20127 = us ascii
begin
  Result := String(USASCIIString(aStr));
end;

不幸的是，此代码仅适用于 MS Windows。在 Mac 上，重音符号不是用最合适的字符替换，而是用问号替换。

显然，Delphi 在 Windows 上内部使用 WideCharToMultiByte，而在 Mac 上则使用 iconv（请参阅 System.pas 中的 LocaleCharsFromUnicode）。
问题是不同操作系统上的这种不同行为是否应被视为错误并报告给 CodeCentral。

Just to extend Craig's answer for Delphi 2009:

If you use Delphi 2009 and newer, you can use a more readable code with the same result:

function OStripAccents(const aStr: String): String;
type
  USASCIIString = type AnsiString(20127);//20127 = us ascii
begin
  Result := String(USASCIIString(aStr));
end;

Unfortunately, this code does work only on MS Windows. On Mac, the accents are not replaced by best-fitted characters but by question marks.

Obviously, Delphi internally uses WideCharToMultiByte on Windows whereas on Mac iconv is used (see LocaleCharsFromUnicode in System.pas).
The question is if this different behaviour on different OS should be considered as bug and reported to CodeCentral.

回复收藏 0 原文