使用 Delphi 6/7，如何将不同字符集中的 AnsiString 转换为十六进制字符串 UTF-8？

发布于 2025-01-20 04:38:06 字数 613 浏览 5 评论 0原文

我需要用Delphi 6/7绘制条形码（QR）。该程序可以在各种Windows Locales中运行，数据来自输入框。

在此输入框上，用户可以选择一个字符集，并输入他自己的语言。这很好。输入数据仅来自同一代码。示例配置可能是：

Windows在西欧，ANSI文本
输入的CodePage 1252是在Shift-Jis ANSI Charset中完成的，

我需要将Shift-JIS跨越条形码。最健壮的方法是使用十六进制编码。

因此，我的问题是：如果代码ePage与Windows Locale不同，我该如何从shift-jis转到UTF-8编码中的十六进制字符串？

例如：我有字符串能ラ。根据UTF-8，需要将其转换为E883BDE383A9。我已经尝试过，但是结果是不同且毫无意义的：

String2Hex(UTF8Encode(ftext))

不幸的是，我不能只有一个用于宽度的输入框。但是，如果我能找到一种将ANSI文本转换为宽大的方法，则条形码模块也可以与Unicode字符串一起使用。

如果是相关的：我正在使用tec-it tbarcode dll。

原文

I need to draw a barcode (QR) with Delphi 6/7. The program can run in various windows locales, and the data is from an input box.

On this input box, the user can choose a charset, and input his own language. This works fine. The input data is only ever from the same codepage. Example configurations could be:

Windows is on Western Europe, Codepage 1252 for ANSI text
Input is done in Shift-JIS ANSI charset

I need to get the Shift-JIS across to the barcode. The most robust way is to use hex encoding.

So my question is: how do I go from Shift-JIS to a hex String in UTF-8 encoding, if the codepage is not the same as the Windows locale?

As example: I have the string 能ラ. This needs to be converted to E883BDE383A9 as per UTF-8. I have tried this but the result is different and meaningless:

String2Hex(UTF8Encode(ftext))

Unfortunately I can't just have an inputbox for WideStrings. But if I can find a way to convert the ANSI text to a WideString, the barcode module can work with Unicode Strings as well.

If it's relevant: I am using the TEC-IT TBarcode DLL.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

久而酒知 2025-01-27 04:38:06

创建和访问Unicode文本控件

比您想象的要容易，并且我过去使用全新的 Windows 2000 在方便的组件（如 tnt delphi Unicode控件。拥有有关如何在不使用 delphi 的VCL的情况下创建A Windows GUI程序的背景知识，并手动创建所有内容会有所帮助 - 否则，这也是对此的介绍。

首先将属性添加到您的表格中，因此我们以后可以轻松访问新控件：

 类型
  tform1 = class（tform）
...
  私人的
    Hedit：Thandle； //我们的新Unicode控件
  结尾;

现在只需在您喜欢的事件中创建它 - 我选择FormCreate：

  //创建子控制，键入“编辑”
  self.hedit：= createWIndoww（pwidechar（widestring（'edit'）），pwidechar（widestring（'myinput'）），ws_child或ws_visible，10，10，100，200，200，25，handle，0，0，hinstance，nil）;
  如果self.hedit = 0，则开始//失败。获取错误代码，因此我们知道为什么失败。
    // getlastror（）;
    出口;
  结尾;

  //添加一个沉没的3D边缘（从历史上讲，嗯）
  如果setWindowlong（self.hedit，gwl_exstyle，ws_ex_clientedge）= 0然后开始
    // getlastror（）;
    出口;
  结尾;

  //应用新的扩展样式：控件的框架已更改
  如果不是setWindowpos（self.hedit，0、0、0、0、0、0，swp_framechanged或swp_nomove或swp_nozorder或swp_nosize）然后开始
    // getlastror（）;
    出口;
  结尾;

  //系统的默认字体没有帮助，让我们使用此表格的字体（希望是tahoma）
  sendmessage（self.hedit，wm_setfont，self.font.handle，1）;

在某个时候您想获得编辑的内容。再次：如果没有 delphi 的VCL，这是如何完成的，而是直接与 winapi 一起完成？这次我使用了按钮的单击事件：

  var
  Stext：宽度；
  伊伦，伊雷尔：整数；
开始
  //要复制几个字符？
  ilen：= getWindowTextLengthW（self.hedit）;
  如果iLen = 0，则iError：= getLasterRor（）else iError：= 0; //可能是空的，可能是一个错误
  如果iError＆lt;＆gt; 0然后开始
    出口;
  结尾;

  Inc（Ilen）； //潜在的尾声＃0
  setLength（Stext，Ilen）; //预备空间
  如果getWindowTextw（self.hedit，@stext [1]，iLen）= 0，然后开始//复制文本
    // getlastror（）;
    出口;
  结尾;

  //证明非ANSI文本是通过非ANSI控制复制的
  MessageBoxw（handle，pwidechar（stext），nil，0）;
结尾;

存在详细问题，例如无法通过 tab 无法达到此新控件，但是我们基本上已经重新发射<< em> delphi 的VCL，因此这些细节在其他时间都需要照顾。

转换codepages

winapi 在 codepages （字符串）或（宽度）。出于历史原因（UCS-2及以后）UTF-16 le符合所有内容，因此这始终是来自代码时的隐含目标：

// Converting an ANSI charset (String) to UTF-16 LE (Widestring)
function StringToWideString( s: AnsiString; iSrcCodePage: DWord ): WideString;
var
  iLenDest, iLenSrc: Integer;
begin
  iLenSrc:= Length( s );
  iLenDest:= MultiByteToWideChar( iSrcCodePage, 0, PChar(s), iLenSrc, nil, 0 );  // How much CHARACTERS are needed?
  SetLength( result, iLenDest );
  if iLenDest> 0 then begin  // Otherwise we get the error ERROR_INVALID_PARAMETER
    if MultiByteToWideChar( iSrcCodePage, 0, PChar(s), iLenSrc, PWideChar(result), iLenDest )= 0 then begin
      //GetLastError();
      result:= '';
    end;
  end;
end;

源代码epage取决于您：也许

1252 for for' Windows-1252“ = ANSI拉丁1个多语言（西欧）
932用于“ Shift-Jis X-0208” = IBM-PC Japan Mix（DOS/V）（DBCS）（DBCS）（897 + 301）
<代码> 28595 “ ISO 8859-5” = Cyrillic
65001“ UTF-8”，

但是，如果您想将一个代码转换为另一个代码，源和目标都不应是UTF-16 LE，然后您必须向后走：

从ANSI转换为从宽度
转换为不同的ANSI ，

// Converting UTF-16 LE (Widestring) to an ANSI charset (String, hopefully you want 65001=UTF-8)
function WideStringToString( s: WideString; iDestCodePage: DWord= CP_UTF8 ): AnsiString;
var
  iLenDest, iLenSrc: Integer;
begin
  iLenSrc:= Length( s );
  iLenDest:= WideCharToMultiByte( iDestCodePage, 0, PWideChar(s), iLenSrc, nil, 0, nil, nil );
  SetLength( result, iLenDest );
  if iLenDest> 0 then begin  // Otherwise we get the error ERROR_INVALID_PARAMETER
    if WideCharToMultiByte( iDestCodePage, 0, PWideChar(s), iLenSrc, PChar(result), iLenDest, nil, nil )= 0 then begin
      //GetLastError();
      result:= '';
    end;
  end;
end;

根据每个 Windows 安装，并不支持每个编码，或支持不同的编码epepage，或者支持不同的编码。因此转换尝试可能会失败。立即瞄准A Unicode 程序将更加健壮，因为这是每个 Windows 安装的确定支持（除非您仍然处理 Windows 95 ， Windows 98 或 Windows ME ）。

将所有内容组合

在一起，您可以获得将其放在一起需要的一切：

您可以拥有 unicode 文本控制以直接在UTF-16 LE中获取它，
您可以使用ANSI文本控制来将输入转换为UTF -16 LE
可以从UTF-16 LE（宽）转换为UTF-8（ANSI）

大小

UTF-8主要是最好的选择，但是当您的目标受众是亚洲人时，大小的Wise UTF-16可能需要更少的字节：在UTF-8中，和每个都需要3个字节，但是在UTF-16中，每个字节都只需要2个字节。我猜，根据您的QR条形码大小是一个重要因素。

同样，不要通过将二进制数据（每字节8位）转换为ASCII文本（每个字符显示4位，但本身需要1个字节= 8位）来浪费。看看 base64 ，将6位编码到每个字节中。您一生中已经遇到了无数次的概念，因为它用于电子邮件附件。

Creating and accessing a Unicode text control

This is easier than you may think and I did so in the past with the brand new Windows 2000 when convenient components like Tnt Delphi Unicode Controls were not available. Having background knowledge on how to create a Windows GUI program without using Delphi's VCL and manually creating everything helps - otherwise this is also an introduction of it.

First add a property to your form, so we can later access the new control easily:

type
  TForm1= class(TForm)
...
  private
    hEdit: THandle;  // Our new Unicode control
  end;

Now just create it at your favorite event - I chose FormCreate:

  // Creating a child control, type "edit"
  self.hEdit:= CreateWindowW( PWideChar(WideString('edit')), PWideChar(WideString('myinput')), WS_CHILD or WS_VISIBLE, 10, 10, 200, 25, Handle, 0, HINSTANCE, nil );
  if self.hEdit= 0 then begin  // Failed. Get error code so we know why it failed.
    //GetLastError();
    exit;
  end;

  // Add a sunken 3D edge (well, historically speaking)
  if SetWindowLong( self.hEdit, GWL_EXSTYLE, WS_EX_CLIENTEDGE )= 0 then begin
    //GetLastError();
    exit;
  end;

  // Applying new extended style: the control's frame has changed
  if not SetWindowPos( self.hEdit, 0, 0, 0, 0, 0, SWP_FRAMECHANGED or SWP_NOMOVE or SWP_NOZORDER or SWP_NOSIZE ) then begin
    //GetLastError();
    exit;
  end;

  // The system's default font is no help, let's use this form's font (hopefully Tahoma)
  SendMessage( self.hEdit, WM_SETFONT, self.Font.Handle, 1 );

At some point you want to get the edit's content. Again: how is this done without Delphi's VCL but instead directly with the WinAPI? This time I used a button's Click event:

var
  sText: WideString;
  iLen, iError: Integer;
begin
  // How many CHARACTERS to copy?
  iLen:= GetWindowTextLengthW( self.hEdit );
  if iLen= 0 then iError:= GetLastError() else iError:= 0;  // Could be empty, could be an error
  if iError<> 0 then begin
    exit;
  end;

  Inc( iLen );  // For a potential trailing #0
  SetLength( sText, iLen );  // Reserve space
  if GetWindowTextW( self.hEdit, @sText[1], iLen )= 0 then begin  // Copy text
    //GetLastError();
    exit;
  end;

  // Demonstrate that non-ANSI text was copied out of a non-ANSI control
  MessageBoxW( Handle, PWideChar(sText), nil, 0 );
end;

There are detail issues, like not being able to reach this new control via Tab, but we're already basically re-inventing Delphi's VCL, so those are details to take care about at other times.

Converting codepages

The WinAPI deals either in codepages (Strings) or in UTF-16 LE (WideStrings). For historical reasons (UCS-2 and later) UTF-16 LE fits everything, so this is always the implied target to achieve when coming from codepages:

// Converting an ANSI charset (String) to UTF-16 LE (Widestring)
function StringToWideString( s: AnsiString; iSrcCodePage: DWord ): WideString;
var
  iLenDest, iLenSrc: Integer;
begin
  iLenSrc:= Length( s );
  iLenDest:= MultiByteToWideChar( iSrcCodePage, 0, PChar(s), iLenSrc, nil, 0 );  // How much CHARACTERS are needed?
  SetLength( result, iLenDest );
  if iLenDest> 0 then begin  // Otherwise we get the error ERROR_INVALID_PARAMETER
    if MultiByteToWideChar( iSrcCodePage, 0, PChar(s), iLenSrc, PWideChar(result), iLenDest )= 0 then begin
      //GetLastError();
      result:= '';
    end;
  end;
end;

The source codepage is up to you: maybe

1252 for "Windows-1252" = ANSI Latin 1 Multilingual (Western Europe)
932 for "Shift-JIS X-0208" = IBM-PC Japan MIX (DOS/V) (DBCS) (897 + 301)
28595 for "ISO 8859-5" = Cyrillic
65001 for "UTF-8"

However, if you want to convert from one codepage to another, and both source and target shall not be UTF-16 LE, then you must go forth and back:

Convert from ANSI to WIDE
Convert from WIDE to a different ANSI

// Converting UTF-16 LE (Widestring) to an ANSI charset (String, hopefully you want 65001=UTF-8)
function WideStringToString( s: WideString; iDestCodePage: DWord= CP_UTF8 ): AnsiString;
var
  iLenDest, iLenSrc: Integer;
begin
  iLenSrc:= Length( s );
  iLenDest:= WideCharToMultiByte( iDestCodePage, 0, PWideChar(s), iLenSrc, nil, 0, nil, nil );
  SetLength( result, iLenDest );
  if iLenDest> 0 then begin  // Otherwise we get the error ERROR_INVALID_PARAMETER
    if WideCharToMultiByte( iDestCodePage, 0, PWideChar(s), iLenSrc, PChar(result), iLenDest, nil, nil )= 0 then begin
      //GetLastError();
      result:= '';
    end;
  end;
end;

As per every Windows installation not every codepage is supported, or different codepages are supported, so conversion attempts may fail. It would be more robust to aim for a Unicode program right away, as that is what every Windows installation definitly supports (unless you still deal with Windows 95, Windows 98 or Windows ME).

Combining everything

Now you got everything you need to put it together:

you can have a Unicode text control to directly get it in UTF-16 LE
you can use an ANSI text control to then convert the input to UTF-16 LE
you can convert from UTF-16 LE (WIDE) to UTF-8 (ANSI)

Size

UTF-8 is mostly the best choice, but size wise UTF-16 may need fewer bytes in total when your target audience is Asian: in UTF-8 both 能 and ラ need 3 bytes each, but in UTF-16 both only need 2 bytes each. As per your QR barcode size is an important factor, I guess.

Likewise don't waste by turning binary data (8 bits per byte) into ASCII text (displaying 4 bits per character, but itself needing 1 byte = 8 bits again). Have a look at Base64 which encodes 6 bits into every byte. A concept that you encountered countless times in your life already, because it's used for email attachments.

回复收藏 0 原文

~没有更多了~