使用 Delphi 6/7,如何将不同字符集中的 AnsiString 转换为十六进制字符串 UTF-8?
我需要用Delphi 6/7绘制条形码(QR)。该程序可以在各种Windows Locales中运行,数据来自输入框。
在此输入框上,用户可以选择一个字符集,并输入他自己的语言。这很好。输入数据仅来自同一代码。示例配置可能是:
- Windows在西欧,ANSI文本
- 输入的CodePage 1252是在Shift-Jis ANSI Charset中完成的,
我需要将Shift-JIS跨越条形码。最健壮的方法是使用十六进制编码。
因此,我的问题是:如果代码ePage与Windows Locale不同,我该如何从shift-jis转到UTF-8编码中的十六进制字符串?
例如:我有字符串能ラ
。根据UTF-8,需要将其转换为E883BDE383A9
。我已经尝试过,但是结果是不同且毫无意义的:
String2Hex(UTF8Encode(ftext))
不幸的是,我不能只有一个用于宽度的输入框。但是,如果我能找到一种将ANSI文本转换为宽大的方法,则条形码模块也可以与Unicode字符串一起使用。
如果是相关的:我正在使用tec-it tbarcode dll。
I need to draw a barcode (QR) with Delphi 6/7. The program can run in various windows locales, and the data is from an input box.
On this input box, the user can choose a charset, and input his own language. This works fine. The input data is only ever from the same codepage. Example configurations could be:
- Windows is on Western Europe, Codepage 1252 for ANSI text
- Input is done in Shift-JIS ANSI charset
I need to get the Shift-JIS across to the barcode. The most robust way is to use hex encoding.
So my question is: how do I go from Shift-JIS to a hex String in UTF-8 encoding, if the codepage is not the same as the Windows locale?
As example: I have the string 能ラ
. This needs to be converted to E883BDE383A9
as per UTF-8. I have tried this but the result is different and meaningless:
String2Hex(UTF8Encode(ftext))
Unfortunately I can't just have an inputbox for WideStrings. But if I can find a way to convert the ANSI text to a WideString, the barcode module can work with Unicode Strings as well.
If it's relevant: I am using the TEC-IT TBarcode DLL.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
创建和访问Unicode文本控件
比您想象的要容易,并且我过去使用全新的 Windows 2000 在方便的组件(如 tnt delphi Unicode控件 。拥有有关如何在不使用 delphi 的VCL的情况下创建A Windows GUI程序的背景知识,并手动创建所有内容会有所帮助 - 否则,这也是对此的介绍。
首先将属性添加到您的表格中,因此我们以后可以轻松访问新控件:
现在只需在您喜欢的事件中创建它 - 我选择
FormCreate
:在某个时候您想获得编辑的内容。再次:如果没有 delphi 的VCL,这是如何完成的,而是直接与 winapi 一起完成?这次我使用了按钮的
单击
事件:存在详细问题,例如无法通过 tab 无法达到此新控件,但是我们基本上已经重新发射<< em> delphi 的VCL,因此这些细节在其他时间都需要照顾。
转换codepages
winapi 在 codepages (字符串)或(宽度)。出于历史原因(UCS-2及以后)UTF-16 le符合所有内容,因此这始终是来自代码时的隐含目标:
源代码epage取决于您:也许
1252
for for' Windows-1252“ = ANSI拉丁1个多语言(西欧)932
用于“ Shift-Jis X-0208” = IBM-PC Japan Mix(DOS/V)(DBCS)(DBCS)(897 + 301)65001
“ UTF-8”,但是,如果您想将一个代码转换为另一个代码,源和目标都不应是UTF-16 LE,然后您必须向后走:
根据每个 Windows 安装, 并不支持每个编码,或支持不同的编码epepage,或者支持不同的编码。因此转换尝试可能会失败。立即瞄准A Unicode 程序将更加健壮,因为这是每个 Windows 安装的确定支持(除非您仍然处理 Windows 95 , Windows 98 或 Windows ME )。
将所有内容组合
在一起,您可以获得将其放在一起需要的一切:
大小
UTF-8主要是最好的选择,但是当您的目标受众是亚洲人时,大小的Wise UTF-16可能需要更少的字节:在UTF-8中,
和
每个都需要3个字节,但是在UTF-16中,每个字节都只需要2个字节。我猜,根据您的QR条形码大小是一个重要因素。
同样,不要通过将二进制数据(每字节8位)转换为ASCII文本(每个字符显示4位,但本身需要1个字节= 8位)来浪费。看看 base64 ,将6位编码到每个字节中。您一生中已经遇到了无数次的概念,因为它用于电子邮件附件。
Creating and accessing a Unicode text control
This is easier than you may think and I did so in the past with the brand new Windows 2000 when convenient components like Tnt Delphi Unicode Controls were not available. Having background knowledge on how to create a Windows GUI program without using Delphi's VCL and manually creating everything helps - otherwise this is also an introduction of it.
First add a property to your form, so we can later access the new control easily:
Now just create it at your favorite event - I chose
FormCreate
:At some point you want to get the edit's content. Again: how is this done without Delphi's VCL but instead directly with the WinAPI? This time I used a button's
Click
event:There are detail issues, like not being able to reach this new control via Tab, but we're already basically re-inventing Delphi's VCL, so those are details to take care about at other times.
Converting codepages
The WinAPI deals either in codepages (Strings) or in UTF-16 LE (WideStrings). For historical reasons (UCS-2 and later) UTF-16 LE fits everything, so this is always the implied target to achieve when coming from codepages:
The source codepage is up to you: maybe
1252
for "Windows-1252" = ANSI Latin 1 Multilingual (Western Europe)932
for "Shift-JIS X-0208" = IBM-PC Japan MIX (DOS/V) (DBCS) (897 + 301)28595
for "ISO 8859-5" = Cyrillic65001
for "UTF-8"However, if you want to convert from one codepage to another, and both source and target shall not be UTF-16 LE, then you must go forth and back:
As per every Windows installation not every codepage is supported, or different codepages are supported, so conversion attempts may fail. It would be more robust to aim for a Unicode program right away, as that is what every Windows installation definitly supports (unless you still deal with Windows 95, Windows 98 or Windows ME).
Combining everything
Now you got everything you need to put it together:
Size
UTF-8 is mostly the best choice, but size wise UTF-16 may need fewer bytes in total when your target audience is Asian: in UTF-8 both
能
andラ
need 3 bytes each, but in UTF-16 both only need 2 bytes each. As per your QR barcode size is an important factor, I guess.Likewise don't waste by turning binary data (8 bits per byte) into ASCII text (displaying 4 bits per character, but itself needing 1 byte = 8 bits again). Have a look at Base64 which encodes 6 bits into every byte. A concept that you encountered countless times in your life already, because it's used for email attachments.