RichEdit 中的 Unicode RTF 文本

发布于 2024-08-12 07:09:49 字数 518 浏览 10 评论 0原文

我在使用 RichEdit 控件显示 unicode RTF 文本时遇到问题。我的应用程序是 Unicode,因此所有字符串都是 wchar_t 字符串。
如果我将控件创建为“RichEdit20A”,我可以使用例如 SetWindowText,并且文本将以正确的格式显示。如果我将控件创建为“RichEdit20W”,则使用 SetWindowText 会逐字显示文本,即显示所有 RTF 代码。如果我使用 EM_SETTEXTEX 参数,指定代码页 1200(MSDN 告诉我用于指示 unicode),也会发生同样的情况。
我尝试过使用 StreamIn 函数,但这似乎仅在我以 ASCII 文本进行流式传输时才有效。如果我以宽字符进行流式传输,那么我会在控件中看到空文本。我使用 SF_RTF|SF_UNICODE 标志,MSDN 提示可能不允许这种组合。

那么该怎么办呢?有没有什么方法可以将 Widechars 放入 RichEdit 而不丢失 RTF 解释,或者我是否需要对其进行编码?我考虑过尝试 UTF-8,或者也许使用 RTF 中的编码工具,但我不确定最好的选择是什么。

I'm having trouble getting a RichEdit control to display unicode RTF text. My application is Unicode, so all strings are wchar_t strings.
If I create the control as "RichEdit20A" I can use e.g. SetWindowText, and the text is displayed with the proper formatting. If I create the control as "RichEdit20W" then using SetWindowText shows the text verbatim, i.e. all the RTF code is displayed. The same happens if I use the EM_SETTEXTEX parameter, specifying codepage 1200 which MSDN tells me is used to indicate unicode.
I've tried using the StreamIn function, but this only seems to work if I stream in ASCII text. If I stream in widechars then I get empty text in the control. I use the SF_RTF|SF_UNICODE flags, and MSDN hints that this combination may not be allowed.

So what to do? Is there any way to get widechars into a RichEdit without losing RTF interpretation, or do I need to encode it? I've thought about trying UTF-8, or perhaps use the encoding facilities in RTF, but am unsure what the best choice is.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

眉黛浅 2024-08-19 07:09:50

我最近不得不这样做,并注意到你所做的同样的观察。

看起来,尽管 MSDN 几乎是这么建议的,“RTF”解析器只能使用 8 位编码。所以我最终做的是使用 UTF-8,这是一种 8 位编码,但是仍然可以表示全范围的 Unicode 字符。您可以通过 WideCharToMultiByte()

PWSTR WideString = /* Some string... */;
DWORD WideLength = wcslen(WideString) + 1;
PSTR Utf8;
DWORD Length;
INT ReturnedLength;

// A utf8 representation shouldn't be longer than 4 times the size
// of the utf16 one.
Length = WideLength * 4;
Utf8 = malloc(Length);
if (!Utf8) { /* TODO: handle failure */ }

ReturnedLength = WideCharToMultiByte(CP_UTF8,
                                     0,
                                     WideString,
                                     WideLength-1,
                                     Utf8,
                                     Length-1,
                                     NULL,
                                     NULL);
if (ReturnedLength)
{
   // Need to zero terminate...
   Utf8[ReturnedLength] = 0;
}
else { /* TODO: handle failure */ }

一旦你有了 UTF-8 格式的它,你就可以这样做:

SETTEXTEX TextInfo = {0};

TextInfo.flags = ST_SELECTION;
TextInfo.codepage = CP_UTF8;

SendMessage(hRichText, EM_SETTEXTEX, (WPARAM)&TextInfo, (LPARAM)Utf8);

当然(我最初忽略了这一点,但虽然我很明确......):

free(Utf8);

I had to do this recently, and noticed the same sorts of observations you're making.

It seems that, despite what MSDN almost suggests, the "RTF" parser will only work with 8-bit encodings. So what I ended up doing was using UTF-8, which is an 8 bit encoding but still can represent the full range of Unicode characters. You can get UTF-8 from a PWSTR via WideCharToMultiByte():

PWSTR WideString = /* Some string... */;
DWORD WideLength = wcslen(WideString) + 1;
PSTR Utf8;
DWORD Length;
INT ReturnedLength;

// A utf8 representation shouldn't be longer than 4 times the size
// of the utf16 one.
Length = WideLength * 4;
Utf8 = malloc(Length);
if (!Utf8) { /* TODO: handle failure */ }

ReturnedLength = WideCharToMultiByte(CP_UTF8,
                                     0,
                                     WideString,
                                     WideLength-1,
                                     Utf8,
                                     Length-1,
                                     NULL,
                                     NULL);
if (ReturnedLength)
{
   // Need to zero terminate...
   Utf8[ReturnedLength] = 0;
}
else { /* TODO: handle failure */ }

Once you have it in UTF-8, you can do:

SETTEXTEX TextInfo = {0};

TextInfo.flags = ST_SELECTION;
TextInfo.codepage = CP_UTF8;

SendMessage(hRichText, EM_SETTEXTEX, (WPARAM)&TextInfo, (LPARAM)Utf8);

And of course (I left this out originally, but while I'm being explicit...):

free(Utf8);
本王不退位尔等都是臣 2024-08-19 07:09:50

RTF 是 ASCII,任何 ASCII 之外的字符都将使用转义序列进行编码。
RTF 1.9.1 规范 ( 2008 年 3 月)

RTF is ASCII, any charactor out of ASCII would be encoded using escape sequence.
RTF 1.9.1 specification (March 2008)

骑趴 2024-08-19 07:09:50

看一下 rtf 规范中的 \uN 文字,因此您必须将宽字符串转换为 unicode 字符字符串,例如 \u902?\u300?\u888?
http://www.biblioscape.com/rtf15_spec.htm#Heading9
本例中的数字代表字符十进制代码,问号是在 RichEdit 不支持 unicode (RichEdit v1.0) 的情况下将替换 unicode 字符的字符。

例如,对于 unicode 字符串 L"TIME",rtf 数据将为“\u84?\u73?\u77?\u69?”

Take a look at \uN literal in rtf specification so you have to convert your wide string to string of unicode characters like \u902?\u300?\u888?
http://www.biblioscape.com/rtf15_spec.htm#Heading9
The numbers in this case represent the characters decimal code and the question mark is the character which will replace the unicode char in case if RichEdit does not support unicode (RichEdit v1.0).

For example for unicode string L"TIME" the rtf data will be "\u84?\u73?\u77?\u69?"

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文