Flash CS4/AS3:控制台和文本区域之间打印 UTF-16 字符的不同行为

发布于 2024-10-29 05:03:07 字数 541 浏览 4 评论 0原文

trace(escape("д"));

将打印“%D0%B4”,这是该字符的正确 URL 编码(相当于“A”的西里尔字母)。

但是,如果我这样做......

myTextArea.htmlText += unescape("%D0%B4");

打印出来的是:

д

,这当然是不正确的。不过,简单地跟踪上面的 unescape 就会返回正确的西里尔字符!对于此 texarea,转义“д”将返回其 unicode 代码点“%u0434”。

我不确定到底发生了什么事情搞砸了,但是......

网络编码中的 UTF-16 д 是: %FE%FF%00%D0%00%B4

网络编码中的 UTF-16 д 是: %00%D0%00%B4

所以它在开始时用一些东西填充这个值。为什么跟踪提供的文本与(空)文本区域的打印不同?发生什么事了?

如果这种事情可能的话,所讨论的文本区域没有附加奇怪的编码属性。

trace(escape("д"));

will print "%D0%B4", the correct URL encoding for this character (Cyrillic equivalent of "A").

However, if I were to do..

myTextArea.htmlText += unescape("%D0%B4");

What gets printed is:

д

which is of course incorrect. Simply tracing the above unescape returns the correct Cyrillic character, though! For this texarea, escaping "д" returns its unicode code-point "%u0434".

I'm not sure what exactly is happening to mess this up, but...

UTF-16 д in web encoding is: %FE%FF%00%D0%00%B4

Whereas

UTF-16 д in web encoding is: %00%D0%00%B4

So it's padding this value with something at the beginning. Why would a trace provide different text than a print to an (empty) textarea? What's goin' on?

The textarea in question has no weird encoding properties attached to it, if that sort of thing is even possible.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

阪姬 2024-11-05 05:03:07

问题是 unescapeescape 也可能是一个问题,但它不是本例中的罪魁祸首)。这些函数不支持多字节。 escape 的作用是这样的:它接受输入字符串中的一个字节,并返回其十六进制表示形式,并在前面添加 %unescape 则相反。这里的关键点是它们使用字节,而不是字符

您想要的是 encodeURIComponent / decodeURIComponent。两者都使用 utf-8 作为字符串编码方案(flash 到处都使用这种编码)。请注意,它不是 utf-16(只要涉及 Flash,您就不应该关心它)。

encodeURIComponent("д"); //%D0%B4
decodeURIComponent("%D0%B4"); // д

现在,如果您想更深入地了解,请阅读以下内容(假设您对 utf-8 的工作原理有基本了解)。

escape("д")

这返回

%D0%B4

为什么?

“д”被 flash 视为 utf-8。该字符的代码点是 0x0434。

在二进制中:

0000 0100 0011 0100

它适合两个 utf-8 字节,因此它是这样编码的(其中 e 表示编码位,p 表示有效负载位):

1101 0000 1011 0100
eeep pppp eepp pppp 

将其转换为十六进制,我们得到:

0xd0  0xb4

所以,0xd0,0xb4是utf-8编码的“д”。

这被馈送到escapeescape 看到两个字节,并给出:

%d0%b4

现在,您将其传递给 unescape。但是 unescape 有点死脑筋,所以它总是认为一个字节是一个字节,并且与一个字符是一样的。就 unescape 而言,您有两个字节,因此,您有两个字符。如果您查找 0xd0 和 0xb4 的代码点,您会看到以下内容:

0xd0 -> Ð
0xb4 -> ´

因此,unescape 返回一个由两个字符 д 组成的字符串 (而不是弄清楚它得到的两个字节实际上只是一个字符,utf-8 编码)。然后,当您分配文本属性时,您实际上并不是传递д´,而是传递д`,这就是您在文本区域中看到的内容。

The problem is unescape (escape could also be a problem, but it's not the culprit in this case). These functions are not multibyte aware. What escape does is this: it takes a byte in the input string and returns its hex representation with a % prepended. unescape does the opposite. The key point here is that they work with bytes, not characters.

What you want is encodeURIComponent / decodeURIComponent. Both use utf-8 as the string encoding scheme (the encoding using by flash everywhere). Note that it's not utf-16 (which you shouldn't care about as long as flash is concerned).

encodeURIComponent("д"); //%D0%B4
decodeURIComponent("%D0%B4"); // д

Now, if you want to dig a bit deeper, here's what's going on (this assumes a basic knowledge of how utf-8 works).

escape("д")

This returns

%D0%B4

Why?

"д" is treated by flash as utf-8. The codepoint for this character is 0x0434.

In binary:

0000 0100 0011 0100

It fits in two utf-8 bytes, so it's encoded thus (where e means encoding bit, and p means payload bit):

1101 0000 1011 0100
eeep pppp eepp pppp 

Converting it to hex, we get:

0xd0  0xb4

So, 0xd0,0xb4 is a utf-8 encoded "д".

This is fed to escape. escape sees two bytes, and gives you:

%d0%b4

Now, you pass this to unescape. But unescape is a little bit brain-dead, so it thinks one byte is one and the same thing as one char, always. As far as unescape is concerned, you have two bytes, hence, you have two chars. If you look up the code-points for 0xd0 and 0xb4, you'll see this:

0xd0 -> Ð
0xb4 -> ´

So, unescape returns a string consisting of two chars, Ð and ´ (instead of figuring out that the two bytes it got where actually just one char, utf-8 encoded). Then, when you assign the text property, you are not really passing д´ butд`, and this is what you see in the text area.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文