当前位置：文江博客话题详情

'𠂉'不是有效的 unicode 字符，但在 unicode 字符集中？

发布于 2024-09-06 04:08:52 字数 15 浏览 7 评论 0原文

简短的故事：我无法将像“

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

回忆那么伤 2024-09-13 04:08:52

您使用的是哪个版本的 MySQL？如果是 5.5 之前的版本，则无法存储该字符，因为它需要四个字节，而 MySQL 最多只支持三个字节的 UTF-8（即 BMP 中的字符）。 MySQL 5.5 添加了对四字节 UTF-8 的支持，但您必须指定 utf8mb4 作为字符集。

参考： http://dev.mysql.com/doc/refman /5.5/en/charset-unicode.html

回复收藏 0 原文

策马西风 2024-09-13 04:08:52

U+20089 是 Unicode 集中定义的字符 (CJK 统一表意文字扩展 B)，并成为编码为 UTF-8 时的字节序列 F0 A0 82 89。问题可能不在于字符，而在于堆栈中某处软件的字符处理。

万一，由于固有的技术原因导致该字符成为问题字符，则可能会在 Unicode 标准或常见问题解答。

回复收藏 0 原文

倾｀听者〃 2024-09-13 04:08:52

如果对它进行双重编码并存储怎么办？

再次对其进行编码并存储。稍后在检索时将其解码一次并以 html 形式呈现。

回复收藏 0 原文

胡大本事 2024-09-13 04:08:52

我无法回答它被列为受支持和不受支持的问题，这可能是运行 fileformat.info 网站的人的问题。

UTF-8 可用于表示任何 Unicode 字符（代码点）。所有 UTF 都是如此。执行此操作所需的字节数各不相同（例如，在 UTF-8 中，您识别的代码点需要四个字节，而罗马字母“A”只需要一个字节），但所有 Unicode 字符都可以表示为所有 UTF。这就是他们的目的。（更多信息。）

听起来好像您遇到了编码问题您的应用程序中的一个（或多个）层。首先要查看的地方是您的应用程序提供的页面：它是否说明了它正在使用的字符集？可能值得检查页面返回的标题，看看它们是否有：

Content-Type: text/html; charset="UTF-8"

...。如果没有，请在 HTML 本身中查找等效的 meta 标记，尽管我似乎记得读过 meta 并不是执行此操作的好方法。如果没有特定的标头，则应用的默认值可能是 ISO-8859-1 （尽管某些浏览器可能使用 Windows-1252 代替），这不起作用如果您的源文本使用 UTF-8 编码。

下一个要查看的地方是您的数据库。我不认为 MySQL 默认以 UTF-8 存储文本，您需要确保它在 MySQL 配置中这样做。

从你的问题来看，我认为你不需要它，但我将完成文章的强制性插件每个软件开发人员绝对必须了解的 Unicode 和字符集（没有任何借口！） 作者：Joel Spolsky（如果只是为了保存）有人将其插入评论中）。 :-)

I can't answer the question of it being listed as both supported and unsupported, that's probably a question for the people running the fileformat.info site.

UTF-8 can be used to represent any Unicode character (code point). This is true of all of the UTFs. The number of bytes required to do so varies (in UTF-8, you need four for the code point you identified, for instance, whereas you only need one for the Roman letter 'A'), but all Unicode characters can be represented by all UTFs. That's what they're for. (More here.)

It sounds as though you're running into an encoding problem at one (or more) of the layers in your app. The first place to look would be the page served by your app: Does it say what charset it's using? It may be worth checking the headers being returned for your pages to see if they have:

Content-Type: text/html; charset="UTF-8"

...in them. If they don't, look for the equivalent meta tag in the HTML itself, though I seem to recall reading that meta isn't a good way to do this. Absent the headers being specific, the default applied will probably be ISO-8859-1 (though some browsers may use Windows-1252 instead), which won't work if your source text is encoded with UTF-8.

The next place to look is your database. I don't think MySQL stores text in UTF-8 by default, you'll need to ensure that it's doing that in your MySQL configuration.

From your question, I don't think you need it, but I'll finish with the obligatory plug for the article The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky (if only to save someone from plugging it in a comment). :-)

回复收藏 0 原文

~没有更多了~