我的数据库中有趣的角色
当我尝试编辑某种内容类型时,我的网络应用程序崩溃了,我很确定这是因为我的数据库中存在一些奇怪的字符。所以当我这样做时:
SELECT body FROM message WHERE id = 666
它返回:
<p>⢠<span></span></p><p><br /></p><p><em><strong>NOTE:</strong> Please remember to use your to participate in the discussion.</em></p>
但是,当我尝试计算有多少文档包含这些字符时,postgres 抱怨:
foo_450_prod=# SELECT COUNT(*) FROM message WHERE body LIKE'%â¢%';
ERROR: invalid byte sequence for encoding "UTF8": 0xe2a225
HINT: This error can also happen if the byte sequence does not match the encodi
有谁知道问题是什么以及我如何查询这些有趣的字符?
提前致谢!
My web app is breaking when I try edit a certain content type and I'm pretty sure it is because of some weird characters in my database. So when I do:
SELECT body FROM message WHERE id = 666
it returns:
<p>⢠<span></span></p><p><br /></p><p><em><strong>NOTE:</strong> Please remember to use your to participate in the discussion.</em></p>
However when I try to count how many documents have those characters postgres complains:
foo_450_prod=# SELECT COUNT(*) FROM message WHERE body LIKE'%â¢%';
ERROR: invalid byte sequence for encoding "UTF8": 0xe2a225
HINT: This error can also happen if the byte sequence does not match the encodi
Does anybody know what the issue is and how I can query for those funny characters?
Thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的
SELECT
语句似乎被解释为 ISO-8859-1 或 windows-1252。在这些编码中,'â' == 0xE2、'¢' == 0xA2 和 '%' == 0x25,这解释了错误消息中提到的 0xe2a225 字节序列。很难弄清楚为什么您的第一个
SELECT
首先返回一个™
。这不太可能是故意使用的字符组合,但它也不是 UTF-8/windows-1252 mojibake 的典型情况,因为 E2 A2 不是有效的 UTF-8。它可能是字符的前 2 个字节,但该字符将是盲文点模式(U+2880 到 U+28BF),这在那里也没有意义。It appears that your
SELECT
statement is being interpreted as ISO-8859-1 or windows-1252. In those encodings, 'â' == 0xE2, '¢' == 0xA2, and '%' == 0x25, which explains the 0xe2a225 byte sequence mentioned in the error message.What's hard to figure out is why your first
SELECT
is returning anâ¢
to begin with. It's an unlikely character combination to use on purpose, but it's also not a typical case of UTF-8/windows-1252 mojibake because E2 A2 isn't valid UTF-8. It could be the first 2 bytes of a character, but that character would be a Braille dot pattern (U+2880 to U+28BF), which doesn't make sense there either.您的数据库与在网页中打印其中的一些数据之间已经有很长的路要走:您的数据库编码可能没问题,但您可能正在尝试在 ISO-8859-1 中以 UTF-8 打印原始内容(而不是“有趣的”角色)。 是否有类似 : 的内容
您的 HTML 页面的
标记中
?另外,您在连接到数据库时是否设置
SET NAMES 'utf8'
?there's already a long way between your DB and printing some data from it in your webpage : your DB encoding may be ok, but you're probably trying here to print something originally in UTF-8 in ISO-8859-1 (and not "funny" characters). do you have something like :
in the
<head>
tag of your HTML page?also, are you setting
SET NAMES 'utf8'
when connecting to your DB?