我的数据库中有趣的角色

发布于 2024-09-14 06:37:01 字数 711 浏览 2 评论 0原文

当我尝试编辑某种内容类型时,我的网络应用程序崩溃了,我很确定这是因为我的数据库中存在一些奇怪的字符。所以当我这样做时:

SELECT body FROM message WHERE id = 666

它返回:

<p>⢠<span></span></p><p><br /></p><p><em><strong>NOTE:</strong> Please remember to use your to participate in the discussion.</em></p>

但是,当我尝试计算有多少文档包含这些字符时,postgres 抱怨:

foo_450_prod=# SELECT COUNT(*) FROM message WHERE body LIKE'%â¢%';

ERROR:  invalid byte sequence for encoding "UTF8": 0xe2a225
HINT:  This error can also happen if the byte sequence does not match the encodi

有谁知道问题是什么以及我如何查询这些有趣的字符?

提前致谢!

My web app is breaking when I try edit a certain content type and I'm pretty sure it is because of some weird characters in my database. So when I do:

SELECT body FROM message WHERE id = 666

it returns:

<p>⢠<span></span></p><p><br /></p><p><em><strong>NOTE:</strong> Please remember to use your to participate in the discussion.</em></p>

However when I try to count how many documents have those characters postgres complains:

foo_450_prod=# SELECT COUNT(*) FROM message WHERE body LIKE'%â¢%';

ERROR:  invalid byte sequence for encoding "UTF8": 0xe2a225
HINT:  This error can also happen if the byte sequence does not match the encodi

Does anybody know what the issue is and how I can query for those funny characters?

Thanks in advance!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

日暮斜阳 2024-09-21 06:37:01

您的 SELECT 语句似乎被解释为 ISO-8859-1 或 windows-1252。在这些编码中,'â' == 0xE2、'¢' == 0xA2 和 '%' == 0x25,这解释了错误消息中提到的 0xe2a225 字节序列。

很难弄清楚为什么您的第一个 SELECT 首先返回一个 。这不太可能是故意使用的字符组合,但它也不是 UTF-8/windows-1252 mojibake 的典型情况,因为 E2 A2 不是有效的 UTF-8。它可能是字符的前 2 个字节,但该字符将是盲文点模式(U+2880 到 U+28BF),这在那里也没有意义。

It appears that your SELECT statement is being interpreted as ISO-8859-1 or windows-1252. In those encodings, 'â' == 0xE2, '¢' == 0xA2, and '%' == 0x25, which explains the 0xe2a225 byte sequence mentioned in the error message.

What's hard to figure out is why your first SELECT is returning an ⢠to begin with. It's an unlikely character combination to use on purpose, but it's also not a typical case of UTF-8/windows-1252 mojibake because E2 A2 isn't valid UTF-8. It could be the first 2 bytes of a character, but that character would be a Braille dot pattern (U+2880 to U+28BF), which doesn't make sense there either.

为你拒绝所有暧昧 2024-09-21 06:37:01

您的数据库与在网页中打印其中的一些数据之间已经有很长的路要走:您的数据库编码可能没问题,但您可能正在尝试在 ISO-8859-1 中以 UTF-8 打印原始内容(而不是“有趣的”角色)。 是否有类似 : 的内容

<meta content="text/html; charset=UTF-8" http-equiv="content-type" />

您的 HTML 页面的 标记中

?另外,您在连接到数据库时是否设置SET NAMES 'utf8'

there's already a long way between your DB and printing some data from it in your webpage : your DB encoding may be ok, but you're probably trying here to print something originally in UTF-8 in ISO-8859-1 (and not "funny" characters). do you have something like :

<meta content="text/html; charset=UTF-8" http-equiv="content-type" />

in the <head> tag of your HTML page?

also, are you setting SET NAMES 'utf8' when connecting to your DB?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文