在 mysql 数据库中存储非拉丁字符时出现问题
我目前正在运行一个蜜罐来捕获论坛垃圾邮件发送者,并且我在数据库中存储非拉丁字符时遇到了问题,我在数据库和表级别设置了 utf8_unicode_ci 并且我使用 mysql_query("SET NAMES 'utf8'") 来制作确保信息以 utf8 格式发送。
时间等信息存储为int。 IP、用户名等存储为 Varchar 和文本,与垃圾邮件数据的唯一区别是我在插入数据之前使用 base64_encode(htmlspecialchars()),并且垃圾邮件列存储在中型 blob 中并且我使用 COMPRESS( ) 在该列的查询中。
对于拉丁字符,它会返回正确的数据,但对于非拉丁字符(例如俄语和泰语),它不会返回正确的数据。
例如:
Уровня конечного начальники или не
将返回为:
Ð£Ñ€Ð¾Ð²Ð½Ñ ÐºÐ¾Ð½ÐµÑ‡Ð½Ð¾Ð³Ð¾ начальнÐ
或仅返回带有问号的菱形。
几年前,当我创建论坛时,我成功地正确存储了这些信息,但我不记得如何设法正确存储它,我整天都在搜索,但无法找到适合我的解决方案。
编辑: 额外信息(如果有帮助的话)。
- Apache/2.2.14 (Ubuntu)
- MySQL 客户端版本:5.1.41
- PHP 扩展:php5-mysql
I am currently running a honeypot to catch forum spammers, and I have been having problems with storing non Latin characters in my database, I have utf8_unicode_ci set on database and table level and I use mysql_query("SET NAMES 'utf8'") to make sure the information is sent as utf8.
Information such as time is stored as int. IP, username and such is stored as Varchar and text, the only difference with the spam data is that I use base64_encode(htmlspecialchars()) before I insert the data, and that the spam column is stored in medium blob and I use COMPRESS() in the query for that column.
With Latin characters it returns the correct data, but with non-Latin characters such as Russian and Thai it does not return the correct data.
For example:
Уровня конечного начальники или не
Will return as:
Ð£Ñ€Ð¾Ð²Ð½Ñ ÐºÐ¾Ð½ÐµÑ‡Ð½Ð¾Ð³Ð¾ начальнÐ
or just diamonds with question marks in them.
I managed to store this information correctly years ago when I created a forum but I can not remember how I managed to get it to store correctly, I have been searching all day and have not been able to find a solution that worked for me.
Edit:
Extra info if its any help.
- Apache/2.2.14 (Ubuntu)
- MySQL client version: 5.1.41
- PHP extension: php5-mysql
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
事实证明,将垃圾邮件提交从我的域发送到主中心的页面没有
header("Content-Type: text/html; charset=utf-8");
因此,当查询时被发送到页面,它在那里被损坏。Turns out that the page that sends spam submissions from my domains to the main hub didn't have
header("Content-Type: text/html; charset=utf-8");
So when a query was made to the page it was getting corrupted there.