MySQL 的 Django 编码问题
好的,我已经设置了 MySQL 数据库。大多数表都是 latin1,Django 可以很好地处理它们。但是,其中一些是 UTF-8,Django 不处理它们。
这是一个示例表(这些表全部来自 django-geonames):
DROP TABLE IF EXISTS `geoname`;
SET @saved_cs_client = @@character_set_client;
SET character_set_client = utf8;
CREATE TABLE `geoname` (
`id` int(11) NOT NULL,
`name` varchar(200) NOT NULL,
`ascii_name` varchar(200) NOT NULL,
`latitude` decimal(20,17) NOT NULL,
`longitude` decimal(20,17) NOT NULL,
`point` point default NULL,
`fclass` varchar(1) NOT NULL,
`fcode` varchar(7) NOT NULL,
`country_id` varchar(2) NOT NULL,
`cc2` varchar(60) NOT NULL,
`admin1_id` int(11) default NULL,
`admin2_id` int(11) default NULL,
`admin3_id` int(11) default NULL,
`admin4_id` int(11) default NULL,
`population` int(11) NOT NULL,
`elevation` int(11) NOT NULL,
`gtopo30` int(11) NOT NULL,
`timezone_id` int(11) default NULL,
`moddate` date NOT NULL,
PRIMARY KEY (`id`),
KEY `country_id_refs_iso_alpha2_e2614807` (`country_id`),
KEY `admin1_id_refs_id_a28cd057` (`admin1_id`),
KEY `admin2_id_refs_id_4f9a0f7e` (`admin2_id`),
KEY `admin3_id_refs_id_f8a5e181` (`admin3_id`),
KEY `admin4_id_refs_id_9cc00ec8` (`admin4_id`),
KEY `fcode_refs_code_977fe2ec` (`fcode`),
KEY `timezone_id_refs_id_5b46c585` (`timezone_id`),
KEY `geoname_52094d6e` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
SET character_set_client = @saved_cs_client;
现在,如果我尝试使用 MySQLdb 和游标直接从表中获取数据,我会得到具有正确编码的文本:
>>> import MySQLdb
>>> from django.conf import settings
>>>
>>> conn = MySQLdb.connect (host = "localhost",
... user = settings.DATABASES['default']['USER'],
... passwd = settings.DATABASES['default']['PASSWORD'],
... db = settings.DATABASES['default']['NAME'])
>>> cursor = conn.cursor ()
>>> cursor.execute("select name from geoname where name like 'Uni%Hidalgo'");
1L
>>> g = cursor.fetchone()
>>> g[0]
'Uni\xc3\xb3n Hidalgo'
>>> print g[0]
Unión Hidalgo
但是,如果我尝试使用Geoname 模型(实际上是一个 django.contrib.gis.db.models.Model),它失败了:
>>> from geonames.models import Geoname
>>> g = Geoname.objects.get(name__istartswith='Uni',name__icontains='Hidalgo')
>>> g.name
u'Uni\xc3\xb3n Hidalgo'
>>> print g.name
Unión Hidalgo
这里很明显存在编码错误。在这两种情况下,数据库都返回“Uni\xc3\xb3n Hidalgo”,但 Django 正在(错误地?)将“\xc3\xb3n”翻译为 à。
我可以做什么来解决这个问题?
更新
好吧,这很奇怪:
>>> c = unicode('Uni\xc3\xb3n Hidalgo','utf-8')
>>> c
u'Uni\xf3n Hidalgo'
>>> print c
Unión Hidalgo
如果我强制 python 将字符串从 utf-8 编码为 Unicode,它就可以工作。然而,这又重现了错误:
>>> c = unicode('Unión Hidalgo','latin1')
>>> c
u'Uni\xc3\xb3n Hidalgo'
>>> print c
Unión Hidalgo
所以,我猜测 MySQL 正在发送 utf-8 但告诉 Python 它是 latin1?
Okay, so I have a MySQL database set up. Most of the tables are latin1 and Django handles them fine. But, some of them are UTF-8 and Django does not handle them.
Here's a sample table (these tables are all from django-geonames):
DROP TABLE IF EXISTS `geoname`;
SET @saved_cs_client = @@character_set_client;
SET character_set_client = utf8;
CREATE TABLE `geoname` (
`id` int(11) NOT NULL,
`name` varchar(200) NOT NULL,
`ascii_name` varchar(200) NOT NULL,
`latitude` decimal(20,17) NOT NULL,
`longitude` decimal(20,17) NOT NULL,
`point` point default NULL,
`fclass` varchar(1) NOT NULL,
`fcode` varchar(7) NOT NULL,
`country_id` varchar(2) NOT NULL,
`cc2` varchar(60) NOT NULL,
`admin1_id` int(11) default NULL,
`admin2_id` int(11) default NULL,
`admin3_id` int(11) default NULL,
`admin4_id` int(11) default NULL,
`population` int(11) NOT NULL,
`elevation` int(11) NOT NULL,
`gtopo30` int(11) NOT NULL,
`timezone_id` int(11) default NULL,
`moddate` date NOT NULL,
PRIMARY KEY (`id`),
KEY `country_id_refs_iso_alpha2_e2614807` (`country_id`),
KEY `admin1_id_refs_id_a28cd057` (`admin1_id`),
KEY `admin2_id_refs_id_4f9a0f7e` (`admin2_id`),
KEY `admin3_id_refs_id_f8a5e181` (`admin3_id`),
KEY `admin4_id_refs_id_9cc00ec8` (`admin4_id`),
KEY `fcode_refs_code_977fe2ec` (`fcode`),
KEY `timezone_id_refs_id_5b46c585` (`timezone_id`),
KEY `geoname_52094d6e` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
SET character_set_client = @saved_cs_client;
Now, if I try to get data from the table directly using MySQLdb and a cursor, I get the text with the proper encoding:
>>> import MySQLdb
>>> from django.conf import settings
>>>
>>> conn = MySQLdb.connect (host = "localhost",
... user = settings.DATABASES['default']['USER'],
... passwd = settings.DATABASES['default']['PASSWORD'],
... db = settings.DATABASES['default']['NAME'])
>>> cursor = conn.cursor ()
>>> cursor.execute("select name from geoname where name like 'Uni%Hidalgo'");
1L
>>> g = cursor.fetchone()
>>> g[0]
'Uni\xc3\xb3n Hidalgo'
>>> print g[0]
Unión Hidalgo
However, if I try to use the Geoname model (which is actually a django.contrib.gis.db.models.Model
), it fails:
>>> from geonames.models import Geoname
>>> g = Geoname.objects.get(name__istartswith='Uni',name__icontains='Hidalgo')
>>> g.name
u'Uni\xc3\xb3n Hidalgo'
>>> print g.name
Unión Hidalgo
There's pretty clearly an encoding error here. In both cases the database is returning 'Uni\xc3\xb3n Hidalgo' but Django is (incorrectly?) translating the '\xc3\xb3n' to ó.
What can I do to fix this?
Update
Okay, so this is weird:
>>> c = unicode('Uni\xc3\xb3n Hidalgo','utf-8')
>>> c
u'Uni\xf3n Hidalgo'
>>> print c
Unión Hidalgo
If I force python to encode the string into Unicode from utf-8, it works. However, this recreates the mistake:
>>> c = unicode('Unión Hidalgo','latin1')
>>> c
u'Uni\xc3\xb3n Hidalgo'
>>> print c
Unión Hidalgo
So, my guess MySQL is sending utf-8 but telling Python it is latin1?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
你可以这样使用
you could use like this
看起来问题毕竟出在 MySQL 上。我删除了这些表,使用字符集和排序规则设置为 UTF 重新创建了它们,然后重新导入了所有数据。
现在正在工作。
Looks like the problem was in MySQL after all. I dropped the tables, recreated them with charset and collate set to UTF, and re-imported all of the data.
It's working now.
Django 1.10, MariaDB 5.5.47
一个非常重要的事情是在创建数据库时设置数据库的字符集:
然后你可以检查你的mysql的配置文件
/etc/my.cnf
(我使用MariaDB):另外记得重新启动你的sqlservice:
ref:
https://docs .djangoproject.com/en/1.10/ref/databases/#creating-your-database
https://mariadb.com/kb /en/the-mariadb-library/setting-character-sets-and-collations/
Django 1.10, MariaDB 5.5.47
A very important thing is set the database's character set when you create database:
Then you can check your mysql's configure files
/etc/my.cnf
(I use MariaDB):Also remember to restart your sqlservice:
ref:
https://docs.djangoproject.com/en/1.10/ref/databases/#creating-your-database
https://mariadb.com/kb/en/the-mariadb-library/setting-character-sets-and-collations/