MySQL 的 Django 编码问题

发布于 2024-09-01 15:56:41 字数 2953 浏览 3 评论 0原文

好的,我已经设置了 MySQL 数据库。大多数表都是 latin1,Django 可以很好地处理它们。但是,其中一些是 UTF-8,Django 不处理它们。

这是一个示例表(这些表全部来自 django-geonames):

DROP TABLE IF EXISTS `geoname`;
SET @saved_cs_client     = @@character_set_client;
SET character_set_client = utf8;
CREATE TABLE `geoname` (
  `id` int(11) NOT NULL,
  `name` varchar(200) NOT NULL,
  `ascii_name` varchar(200) NOT NULL,
  `latitude` decimal(20,17) NOT NULL,
  `longitude` decimal(20,17) NOT NULL,
  `point` point default NULL,
  `fclass` varchar(1) NOT NULL,
  `fcode` varchar(7) NOT NULL,
  `country_id` varchar(2) NOT NULL,
  `cc2` varchar(60) NOT NULL,
  `admin1_id` int(11) default NULL,
  `admin2_id` int(11) default NULL,
  `admin3_id` int(11) default NULL,
  `admin4_id` int(11) default NULL,
  `population` int(11) NOT NULL,
  `elevation` int(11) NOT NULL,
  `gtopo30` int(11) NOT NULL,
  `timezone_id` int(11) default NULL,
  `moddate` date NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `country_id_refs_iso_alpha2_e2614807` (`country_id`),
  KEY `admin1_id_refs_id_a28cd057` (`admin1_id`),
  KEY `admin2_id_refs_id_4f9a0f7e` (`admin2_id`),
  KEY `admin3_id_refs_id_f8a5e181` (`admin3_id`),
  KEY `admin4_id_refs_id_9cc00ec8` (`admin4_id`),
  KEY `fcode_refs_code_977fe2ec` (`fcode`),
  KEY `timezone_id_refs_id_5b46c585` (`timezone_id`),
  KEY `geoname_52094d6e` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
SET character_set_client = @saved_cs_client;

现在,如果我尝试使用 MySQLdb 和游标直接从表中获取数据,我会得到具有正确编码的文本:

>>> import MySQLdb
>>> from django.conf import settings
>>> 
>>> conn = MySQLdb.connect (host = "localhost",
... user = settings.DATABASES['default']['USER'],
... passwd = settings.DATABASES['default']['PASSWORD'],
... db = settings.DATABASES['default']['NAME'])
>>> cursor = conn.cursor ()
>>> cursor.execute("select name from geoname where name like 'Uni%Hidalgo'");
1L
>>> g = cursor.fetchone()
>>> g[0]
'Uni\xc3\xb3n Hidalgo'
>>> print g[0]
Unión Hidalgo

但是,如果我尝试使用Geoname 模型(实际上是一个 django.contrib.gis.db.models.Model),它失败了:

>>> from geonames.models import Geoname
>>> g = Geoname.objects.get(name__istartswith='Uni',name__icontains='Hidalgo')
>>> g.name
u'Uni\xc3\xb3n Hidalgo'
>>> print g.name
Unión Hidalgo

这里很明显存在编码错误。在这两种情况下,数据库都返回“Uni\xc3\xb3n Hidalgo”,但 Django 正在(错误地?)将“\xc3\xb3n”翻译为 à。

我可以做什么来解决这个问题?

更新

好吧,这很奇怪:

>>> c = unicode('Uni\xc3\xb3n Hidalgo','utf-8')
>>> c
u'Uni\xf3n Hidalgo'
>>> print c
Unión Hidalgo

如果我强制 python 将字符串从 utf-8 编码为 Unicode,它就可以工作。然而,这又重现了错误:

>>> c = unicode('Unión Hidalgo','latin1')
>>> c
u'Uni\xc3\xb3n Hidalgo'
>>> print c
Unión Hidalgo

所以,我猜测 MySQL 正在发送 utf-8 但告诉 Python 它是 latin1?

Okay, so I have a MySQL database set up. Most of the tables are latin1 and Django handles them fine. But, some of them are UTF-8 and Django does not handle them.

Here's a sample table (these tables are all from django-geonames):

DROP TABLE IF EXISTS `geoname`;
SET @saved_cs_client     = @@character_set_client;
SET character_set_client = utf8;
CREATE TABLE `geoname` (
  `id` int(11) NOT NULL,
  `name` varchar(200) NOT NULL,
  `ascii_name` varchar(200) NOT NULL,
  `latitude` decimal(20,17) NOT NULL,
  `longitude` decimal(20,17) NOT NULL,
  `point` point default NULL,
  `fclass` varchar(1) NOT NULL,
  `fcode` varchar(7) NOT NULL,
  `country_id` varchar(2) NOT NULL,
  `cc2` varchar(60) NOT NULL,
  `admin1_id` int(11) default NULL,
  `admin2_id` int(11) default NULL,
  `admin3_id` int(11) default NULL,
  `admin4_id` int(11) default NULL,
  `population` int(11) NOT NULL,
  `elevation` int(11) NOT NULL,
  `gtopo30` int(11) NOT NULL,
  `timezone_id` int(11) default NULL,
  `moddate` date NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `country_id_refs_iso_alpha2_e2614807` (`country_id`),
  KEY `admin1_id_refs_id_a28cd057` (`admin1_id`),
  KEY `admin2_id_refs_id_4f9a0f7e` (`admin2_id`),
  KEY `admin3_id_refs_id_f8a5e181` (`admin3_id`),
  KEY `admin4_id_refs_id_9cc00ec8` (`admin4_id`),
  KEY `fcode_refs_code_977fe2ec` (`fcode`),
  KEY `timezone_id_refs_id_5b46c585` (`timezone_id`),
  KEY `geoname_52094d6e` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
SET character_set_client = @saved_cs_client;

Now, if I try to get data from the table directly using MySQLdb and a cursor, I get the text with the proper encoding:

>>> import MySQLdb
>>> from django.conf import settings
>>> 
>>> conn = MySQLdb.connect (host = "localhost",
... user = settings.DATABASES['default']['USER'],
... passwd = settings.DATABASES['default']['PASSWORD'],
... db = settings.DATABASES['default']['NAME'])
>>> cursor = conn.cursor ()
>>> cursor.execute("select name from geoname where name like 'Uni%Hidalgo'");
1L
>>> g = cursor.fetchone()
>>> g[0]
'Uni\xc3\xb3n Hidalgo'
>>> print g[0]
Unión Hidalgo

However, if I try to use the Geoname model (which is actually a django.contrib.gis.db.models.Model), it fails:

>>> from geonames.models import Geoname
>>> g = Geoname.objects.get(name__istartswith='Uni',name__icontains='Hidalgo')
>>> g.name
u'Uni\xc3\xb3n Hidalgo'
>>> print g.name
Unión Hidalgo

There's pretty clearly an encoding error here. In both cases the database is returning 'Uni\xc3\xb3n Hidalgo' but Django is (incorrectly?) translating the '\xc3\xb3n' to ó.

What can I do to fix this?

Update

Okay, so this is weird:

>>> c = unicode('Uni\xc3\xb3n Hidalgo','utf-8')
>>> c
u'Uni\xf3n Hidalgo'
>>> print c
Unión Hidalgo

If I force python to encode the string into Unicode from utf-8, it works. However, this recreates the mistake:

>>> c = unicode('Unión Hidalgo','latin1')
>>> c
u'Uni\xc3\xb3n Hidalgo'
>>> print c
Unión Hidalgo

So, my guess MySQL is sending utf-8 but telling Python it is latin1?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

安穩 2024-09-08 15:56:41

你可以这样使用

>>> print g.name.encode('latin1')
Unión Hidalgo

you could use like this

>>> print g.name.encode('latin1')
Unión Hidalgo
梦在夏天 2024-09-08 15:56:41

看起来问题毕竟出在 MySQL 上。我删除了这些表,使用字符集和排序规则设置为 UTF 重新创建了它们,然后重新导入了所有数据。

现在正在工作。

Looks like the problem was in MySQL after all. I dropped the tables, recreated them with charset and collate set to UTF, and re-imported all of the data.

It's working now.

感受沵的脚步 2024-09-08 15:56:41

Django 1.10, MariaDB 5.5.47


一个非常重要的事情是在创建数据库时设置数据库的字符集:

CREATE DATABASE `my_database` CHARACTER SET utf8;

然后你可以检查你的mysql的配置文件/etc/my.cnf(我使用MariaDB):

[client]
default-character-set=utf8

[mysql]
default-character-set=utf8

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
collation-server=utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server=utf8

另外记得重新启动你的sqlservice:

sudo systemctl restart mariadb.service

ref:

https://docs .djangoproject.com/en/1.10/ref/databases/#creating-your-database
https://mariadb.com/kb /en/the-mariadb-library/setting-character-sets-and-collat​​ions/

Django 1.10, MariaDB 5.5.47


A very important thing is set the database's character set when you create database:

CREATE DATABASE `my_database` CHARACTER SET utf8;

Then you can check your mysql's configure files /etc/my.cnf(I use MariaDB):

[client]
default-character-set=utf8

[mysql]
default-character-set=utf8

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
collation-server=utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server=utf8

Also remember to restart your sqlservice:

sudo systemctl restart mariadb.service

ref:

https://docs.djangoproject.com/en/1.10/ref/databases/#creating-your-database
https://mariadb.com/kb/en/the-mariadb-library/setting-character-sets-and-collations/

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文