如何更改 postgres 数据库的字符编码?
我有一个使用默认字符集 SQL_ASCII 设置的数据库。我想将其切换为 UNICODE。有没有简单的方法可以做到这一点?
I have a database that was set up with the default character set SQL_ASCII. I want to switch it to UNICODE. Is there an easy way to do that?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
首先,丹尼尔的答案是正确、安全的选择。
对于从 SQL_ASCII 更改为其他内容的特定情况,您可以作弊并简单地戳 pg_database 目录来重新分配数据库编码。这假设您已经以预期的编码存储了任何非 ASCII 字符(或者您根本没有使用任何非 ASCII 字符)。
然后你可以这样做:
这不会改变数据库的排序规则,只是改变编码字节转换为字符的方式(所以现在
length('£123')
将返回 4 而不是 5)。如果数据库使用“C”排序规则,则 ASCII 字符串的排序不应发生变化。不过,您可能需要重建任何包含非 ASCII 字符的索引。买者自负。转储和重新加载提供了一种方法来检查数据库内容实际上是否采用您期望的编码,而事实并非如此。如果事实证明数据库中确实存在一些编码错误的数据,那么救援将会很困难。因此,如果可以的话,转储并重新初始化。
First off, Daniel's answer is the correct, safe option.
For the specific case of changing from SQL_ASCII to something else, you can cheat and simply poke the pg_database catalogue to reassign the database encoding. This assumes you've already stored any non-ASCII characters in the expected encoding (or that you simply haven't used any non-ASCII characters).
Then you can do:
This will not change the collation of the database, just how the encoded bytes are converted into characters (so now
length('£123')
will return 4 instead of 5). If the database uses 'C' collation, there should be no change to ordering for ASCII strings. You'll likely need to rebuild any indices containing non-ASCII characters though.Caveat emptor. Dumping and reloading provides a way to check your database content is actually in the encoding you expect, and this doesn't. And if it turns out you did have some wrongly-encoded data in the database, rescuing is going to be difficult. So if you possibly can, dump and reinitialise.
要更改数据库的编码:
确保在这一切过程中客户端编码设置正确。
资料来源: http://archives.postgresql.org/pgsql-novice/2006 -03/msg00210.php
To change the encoding of your database:
Make sure the client encoding is set correctly during all this.
Source: http://archives.postgresql.org/pgsql-novice/2006-03/msg00210.php
转储具有特定编码的数据库并尝试将其恢复到具有不同编码的另一个数据库上可能会导致数据损坏。
必须在将任何数据插入数据库之前设置数据编码。
检查这个:
复制任何其他数据库时,无法更改源数据库的编码和区域设置,因为这可能会导致数据损坏。
并且这个 :
某些区域设置类别在创建数据库时必须固定其值。您可以对不同的数据库使用不同的设置,但是一旦创建了数据库,您就无法再为该数据库更改它们。 LC_COLLATE 和 LC_CTYPE 就是这些类别。 它们会影响索引的排序顺序,因此必须保持固定,否则文本列上的索引将会损坏。(但是您可以使用排序规则来缓解此限制,如第 22.2 节中所述。 ) 这些类别的默认值是在运行 initdb 时确定的,并且在创建新数据库时使用这些值,除非在 CREATE DATABASE 命令中另有指定。
我宁愿重建所有内容从一开始就在您的 debian 操作系统上使用正确的本地编码,如所解释的 此处:
重新配置您的本地设置:
选择您的区域设置(例如瑞士的法语:fr_CH.UTF8)
正确卸载并清理 postgresql:
重新安装 postgresql:
现在将自动使用正确的内容创建任何新数据库编码、LC_TYPE(字符分类)和 LC_COLLATE(字符串排序顺序)。
Dumping a database with a specific encoding and try to restore it on another database with a different encoding could result in data corruption.
Data encoding must be set BEFORE any data is inserted into the database.
Check this :
When copying any other database, the encoding and locale settings cannot be changed from those of the source database, because that might result in corrupt data.
And this :
Some locale categories must have their values fixed when the database is created. You can use different settings for different databases, but once a database is created, you cannot change them for that database anymore. LC_COLLATE and LC_CTYPE are these categories. They affect the sort order of indexes, so they must be kept fixed, or indexes on text columns would become corrupt. (But you can alleviate this restriction using collations, as discussed in Section 22.2.) The default values for these categories are determined when initdb is run, and those values are used when new databases are created, unless specified otherwise in the CREATE DATABASE command.
I would rather rebuild everything from the begining properly with a correct local encoding on your debian OS as explained here :
Reconfigure your local settings :
Choose your locale (like for instance for french in Switzerland : fr_CH.UTF8)
Uninstall and clean properly postgresql :
Re-install postgresql :
Now any new database will be automatically be created with correct encoding, LC_TYPE (character classification), and LC_COLLATE (string sort order).
一旦一切正常,您可以删除数据库
myDB_wrong_encoding
Once everything runs fine, you can drop the database
myDB_wrong_encoding
Daniel Kutik 的答案是正确的,但通过数据库重命名可以更加安全。
因此,真正安全的方法是:
在紧急情况下,只需将数据库重命名回来
Daniel Kutik's answer is correct, but it can be even more safe, with database renaming.
So, the truly safe way is:
In case of emergency, just rename DBs back
我在 postgres 11 中遇到了同样的问题,我确实使用以下步骤更改了数据库编码,
以更新所有编码列表
SET client_encoding = 'UTF8';
更新 pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='postgres';< /代码>
更新 pg_database 设置编码 = pg_char_to_encoding(' UTF8') 其中 datname = 'dbname' ;
确保在
template0
和template0
中应用update
语句code>template1 和postgres
数据库I had the same issue in postgres 11 and I did change the database encoding using the below steps,
to update all the list of encoding
SET client_encoding = 'UTF8';
UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='postgres';
update pg_database set encoding = pg_char_to_encoding('UTF8') where datname = 'dbname' ;
make sure to apply the
update
statment intemplate0
andtemplate1
andpostgres
Database