当前位置：文江博客话题详情

如何更改 postgres 数据库的字符编码？

发布于 2024-10-18 23:43:35 字数 68 浏览 2 评论 0原文

我有一个使用默认字符集 SQL_ASCII 设置的数据库。我想将其切换为 UNICODE。有没有简单的方法可以做到这一点？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

半夏半凉 2024-10-25 23:43:35

首先，丹尼尔的答案是正确、安全的选择。

对于从 SQL_ASCII 更改为其他内容的特定情况，您可以作弊并简单地戳 pg_database 目录来重新分配数据库编码。这假设您已经以预期的编码存储了任何非 ASCII 字符（或者您根本没有使用任何非 ASCII 字符）。

然后你可以这样做：

update pg_database set encoding = pg_char_to_encoding('UTF8') where datname = 'thedb'

这不会改变数据库的排序规则，只是改变编码字节转换为字符的方式（所以现在 length('£123') 将返回 4 而不是 5）。如果数据库使用“C”排序规则，则 ASCII 字符串的排序不应发生变化。不过，您可能需要重建任何包含非 ASCII 字符的索引。

买者自负。转储和重新加载提供了一种方法来检查数据库内容实际上是否采用您期望的编码，而事实并非如此。如果事实证明数据库中确实存在一些编码错误的数据，那么救援将会很困难。因此，如果可以的话，转储并重新初始化。

First off, Daniel's answer is the correct, safe option.

For the specific case of changing from SQL_ASCII to something else, you can cheat and simply poke the pg_database catalogue to reassign the database encoding. This assumes you've already stored any non-ASCII characters in the expected encoding (or that you simply haven't used any non-ASCII characters).

Then you can do:

update pg_database set encoding = pg_char_to_encoding('UTF8') where datname = 'thedb'

This will not change the collation of the database, just how the encoded bytes are converted into characters (so now length('£123') will return 4 instead of 5). If the database uses 'C' collation, there should be no change to ordering for ASCII strings. You'll likely need to rebuild any indices containing non-ASCII characters though.

Caveat emptor. Dumping and reloading provides a way to check your database content is actually in the encoding you expect, and this doesn't. And if it turns out you did have some wrongly-encoded data in the database, rescuing is going to be difficult. So if you possibly can, dump and reinitialise.

回复收藏 0 原文

乖乖公主 2024-10-25 23:43:35

要更改数据库的编码：

转储数据库
删除数据库，
使用不同的编码创建新数据库
重新加载数据。

确保在这一切过程中客户端编码设置正确。

资料来源： http://archives.postgresql.org/pgsql-novice/2006 -03/msg00210.php

回复收藏 0 原文

银河中√捞星星 2024-10-25 23:43:35

转储具有特定编码的数据库并尝试将其恢复到具有不同编码的另一个数据库上可能会导致数据损坏。
必须在将任何数据插入数据库之前设置数据编码。

检查这个：
复制任何其他数据库时，无法更改源数据库的编码和区域设置，因为这可能会导致数据损坏。

并且这个：
某些区域设置类别在创建数据库时必须固定其值。您可以对不同的数据库使用不同的设置，但是一旦创建了数据库，您就无法再为该数据库更改它们。 LC_COLLATE 和 LC_CTYPE 就是这些类别。 它们会影响索引的排序顺序，因此必须保持固定，否则文本列上的索引将会损坏。（但是您可以使用排序规则来缓解此限制，如第 22.2 节中所述。 ) 这些类别的默认值是在运行 initdb 时确定的，并且在创建新数据库时使用这些值，除非在 CREATE DATABASE 命令中另有指定。

我宁愿重建所有内容从一开始就在您的 debian 操作系统上使用正确的本地编码，如所解释的此处：

su root

重新配置您的本地设置：

dpkg-reconfigure locales

选择您的区域设置（例如瑞士的法语：fr_CH.UTF8）

正确卸载并清理 postgresql：

apt-get --purge remove postgresql\*
rm -r /etc/postgresql/
rm -r /etc/postgresql-common/
rm -r /var/lib/postgresql/
userdel -r postgres
groupdel postgres

重新安装 postgresql：

aptitude install postgresql-9.1 postgresql-contrib-9.1 postgresql-doc-9.1

现在将自动使用正确的内容创建任何新数据库编码、LC_TYPE（字符分类）和 LC_COLLATE（字符串排序顺序）。

Dumping a database with a specific encoding and try to restore it on another database with a different encoding could result in data corruption.
Data encoding must be set BEFORE any data is inserted into the database.

Check this :
When copying any other database, the encoding and locale settings cannot be changed from those of the source database, because that might result in corrupt data.

And this :
Some locale categories must have their values fixed when the database is created. You can use different settings for different databases, but once a database is created, you cannot change them for that database anymore. LC_COLLATE and LC_CTYPE are these categories. They affect the sort order of indexes, so they must be kept fixed, or indexes on text columns would become corrupt. (But you can alleviate this restriction using collations, as discussed in Section 22.2.) The default values for these categories are determined when initdb is run, and those values are used when new databases are created, unless specified otherwise in the CREATE DATABASE command.

I would rather rebuild everything from the begining properly with a correct local encoding on your debian OS as explained here :

su root

Reconfigure your local settings :

dpkg-reconfigure locales

Choose your locale (like for instance for french in Switzerland : fr_CH.UTF8)

Uninstall and clean properly postgresql :

apt-get --purge remove postgresql\*
rm -r /etc/postgresql/
rm -r /etc/postgresql-common/
rm -r /var/lib/postgresql/
userdel -r postgres
groupdel postgres

Re-install postgresql :

aptitude install postgresql-9.1 postgresql-contrib-9.1 postgresql-doc-9.1

Now any new database will be automatically be created with correct encoding, LC_TYPE (character classification), and LC_COLLATE (string sort order).

回复收藏 0 原文

朕就是辣么酷 2024-10-25 23:43:35

# dump into file
pg_dump myDB > /tmp/myDB.sql

# create an empty db with the right encoding (on older versions the escaped single quotes are needed!)
psql -c 'CREATE DATABASE "tempDB" WITH OWNER = "myself" LC_COLLATE = '\''de_DE.utf8'\'' TEMPLATE template0;'

# import in the new DB
psql -d tempDB -1 -f /tmp/myDB.sql

# rename databases
psql -c 'ALTER DATABASE "myDB" RENAME TO "myDB_wrong_encoding";' 
psql -c 'ALTER DATABASE "tempDB" RENAME TO "myDB";'

# see the result
psql myDB -c "SHOW LC_COLLATE"

一旦一切正常，您可以删除数据库myDB_wrong_encoding

# dump into file
pg_dump myDB > /tmp/myDB.sql

# create an empty db with the right encoding (on older versions the escaped single quotes are needed!)
psql -c 'CREATE DATABASE "tempDB" WITH OWNER = "myself" LC_COLLATE = '\''de_DE.utf8'\'' TEMPLATE template0;'

# import in the new DB
psql -d tempDB -1 -f /tmp/myDB.sql

# rename databases
psql -c 'ALTER DATABASE "myDB" RENAME TO "myDB_wrong_encoding";' 
psql -c 'ALTER DATABASE "tempDB" RENAME TO "myDB";'

# see the result
psql myDB -c "SHOW LC_COLLATE"

Once everything runs fine, you can drop the database myDB_wrong_encoding

回复收藏 0 原文

夜血缘 2024-10-25 23:43:35

Daniel Kutik 的答案是正确的，但通过数据库重命名可以更加安全。

因此，真正安全的方法是：

使用不同的编码和名称创建新数据库
转储数据库
将转储恢复到新数据库
测试应用程序是否使用新数据库正确运行将旧
数据库重命名为有意义的
名称重命名新数据库
再次测试应用程序
删除旧数据库

在紧急情况下，只需将数据库重命名回来

回复收藏 0 原文

吃素的狼 2024-10-25 23:43:35

我在 postgres 11 中遇到了同样的问题，我确实使用以下步骤更改了数据库编码，

以更新所有编码列表

SET client_encoding = 'UTF8';
更新 pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='postgres';< /代码>
更新 pg_database 设置编码 = pg_char_to_encoding(' UTF8') 其中 datname = 'dbname' ;

确保在 template0 和 template0 中应用 update 语句code>template1 和 postgres 数据库

postgres=# UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='postgres';
UPDATE 1
postgres=# UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='template0';
UPDATE 1
postgres=# UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='template1';
UPDATE 1

I had the same issue in postgres 11 and I did change the database encoding using the below steps,

to update all the list of encoding

SET client_encoding = 'UTF8';
UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='postgres';
update pg_database set encoding = pg_char_to_encoding('UTF8') where datname = 'dbname' ;

make sure to apply the update statment in template0 and template1 and postgres Database

postgres=# UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='postgres';
UPDATE 1
postgres=# UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='template0';
UPDATE 1
postgres=# UPDATE pg_database SET datcollate='en_US.UTF-8', datctype='en_US.UTF-8' WHERE datname='template1';
UPDATE 1