Linux 上的 UTF8 问题
我有一些代码从数据库中获取一些数据,数据库代码页是UTF8。当我在 Linux 机器上运行代码时,某些字符显示为问号 (?),但当我在 Windows 服务器上运行相同的代码时,所有字符都正确显示。
当我这样做时: $> $LANG 返回以下内容 en_SG.UTF-8
en_SG 看起来不正确,应该是 en_US 但返回的字符串的后半部分是UTF-8,这很好。我还可以研究什么来解决角色损坏问题吗?
I have some code that fetches some data from the database, database codepage is UTF8. When I run the code on a linux box, some characters come out as question marks (?) but when I run the same code on a windows server, all characters appear correctly.
When I do:
$> $LANG
Following is returned
en_SG.UTF-8
en_SG is something that doesn't look correct, it should be en_US
but the latter part of the returned string is UTF-8 which is good. Is there anything else that I can look into to fix the character corruption problem?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
一般来说, ?当您拥有的字体没有该 Unicode 代码点的表示形式时,就会出现此信息。您正在查看什么内容以及使用什么字体?
Generally, ? appears when the font you have does not have a representation for that Unicode codepoint. What are you viewing in and what font are you using?
您能提供有关环境的信息吗?您使用什么编程语言,使用什么库或方法连接到数据库并从数据库中提取信息,以及使用什么库或方法将数据输出到文件?
我假设运行代码的两个实例(在 Windows 和 Linux 上)正在访问同一物理数据库中的数据。
我要寻找的罪魁祸首是您的 I/O 之一正在将 Unicode 数据转换为其他某种(可能是 ASCII 或 Latin1)代码页。
数据库本身可能正在转换,因为数据库方法默认为不同的编码。数据库方法可能正在转换传入的信息,因为语言本身默认为不同的代码页。可能是输出方法正在转换。
Can you please provide information about the environment? What programming language are you working with, what library or methods are you using to connect to and pull information from the database, and what library or methods are you using to output the data to file?
I am assuming that both instances of running your code (on Windows and Linux) are accessing the data from the same physical database.
The culprit I would be looking for is that one of your I/O's is converting the Unicode data to some other (probably ASCII or Latin1) codepage.
It could be that the database itself is converting because the database methods are defaulting to a different encoding. It could be that the database methods are converting the incoming information because the language itself is defaulting to a different codepage. It could be that the output methods are converting.