即使设置了 UTF-8 编码,在浏览器中显示俄语字母时出现问题
我知道也有一些类似的问题。然而,在阅读答案并研究该主题后,我仍然在努力在浏览器中显示俄语字母。我将它们存储在 .csv 文件中(以 UTF-8 无 BOM 编码)。在读取 .csv(也以 UTF-8 无 BOM 编码)的 php 文件中,我声明了字符集:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
要打开并迭代 .csv 文件,我使用以下代码:
if(($handle = fopen($path, "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, $delimiter)) !== FALSE) {
...
}
}
并且没有显示任何内容或类似这样:
-ам-Зее
而不是
Целль-ам-Зее
还有什么想法我还可以尝试吗?
更新:
将浏览器编码设置为 UTF-8 后,我得到了正确的俄语字母。然而,仍然有一些文本根本没有显示。我怀疑我在读取 .csv 文件时做了一些不正确的事情,简化版本是:(
if(($handle = fopen($path, "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, $delimiter)) !== FALSE) {
echo $data[1];
}
}
我省略第一列并显示第二列的内容,该列总是被填充)
I am aware that there were some similar problems. However after reading answers and gooling about the topic I am still struggling with displaying Russian letters in the browser. I have them stored inside .csv file (which is encoded in UTF-8 no BOM). In my php file which reads .csv (which is also encoded in UTF-8 no BOM) I declared charset:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
To open and iterate through .csv file I am using following code:
if(($handle = fopen($path, "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, $delimiter)) !== FALSE) {
...
}
}
and either nothing is displayed or something like this:
-ам-Зее
instead of
Целль-ам-Зее
Any ideas what else I can try?
UPDATE:
After setting browser encoding to UTF-8 I get correct russian letters. However still some of the text is not displayed at all. I suspect that I do something incorectly while reading .csv file, the simplified version is:
if(($handle = fopen($path, "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, $delimiter)) !== FALSE) {
echo $data[1];
}
}
( I omit first column and display the content of the second one, which is always filled )
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
检查您的服务器配置
您是否已将 Apache 配置为支持
字符集覆盖?默认情况下,它使用 ISO-8859-1 作为其默认并忽略它所提供的网页中出现的任何覆盖。
解决方案#1(共 3 个)
例如,您可以将其放入
.htaccess
文件中作为封闭目录,现在您的网页将获得其覆盖:
Apache 文档 指出:
在关闭
AddDefaultCharset
之前,我无法让我的标记正常工作。这是相当神秘和令人沮丧的。不过,一旦我这样做了,一切就都很顺利了。
解决方案 #2(共 3 个)
如果您对 Apache 的配置文件具有写入权限,则可以更改服务器本身。但是,您必须确保没有任何其他内容依赖于旧的不可覆盖的设置。这是使用
.htaccess
的另一个原因。当所有其他方法都失败时:解决方案#3(共 3 个)
如果您既无法更改整个服务器配置本身,也无法创建一个
.htaccess
,其自身的设置将受到其下面任何内容的尊重,那么您唯一的选择就是使用超过 127 的所有代码点的数字实体。例如,您必须使用
or
代替。这样做的优点是它不再需要
覆盖并摆弄服务器或
.htaccess
文件。缺点是需要额外的翻译过程,这会妨碍使用理解文字 UTF-8 的编辑器直接编辑文件。实体忽略编码
它起作用的原因是因为所有 HTML 始终采用 Unicode,因此字符号 1062 始终为
西里尔大写字母 TSE
等。实体编号始终代表 Unicode 代码点编号;它们绝不是文档编码中的数字。只有编码字节才算作服务器或页面编码,而不是始终为 Unicode 的未编码代码点数字。这就是为什么我们可以使用像
é
这样的东西,它总是意味着带有尖锐音的拉丁文小写字母 E
,因为代码点 233 总是即使网页本身应该采用其他编码(例如 MacRoman 中的 142 或 NextStep 中的 221)。字符数始终为 Unicode 数字,不关心编码。这是因为 HTML、XHTML 和 XML 等标记语言始终使用逻辑 Unicode 代码点数字,就像 Perl 和 Go 等编程语言一样。 (PHP 实际上只是字节,上面有一些 UTF-8 API,但正如您所知,它仍然存在问题。这既是由于其内部模型,也是由于 Web 服务器甚至 Web 客户端,所有这些都使得 PHP 中的一切比大多数其他语言更加复杂。)
即使您已使用 ISO-8859-1 西里尔字母对网页进行编码,其中文字 0xC6 字节编码 Unicode U+0426,
CYRILLIC大写字母 TSE
,作为字符实体,您可以使用Ц
或Ц
— 而不是Æ ;
这是错误的,因为 U+00C6 是拉丁大写字母 AE。
同样,如果您使用 MacCyrillic 编码,则文字 0x96 字节将是
CYRILLIC CAPITAL LETTER TSE
,但由于数字实体始终采用 Unicode,因此您必须使用Ц
或Ц
— 而不是–
。我更喜欢对所有网页仅使用 UTF-8。嗯,对于新人来说,就是这样。我确实认识到遗留的非 Unicode 页面的存在。那些我只是保留原样。
Check Your Server Config
Do you have Apache configured to honor the
<meta>
charset override? By default it uses ISO-8859-1 for its default and ignores any overrides that appear in web pages it serves up.Solution #1 of 3
For example, you can put this in your
.htaccess
file for an enclosing directory, and now your web pages will have their<meta>
overrides honored:The Apache documentation states:
Until I turned off
AddDefaultCharset
, I could not get my<meta>
tags to work. It was quite mysterious and frustrating. Once I did, though, everything worked smoothly.Solution #2 of 3
If you have write access to Apache’s configuration files, then you can change the server itself. However, you have to make sure nothing else relies on the old unoverridable setting. This is another reason to use
.htaccess
.When All Else Fails: Solution #3 of 3
If you can neither change the overall server configuration itself nor create a
.htaccess
whose own settings will be respected for anything underneath it, then your only option is to use numeric entities for all code points over 127. For example, instead ofyou must instead use
or
The advantage of that is that it no longer requires a
<meta>
override and fiddling with the server or with.htaccess
files. The disadvantage is that it takes an extra translation pass, which interferes with being able to directly edit the file with an editor that understand literal UTF‑8.Entities Ignore Encodings
The reason it works is because all HTML is always in Unicode, so character number 1062 is always
CYRILLIC CAPITAL LETTER TSE
, etc. Entity numbers always represent Unicode code point numbers; they are never the numbers from the document encoding. Only encoded bytes count as being in the server or page encoding, not unencoded code point numbers which are always Unicode.That’s why we can use something like
é
and it always meansLATIN SMALL LETTER E WITH ACUTE
, because code point 233 is always that character, even if the web page itself should be in some other encoding (like 142 in MacRoman or 221 in NextStep).The numbers of characters are always Unicode numbers, and pay no attention to the encoding. That’s because markup languages like HTML, XHTML, and XML always use logical Unicode code point numbers, just like programming languages like Perl and Go do. (PHP is really just bytes with some UTF‑8 APIs on top of it, but as you have yourself learned, one still has issues with it. This is both because of its internal model but also due to web servers and even web clients, all of which makes everything more complicated in PHP than in most other languages.)
Even if you had encoded your web page in ISO-8859-1 for Cyrillic, where a literal 0xC6 byte encodes Unicode U+0426,
CYRILLIC CAPITAL LETTER TSE
, as a character entity you would useЦ
orЦ
— and notÆ
which would be wrong since U+00C6 isLATIN CAPITAL LETTER AE
.Similarly, if you were using the MacCyrillic encoding, the literal 0x96 byte would be a
CYRILLIC CAPITAL LETTER TSE
, but because the numeric entity is always in Unicode, you must useЦ
orЦ
— and not–
.I prefer using only UTF‑8 for all web pages. Well, for new ones, that is. I do recognize that legacy non‐Unicode pages exist. Those I just leave as is.
您需要在服务器上设置正确的区域设置。
然后您可以检查您的服务器是否已接受所需的区域设置
问题出在 fgetcsv 函数中,该函数使用了不正确的区域设置。如果您无法更改区域设置,您可以使用explode 将 fgetcsv 函数替换为您自己的函数
You need to set correct locale on your server.
And then you can check if your server has accepted needed locale
The problem is in fgetcsv function which is using incorrect locale. If you have no possibility to change locale you could replace fgetcsv function with your own using explode