上传文件中的重音符号被替换为 '?
我正在为我正在开发的网站的管理部分构建一个数据导入工具。该数据为法语和英语,并且包含许多重音字符。每当我尝试上传文件、解析数据并将其存储在 MySQL 数据库中时,重音符号都会被替换为“?”。
我有包含数据的文本文件(字符集为 iso-8859-1),我使用 CodeIgniter 的文件上传库将其上传到我的服务器。然后我用 PHP 读取该文件。
我的代码与此类似:
$this->upload->do_upload()
$data = array('upload_data' => $this->upload->data());
$fileHandle = fopen($data['upload_data']['full_path'], "r");
while (($line = fgets($fileHandle)) !== false) {
echo $line;
}
这会生成重音符号替换为“?”的行。其他一切都是正确的。
如果我通过 FTP 从服务器下载上传的文件,字符集仍然是 iso-8850-1,但差异显示该文件已更改。但是,如果我在 TextEdit 中打开该文件,它会正确显示。
我尝试使用 PHP 的 stream_encoding
方法将文件流显式设置为 iso-8859-1,但我的 PHP 版本没有该方法。
在没有想法后,我尝试将字符串包装在 utf8_encode
和 utf8_decode
中。两者都不起作用。
如果有人对我可以尝试的事情有任何建议,我将非常感激。
I am building a data import tool for the admin section of a website I am working on. The data is in both French and English, and contains many accented characters. Whenever I attempt to upload a file, parse the data, and store it in my MySQL database, the accents are replaced with '?'.
I have text files containing data (charset is iso-8859-1) which I upload to my server using CodeIgniter's file upload library. I then read the file in PHP.
My code is similar to this:
$this->upload->do_upload()
$data = array('upload_data' => $this->upload->data());
$fileHandle = fopen($data['upload_data']['full_path'], "r");
while (($line = fgets($fileHandle)) !== false) {
echo $line;
}
This produces lines with accents replaced with '?'. Everything else is correct.
If I download my uploaded file from my server over FTP, the charset is still iso-8850-1, but a diff reveals that the file has changed. However, if I open the file in TextEdit, it displays properly.
I attempted to use PHP's stream_encoding
method to explicitly set my file stream to iso-8859-1, but my build of PHP does not have the method.
After running out of ideas, I tried wrapping my strings in both utf8_encode
and utf8_decode
. Neither worked.
If anyone has any suggestions about things I could try, I would be extremely grateful.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
重要的是要查看损坏是在向 mySQL 发出查询之前还是之后发生的。这里发生的事情有太多可能,无法确定。你能输出你的MySql来检查这个吗?
假设您的查询格式正确(在输出查询的阶段没有损坏),您应该检查以下几件事。
数据库本身的字符编码是什么? (collation)
连接的字符集是什么 - 这可能在您的 mysql 配置中没有正确设置,可以使用“SET NAMES”命令手动设置
在我自己的应用程序中,我在建立连接后发出“SET NAMES utf8”作为我的第一个查询,因为我无法更改MySQL 配置。
看到这个。
http://dev.mysql.com/doc/refman/5.0 /en/charset-connection.html
编辑:如果问题与mysql无关,我会检查以下内容
你说文件的编码是'charset is iso -8859-1' - 我能问你怎么确定这一点吗?
如果将文件本身另存为 utf8(无 BOM)并尝试重新处理它会发生什么?
执行转换的 php 文件的编码是什么? (你用什么来编写你的 php - 它可能会以一种不希望的方式为你“管理”它)
(旁白)你正在处理的文件是否适合使用 fgetcsv 来处理?
http://php.net/manual/en/function.fgetcsv.php
It's Important to see if the corruption is happening before or after the query is being issued to mySQL. There are too many possible things happening here to be able to pinpoint it. Are you able to output your MySql to check this?
Assuming that your query IS properly formed (no corruption at the stage the query is being outputted) there are a couple of things that you should check.
What is the character encoding of the database itself? (collation)
What is the Charset of the connection - this may not be set up correctly in your mysql config and can be manually set using the 'SET NAMES' command
In my own application I issue a 'SET NAMES utf8' as my first query after establishing a connection as I am unable to change the MySQL config.
See this.
http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html
Edit: If the issue is not related to mysql I'd check the following
You say the encoding of the file is 'charset is iso-8859-1' - can I ask how you are sure of this?
What happens if you save the file itself as utf8 (Without BOM) and try to reprocess it?
What is the encoding of the php file that is performing the conversion? (What are you using to write your php - it may be 'managing' this for you in an undesired way)
(an aside) Are the files you are processing suitable for processing using fgetcsv instead?
http://php.net/manual/en/function.fgetcsv.php
上传到服务器的文件在下载时应该返回相同的结果。这意味着,文件的编码(只是一堆二进制数据)不应更改。相反,您应该注意能够以不变的方式存储该文件的二进制信息。
要在数据库中实现此目的,请创建一个 BLOB 字段。这是正确的列类型。这只是二进制数据。
假设您使用的是 MySQL,这是参考:BLOB 和 TEXT 类型,留意 BLOB。
Files uploaded to your server should be returned the same on download. That means, the encoding of the file (which is just a bunch of binary data) should not be changed. Instead you should take care that you are able to store the binary information of that file unchanged.
To achieve that with your database, create a BLOB field. That's the right column type for it. It's just binary data.
Assuming you're using MySQL, this is the reference: The BLOB and TEXT Types, look out for BLOB.
问题是您使用的是 iso-8859-1 而不是 utf-8。为了将其编码为正确的字符集,您应该使用 iconv 函数,如下所示:
$output_string = iconv('utf-8", "utf-8//TRANSLIT", $input_string);
iso-8859-1 没有任何类型的重音编码,
如果一切都是 utf-8 会好得多,因为它几乎可以处理人类已知的所有字符。
The problem is that you are using iso-8859-1 instead of utf-8. In order to encode it in the correct charset, you should use the iconv function, like so:
$output_string = iconv('utf-8", "utf-8//TRANSLIT", $input_string);
iso-8859-1 does not have the encoding for any sort of accents.
It would be so much better if everything were utf-8, as it handles virtually every character known to man.