如何写入 UTF-8 格式的文件?

发布于 2024-10-14 15:52:41 字数 496 浏览 8 评论 0 原文

我有一堆不是 UTF-8 编码的文件,我正在将一个站点转换为 UTF-8 编码。

我对要以 UTF-8 保存的文件使用简单脚本,但文件以旧编码保存:

header('Content-type: text/html; charset=utf-8');
mb_internal_encoding('UTF-8');
$fpath = "folder";
$d = dir($fpath);
while (False !== ($a = $d->read()))
{
    if ($a != '.' and $a != '..')
    {
        $npath = $fpath . '/' . $a;

        $data = file_get_contents($npath);

        file_put_contents('tempfolder/' . $a, $data);
    }
}

How can I save files in UTF-8编码?

I have bunch of files that are not in UTF-8 encoding and I'm converting a site to UTF-8 encoding.

I'm using simple script for files that I want to save in UTF-8, but the files are saved in old encoding:

header('Content-type: text/html; charset=utf-8');
mb_internal_encoding('UTF-8');
$fpath = "folder";
$d = dir($fpath);
while (False !== ($a = $d->read()))
{
    if ($a != '.' and $a != '..')
    {
        $npath = $fpath . '/' . $a;

        $data = file_get_contents($npath);

        file_put_contents('tempfolder/' . $a, $data);
    }
}

How can I save files in UTF-8 encoding?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

囍孤女 2024-10-21 15:52:41

添加 BOM:UTF-8

file_put_contents($myFile, "\xEF\xBB\xBF".  $content); 

Add BOM: UTF-8

file_put_contents($myFile, "\xEF\xBB\xBF".  $content); 
尛丟丟 2024-10-21 15:52:41

file_get_contents()file_put_contents() > 不会神奇地转换编码。

您必须显式转换字符串;例如使用 iconv()mb_convert_encoding()

试试这个:

$data = file_get_contents($npath);
$data = mb_convert_encoding($data, 'UTF-8', 'OLD-ENCODING');
file_put_contents('tempfolder/' . $a, $data);

或者,使用 PHP 的流过滤器:

$fd = fopen($file, 'r');
stream_filter_append($fd, 'convert.iconv.UTF-8/OLD-ENCODING');
stream_copy_to_stream($fd, fopen($output, 'w'));

file_get_contents() and file_put_contents() will not magically convert encoding.

You have to convert the string explicitly; for example with iconv() or mb_convert_encoding().

Try this:

$data = file_get_contents($npath);
$data = mb_convert_encoding($data, 'UTF-8', 'OLD-ENCODING');
file_put_contents('tempfolder/' . $a, $data);

Or alternatively, with PHP's stream filters:

$fd = fopen($file, 'r');
stream_filter_append($fd, 'convert.iconv.UTF-8/OLD-ENCODING');
stream_copy_to_stream($fd, fopen($output, 'w'));
莫相离 2024-10-21 15:52:41
<?php
    function writeUTF8File($filename, $content) {
        $f = fopen($filename, "w");
        # Now UTF-8 - Add byte order mark
        fwrite($f, pack("CCC", 0xef, 0xbb, 0xbf));
        fwrite($f, $content);
        fclose($f);
    }
?>
<?php
    function writeUTF8File($filename, $content) {
        $f = fopen($filename, "w");
        # Now UTF-8 - Add byte order mark
        fwrite($f, pack("CCC", 0xef, 0xbb, 0xbf));
        fwrite($f, $content);
        fclose($f);
    }
?>
巴黎盛开的樱花 2024-10-21 15:52:41

Iconv 来救援。

Iconv to the rescue.

百善笑为先 2024-10-21 15:52:41

在 Unix/Linux 上,可以使用简单的 shell 命令来转换给定目录中的所有文件:

recode L1..UTF8 dir/*

它可以通过 PHP 的 exec() 也是如此。

On Unix/Linux, a simple shell command could be used alternatively to convert all files from a given directory:

recode L1..UTF8 dir/*

It could be started via PHP's exec() as well.

驱逐舰岛风号 2024-10-21 15:52:41
//add BOM to fix UTF-8 in Excel
fputs($fp, $bom =( chr(0xEF) . chr(0xBB) . chr(0xBF) ));

我从 Cool 得到了这一行

//add BOM to fix UTF-8 in Excel
fputs($fp, $bom =( chr(0xEF) . chr(0xBB) . chr(0xBF) ));

I got this line from Cool

独﹏钓一江月 2024-10-21 15:52:41

如果您想递归地使用重新编码并过滤类型,请尝试以下操作:

find . -name "*.html" -exec recode L1..UTF8 {} \;

If you want to use recode recursively, and filter for type, try this:

find . -name "*.html" -exec recode L1..UTF8 {} \;
嗼ふ静 2024-10-21 15:52:41

我把所有这些放在一起,得到了将 ANSI 文本文件转换为“UTF-8 No Mark”的简单方法:

function filesToUTF8($searchdir, $convdir, $filetypes) {
  $get_files = glob($searchdir . '*{' . $filetypes . '}', GLOB_BRACE);
  foreach($get_files as $file) {
    $expl_path = explode('/', $file);
    $filename = end($expl_path);
    $get_file_content = file_get_contents($file);
    $new_file_content = iconv(mb_detect_encoding($get_file_content, mb_detect_order(), true), "UTF-8", $get_file_content);
    $put_new_file = file_put_contents($convdir.$filename, $new_file_content);
  }
}

用法:

filesToUTF8('C:/Temp/', 'C:/Temp/conv_files/', 'php,txt');

I put all together and got easy way to convert ANSI text files to "UTF-8 No Mark":

function filesToUTF8($searchdir, $convdir, $filetypes) {
  $get_files = glob($searchdir . '*{' . $filetypes . '}', GLOB_BRACE);
  foreach($get_files as $file) {
    $expl_path = explode('/', $file);
    $filename = end($expl_path);
    $get_file_content = file_get_contents($file);
    $new_file_content = iconv(mb_detect_encoding($get_file_content, mb_detect_order(), true), "UTF-8", $get_file_content);
    $put_new_file = file_put_contents($convdir.$filename, $new_file_content);
  }
}

Usage:

filesToUTF8('C:/Temp/', 'C:/Temp/conv_files/', 'php,txt');
爱要勇敢去追 2024-10-21 15:52:41

这是一个非常有用的问题。我认为我在 Windows 10 PHP 7 上的解决方案对于那些还存在 UTF-8 转换问题的人来说相当有用。

这是我的步骤。调用以下函数的 PHP 脚本(位于 utfsave.php 中)本身必须具有 UTF-8 编码,这可以通过 UltraEdit

utfsave.php文件中,我们定义了一个调用PHP fopen($filename, "wb")的函数,即在两者中打开w 写入模式,尤其是二进制 模式下的b

<?php
//
//  UTF-8 编码:
//
// fnc001: save string as a file in UTF-8:
// The resulting file is UTF-8 only if $strContent is,
// with French accents, Chinese ideograms, etc.
//
function entSaveAsUtf8($strContent, $filename) {
  $fp = fopen($filename, "wb");
  fwrite($fp, $strContent);
  fclose($fp);
  return True;
}

//
// 0. write UTF-8 string in fly into UTF-8 file:
//
$strContent = "My string contains UTF-8 chars ie 鱼肉酒菜 for un été en France";

$filename = "utf8text.txt";

entSaveAsUtf8($strContent, $filename);


//
// 2. convert CP936 ANSI/OEM - Chinese simplified GBK file into UTF-8 file
//
//   CP936: <https://en.wikipedia.org/wiki/Code_page_936_(Microsoft_Windows)>
//   GBK:   <https://en.wikipedia.org/wiki/GBK_(character_encoding)> 
//
$strContent = file_get_contents("cp936gbktext.txt");
$strContent = mb_convert_encoding($strContent, "UTF-8", "CP936");


$filename = "utf8text2.txt";

entSaveAsUtf8($strContent, $filename);

?>

源文件cp936gbktext.txt的内容:

>>Get-Content cp936gbktext.txt
My string contains UTF-8 chars ie 鱼肉酒菜 for un été en France 936 (ANSI/OEM - chinois simplifié GBK)

在Windows 10 PHP上运行utf8save.php,从而创建utf8text.txtutf8text2。 txt 文件将自动保存为 UTF-8 格式。

使用此方法,不需要 BOM 字符。 BOM 解决方案很糟糕,因为当我们为 MySQL 获取 SQL 文件时,它会带来麻烦。

值得注意的是,我未能为此目的进行 file_put_contents($filename, utf8_encode($mystring)); 工作。

++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++++++++++++

如果你不知道源文件的编码,你可以用 PHP 列出编码:

print_r(mb_list_encodings());

这给出了一个这样的列表:

Array
(
  [0] => pass
  [1] => wchar
  [2] => byte2be
  [3] => byte2le
  [4] => byte4be
  [5] => byte4le
  [6] => BASE64
  [7] => UUENCODE
  [8] => HTML-ENTITIES
  [9] => Quoted-Printable
  [10] => 7bit
  [11] => 8bit
  [12] => UCS-4
  [13] => UCS-4BE
  [14] => UCS-4LE
  [15] => UCS-2
  [16] => UCS-2BE
  [17] => UCS-2LE
  [18] => UTF-32
  [19] => UTF-32BE
  [20] => UTF-32LE
  [21] => UTF-16
  [22] => UTF-16BE
  [23] => UTF-16LE
  [24] => UTF-8
  [25] => UTF-7
  [26] => UTF7-IMAP
  [27] => ASCII
  [28] => EUC-JP
  [29] => SJIS
  [30] => eucJP-win
  [31] => EUC-JP-2004
  [32] => SJIS-win
  [33] => SJIS-Mobile#DOCOMO
  [34] => SJIS-Mobile#KDDI
  [35] => SJIS-Mobile#SOFTBANK
  [36] => SJIS-mac
  [37] => SJIS-2004
  [38] => UTF-8-Mobile#DOCOMO
  [39] => UTF-8-Mobile#KDDI-A
  [40] => UTF-8-Mobile#KDDI-B
  [41] => UTF-8-Mobile#SOFTBANK
  [42] => CP932
  [43] => CP51932
  [44] => JIS
  [45] => ISO-2022-JP
  [46] => ISO-2022-JP-MS
  [47] => GB18030
  [48] => Windows-1252
  [49] => Windows-1254
  [50] => ISO-8859-1
  [51] => ISO-8859-2
  [52] => ISO-8859-3
  [53] => ISO-8859-4
  [54] => ISO-8859-5
  [55] => ISO-8859-6
  [56] => ISO-8859-7
  [57] => ISO-8859-8
  [58] => ISO-8859-9
  [59] => ISO-8859-10
  [60] => ISO-8859-13
  [61] => ISO-8859-14
  [62] => ISO-8859-15
  [63] => ISO-8859-16
  [64] => EUC-CN
  [65] => CP936
  [66] => HZ
  [67] => EUC-TW
  [68] => BIG-5
  [69] => CP950
  [70] => EUC-KR
  [71] => UHC
  [72] => ISO-2022-KR
  [73] => Windows-1251
  [74] => CP866
  [75] => KOI8-R
  [76] => KOI8-U
  [77] => ArmSCII-8
  [78] => CP850
  [79] => JIS-ms
  [80] => ISO-2022-JP-2004
  [81] => ISO-2022-JP-MOBILE#KDDI
  [82] => CP50220
  [83] => CP50220raw
  [84] => CP50221
  [85] => CP50222
)

如果你猜不出来,你一个一个地尝试,因为 mb_detect_encoding() 无法轻松完成这项工作。

This is a quite useful question. I think that my solution on Windows 10 PHP 7 is rather useful for people who have yet some UTF-8 conversion trouble.

Here are my steps. The PHP script calling the following function, here in utfsave.php must have UTF-8 encoding itself, and this can be easily done by conversion on UltraEdit.

In the utfsave.php file, we define a function calling PHP fopen($filename, "wb"), i.e., it's opened in both w write mode, and especially with b in binary mode.

<?php
//
//  UTF-8 编码:
//
// fnc001: save string as a file in UTF-8:
// The resulting file is UTF-8 only if $strContent is,
// with French accents, Chinese ideograms, etc.
//
function entSaveAsUtf8($strContent, $filename) {
  $fp = fopen($filename, "wb");
  fwrite($fp, $strContent);
  fclose($fp);
  return True;
}

//
// 0. write UTF-8 string in fly into UTF-8 file:
//
$strContent = "My string contains UTF-8 chars ie 鱼肉酒菜 for un été en France";

$filename = "utf8text.txt";

entSaveAsUtf8($strContent, $filename);


//
// 2. convert CP936 ANSI/OEM - Chinese simplified GBK file into UTF-8 file
//
//   CP936: <https://en.wikipedia.org/wiki/Code_page_936_(Microsoft_Windows)>
//   GBK:   <https://en.wikipedia.org/wiki/GBK_(character_encoding)> 
//
$strContent = file_get_contents("cp936gbktext.txt");
$strContent = mb_convert_encoding($strContent, "UTF-8", "CP936");


$filename = "utf8text2.txt";

entSaveAsUtf8($strContent, $filename);

?>

The content of source file cp936gbktext.txt:

>>Get-Content cp936gbktext.txt
My string contains UTF-8 chars ie 鱼肉酒菜 for un été en France 936 (ANSI/OEM - chinois simplifié GBK)

Running utf8save.php on Windows 10 PHP, thus created utf8text.txt, utf8text2.txt files will be automatically saved in UTF-8 format.

With this method, the BOM characters are not required. The BOM solution is bad because it causes troubles when we do sourcing of an SQL file for MySQL for example.

It's worth noting that I failed making work file_put_contents($filename, utf8_encode($mystring)); for this purpose.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

If you don't know the encoding of the source file, you can list encodings with PHP:

print_r(mb_list_encodings());

This gives a list like this:

Array
(
  [0] => pass
  [1] => wchar
  [2] => byte2be
  [3] => byte2le
  [4] => byte4be
  [5] => byte4le
  [6] => BASE64
  [7] => UUENCODE
  [8] => HTML-ENTITIES
  [9] => Quoted-Printable
  [10] => 7bit
  [11] => 8bit
  [12] => UCS-4
  [13] => UCS-4BE
  [14] => UCS-4LE
  [15] => UCS-2
  [16] => UCS-2BE
  [17] => UCS-2LE
  [18] => UTF-32
  [19] => UTF-32BE
  [20] => UTF-32LE
  [21] => UTF-16
  [22] => UTF-16BE
  [23] => UTF-16LE
  [24] => UTF-8
  [25] => UTF-7
  [26] => UTF7-IMAP
  [27] => ASCII
  [28] => EUC-JP
  [29] => SJIS
  [30] => eucJP-win
  [31] => EUC-JP-2004
  [32] => SJIS-win
  [33] => SJIS-Mobile#DOCOMO
  [34] => SJIS-Mobile#KDDI
  [35] => SJIS-Mobile#SOFTBANK
  [36] => SJIS-mac
  [37] => SJIS-2004
  [38] => UTF-8-Mobile#DOCOMO
  [39] => UTF-8-Mobile#KDDI-A
  [40] => UTF-8-Mobile#KDDI-B
  [41] => UTF-8-Mobile#SOFTBANK
  [42] => CP932
  [43] => CP51932
  [44] => JIS
  [45] => ISO-2022-JP
  [46] => ISO-2022-JP-MS
  [47] => GB18030
  [48] => Windows-1252
  [49] => Windows-1254
  [50] => ISO-8859-1
  [51] => ISO-8859-2
  [52] => ISO-8859-3
  [53] => ISO-8859-4
  [54] => ISO-8859-5
  [55] => ISO-8859-6
  [56] => ISO-8859-7
  [57] => ISO-8859-8
  [58] => ISO-8859-9
  [59] => ISO-8859-10
  [60] => ISO-8859-13
  [61] => ISO-8859-14
  [62] => ISO-8859-15
  [63] => ISO-8859-16
  [64] => EUC-CN
  [65] => CP936
  [66] => HZ
  [67] => EUC-TW
  [68] => BIG-5
  [69] => CP950
  [70] => EUC-KR
  [71] => UHC
  [72] => ISO-2022-KR
  [73] => Windows-1251
  [74] => CP866
  [75] => KOI8-R
  [76] => KOI8-U
  [77] => ArmSCII-8
  [78] => CP850
  [79] => JIS-ms
  [80] => ISO-2022-JP-2004
  [81] => ISO-2022-JP-MOBILE#KDDI
  [82] => CP50220
  [83] => CP50220raw
  [84] => CP50221
  [85] => CP50222
)

If you cannot guess, you try one by one, as mb_detect_encoding() cannot do the job easily.

嘴硬脾气大 2024-10-21 15:52:41
  1. 在 Windows 笔记本中打开文件
  2. 将编码更改为 UTF-8 编码
  3. 保存文件
  4. 再试一次! :O)
  1. Open your files in windows notebook
  2. Change the encoding to be an UTF-8 encoding
  3. Save your file
  4. Try again! :O)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文