将网页从不同的字符集迁移到 UTF-8

发布于 2024-10-14 12:07:47 字数 997 浏览 8 评论 0原文

过去几年我在 Win XP SP2 上使用 Notepad++。 正如我刚才所看到的,Notepad++ 中的设置是将新文件以“Windows 格式”中的“ANSI”进行编码。基本上我硬盘上的所有文件都应该是 ANSI 文件,但我不确定。 大多数 .html 文件的字符集标签为“text/html; charset=iso-8859-1”,但有些文件没有。 其他文件,特别是我用 Firefox XPCOM 系统存储的文本文件(例如关键字列表),我不知道它们当前是如何编码的。

在服务器端,我有 Apache、PHP 和 MySql。 对于上传,我使用了 Filezilla。

现在的问题是:我想使用日语标志(或阿拉伯语等)。这只能部分起作用。 我可以让我自制的 Firefox 应用程序不断地写入或读取 UTF-8。但我无法每次都检查哪些旧文件是哪种编码。

刚刚阅读了 Joel Spolsky 关于 UTF-8 的旧文章,这强化了我的观点:我只需将整个系统尽可能地更改为 UTF-8。 只要我让它在本地硬盘上以这种方式运行,我就可以将所有内容重新上传到服务器。

那么:如何将本地所有文件传输为 UTF-8? 并且:是否有可能让 Win XP SP2 在所有地方不断使用 UTF-8?或者我是否必须对每个程序进行检查,甚至更糟糕的是对每个文件进行检查,以确保使用正确的编码。 我通过电子邮件或 USB 记忆棒获得的文件,或者以 zip 文件形式下载的文件怎么样? (或者还有一千种可能性。)

更新:

1.-4。到目前为止一切顺利。我首先尝试使用 BOM,但不使用似乎更好。
所以到5。)我也必须在那里改变一些东西。我按照 3.) 中的方式更改了 html-template-file 中的字符集,并且来自模板的文本可以正确显示。但来自MySql/Php 的文本目前在某些地方显示了UnknownChar 符号,即应该有Umlaute äöü 的地方。 我已通过 phpmyadmin 将 MySql 数据库中文本字段的所有排序规则更改为“utf8_unicode_ci”,但这并没有解决问题。 这是一个 php 问题,还是我只需要以某种方式转换 MySql 数据库中的数据一次?

For the last years I used Notepad++ on Win XP SP2.
As I just have seen, the setting in Notepad++ is to encode new files in "ANSI" in "Windows Format". Basically all files on my harddisk should be ANSI files then, but I'm not sure.
Most .html-files have a charset-tag as "text/html; charset=iso-8859-1", but some have none.
Other files, especially text-files (for example keyword-lists) I stored with Firefox XPCOM-system, I don't know how they are currently encoded.

On Server-side I have Apache with PHP and MySql.
For Upload I used Filezilla.

Now the problem is: I want to use Japanes signs (or arabic, etc.). This only works partly.
I can get my selfmade Firefox-Application to constantly write or read UTF-8. But I can't check everytime which of the old files is which encoding.

Having just read Joel Spolsky's old article about UTF-8 strengthens my view that I simply have to get my whole system changed as much as possible to UTF-8.
As long as I have it running that way locally on my Hard-Disk I could just re-upload everything to the server.

So: How do I get all my files locally transfered to UTF-8?
And: Is it possible at all to have Win XP SP2 using constantly UTF-8 everywhere? Or do I have to check it with every program, or even worse with every file, that the right encoding is to be used.
How about files I get for example in E-Mails or via an USB-stick, or that I download in zip-files? (Or a thousand possibilities more.)

Update:

1.-4. went OK so far. I tried first with BOM, but without seems to be better.
So to 5.) Something I have to change there too. I changed as in 3.) the charset in the html-template-file, and the text coming from the template is displayed correctly. But the text coming from MySql/Php shows the UnknownChar-sign at some places currently, i.e. where there should be Umlaute äöü.
I have changed all collations for text fields in the MySql-Database via phpmyadmin to "utf8_unicode_ci", but that didn't do the trick.
Is it a php-issue, or do I only have to convert somehow the data in the MySql-Database once?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

风吹雪碎 2024-10-21 12:07:47
  1. UTF-8 的优点在于它是 ASCII 的超集,因此如果您的 html 和 php 文件仅包含拉丁字母(即英语和编程/HTML 语法),则根本不需要转换文件。您可以保留文件的大部分内容不变。
  2. 如果您发现一些例外情况需要手动转换,您可以在 Notepad++ 中打开它们,然后执行“编码”-“转换 UTF-8(无 BOM)”。
  3. 是的,您确实需要更改/添加charset 标签添加到所有 HTML 文件中,以确保浏览器以 UTF-8 格式呈现您的文件。
  4. 在 Notepad++ 中,您可以将新文件设置为始终使用“UTF-8(无 BOM),Unix”打开。另外,勾选“应用到 ANSI 文件”,以便旧文件可以正确保存为新编码。我建议采用这种格式是因为即使您在 Windows 计算机上工作,Web 服务器通常也运行 Linux/BSD,因此该格式是本机形式(以本机形式保存文件非常重要,尤其是当您使用版本控制系统时)。
  5. 迁移带有数据库的实时站点是一个不同的问题。 MySQL中的数据有自己的编码,从你的问题我无法判断你是否需要这样做以及如何做。需要更多细节(如果需要的话)。
  1. The beauty of UTF-8 is that it's a superset to ASCII, so if your html and php files only contain Latin alphabets (i.e. English and programing/HTML syntax), you don't need to convert the file at all. You can leave most of your file unchanged.
  2. Should you find few exceptions that you want to convert it manually, you may open them up in Notepad++, and do 'Encoding' - 'Convert to UTF-8 (No BOM)'.
  3. Yes, you do need to change/add <meta> charset tag to all the HTML files to make sure the browser render your files in UTF-8.
  4. In Notepad++ you could set the new file to always open with 'UTF-8 (No BOM), Unix'. Also, check the tick on "Apply to ANSI files" so old file can be correctly saved to the new encoding. I suggest the format is because even though you are working on a Windows machine, the web servers usually runs Linux/BSD so the format is the native form (keeping files in native form is important especially when you are using a version control system).
  5. Migrate a live site with database is a different issue. Data in MySQL comes with their own encoding, and from your question I cannot tell if you need to do it and how to do it. Need more specifics on that (if you need to).
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文