如何使用 Delphi 7 将文本文件从 ANSI 转换为 UTF-8?

发布于 2024-07-16 10:41:25 字数 118 浏览 13 评论 0原文

我用 Delphi 7 编写了一个程序,用于搜索硬盘上的 *.srt 文件。 该程序在备忘录中列出这些文件的路径和名称。 现在我需要将这些文件从 ANSI 转换为 UTF-8,但我还没有成功。

I written a program with Delphi 7 which searches *.srt files on a hard drive. This program lists the path and name of these files in a memo. Now I need convert these files from ANSI to UTF-8, but I haven't succeeded.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

肩上的翅膀 2024-07-23 10:41:25

Utf8Encode 函数采用 WideString 字符串作为参数并返回 Utf-8 字符串。

样本:

procedure ConvertANSIFileToUTF8File(AInputFileName, AOutputFileName: TFileName);
var
  Strings: TStrings;
begin
  Strings := TStringList.Create;
  try
    Strings.LoadFromFile(AInputFileName);
    Strings.Text := UTF8Encode(Strings.Text);
    Strings.SaveToFile(AOutputFileName);
  finally
    Strings.Free;
  end;
end;

The Utf8Encode function takes a WideString string as parameter and returns a Utf-8 string.

Sample:

procedure ConvertANSIFileToUTF8File(AInputFileName, AOutputFileName: TFileName);
var
  Strings: TStrings;
begin
  Strings := TStringList.Create;
  try
    Strings.LoadFromFile(AInputFileName);
    Strings.Text := UTF8Encode(Strings.Text);
    Strings.SaveToFile(AOutputFileName);
  finally
    Strings.Free;
  end;
end;
悲念泪 2024-07-23 10:41:25
var
  Latin1Encoding: TEncoding;
begin
  Latin1Encoding := TEncoding.GetEncoding(28591);
  try
       MyTStringList.SaveToFile('some file.txt', Latin1Encoding);
  finally
      Latin1Encoding.Free;
  end;
end;
var
  Latin1Encoding: TEncoding;
begin
  Latin1Encoding := TEncoding.GetEncoding(28591);
  try
       MyTStringList.SaveToFile('some file.txt', Latin1Encoding);
  finally
      Latin1Encoding.Free;
  end;
end;
怪我太投入 2024-07-23 10:41:25

看看 GpTextStream ,它看起来像与 Delphi 7 一起工作。它有能力在旧版本的 Delphi 中读取/写入 unicode 文件(尽管可以与 Delphi 2009 一起使用)并且应该有助于您的转换。

Take a look at GpTextStream which looks like it works with Delphi 7. It has the ability to read/write unicode files in older versions of Delphi (although does work with Delphi 2009) and should help with your conversion.

榆西 2024-07-23 10:41:25

请在开始编码之前阅读整个答案。


问题的正确答案 - 这并不容易 - 基本上由树形步骤组成:

  1. 您必须确定计算机上使用的 ANSI 代码页。 您可以通过使用 Windows API 中的 GetACP() 函数来实现此目标。 (重要提示:在文件名检索后,您必须尽快检索代码页,因为用户可以更改它。)
  2. 您必须通过调用 MultiByteToWideChar() Windows 将 ANSI 字符串转换为 Unicode具有正确 CodePage 参数的 API 函数(在上一步中检索)。 完成此步骤后,您将获得一个包含文件名列表的 UTF-16 字符串(实际上是 WideString)。
  3. 您必须使用 UTF8Encode() 或 WideCharToMultiByte() Windows API 将 Unicode 字符串转换为 UTF-8。 该函数将返回您需要的 UTF-8 字符串。

但是,此解决方案将返回包含输入 ANSI 字符串的 UTF-8 字符串,这可能不是解决问题的最佳方法,因为当 ANSI 函数返回文件名时,文件名可能已经损坏,因此正确的文件名不保证


解决问题的正确方法要复杂得多:

如果您想确保您的文件名列表完全干净,您必须确保它不会被删除完全转换为 ANSI。 您可以通过显式使用文件处理 API 的“W”版本来完成此操作。 在这种情况下 - 当然 - 您不能使用 TFileStream 和其他 ANSI 文件处理对象,而是直接调用 Windows API。

这并不那么难,但是如果您已经有一个基于 TFileStream 等构建的复杂框架,@ss 可能会有点痛苦。 在这种情况下,最好的解决方案是创建一个使用适当 API 的 TStream 后代。

我希望我的回答可以帮助您或任何需要处理同样问题的人。 (不久前我不得不这么做。)

Please read the whole answer before you start coding.


The proper answer to question - and it is not the easy one - basically consist of tree steps:

  1. You have to determine the ANSI code page used on your computer. You can achieve this goal by using the GetACP() function from Windows API. (Important: you have to retrieve the codepage as soon as possible after the file name retrieval, because it can be changed by the user.)
  2. You must convert your ANSI string to Unicode by calling MultiByteToWideChar() Windows API function with the correct CodePage parameter (retrieved in the previous step). After this step you have an UTF-16 string (practically a WideString) containing the file name list.
  3. You have to convert the Unicode string to UTF-8 using UTF8Encode() or the WideCharToMultiByte() Windows API. This function will return an UTF-8 string you needed.

However this solution will return an UTF-8 string containing the input ANSI string, this probably is not the best way to solve your problems, since the file names may already be corrupted when the ANSI functions returned them, so proper file names are not guaranteed.


The proper solution to your problem is ways more complicated:

If you want to be sure that your file name list is exactly clean, you have to make sure it won't get converted to ANSI at all. You can do this by explicitly using the "W" version of the file handling API's. In this case - of course - you can not use TFileStream and other ANSI file handling objects, but the Windows API calls directly.

It is not that hard, but if you already have a complex framework built on e.g. TFileStream it could be a bit of a pain in the @ss. In this case the best solution is to create a TStream descendant that uses the appropriate API's.

I hope my answer helps you or anyone who has to deal with the same problem. (I had to not so long ago.)

请叫√我孤独 2024-07-23 10:41:25

我只做了这个:

procedure TForm1.FormCreate(Sender: TObject);
begin
  Strings := TStringList.Create;
end;  

procedure TForm1.Button3Click(Sender: TObject);
begin
   Strings.Text := UTF8Encode(Memo1.Text);
   Strings.SaveToFile('new.txt');
end;

Verified with Notepad++ UTF8 without BOM

I did only this:

procedure TForm1.FormCreate(Sender: TObject);
begin
  Strings := TStringList.Create;
end;  

procedure TForm1.Button3Click(Sender: TObject);
begin
   Strings.Text := UTF8Encode(Memo1.Text);
   Strings.SaveToFile('new.txt');
end;

Verified with Notepad++ UTF8 without BOM

橘和柠 2024-07-23 10:41:25

你指的是ASCII吗?

ASCII 向后兼容 UTF-8。
http://en.wikipedia.org/wiki/UTF-8

Did you mean ASCII?

ASCII is backwards compatible with UTF-8.
http://en.wikipedia.org/wiki/UTF-8

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文