Delphi:TStringList不理解BOM?

发布于 2024-11-16 09:27:58 字数 340 浏览 2 评论 0原文

TStringList不懂BOM吗?

Tf1 := TFileStream.Create(LIGALOG+'liga.log',fmOpenRead or fmShareDenyNone);

str:=tstringlist.Create;
str.LoadFromStream(tf1);

String1:='FStream '+inttostr(tf1.Size)+'/ String: '+(str.Text);

如果文本文件以 UTF-8 +BOM 格式保存,则 Str.Count=0; Str.Text=''。没有 BOM 一切都可以。
正常吗?

Does TStringList not understand BOM?

Tf1 := TFileStream.Create(LIGALOG+'liga.log',fmOpenRead or fmShareDenyNone);

str:=tstringlist.Create;
str.LoadFromStream(tf1);

String1:='FStream '+inttostr(tf1.Size)+'/ String: '+(str.Text);

If a text file is saved in UTF-8 +BOM then Str.Count=0; Str.Text=''. Without BOM all is OK.
Is it normal?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

甜心小果奶 2024-11-23 09:27:58

如果您使用的是 2009 年之前的 Delphi 版本,它不支持 Unicode,并且 BOM 对 TStringList 没有意义。

如果您使用的是 D2009 或更高版本(支持 Unicode),如果您提前知道编码是什么,则可以使用重载的 TStringList.LoadFromStream(Stream: TStream; Encoding: TEncoding);如果不这样做,RTL 将尝试使用 TEncoding.GetBufferEncoding 来解决这个问题。您可以在此处查看有关该主题的 Delphi XE 文档(

如果您不这样做)提前不知道,并且 RTL 无法从内容中找出它,您始终可以自己从流中读取 BOM,然后将 Stream.Position 设置为紧随其后BOM 和负载该位置的 TStringList 以及您根据该 BOM 自行确定的解码。

另外,简单地创建一个 TFileStream 然后加载到 TStringList 是一种浪费; TStringList.LoadFromFile 将处理文件本身,如果这就是您要对 TStream 执行的全部操作,则代码要少得多。

编辑:在您发表评论后,我想我应该包括我熟悉的 BOM 列表 - 可能还有更多我不知道的:(

$00 $00 $FE $FF  UTF-32, big-endian (bytes must be swapped for Windows)
$FE $FF $00 $00  UTF-32, little-endian
$FF $FE          UTF-16 2 byte chars little-endian
$FE $FF          UTF-16 2 byte big-endian 
$EF $BB $BF      Unicode UTF-8 (must be decoded before using Unicode data)

供将来参考:您应该在标签或文本中指出你的问题是你使用的是哪个版本的 Delphi,因为它们之间的 VCL 和 RTL 存在差异,当涉及到 Unicode/BOM 类型问题时,这些差异非常重要。)

If you're using a version of Delphi prior to 2009, it doesn't support Unicode and the BOM is meaningless to TStringList.

If you're using D2009 or higher (which support Unicode), you can use the overloaded TStringList.LoadFromStream(Stream: TStream; Encoding: TEncoding)if you know ahead of time what the encoding is; if you don't, the RTL will try to figure it out using TEncoding.GetBufferEncoding. You can see the Delphi XE documentation on the topic here

If you don't know ahead of time, and the RTL isn't able to figure it out from the content, you can always read the BOM yourself from the stream, and then set the Stream.Position to just after the BOM and load the TStringList from that position with the decoding you determine yourself from that BOM.

Also, creating a TFileStream simply to then load into a TStringList is a waste; TStringList.LoadFromFile will handle the file itself, and is a lot less code if that's all you're going to do with the TStream.

EDIT: After your comment, I thought I'd include a list of the BOMs I'm familiar with - there may be more I'm not aware of:

$00 $00 $FE $FF  UTF-32, big-endian (bytes must be swapped for Windows)
$FE $FF $00 $00  UTF-32, little-endian
$FF $FE          UTF-16 2 byte chars little-endian
$FE $FF          UTF-16 2 byte big-endian 
$EF $BB $BF      Unicode UTF-8 (must be decoded before using Unicode data)

(For future reference: You should indicate in either the tags or the text of your question which version of Delphi you're using, as there are differences in the VCL and RTL between them. When it comes to things like Unicode/BOM type questions, these differences are extremely important.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文