如何读取文本文件而不丢失奇数字符?

发布于 2024-08-12 04:38:22 字数 272 浏览 1 评论 0原文

我想使用 System.IO.File.ReadAllLines 将文本文件读入字符串数组。但是,ReadAllLines 删除了文件中我想保留的一些奇怪字符,例如 chr(187)。我尝试了一些不同的编码选项,但这没有帮助,而且我没有看到“无编码”的选项。

我可以使用 FileOpen 和 LineInput 不加修改地读取文​​件,但这会慢一些。使用 FileSystemObject 也可以正常工作,但我宁愿不使用它。

在 .net 中将文本文件读入字符串数组而不进行修改的最佳方法是什么?

I would like to read a text file into an array of strings using System.IO.File.ReadAllLines. However, ReadAllLines strips out some odd characters in the file that I would like to keep, such as chr(187). I've tried some different encoding options, but that doesn't help and I don't see an option for "no encoding."

I can use FileOpen and LineInput to read the file without modification, but this is quite a bit slower. Using FileSystemObject also works properly, but I would rather not use that.

What is the best way to read a text file into an array of strings without modification in .net?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

謌踐踏愛綪 2024-08-19 04:38:22

不存在“无编码”这样的概念。您必须找到正确的编码,否则您无法正确解释数据。

当你说“chr(187)”时,你指的是什么 Unicode 字符?

您可能想要尝试的一些编码:

  • Encoding.Default - 系统默认编码
  • Encoding.GetEncoding(28591) - ISO-Latin-1
  • Encoding.UTF8 - 在现代文件中非常常见

There's no such concept as "no encoding". You must find out the right encoding, otherwise you can't possibly interpret the data correctly.

When you say "chr(187)" what Unicode character do you mean?

Some encodings you might want to try:

  • Encoding.Default - the system default encoding
  • Encoding.GetEncoding(28591) - ISO-Latin-1
  • Encoding.UTF8 - very common in modern files
轻许诺言 2024-08-19 04:38:22

It sounds like you want to read the raw bytes.

Use File.ReadAllBytes to read them into an array (don't do this for large files), or use a FileStream to read chunks of bytes at a time.

巾帼英雄 2024-08-19 04:38:22

被删除的字符位于文件的开头。事实证明,它们是 UTF-8 的字节顺序标记。 File.ReadAllLines 和 File.ReadAllText 会去除字节顺序标记,而 LineInput 和 FileSystemObject 函数则不会。

如果我在问题中解释了奇怪的字符位于文件开头,我想我会得到一个快速的答案。我将感谢乔恩·斯基特(Jon Skeet)对我提出的问题提供了最佳答案。

The characters that were stripped out were at the beginning of the file. It turns out they were the byte order marks for UTF-8. File.ReadAllLines and File.ReadAllText strips out the byte order marks, while LineInput and FileSystemObject functions do not.

If I had explained in the question that the odd characters were at the file beginning, I imagine I would have gotten a quick answer. I'll give Jon Skeet credit for the best answer to the question I posed.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文