如何读取文本文件而不丢失奇数字符?
我想使用 System.IO.File.ReadAllLines 将文本文件读入字符串数组。但是,ReadAllLines 删除了文件中我想保留的一些奇怪字符,例如 chr(187)。我尝试了一些不同的编码选项,但这没有帮助,而且我没有看到“无编码”的选项。
我可以使用 FileOpen 和 LineInput 不加修改地读取文件,但这会慢一些。使用 FileSystemObject 也可以正常工作,但我宁愿不使用它。
在 .net 中将文本文件读入字符串数组而不进行修改的最佳方法是什么?
I would like to read a text file into an array of strings using System.IO.File.ReadAllLines. However, ReadAllLines strips out some odd characters in the file that I would like to keep, such as chr(187). I've tried some different encoding options, but that doesn't help and I don't see an option for "no encoding."
I can use FileOpen and LineInput to read the file without modification, but this is quite a bit slower. Using FileSystemObject also works properly, but I would rather not use that.
What is the best way to read a text file into an array of strings without modification in .net?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
不存在“无编码”这样的概念。您必须找到正确的编码,否则您无法正确解释数据。
当你说“chr(187)”时,你指的是什么 Unicode 字符?
您可能想要尝试的一些编码:
There's no such concept as "no encoding". You must find out the right encoding, otherwise you can't possibly interpret the data correctly.
When you say "chr(187)" what Unicode character do you mean?
Some encodings you might want to try:
听起来你想读取原始字节。
使用
File.ReadAllBytes
< /a> 将它们读入数组(不要对大文件执行此操作),或使用FileStream
一次读取字节块。It sounds like you want to read the raw bytes.
Use
File.ReadAllBytes
to read them into an array (don't do this for large files), or use aFileStream
to read chunks of bytes at a time.被删除的字符位于文件的开头。事实证明,它们是 UTF-8 的字节顺序标记。 File.ReadAllLines 和 File.ReadAllText 会去除字节顺序标记,而 LineInput 和 FileSystemObject 函数则不会。
如果我在问题中解释了奇怪的字符位于文件开头,我想我会得到一个快速的答案。我将感谢乔恩·斯基特(Jon Skeet)对我提出的问题提供了最佳答案。
The characters that were stripped out were at the beginning of the file. It turns out they were the byte order marks for UTF-8. File.ReadAllLines and File.ReadAllText strips out the byte order marks, while LineInput and FileSystemObject functions do not.
If I had explained in the question that the odd characters were at the file beginning, I imagine I would have gotten a quick answer. I'll give Jon Skeet credit for the best answer to the question I posed.