C# - 检测文件中的编码,使用找到的编码将更改写入文件
我编写了一个小程序,用于迭代大量文件并在找到某个字符串匹配时应用一些更改,我遇到的问题是不同的文件具有不同的编码。所以我想做的是检查编码,然后用原始编码覆盖文件。
在 C# .net 2.0 中最漂亮的方式是什么?
到目前为止,我的代码看起来非常简单;
String f1 = File.ReadAllText(fileList[i]).ToLower();
if (f1.Contains(oPath))
{
f1 = f1.Replace(oPath, nPath);
File.WriteAllText(fileList[i], f1, Encoding.Unicode);
}
我查看了 C# 中的自动编码检测,这让我意识到我是如何可以检测编码,但我不确定如何使用该信息以相同的编码写入。
非常感谢这里的任何帮助。
I wrote a small program for iterating through a lot of files and applying some changes where a certain string match is found, the problem I have is that different files have different encodings. So what I would like to do is check the encoding, then overwrite the file in its original encoding.
What would be the prettiest way of doing that in C# .net 2.0?
My code looks very simple as of now;
String f1 = File.ReadAllText(fileList[i]).ToLower();
if (f1.Contains(oPath))
{
f1 = f1.Replace(oPath, nPath);
File.WriteAllText(fileList[i], f1, Encoding.Unicode);
}
I took a look at Auto encoding detect in C# which made me realize how I could detect encoding, but I am not sure how I could use that information to write in the same encoding.
Would greatly appreciate any help here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
不幸的是,编码是并不总是有明确答案的主题之一。在许多情况下,它更接近于猜测编码而不是检测编码。 Raymond Chen 就此主题发表了一篇出色的博客文章,值得一读
文章的要点是
但是我仍然认为最好的方法是达林在您链接的问题中提到的。让 StreamReader 为您猜测还是重新发明轮子。它只需要对您的样本进行非常小的修改。
Unfortunately encoding is one of those subjects where there is not always a definitive answer. In many cases it's much closer to guessing the encoding as opposed to detecting it. Raymond Chen did an excellent blog post on this subject that is worth the read
The gist of the article is
However I still think the best approach is to Darin mentioned in the question you linked. Let
StreamReader
guess for you vs. re-inventing the wheel. It only requires a very slight modification to your sample.默认情况下,.Net 使用 UTF8。很难检测字符编码,因为大多数情况下 .Net 将读取为 UTF8。我总是对 ANSI 有问题。
我的技巧是我将文件作为 Stream 读取,强制它读取为 UTF8 并检测文本中应包含的常用字符。如果找到,则为 UTF8,否则为 ANSI ...并告诉用户您可以仅使用 ANSI 或 UTF8 两种编码。自动检测在我的语言中不太有效:p
By default, .Net use UTF8. It is hard to detect character encoding becus most of the time .Net will read as UTF8. i alway have problem with ANSI.
my trick is i will read the file as Stream as force it to read as UTF8 and detect usual character that should be in text. If found, then UTF8 else ANSI ... and tell user u can use just 2 encoding either ANSI or UTF8. auto dectect not quite work in my language :p
恐怕您必须知道编码。对于基于 UTF 的编码,您可以使用
StreamReader
内置功能。取自此处。
I am afraid, you will have to know the encoding. For UTF based encodings though you can use
StreamReader
built in functionality though.Taken form here.
问题有点晚了,但我自己也遇到了同样的问题,使用前面的答案我找到了一个适合我的解决方案,它使用 StreamReaders 默认编码读取文本,提取该文件上使用的编码并使用 StreamWriter 将其写回使用找到的编码进行更改。还删除\重新添加只读标志
Prob a bit late but I encountered the same problem myself, using the previous answers I found a solution that works for me, It reads in the text using StreamReaders default encoding, extracts the encoding used on that file and uses StreamWriter to write it back with the changes using the found Encoding. Also removes\reAdds the ReadOnly flag
适合所有德国人的解决方案=> äÖÜäöüß
此函数打开文件并确定 BOM 的编码。
如果缺少 BOM,该文件将被解释为 ANSI,但如果其中包含 UTF8 编码的德语变音符号,则会被检测为 UTF8。
https://stackoverflow.com/a/69312696/9134997
The solution for all Germans => ÄÖÜäöüß
This function opens the file an determines the Encoding by the BOM.
If the BOM is missing the file will be interpreted as ANSI, but if there are UTF8 encoded German Umlaute in it, it will be detected as UTF8.
https://stackoverflow.com/a/69312696/9134997