C# - 检测文件中的编码，使用找到的编码将更改写入文件

发布于 2024-10-07 01:41:07 字数 526 浏览 0 评论 0原文

我编写了一个小程序，用于迭代大量文件并在找到某个字符串匹配时应用一些更改，我遇到的问题是不同的文件具有不同的编码。所以我想做的是检查编码，然后用原始编码覆盖文件。

在 C# .net 2.0 中最漂亮的方式是什么？

到目前为止，我的代码看起来非常简单；

String f1 = File.ReadAllText(fileList[i]).ToLower();

if (f1.Contains(oPath))
{
    f1 = f1.Replace(oPath, nPath);
    File.WriteAllText(fileList[i], f1, Encoding.Unicode);
}

我查看了 C# 中的自动编码检测，这让我意识到我是如何可以检测编码，但我不确定如何使用该信息以相同的编码写入。

非常感谢这里的任何帮助。

原文

I wrote a small program for iterating through a lot of files and applying some changes where a certain string match is found, the problem I have is that different files have different encodings. So what I would like to do is check the encoding, then overwrite the file in its original encoding.

What would be the prettiest way of doing that in C# .net 2.0?

My code looks very simple as of now;

String f1 = File.ReadAllText(fileList[i]).ToLower();

if (f1.Contains(oPath))
{
    f1 = f1.Replace(oPath, nPath);
    File.WriteAllText(fileList[i], f1, Encoding.Unicode);
}

I took a look at Auto encoding detect in C# which made me realize how I could detect encoding, but I am not sure how I could use that information to write in the same encoding.

Would greatly appreciate any help here.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

笑脸一如从前 2024-10-14 01:41:07

不幸的是，编码是并不总是有明确答案的主题之一。在许多情况下，它更接近于猜测编码而不是检测编码。 Raymond Chen 就此主题发表了一篇出色的博客文章，值得一读

http://blogs.msdn.com/b/oldnewthing/archive/2007/04/17/2158334.aspx

文章的要点是

如果 BOM（字节顺序标记）存在，那么您黄金
否则它是猜测工作和启发式

但是我仍然认为最好的方法是达林在您链接的问题中提到的。让 StreamReader 为您猜测还是重新发明轮子。它只需要对您的样本进行非常小的修改。

String f1;
Encoding encoding;
using (var reader = new StreamReader(fileList[i])) {
  f1 = reader.ReadToEnd().ToLower();
  encoding = reader.CurrentEncoding;
}

if (f1.Contains(oPath))
{
  f1 = f1.Replace(oPath, nPath);
  File.WriteAllText(fileList[i], f1, encoding);
}

Unfortunately encoding is one of those subjects where there is not always a definitive answer. In many cases it's much closer to guessing the encoding as opposed to detecting it. Raymond Chen did an excellent blog post on this subject that is worth the read

http://blogs.msdn.com/b/oldnewthing/archive/2007/04/17/2158334.aspx

The gist of the article is

If the BOM (byte order marker) exists then you're golden
Else it's guess work and heuristics

However I still think the best approach is to Darin mentioned in the question you linked. Let StreamReader guess for you vs. re-inventing the wheel. It only requires a very slight modification to your sample.

String f1;
Encoding encoding;
using (var reader = new StreamReader(fileList[i])) {
  f1 = reader.ReadToEnd().ToLower();
  encoding = reader.CurrentEncoding;
}

if (f1.Contains(oPath))
{
  f1 = f1.Replace(oPath, nPath);
  File.WriteAllText(fileList[i], f1, encoding);
}

回复收藏 0 原文

孤千羽 2024-10-14 01:41:07

默认情况下，.Net 使用 UTF8。很难检测字符编码，因为大多数情况下 .Net 将读取为 UTF8。我总是对 ANSI 有问题。

我的技巧是我将文件作为 Stream 读取，强制它读取为 UTF8 并检测文本中应包含的常用字符。如果找到，则为 UTF8，否则为 ANSI ...并告诉用户您可以仅使用 ANSI 或 UTF8 两种编码。自动检测在我的语言中不太有效：p

回复收藏 0 原文

苍风燃霜 2024-10-14 01:41:07

恐怕您必须知道编码。对于基于 UTF 的编码，您可以使用 StreamReader 内置功能。

取自此处。

关于编码 - 你会
需要识别编码
为了使用 StreamReader。
但是，StreamReader 本身可以
如果您使用其中之一创建它，则会有所帮助
构造函数重载可以让你
供应旗帜
检测EncodingFromByteOrder标记为
true （或者你可以使用
Encoding.GetPreamble 并查看
字节序言自己）。
这两种方法只会有帮助
自动检测基于 UTF 的编码
- 因此任何具有指定代码页的 ANSI 编码可能不会
正确解析。

回复收藏 0 原文

好多鱼好多余 2024-10-14 01:41:07

问题有点晚了，但我自己也遇到了同样的问题，使用前面的答案我找到了一个适合我的解决方案，它使用 StreamReaders 默认编码读取文本，提取该文件上使用的编码并使用 StreamWriter 将其写回使用找到的编码进行更改。还删除\重新添加只读标志

        string file = "File to open";
        string text;
        Encoding encoding;
        string oldValue = "string to be replaced";
        string replacementValue = "New string";

        var attributes = File.GetAttributes(file);
        File.SetAttributes(file, attributes & ~FileAttributes.ReadOnly);

        using (StreamReader reader = new StreamReader(file, Encoding.Default))
        {
            text = reader.ReadToEnd();
            encoding = reader.CurrentEncoding;
            reader.Close();
        }

        bool changedValue = false;
        if (text.Contains(oldValue))
        {
            text = text.Replace(oldValue, replacementValue);
            changedValue = true;
        }

        if (changedValue)
        {
            using (StreamWriter write = new StreamWriter(file, false, encoding))
            {
                write.Write(text.ToString());
                write.Close();
            }
            File.SetAttributes(file, attributes | FileAttributes.ReadOnly);
        }

Prob a bit late but I encountered the same problem myself, using the previous answers I found a solution that works for me, It reads in the text using StreamReaders default encoding, extracts the encoding used on that file and uses StreamWriter to write it back with the changes using the found Encoding. Also removes\reAdds the ReadOnly flag

        string file = "File to open";
        string text;
        Encoding encoding;
        string oldValue = "string to be replaced";
        string replacementValue = "New string";

        var attributes = File.GetAttributes(file);
        File.SetAttributes(file, attributes & ~FileAttributes.ReadOnly);

        using (StreamReader reader = new StreamReader(file, Encoding.Default))
        {
            text = reader.ReadToEnd();
            encoding = reader.CurrentEncoding;
            reader.Close();
        }

        bool changedValue = false;
        if (text.Contains(oldValue))
        {
            text = text.Replace(oldValue, replacementValue);
            changedValue = true;
        }

        if (changedValue)
        {
            using (StreamWriter write = new StreamWriter(file, false, encoding))
            {
                write.Write(text.ToString());
                write.Close();
            }
            File.SetAttributes(file, attributes | FileAttributes.ReadOnly);
        }

回复收藏 0 原文