C# - 检测文件中的编码,使用找到的编码将更改写入文件

发布于 2024-10-07 01:41:07 字数 526 浏览 0 评论 0原文

我编写了一个小程序,用于迭代大量文件并在找到某个字符串匹配时应用一些更改,我遇到的问题是不同的文件具有不同的编码。所以我想做的是检查编码,然后用原始编码覆盖文件。

在 C# .net 2.0 中最漂亮的方式是什么?

到目前为止,我的代码看起来非常简单;

String f1 = File.ReadAllText(fileList[i]).ToLower();

if (f1.Contains(oPath))
{
    f1 = f1.Replace(oPath, nPath);
    File.WriteAllText(fileList[i], f1, Encoding.Unicode);
}

我查看了 C# 中的自动编码检测,这让我意识到我是如何可以检测编码,但我不确定如何使用该信息以相同的编码写入。

非常感谢这里的任何帮助。

I wrote a small program for iterating through a lot of files and applying some changes where a certain string match is found, the problem I have is that different files have different encodings. So what I would like to do is check the encoding, then overwrite the file in its original encoding.

What would be the prettiest way of doing that in C# .net 2.0?

My code looks very simple as of now;

String f1 = File.ReadAllText(fileList[i]).ToLower();

if (f1.Contains(oPath))
{
    f1 = f1.Replace(oPath, nPath);
    File.WriteAllText(fileList[i], f1, Encoding.Unicode);
}

I took a look at Auto encoding detect in C# which made me realize how I could detect encoding, but I am not sure how I could use that information to write in the same encoding.

Would greatly appreciate any help here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

笑脸一如从前 2024-10-14 01:41:07

不幸的是,编码是并不总是有明确答案的主题之一。在许多情况下,它更接近于猜测编码而不是检测编码。 Raymond Chen 就此主题发表了一篇出色的博客文章,值得一读

文章的要点是

  • 如果 BOM(字节顺序标记)存在,那么您黄金
  • 否则它是猜测工作和启发式

但是我仍然认为最好的方法是达林在您链接的问题中提到的。让 StreamReader 为您猜测还是重新发明轮子。它只需要对您的样本进行非常小的修改。

String f1;
Encoding encoding;
using (var reader = new StreamReader(fileList[i])) {
  f1 = reader.ReadToEnd().ToLower();
  encoding = reader.CurrentEncoding;
}

if (f1.Contains(oPath))
{
  f1 = f1.Replace(oPath, nPath);
  File.WriteAllText(fileList[i], f1, encoding);
}

Unfortunately encoding is one of those subjects where there is not always a definitive answer. In many cases it's much closer to guessing the encoding as opposed to detecting it. Raymond Chen did an excellent blog post on this subject that is worth the read

The gist of the article is

  • If the BOM (byte order marker) exists then you're golden
  • Else it's guess work and heuristics

However I still think the best approach is to Darin mentioned in the question you linked. Let StreamReader guess for you vs. re-inventing the wheel. It only requires a very slight modification to your sample.

String f1;
Encoding encoding;
using (var reader = new StreamReader(fileList[i])) {
  f1 = reader.ReadToEnd().ToLower();
  encoding = reader.CurrentEncoding;
}

if (f1.Contains(oPath))
{
  f1 = f1.Replace(oPath, nPath);
  File.WriteAllText(fileList[i], f1, encoding);
}
孤千羽 2024-10-14 01:41:07

默认情况下,.Net 使用 UTF8。很难检测字符编码,因为大多数情况下 .Net 将读取为 UTF8。我总是对 ANSI 有问题。

我的技巧是我将文件作为 Stream 读取,强制它读取为 UTF8 并检测文本中应包含的常用字符。如果找到,则为 UTF8,否则为 ANSI ...并告诉用户您可以仅使用 ANSI 或 UTF8 两种编码。自动检测在我的语言中不太有效:p

By default, .Net use UTF8. It is hard to detect character encoding becus most of the time .Net will read as UTF8. i alway have problem with ANSI.

my trick is i will read the file as Stream as force it to read as UTF8 and detect usual character that should be in text. If found, then UTF8 else ANSI ... and tell user u can use just 2 encoding either ANSI or UTF8. auto dectect not quite work in my language :p

苍风燃霜 2024-10-14 01:41:07

恐怕您必须知道编码。对于基于 UTF 的编码,您可以使用 StreamReader 内置功能。

取自此处

关于编码 - 你会
需要识别编码
为了使用 StreamReader。

但是,StreamReader 本身可以
如果您使用其中之一创建它,则会有所帮助
构造函数重载可以让你
供应旗帜
检测EncodingFromByteOrder标记为
true (或者你可以使用
Encoding.GetPreamble 并查看
字节序言自己)。

这两种方法只会有帮助
自动检测基于 UTF 的编码
- 因此任何具有指定代码页的 ANSI 编码可能不会
正确解析。

I am afraid, you will have to know the encoding. For UTF based encodings though you can use StreamReader built in functionality though.

Taken form here.

With regard to encodings - you will
need to have identified the encoding
in order to use the StreamReader.

However, the StreamReader itself can
help if you create it with one of the
constructor overloads that allows you
to supply the flag
detectEncodingFromByteOrderMarks as
true (or you can use
Encoding.GetPreamble and look at the
byte preamble yourself).

Both these methods will only help
auto-detect UTF based encodings though
- so any ANSI encodings with a specified codepage will probably not
be parsed correctly.

好多鱼好多余 2024-10-14 01:41:07

问题有点晚了,但我自己也遇到了同样的问题,使用前面的答案我找到了一个适合我的解决方案,它使用 StreamReaders 默认编码读取文本,提取该文件上使用的编码并使用 StreamWriter 将其写回使用找到的编码进行更改。还删除\重新添加只读标志

        string file = "File to open";
        string text;
        Encoding encoding;
        string oldValue = "string to be replaced";
        string replacementValue = "New string";

        var attributes = File.GetAttributes(file);
        File.SetAttributes(file, attributes & ~FileAttributes.ReadOnly);

        using (StreamReader reader = new StreamReader(file, Encoding.Default))
        {
            text = reader.ReadToEnd();
            encoding = reader.CurrentEncoding;
            reader.Close();
        }

        bool changedValue = false;
        if (text.Contains(oldValue))
        {
            text = text.Replace(oldValue, replacementValue);
            changedValue = true;
        }

        if (changedValue)
        {
            using (StreamWriter write = new StreamWriter(file, false, encoding))
            {
                write.Write(text.ToString());
                write.Close();
            }
            File.SetAttributes(file, attributes | FileAttributes.ReadOnly);
        }

Prob a bit late but I encountered the same problem myself, using the previous answers I found a solution that works for me, It reads in the text using StreamReaders default encoding, extracts the encoding used on that file and uses StreamWriter to write it back with the changes using the found Encoding. Also removes\reAdds the ReadOnly flag

        string file = "File to open";
        string text;
        Encoding encoding;
        string oldValue = "string to be replaced";
        string replacementValue = "New string";

        var attributes = File.GetAttributes(file);
        File.SetAttributes(file, attributes & ~FileAttributes.ReadOnly);

        using (StreamReader reader = new StreamReader(file, Encoding.Default))
        {
            text = reader.ReadToEnd();
            encoding = reader.CurrentEncoding;
            reader.Close();
        }

        bool changedValue = false;
        if (text.Contains(oldValue))
        {
            text = text.Replace(oldValue, replacementValue);
            changedValue = true;
        }

        if (changedValue)
        {
            using (StreamWriter write = new StreamWriter(file, false, encoding))
            {
                write.Write(text.ToString());
                write.Close();
            }
            File.SetAttributes(file, attributes | FileAttributes.ReadOnly);
        }
陌伤ぢ 2024-10-14 01:41:07

适合所有德国人的解决方案=> äÖÜäöüß

此函数打开文件并确定 BOM 的编码。
如果缺少 BOM,该文件将被解释为 ANSI,但如果其中包含 UTF8 编码的德语变音符号,则会被检测为 UTF8。

https://stackoverflow.com/a/69312696/9134997

The solution for all Germans => ÄÖÜäöüß

This function opens the file an determines the Encoding by the BOM.
If the BOM is missing the file will be interpreted as ANSI, but if there are UTF8 encoded German Umlaute in it, it will be detected as UTF8.

https://stackoverflow.com/a/69312696/9134997

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文