如何使用 XmlWriter 将编码属性添加到除 utf-16 之外的 xml 中?

发布于 2024-07-12 01:05:11 字数 1030 浏览 7 评论 0原文

我有一个创建一些 XmlDocument 的函数:

public string CreateOutputXmlString(ICollection<Field> fields)
{
    XmlWriterSettings settings = new XmlWriterSettings();
    settings.Indent = true;
    settings.Encoding = Encoding.GetEncoding("windows-1250");

    StringBuilder builder = new StringBuilder();
    XmlWriter writer = XmlWriter.Create(builder, settings);

    writer.WriteStartDocument();
    writer.WriteStartElement("data");
    foreach (Field field in fields)
    {
        writer.WriteStartElement("item");
        writer.WriteAttributeString("name", field.Id);
        writer.WriteAttributeString("value", field.Value);
        writer.WriteEndElement();
    }
    writer.WriteEndElement();
    writer.Flush();
    writer.Close();

    return builder.ToString();
}

我设置了编码,但在创建 XmlWriter 之后,它确实具有 utf-16 编码。 我知道这是因为字符串(我想是 StringBuilder)是用 utf-16 编码的,你无法更改它。
那么如何轻松创建这个将编码属性设置为“windows-1250”的 xml? 它甚至不必以这种编码方式进行编码,它只需要具有指定的属性即可。

编辑:它必须位于 .Net 2.0 中,因此无法使用任何新的框架元素。

I've got a function creating some XmlDocument:

public string CreateOutputXmlString(ICollection<Field> fields)
{
    XmlWriterSettings settings = new XmlWriterSettings();
    settings.Indent = true;
    settings.Encoding = Encoding.GetEncoding("windows-1250");

    StringBuilder builder = new StringBuilder();
    XmlWriter writer = XmlWriter.Create(builder, settings);

    writer.WriteStartDocument();
    writer.WriteStartElement("data");
    foreach (Field field in fields)
    {
        writer.WriteStartElement("item");
        writer.WriteAttributeString("name", field.Id);
        writer.WriteAttributeString("value", field.Value);
        writer.WriteEndElement();
    }
    writer.WriteEndElement();
    writer.Flush();
    writer.Close();

    return builder.ToString();
}

I set an encoding but after i create XmlWriter it does have utf-16 encoding. I know it's because strings (and StringBuilder i suppose) are encoded in utf-16 and you can't change it.
So how can I easily create this xml with the encoding attribute set to "windows-1250"? it doesn't even have to be encoded in this encoding, it just has to have the specified attribute.

edit: it has to be in .Net 2.0 so any new framework elements cannot be used.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

静若繁花 2024-07-19 01:05:11

您需要使用具有适当编码的 StringWriter。 不幸的是 StringWriter 不允许您直接指定编码,因此您需要这样的类:

public sealed class StringWriterWithEncoding : StringWriter
{
    private readonly Encoding encoding;

    public StringWriterWithEncoding (Encoding encoding)
    {
        this.encoding = encoding;
    }

    public override Encoding Encoding
    {
        get { return encoding; }
    }
}

(这个问题类似,但不完全重复。)

编辑:要回答评论:将 StringWriterWithEncoding 传递给 XmlWriter.Create 而不是 StringBuilder,然后在最后调用 ToString() 。

You need to use a StringWriter with the appropriate encoding. Unfortunately StringWriter doesn't let you specify the encoding directly, so you need a class like this:

public sealed class StringWriterWithEncoding : StringWriter
{
    private readonly Encoding encoding;

    public StringWriterWithEncoding (Encoding encoding)
    {
        this.encoding = encoding;
    }

    public override Encoding Encoding
    {
        get { return encoding; }
    }
}

(This question is similar but not quite a duplicate.)

EDIT: To answer the comment: pass the StringWriterWithEncoding to XmlWriter.Create instead of the StringBuilder, then call ToString() on it at the end.

缱绻入梦 2024-07-19 01:05:11

只是对为什么会这样的一些额外解释。

字符串是字符序列,而不是字节。 字符串本身并没有“编码”,因为它们使用的是存储为 Unicode 代码点的字符。 编码在字符串级别没有意义。

编码是从代码点(字符)序列到字节序列(用于存储在基于字节的系统(如文件系统或内存))的映射。 该框架不允许您指定编码,除非有令人信服的理由,例如使 16 位代码点适合基于字节的存储。

因此,当您尝试将 XML 写入 StringBuilder 时,您实际上是在构建 XML 字符序列并将它们写为字符序列,因此不会执行任何编码。 因此,没有 Encoding 字段。

如果要使用编码,XmlWriter 必须写入流。

关于您通过 MemoryStream 找到的解决方案,无意冒犯,但它只是在手臂上拍打并移动热空气。 您使用“windows-1252”对代码点进行编码,然后将其解析回代码点。 唯一可能发生的变化是 windows-1252 中未定义的字符被转换为“?” 过程中的性格。

对我来说,正确的解决方案可能是以下一种。 根据函数的用途,您可以将 Stream 作为参数传递给函数,以便调用者决定是否应将其写入内存或文件。 所以它会写成这样:


        public static void WriteFieldsAsXmlDocument(ICollection fields, Stream outStream)
        {
            XmlWriterSettings settings = new XmlWriterSettings();
            settings.Indent = true;
            settings.Encoding = Encoding.GetEncoding("windows-1250");

            using(XmlWriter writer = XmlWriter.Create(outStream, settings)) {
                writer.WriteStartDocument();
                writer.WriteStartElement("data");
                foreach (Field field in fields)
                {
                    writer.WriteStartElement("item");
                    writer.WriteAttributeString("name", field.Id);
                    writer.WriteAttributeString("value", field.Value);
                    writer.WriteEndElement();
                }
                writer.WriteEndElement();
            }
        }

Just some extra explanations to why this is so.

Strings are sequences of characters, not bytes. Strings, per se, are not "encoded", because they are using characters, which are stored as Unicode codepoints. Encoding DOES NOT MAKE SENSE at String level.

An encoding is a mapping from a sequence of codepoints (characters) to a sequence of bytes (for storage on byte-based systems like filesystems or memory). The framework does not let you specify encodings, unless there is a compelling reason to, like to make 16-bit codepoints fit on byte-based storage.

So when you're trying to write your XML into a StringBuilder, you're actually building an XML sequence of characters and writing them as a sequence of characters, so no encoding is performed. Therefore, no Encoding field.

If you want to use an encoding, the XmlWriter has to write to a Stream.

About the solution that you found with the MemoryStream, no offense intended, but it's just flapping around arms and moving hot air. You're encoding your codepoints with 'windows-1252', and then parsing it back to codepoints. The only change that may occur is that characters not defined in windows-1252 get converted to a '?' character in the process.

To me, the right solution might be the following one. Depending on what your function is used for, you could pass a Stream as a parameter to your function, so that the caller decides whether it should be written to memory or to a file. So it would be written like this:


        public static void WriteFieldsAsXmlDocument(ICollection fields, Stream outStream)
        {
            XmlWriterSettings settings = new XmlWriterSettings();
            settings.Indent = true;
            settings.Encoding = Encoding.GetEncoding("windows-1250");

            using(XmlWriter writer = XmlWriter.Create(outStream, settings)) {
                writer.WriteStartDocument();
                writer.WriteStartElement("data");
                foreach (Field field in fields)
                {
                    writer.WriteStartElement("item");
                    writer.WriteAttributeString("name", field.Id);
                    writer.WriteAttributeString("value", field.Value);
                    writer.WriteEndElement();
                }
                writer.WriteEndElement();
            }
        }
阳光的暖冬 2024-07-19 01:05:11
MemoryStream memoryStream = new MemoryStream();
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings();
xmlWriterSettings.Encoding = Encoding.UTF8;

XmlWriter xmlWriter = XmlWriter.Create(memoryStream, xmlWriterSettings);
xmlWriter.WriteStartDocument();
xmlWriter.WriteStartElement("root", "http://www.timvw.be/ns");
xmlWriter.WriteEndElement();
xmlWriter.WriteEndDocument();
xmlWriter.Flush();
xmlWriter.Close();

string xmlString = Encoding.UTF8.GetString(memoryStream.ToArray());

从这里

MemoryStream memoryStream = new MemoryStream();
XmlWriterSettings xmlWriterSettings = new XmlWriterSettings();
xmlWriterSettings.Encoding = Encoding.UTF8;

XmlWriter xmlWriter = XmlWriter.Create(memoryStream, xmlWriterSettings);
xmlWriter.WriteStartDocument();
xmlWriter.WriteStartElement("root", "http://www.timvw.be/ns");
xmlWriter.WriteEndElement();
xmlWriter.WriteEndDocument();
xmlWriter.Flush();
xmlWriter.Close();

string xmlString = Encoding.UTF8.GetString(memoryStream.ToArray());

From here

嘦怹 2024-07-19 01:05:11

我实际上用 MemoryStream 解决了这个问题:

public static string CreateOutputXmlString(ICollection<Field> fields)
        {
            XmlWriterSettings settings = new XmlWriterSettings();
            settings.Indent = true;
            settings.Encoding = Encoding.GetEncoding("windows-1250");

            MemoryStream memStream = new MemoryStream();
            XmlWriter writer = XmlWriter.Create(memStream, settings);

            writer.WriteStartDocument();
            writer.WriteStartElement("data");
            foreach (Field field in fields)
            {
                writer.WriteStartElement("item");
                writer.WriteAttributeString("name", field.Id);
                writer.WriteAttributeString("value", field.Value);
                writer.WriteEndElement();
            }
            writer.WriteEndElement();
            writer.Flush();
            writer.Close();

            writer.Flush();
            writer.Close();

            string xml = Encoding.GetEncoding("windows-1250").GetString(memStream.ToArray());

            memStream.Close();
            memStream.Dispose();

            return xml;
        }

I actually solved the problem with MemoryStream:

public static string CreateOutputXmlString(ICollection<Field> fields)
        {
            XmlWriterSettings settings = new XmlWriterSettings();
            settings.Indent = true;
            settings.Encoding = Encoding.GetEncoding("windows-1250");

            MemoryStream memStream = new MemoryStream();
            XmlWriter writer = XmlWriter.Create(memStream, settings);

            writer.WriteStartDocument();
            writer.WriteStartElement("data");
            foreach (Field field in fields)
            {
                writer.WriteStartElement("item");
                writer.WriteAttributeString("name", field.Id);
                writer.WriteAttributeString("value", field.Value);
                writer.WriteEndElement();
            }
            writer.WriteEndElement();
            writer.Flush();
            writer.Close();

            writer.Flush();
            writer.Close();

            string xml = Encoding.GetEncoding("windows-1250").GetString(memStream.ToArray());

            memStream.Close();
            memStream.Dispose();

            return xml;
        }
温柔戏命师 2024-07-19 01:05:11

我通过将字符串输出到变量然后用 utf-8 替换对 utf-16 的任何引用来解决我的问题(我的应用程序需要 UTF8 编码)。 由于您使用的是函数,因此您可以执行类似的操作。 我主要使用 VB.net,但我认为 C# 看起来像这样。

return builder.ToString().Replace("utf-16", "utf-8");

I solved mine by outputting the string to a variable then replacing any references to utf-16 with utf-8 (my app needed UTF8 encoding). Since you're using a function, you could do something similar. I use VB.net mostly, but I think the C# would look something like this.

return builder.ToString().Replace("utf-16", "utf-8");
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文