如何编码'á'到'á'用 C# ?? (UTF8)

发布于 2024-09-04 18:47:01 字数 304 浏览 5 评论 0原文

我正在尝试使用 UTF-8 编码编写一个 XML 文件,并且原始字符串可能包含无效字符,例如“á”,因此,我需要将这些无效字符更改为有效字符。

我知道有一种编码方法,例如,将字符 á 转换为字符组 á

我正在尝试用 C# 实现这一目标,但没有成功。我正在使用 Encoding.UTF8 函数,但我仅以 sema 字符(即:á)或“?”结尾特点。

那么,您知道使用 C# 实现此字符更改的正确方法吗?

感谢您的时间和帮助:)

LLORENS

I'm trying to write an XML file with UTF-8 encode, and the original string can have invalid characters like 'á', so, i need to change these invalid characters to a valid ones.

I know that there is an encoding method that take, for example, character á and transform it to group of characters á.

I am trying to achive this with C#but i have no succes on it. I am using Encoding.UTF8 functions but i only end with the sema character (i.e: á) or a '?' character.

So, do you know with is the correct way to achive this character change with C# ??

Thanks for your time and help :)

LLORENS

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

许久 2024-09-11 18:47:01

您可以使用任何一种方法。

以下是在 C# 中对 XML 进行编码的 4 种方法:

  1. string.Replace() 5 次

这很丑陋,但很有效。请注意,Replace("&", "&") 必须是第一个替换,因此我们不会替换其他已转义的 &。

string xml = "<node>it's my \"node\" & i like it<node>";
encodedXml = xml.Replace("&","&").Replace("<","<").Replace(">",">").Replace("\"", """).Replace("'", "'");

// RESULT: <node>it's my "node" & i like it<node>
  1. System.Web.HttpUtility.HtmlEncode()

用于编码 HTML,但 HTML 是 XML 的一种形式,因此我们也可以使用它。主要用于 ASP.NET 应用程序。请注意,HtmlEncode 不编码撇号 ( ' )。

string xml = "<node>it's my \"node\" & i like it<node>";
string encodedXml = HttpUtility.HtmlEncode(xml);

// RESULT: <node>it's my "node" & i like it<node>
  1. System.Security.SecurityElement.Escape()

在 Windows 窗体或控制台应用程序中,我使用此方法。如果不出意外的话,它可以帮助我将 System.Web 引用包含在我的项目中,并且它对所有 5 个字符进行编码。

string xml = "<node>it's my \"node\" & i like it<node>";
string encodedXml = System.Security.SecurityElement.Escape(xml);

// RESULT: <node>it's my "node" & i like it<node>
  1. System.Xml.XmlTextWriter

使用 XmlTextWriter,您不必担心转义任何内容,因为它会在需要时转义字符。例如,在属性中,它不会转义撇号,而在节点值中,它不会转义撇号和 qoutes。

string xml = "<node>it's my \"node\" & i like it<node>";
using (XmlTextWriter xtw = new XmlTextWriter(@"c:\xmlTest.xml", Encoding.Unicode))
{

    xtw.WriteStartElement("xmlEncodeTest");
    xtw.WriteAttributeString("testAttribute", xml);
    xtw.WriteString(xml);
    xtw.WriteEndElement();

}

// RESULT:
/*
<xmlEncodeTest testAttribute="<node>it's my "node" & i like it<node>">
    <node>it's my "node" & i like it<node>
</xmlEncodeTest>
*/

[http://weblogs.sqlteam.com/mladenp/archive/2008/10/21/Different-ways-how-to-escape-an-XML-string-in-C.aspx]

You can use any one method.

Here are 4 ways you can encode XML in C#:

  1. string.Replace() 5 times

This is ugly but it works. Note that Replace("&", "&") has to be the first replace so we don't replace other already escaped &.

string xml = "<node>it's my \"node\" & i like it<node>";
encodedXml = xml.Replace("&","&").Replace("<","<").Replace(">",">").Replace("\"", """).Replace("'", "'");

// RESULT: <node>it's my "node" & i like it<node>
  1. System.Web.HttpUtility.HtmlEncode()

Used for encoding HTML, but HTML is a form of XML so we can use that too. Mostly used in ASP.NET apps. Note that HtmlEncode does NOT encode apostrophes ( ' ).

string xml = "<node>it's my \"node\" & i like it<node>";
string encodedXml = HttpUtility.HtmlEncode(xml);

// RESULT: <node>it's my "node" & i like it<node>
  1. System.Security.SecurityElement.Escape()

In Windows Forms or Console apps I use this method. If nothing else it saves me including the System.Web reference in my projects and it encodes all 5 chars.

string xml = "<node>it's my \"node\" & i like it<node>";
string encodedXml = System.Security.SecurityElement.Escape(xml);

// RESULT: <node>it's my "node" & i like it<node>
  1. System.Xml.XmlTextWriter

Using XmlTextWriter you don't have to worry about escaping anything since it escapes the chars where needed. For example in the attributes it doesn't escape apostrophes, while in node values it doesn't escape apostrophes and qoutes.

string xml = "<node>it's my \"node\" & i like it<node>";
using (XmlTextWriter xtw = new XmlTextWriter(@"c:\xmlTest.xml", Encoding.Unicode))
{

    xtw.WriteStartElement("xmlEncodeTest");
    xtw.WriteAttributeString("testAttribute", xml);
    xtw.WriteString(xml);
    xtw.WriteEndElement();

}

// RESULT:
/*
<xmlEncodeTest testAttribute="<node>it's my "node" & i like it<node>">
    <node>it's my "node" & i like it<node>
</xmlEncodeTest>
*/

[http://weblogs.sqlteam.com/mladenp/archive/2008/10/21/Different-ways-how-to-escape-an-XML-string-in-C.aspx]

乖不如嘢 2024-09-11 18:47:01

á 不是“无效”字符。它采用 UTF-8 编码(字节 195 和 161),Nick 是对的,如果您正确构建所有内容,这将是透明的。

á is not an "invalid" character. It has a UTF-8 encoding (bytes 195 and 161), and Nick is right that if you construct everything correctly this will be transparent.

白云悠悠 2024-09-11 18:47:01
    private static string Escape(string content)
    {
        var sb = new StringBuilder();
        var settings = new XmlWriterSettings 
        { 
            ConformanceLevel = ConformanceLevel.Fragment 
        };

        using (var xmlWriter = XmlWriter.Create(sb, settings))
            xmlWriter.WriteString(content);

        return sb.ToString();
    }
    private static string Escape(string content)
    {
        var sb = new StringBuilder();
        var settings = new XmlWriterSettings 
        { 
            ConformanceLevel = ConformanceLevel.Fragment 
        };

        using (var xmlWriter = XmlWriter.Create(sb, settings))
            xmlWriter.WriteString(content);

        return sb.ToString();
    }
忆梦 2024-09-11 18:47:01

这正是您所需要的:
(位于 http://www.codeproject.com/文章/20255/Full-HTML-Character-Encoding-in-C)

//for example this transforms "čas" to "čas"
public static string HtmlEncode(string text)
    {
        char[] chars = HttpUtility.HtmlEncode(text).ToCharArray();
        StringBuilder result = new StringBuilder(text.Length + (int)(text.Length * 0.1));

        foreach (char c in chars)
        {
            int value = Convert.ToInt32(c);
            if (value > 127)
                result.AppendFormat("&#{0};", value);
            else
                result.Append(c);
        }

        return result.ToString();
    }

This is exactly what you need:
(found at http://www.codeproject.com/Articles/20255/Full-HTML-Character-Encoding-in-C)

//for example this transforms "čas" to "čas"
public static string HtmlEncode(string text)
    {
        char[] chars = HttpUtility.HtmlEncode(text).ToCharArray();
        StringBuilder result = new StringBuilder(text.Length + (int)(text.Length * 0.1));

        foreach (char c in chars)
        {
            int value = Convert.ToInt32(c);
            if (value > 127)
                result.AppendFormat("&#{0};", value);
            else
                result.Append(c);
        }

        return result.ToString();
    }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文