Xdocument - 如何转换非 html 安全字符

发布于 2024-10-04 00:39:24 字数 461 浏览 2 评论 0原文

我的 UTF-8 xml 元素内有一个“title”属性,例如

<tag title="This is some test with special chars §£" />

,因为我希望该属性的内容直接打印在 HTML 页面中,所以我尝试获得如下输出:

<tag title="This is some test with special chars &#x00a7;&#x00a3;" />

我添加的代码片段有属性看起来像这样:

new XElement( "tag",
    new XAttribute( "title" , title)
);

字符如 &和 " 被转义,但 §£ 没有 - 因为它们是有效的 utf-8 字符。 我应该改变什么?

I have a "title" attribute inside elements of my UTF-8 xml, e.g.

<tag title="This is some test with special chars §£" />

as I want the content of this attribute to be printed directly in an HTML page, I'm trying to have an output like:

<tag title="This is some test with special chars §£" />

The code fragment where I add there attribute looks like this:

new XElement( "tag",
    new XAttribute( "title" , title)
);

Characters such as & and " are escaped, but §£ are not - as they're valid utf-8 characters.
What should I change?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

番薯 2024-10-11 00:39:24

如果页面声明为 UTF-8,则 HTML 中支持 UTF-8 字符。

您应该始终指定编码
用于 HTML 或 XML 页面。如果你
不,你会冒这个角色的风险
您的内容不正确
解释了。这不仅仅是一个问题
人类可读性,越来越
机器需要理解你的数据
也。您还应该检查您是否
没有指定不同的编码
在不同的地方。

如果页面的默认编码是范围较小的字符集,则它将无法正确呈现所有 UTF-8 字符。但是,如果文档声明为 UTF-8,它们应该可以正常显示。

您可能需要显式声明您的页面为 UTF-8。

有多种方法可以执行此操作:

UTF-8 characters are supported in HTML, if the page is declared as UTF-8.

You should always specify the encoding
used for an HTML or XML page. If you
don't, you risk that characters in
your content are incorrectly
interpreted. This is not just an issue
of human readability, increasingly
machines need to understand your data
too. You should also check that you
are not specifying different encodings
in different places.

If the default encoding for the page is a character set with a smaller range, then it will not render all of the UTF-8 characters properly. However, if the document is declared as UTF-8 they should display fine.

Rather than replacing characters with entity references, you may need to explicitly declare the encoding of your page as UTF-8.

There are a variety of ways to do this:

  • <meta charset="UTF-8">
  • <meta http-equiv="Content-type" content="text/html;charset=UTF-8">
  • <?xml version="1.0" encoding="UTF-8"?>
残月升风 2024-10-11 00:39:24

也许您可以手动解码这些字符。我以前用过这个

 Dictionary<string, char> HTMLSymbolMap = new Dictionary<string, char>()
        {
            {"–",'–'},
            {"—",'—'},
            {"‘",'‘'},
            {"’",'’'},
            {"‚",'‚'},
            {"“",'“'},
            {"”",'”'},
            {"•",'•'},
            {"·",'·'},
            {"„",'„'},                
            {"£",'£'},
            {"§",'§'},

        };

   public string CleanJunk(string docText)
    {


        foreach (var kv in HTMLSymbolMap)
        {
            docText = docText.Replace(kv.value.tostring(), kv.key);
        }

        return docText;

    }

请参阅此 HTMLSymbol 表 了解更多信息

May be you can manually decode those characters. I have used this before

 Dictionary<string, char> HTMLSymbolMap = new Dictionary<string, char>()
        {
            {"–",'–'},
            {"—",'—'},
            {"‘",'‘'},
            {"’",'’'},
            {"‚",'‚'},
            {"“",'“'},
            {"”",'”'},
            {"•",'•'},
            {"·",'·'},
            {"„",'„'},                
            {"£",'£'},
            {"§",'§'},

        };

   public string CleanJunk(string docText)
    {


        foreach (var kv in HTMLSymbolMap)
        {
            docText = docText.Replace(kv.value.tostring(), kv.key);
        }

        return docText;

    }

Refer this HTMLSymbol table for more info

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文