如果尚未编码,如何对&符号进行编码?

发布于 2024-12-09 13:52:44 字数 426 浏览 0 评论 0原文

我需要 ac# 方法来编码 & 符号(如果它们尚未编码或属于另一个编码表达式的一部分),

例如

"tom & jill" should become "tom & jill"


"tom & jill" should remain "tom & jill"


"tom € jill" should remain "tom € jill"


"tom <&> jill" should become "tom <&amp;> jill"


"tom &quot;&&quot; jill" should become "tom &quot;&amp;&quot; jill"

I need a c# method to encode ampersands if they are not already encoded or part of another encoded epxression

eg

"tom & jill" should become "tom & jill"


"tom & jill" should remain "tom & jill"


"tom € jill" should remain "tom € jill"


"tom <&> jill" should become "tom <&> jill"


"tom "&" jill" should become "tom "&" jill"

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

蓝海 2024-12-16 13:52:44

您真正想要做的是,首先解码字符串,然后再次编码。不必费心尝试修补编码的字符串。

任何编码只有在可以轻松解码的情况下才有价值,因此重用该逻辑可以让您的生活更轻松。而且您的软件更不容易出现错误。

现在,如果您不确定字符串是否经过编码,那么问题肯定不是字符串本身,而是生成字符串的生态系统。你从哪里得到它?它在到达你之前经过了谁?你相信它吗?

如果您确实必须求助于创建一个 magic-fix-weird-data 函数,那么请考虑构建一个“编码”及其相应字符的表:

& -> &
€ -> €
< -> <
// etc.

然后,首先解码根据表中所有遇到的编码,然后重新编码整个字符串。当然,在不先解码的情况下摸索时,您可能会获得更有效的方法。但明年你就不会理智了。这是你的运营商,对吗?你需要保持头脑清醒!如果你试图变得太聪明,你就会失去理智。当你发疯时你就会丢掉工作。可悲的事情发生在那些让维护自己的黑客毁掉他们的思想的人身上......

编辑:当然,使用 .NET 库可以让你免于疯狂:

我刚刚测试过它,并且解码仅包含&符号的字符串似乎没有问题。因此,继续:

string magic(string encodedOrNot)
{
    var decoded = HttpUtility.HtmlDecode(encodedOrNot);
    return HttpUtility.HtmlEncode(decoded);
}

EDIT#2:事实证明,解码器 HttpUtility.HtmlDecode 将满足您的目的,但编码器不会,因为您不这样做想要对尖括号(<>)进行编码。但编写编码器非常简单:

define encoder(string decoded):
    result is a string-builder
    for character in decoded:
        if character in encoding-table:
           result.append(encoding-table[character])
        else:
           result.append(character)
    return result as string

What you actually want to do, is first decode the string and then encode it again. Don't bother trying to patch an encoded string.

Any encoding is only worth its salt if it can be decoded easily, so reuse that logic to make your life easier. And your software less bug-prone.

Now, if you are unsure of whether the string is encoded or not - the problem will most certainly not be the string itself, but the ecosystem that produced the string. Where did you get it from? Who did it pass through before it got to you? Do you trust it?

If you really have to resort to creating a magic-fix-weird-data function, then consider building a table of "encodings" and their corresponding characters:

& -> &
€ -> €
< -> <
// etc.

Then, first decode all encountered encodings according to the table and later reencode the whole string. Sure, you might get more efficient methods when fumbling without decoding first. But you won't be sane next year. And this is your carrier, right? You need to stay right in the head! You'll loose your mind if you try to be too clever. And you'll lose your job when you go mad. Sad things happen to people who let maintaining their hacks destroy their minds...

EDIT: Using the .NET library, of course, will save you from madness:

I just tested it, and it seems to have no problems with decoding strings with just ampersands in them. So, go ahead:

string magic(string encodedOrNot)
{
    var decoded = HttpUtility.HtmlDecode(encodedOrNot);
    return HttpUtility.HtmlEncode(decoded);
}

EDIT#2: It turns out, that the decoder HttpUtility.HtmlDecode will work for your purpose, but the encoder will not, since you don't want angle brackets (<, >) to be encoded. But writing an encoder is really easy:

define encoder(string decoded):
    result is a string-builder
    for character in decoded:
        if character in encoding-table:
           result.append(encoding-table[character])
        else:
           result.append(character)
    return result as string
浅暮の光 2024-12-16 13:52:44

这应该做得很好:

text = Regex.Replace(text, @"
    # Match & that is not part of an HTML entity.
    &                  # Match literal &.
    (?!                # But only if it is NOT...
      \w+;             # an alphanumeric entity,
    | \#[0-9]+;        # or a decimal entity,
    | \#x[0-9A-F]+;    # or a hexadecimal entity.
    )                  # End negative lookahead.", 
    "&",
    RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

This should do a pretty good job:

text = Regex.Replace(text, @"
    # Match & that is not part of an HTML entity.
    &                  # Match literal &.
    (?!                # But only if it is NOT...
      \w+;             # an alphanumeric entity,
    | \#[0-9]+;        # or a decimal entity,
    | \#x[0-9A-F]+;    # or a hexadecimal entity.
    )                  # End negative lookahead.", 
    "&",
    RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
淡写薰衣草的香 2024-12-16 13:52:44

使用正则表达式,可以使用负 lookahead 来完成。

&(?![^& ]+;)

测试示例此处

with regex it can be done with negative lookahead.

&(?![^& ]+;)

test example here

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文