如何解码包含 \x3c 等的 Feedburner 结果

发布于 2024-10-07 19:00:41 字数 2303 浏览 1 评论 0原文

Feed Burner 更改了他们的博客服务返回结果,它返回类似于以下内容的 JavaScript 块:

文档.write("\x3cdiv 类\x3d\x22feedburnerFeedBlock\x22 id\x3d\x22RitterInsuranceMarketingRSSv3iugf6igask14fl8ok645b6l0\x22\x3e"); document.write("\x3cul\x3e"); 文档.write("\x3cli\x3e\x3cspan 类\x3d\x22标题\x22\x3e\x3ca href\x3d\x22

我想要原始的 html。以前,我可以轻松地使用 .Replace 来删除 document.write 语法,但我无法弄清楚这是什么类型的编码,或者至少无法弄清楚如何使用 C# 对其进行解码。

编辑:嗯,这是一个最终要解决的半噩梦,这就是我想出的方案,以防有人可以提供任何改进

public static  char ConvertHexToASCII(this string hex)
{
    if (hex == null) throw new ArgumentNullException(hex);
    return (char)Convert.ToByte(hex, 16);
}

private string DecodeFeedburnerHtml(string html)
{
    var builder = new StringBuilder(html.Length);
    var stack = new Stack<char>(4);
    foreach (var chr in html)
    {
        switch (chr)
        {
            case '\\':
                if (stack.Count == 0)
                {
                    stack.Push(chr);
                }
                else
                {
                    stack.Clear();
                    builder.Append(chr);
                }
                break;
            case 'x':
                if (stack.Count == 1)
                {
                    stack.Push(chr);
                }
                else
                {
                    stack.Clear();
                    builder.Append(chr);
                }
                break;
            default:
                if (stack.Count >= 2)
                {
                    stack.Push(chr);

                    if (stack.Count == 4)
                    {
                        //get stack[3]stack[4]
                        string hexString = string.Format("{1}{0}", stack.Pop(),
                                                     stack.Pop());

                        builder.Append(hexString.ConvertHexToASCII());
                        stack.Clear();
                    }
                }
                else
                {
                    builder.Append(chr);
                }
                break;
        }
    }

    html = builder.ToString();
    return html;
}

不知道还有什么我可以做得更好。由于某种原因,这样的代码对我来说总是感觉很脏,即使它是一个线性时间算法,我想这与它必须有多长有关。

Feed burner changed their blog service return results that it returns blocks of javascript similar to:

document.write("\x3cdiv
class\x3d\x22feedburnerFeedBlock\x22
id\x3d\x22RitterInsuranceMarketingRSSv3iugf6igask14fl8ok645b6l0\x22\x3e");
document.write("\x3cul\x3e");
document.write("\x3cli\x3e\x3cspan
class\x3d\x22headline\x22\x3e\x3ca
href\x3d\x22

I want the raw html out of this. Previously I was able to easily just use .Replace to cleave out the document.write syntax but I can't figure out what kind of encoding this is or atleast how to decode it with C#.

Edit: Well this was a semi-nightmare to finally solve, here's what I came up with incase anyone has any improvements to offer

public static  char ConvertHexToASCII(this string hex)
{
    if (hex == null) throw new ArgumentNullException(hex);
    return (char)Convert.ToByte(hex, 16);
}

.

private string DecodeFeedburnerHtml(string html)
{
    var builder = new StringBuilder(html.Length);
    var stack = new Stack<char>(4);
    foreach (var chr in html)
    {
        switch (chr)
        {
            case '\\':
                if (stack.Count == 0)
                {
                    stack.Push(chr);
                }
                else
                {
                    stack.Clear();
                    builder.Append(chr);
                }
                break;
            case 'x':
                if (stack.Count == 1)
                {
                    stack.Push(chr);
                }
                else
                {
                    stack.Clear();
                    builder.Append(chr);
                }
                break;
            default:
                if (stack.Count >= 2)
                {
                    stack.Push(chr);

                    if (stack.Count == 4)
                    {
                        //get stack[3]stack[4]
                        string hexString = string.Format("{1}{0}", stack.Pop(),
                                                     stack.Pop());

                        builder.Append(hexString.ConvertHexToASCII());
                        stack.Clear();
                    }
                }
                else
                {
                    builder.Append(chr);
                }
                break;
        }
    }

    html = builder.ToString();
    return html;
}

Not sure what else I could do better. For some reason code like this always feels really dirty to me even though it's a linear time algorithm I guess this is related to how long it has to be.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

稳稳的幸福 2024-10-14 19:00:41

在 dotnet core 中,您可以使用 Uri.UnescapeDataString(originalString.Replace("\x","%"))
首先将其转换为 Url 编码字符串。

In dotnet core you can use Uri.UnescapeDataString(originalString.Replace("\x","%"))
to convert it by making it into a Url encoded string first.

↙厌世 2024-10-14 19:00:41

这些看起来像 ASCII 值,以十六进制编码。您可以遍历该字符串,只要找到 \x 后跟两个十六进制数字 (0-9,af),就将其替换为相应的 ASCII 字符。如果字符串很长,则将结果增量保存到 StringBuilder 而不是使用 String.Replace() 会更快。

我不知道编码规范,但可能需要遵循更多规则(例如,如果 \\ 是文字 \ 的转义字符)。

Those look like ASCII values, encoded in hex. You could traverse the string, and whenever you find a \x followed by two hexadecimal digits (0-9,a-f), replace it with the corresponding ASCII character. If the string is long, it would be faster to save the result incrementally to a StringBuilder instead of using String.Replace().

I don't know the encoding specification, but there might be more rules to follow (for example, if \\ is an escape character for a literal \).

夜声 2024-10-14 19:00:41

这是 PHP Twig 编码:

http://www.twig-project.org/

因为你是使用 C#,您很可能必须创建一个字典来翻译符号,然后使用一系列 .Replace() 字符串方法将它们转换回 HTML 字符。

或者,您可以将该数据保存到文件中,运行 Perl 脚本来解码文本,然后使用 C# 从文件中读取,但这可能会更昂贵。

That is a PHP Twig encoding:

http://www.twig-project.org/

Since you are using C# you will most likely have to create a dictionary to translate the symbols and then use a series of .Replace() string methods to convert those back to HTML characters.

Alternatively you can save that data to a file, run a Perl script to decode the text and then read from the file in C#, but that might be more costly.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文