如何解码包含 \x3c 等的 Feedburner 结果
Feed Burner 更改了他们的博客服务返回结果,它返回类似于以下内容的 JavaScript 块:
文档.write("\x3cdiv 类\x3d\x22feedburnerFeedBlock\x22 id\x3d\x22RitterInsuranceMarketingRSSv3iugf6igask14fl8ok645b6l0\x22\x3e"); document.write("\x3cul\x3e"); 文档.write("\x3cli\x3e\x3cspan 类\x3d\x22标题\x22\x3e\x3ca href\x3d\x22
我想要原始的 html。以前,我可以轻松地使用 .Replace 来删除 document.write 语法,但我无法弄清楚这是什么类型的编码,或者至少无法弄清楚如何使用 C# 对其进行解码。
编辑:嗯,这是一个最终要解决的半噩梦,这就是我想出的方案,以防有人可以提供任何改进
public static char ConvertHexToASCII(this string hex)
{
if (hex == null) throw new ArgumentNullException(hex);
return (char)Convert.ToByte(hex, 16);
}
。
private string DecodeFeedburnerHtml(string html)
{
var builder = new StringBuilder(html.Length);
var stack = new Stack<char>(4);
foreach (var chr in html)
{
switch (chr)
{
case '\\':
if (stack.Count == 0)
{
stack.Push(chr);
}
else
{
stack.Clear();
builder.Append(chr);
}
break;
case 'x':
if (stack.Count == 1)
{
stack.Push(chr);
}
else
{
stack.Clear();
builder.Append(chr);
}
break;
default:
if (stack.Count >= 2)
{
stack.Push(chr);
if (stack.Count == 4)
{
//get stack[3]stack[4]
string hexString = string.Format("{1}{0}", stack.Pop(),
stack.Pop());
builder.Append(hexString.ConvertHexToASCII());
stack.Clear();
}
}
else
{
builder.Append(chr);
}
break;
}
}
html = builder.ToString();
return html;
}
不知道还有什么我可以做得更好。由于某种原因,这样的代码对我来说总是感觉很脏,即使它是一个线性时间算法,我想这与它必须有多长有关。
Feed burner changed their blog service return results that it returns blocks of javascript similar to:
document.write("\x3cdiv
class\x3d\x22feedburnerFeedBlock\x22
id\x3d\x22RitterInsuranceMarketingRSSv3iugf6igask14fl8ok645b6l0\x22\x3e");
document.write("\x3cul\x3e");
document.write("\x3cli\x3e\x3cspan
class\x3d\x22headline\x22\x3e\x3ca
href\x3d\x22
I want the raw html out of this. Previously I was able to easily just use .Replace to cleave out the document.write syntax but I can't figure out what kind of encoding this is or atleast how to decode it with C#.
Edit: Well this was a semi-nightmare to finally solve, here's what I came up with incase anyone has any improvements to offer
public static char ConvertHexToASCII(this string hex)
{
if (hex == null) throw new ArgumentNullException(hex);
return (char)Convert.ToByte(hex, 16);
}
.
private string DecodeFeedburnerHtml(string html)
{
var builder = new StringBuilder(html.Length);
var stack = new Stack<char>(4);
foreach (var chr in html)
{
switch (chr)
{
case '\\':
if (stack.Count == 0)
{
stack.Push(chr);
}
else
{
stack.Clear();
builder.Append(chr);
}
break;
case 'x':
if (stack.Count == 1)
{
stack.Push(chr);
}
else
{
stack.Clear();
builder.Append(chr);
}
break;
default:
if (stack.Count >= 2)
{
stack.Push(chr);
if (stack.Count == 4)
{
//get stack[3]stack[4]
string hexString = string.Format("{1}{0}", stack.Pop(),
stack.Pop());
builder.Append(hexString.ConvertHexToASCII());
stack.Clear();
}
}
else
{
builder.Append(chr);
}
break;
}
}
html = builder.ToString();
return html;
}
Not sure what else I could do better. For some reason code like this always feels really dirty to me even though it's a linear time algorithm I guess this is related to how long it has to be.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在 dotnet core 中,您可以使用 Uri.UnescapeDataString(originalString.Replace("\x","%"))
首先将其转换为 Url 编码字符串。
In dotnet core you can use Uri.UnescapeDataString(originalString.Replace("\x","%"))
to convert it by making it into a Url encoded string first.
这些看起来像 ASCII 值,以十六进制编码。您可以遍历该字符串,只要找到
\x
后跟两个十六进制数字 (0-9,af),就将其替换为相应的 ASCII 字符。如果字符串很长,则将结果增量保存到StringBuilder
而不是使用String.Replace()
会更快。我不知道编码规范,但可能需要遵循更多规则(例如,如果
\\
是文字\
的转义字符)。Those look like ASCII values, encoded in hex. You could traverse the string, and whenever you find a
\x
followed by two hexadecimal digits (0-9,a-f), replace it with the corresponding ASCII character. If the string is long, it would be faster to save the result incrementally to aStringBuilder
instead of usingString.Replace()
.I don't know the encoding specification, but there might be more rules to follow (for example, if
\\
is an escape character for a literal\
).这是 PHP Twig 编码:
http://www.twig-project.org/
因为你是使用 C#,您很可能必须创建一个字典来翻译符号,然后使用一系列
.Replace()
字符串方法将它们转换回 HTML 字符。或者,您可以将该数据保存到文件中,运行 Perl 脚本来解码文本,然后使用 C# 从文件中读取,但这可能会更昂贵。
That is a PHP Twig encoding:
http://www.twig-project.org/
Since you are using C# you will most likely have to create a dictionary to translate the symbols and then use a series of
.Replace()
string methods to convert those back to HTML characters.Alternatively you can save that data to a file, run a Perl script to decode the text and then read from the file in C#, but that might be more costly.