JavaScript 字符串中转义的 html 中未终止的字符串文字
我在编码此值时发现一些 javascript 字符串文字存在问题:
Unencoded
<!-- Start ValueClick Media 300x250 Code for Test Tag -->
<script language="javascript" src="http://media.fastclick.net/w/get.media?sid=38901&m=6&tp=8&d=j&t=n"></script>
<noscript><a href="http://media.fastclick.net/w/click.here?sid=38901&m=6&c=1" target="_blank">
<img src="http://media.fastclick.net/w/get.media?sid=38901&m=6&tp=8&d=s&c=1"width=300 height=250 border=1></a></noscript>
<!-- End ValueClick Media 300x250 Code for Test Tag -->
我最终得到此值:
Decoded
"<!-- Start ValueClick Media 300x250 Code for Test Tag -->\r\n<script language=\"javascript\" src=\"http://media.fastclick.net/w/get.media?sid=38901&m=6&tp=8&d=j&t=n\"></script>\r\n<noscript><a href=\"http://media.fastclick.net/w/click.here?sid=38901&m=6&c=1\" target=\"_blank\">\r\n<img src=\"http://media.fastclick.net/w/get.media?sid=38901&m=6&tp=8&d=s&c=1\"width=300 height=250 border=1></a></noscript>\r\n<!-- End ValueClick Media 300x250 Code for Test Tag -->"
当在某些 javascript 中用作 javascript 文字时代码,Firefox 抱怨它未终止 - 但我自己也不明白为什么。
奇怪的是,如果我从上面的 html 中删除“”结束标记,编码版本可以正常工作,如下所示:
未编码
<!-- Start ValueClick Media 300x250 Code for Test Tag -->
<script language="javascript" src="http://media.fastclick.net/w/get.media?sid=38901&m=6&tp=8&d=j&t=n">
<noscript><a href="http://media.fastclick.net/w/click.here?sid=38901&m=6&c=1" target="_blank">
<img src="http://media.fastclick.net/w/get.media?sid=38901&m=6&tp=8&d=s&c=1"width=300 height=250 border=1></a></noscript>
<!-- End ValueClick Media 300x250 Code for Test Tag -->
编码
"<!-- Start ValueClick Media 300x250 Code for Test Tag -->\r\n<script language=\"javascript\" src=\"http://media.fastclick.net/w/get.media?sid=38901&m=6&tp=8&d=j&t=n\">\r\n<noscript><a href=\"http://media.fastclick.net/w/click.here?sid=38901&m=6&c=1\" target=\"_blank\">\r\n<img src=\"http://media.fastclick.net/w/get.media?sid=38901&m=6&tp=8&d=s&c=1\"width=300 height=250 border=1></a></noscript>\r\n<!-- End ValueClick Media 300x250 Code for Test Tag -->"
这个编码值有效...
有人知道我错过了什么吗?
更新
现在看起来相当明显,我归咎于睡眠不足,在这种情况下,应用程序依赖旧版本的 JSON.Net 来编码 javascript - 所以我通过引入新的 JsonConverter 解决了这个问题对于字符串,在应用 JavaScript 转义后处理第二遍转义结束标记。
public class EscapeTagsStringConverter : JsonConverter
{
public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
{
if (value == null)
{
writer.WriteNull();
return;
}
string escapedValue = ToEscapedJavaScriptString(value.ToString(), '"').Replace("</", "<\\/");
writer.WriteRawValue("\"" + escapedValue + "\"");
}
public override object ReadJson(JsonReader reader, Type objectType, JsonSerializer serializer)
{
return reader.Value.ToString();
}
public override bool CanConvert(Type objectType)
{
return (objectType == typeof (string));
}
public static char IntToHex(int n)
{
if (n <= 9)
{
return (char)(n + 48);
}
return (char)((n - 10) + 97);
}
public static void WriteCharAsUnicode(TextWriter writer, char c)
{
char h1 = IntToHex((c >> 12) & '\x000f');
char h2 = IntToHex((c >> 8) & '\x000f');
char h3 = IntToHex((c >> 4) & '\x000f');
char h4 = IntToHex(c & '\x000f');
writer.Write('\\');
writer.Write('u');
writer.Write(h1);
writer.Write(h2);
writer.Write(h3);
writer.Write(h4);
}
public static void WriteEscapedJavaScriptChar(TextWriter writer, char c, char delimiter)
{
switch (c)
{
case '\t':
writer.Write(@"\t");
break;
case '\n':
writer.Write(@"\n");
break;
case '\r':
writer.Write(@"\r");
break;
case '\f':
writer.Write(@"\f");
break;
case '\b':
writer.Write(@"\b");
break;
case '\\':
writer.Write(@"\\");
break;
case '\'':
writer.Write((delimiter == '\'') ? @"\'" : @"'");
break;
case '"':
writer.Write((delimiter == '"') ? "\\\"" : @"""");
break;
default:
if (c > '\u001f')
writer.Write(c);
else
WriteCharAsUnicode(writer, c);
break;
}
}
public void WriteEscapedJavaScriptString(TextWriter writer, string value, char delimiter)
{
if (value != null)
{
for (int i = 0; i < value.Length; i++)
{
WriteEscapedJavaScriptChar(writer, value[i], delimiter);
}
}
}
public string ToEscapedJavaScriptString(string value)
{
return ToEscapedJavaScriptString(value, '"');
}
public string ToEscapedJavaScriptString(string value, char delimiter)
{
using (StringWriter w = CreateStringWriter(GetLength(value) ?? 16))
{
WriteEscapedJavaScriptString(w, value, delimiter);
return w.ToString();
}
}
public static StringWriter CreateStringWriter(int capacity)
{
StringBuilder sb = new StringBuilder(capacity);
StringWriter sw = new StringWriter(sb, CultureInfo.InvariantCulture);
return sw;
}
public static int? GetLength(string value)
{
if (value == null)
return null;
return value.Length;
}
}
I'm seeing an issue with some javascript string literals, when encoding this value:
Unencoded
<!-- Start ValueClick Media 300x250 Code for Test Tag -->
<script language="javascript" src="http://media.fastclick.net/w/get.media?sid=38901&m=6&tp=8&d=j&t=n"></script>
<noscript><a href="http://media.fastclick.net/w/click.here?sid=38901&m=6&c=1" target="_blank">
<img src="http://media.fastclick.net/w/get.media?sid=38901&m=6&tp=8&d=s&c=1"width=300 height=250 border=1></a></noscript>
<!-- End ValueClick Media 300x250 Code for Test Tag -->
I end up with this value:
Decoded
"<!-- Start ValueClick Media 300x250 Code for Test Tag -->\r\n<script language=\"javascript\" src=\"http://media.fastclick.net/w/get.media?sid=38901&m=6&tp=8&d=j&t=n\"></script>\r\n<noscript><a href=\"http://media.fastclick.net/w/click.here?sid=38901&m=6&c=1\" target=\"_blank\">\r\n<img src=\"http://media.fastclick.net/w/get.media?sid=38901&m=6&tp=8&d=s&c=1\"width=300 height=250 border=1></a></noscript>\r\n<!-- End ValueClick Media 300x250 Code for Test Tag -->"
which when used as a javascript literal in some javascript code, Firefox complains that it's unterminated - but I can't see why myself.
Oddly enough if I remove the "</script>
" closing tag from the above html, the encoded version works correctly, as below:
Unecoded
<!-- Start ValueClick Media 300x250 Code for Test Tag -->
<script language="javascript" src="http://media.fastclick.net/w/get.media?sid=38901&m=6&tp=8&d=j&t=n">
<noscript><a href="http://media.fastclick.net/w/click.here?sid=38901&m=6&c=1" target="_blank">
<img src="http://media.fastclick.net/w/get.media?sid=38901&m=6&tp=8&d=s&c=1"width=300 height=250 border=1></a></noscript>
<!-- End ValueClick Media 300x250 Code for Test Tag -->
Encoded
"<!-- Start ValueClick Media 300x250 Code for Test Tag -->\r\n<script language=\"javascript\" src=\"http://media.fastclick.net/w/get.media?sid=38901&m=6&tp=8&d=j&t=n\">\r\n<noscript><a href=\"http://media.fastclick.net/w/click.here?sid=38901&m=6&c=1\" target=\"_blank\">\r\n<img src=\"http://media.fastclick.net/w/get.media?sid=38901&m=6&tp=8&d=s&c=1\"width=300 height=250 border=1></a></noscript>\r\n<!-- End ValueClick Media 300x250 Code for Test Tag -->"
This encoded value works...
Anyone know what I'm missing?
Update
Seems rather obvious now, I blame lack of sleep, in this case the application was relying on an older release of JSON.Net for encoding the javascript - so I worked around the issue by introducing a new JsonConverter for strings, that dealt with escaping closing tags on a second pass after the JavaScript escaping had been applied.
public class EscapeTagsStringConverter : JsonConverter
{
public override void WriteJson(JsonWriter writer, object value, JsonSerializer serializer)
{
if (value == null)
{
writer.WriteNull();
return;
}
string escapedValue = ToEscapedJavaScriptString(value.ToString(), '"').Replace("</", "<\\/");
writer.WriteRawValue("\"" + escapedValue + "\"");
}
public override object ReadJson(JsonReader reader, Type objectType, JsonSerializer serializer)
{
return reader.Value.ToString();
}
public override bool CanConvert(Type objectType)
{
return (objectType == typeof (string));
}
public static char IntToHex(int n)
{
if (n <= 9)
{
return (char)(n + 48);
}
return (char)((n - 10) + 97);
}
public static void WriteCharAsUnicode(TextWriter writer, char c)
{
char h1 = IntToHex((c >> 12) & '\x000f');
char h2 = IntToHex((c >> 8) & '\x000f');
char h3 = IntToHex((c >> 4) & '\x000f');
char h4 = IntToHex(c & '\x000f');
writer.Write('\\');
writer.Write('u');
writer.Write(h1);
writer.Write(h2);
writer.Write(h3);
writer.Write(h4);
}
public static void WriteEscapedJavaScriptChar(TextWriter writer, char c, char delimiter)
{
switch (c)
{
case '\t':
writer.Write(@"\t");
break;
case '\n':
writer.Write(@"\n");
break;
case '\r':
writer.Write(@"\r");
break;
case '\f':
writer.Write(@"\f");
break;
case '\b':
writer.Write(@"\b");
break;
case '\\':
writer.Write(@"\\");
break;
case '\'':
writer.Write((delimiter == '\'') ? @"\'" : @"'");
break;
case '"':
writer.Write((delimiter == '"') ? "\\\"" : @"""");
break;
default:
if (c > '\u001f')
writer.Write(c);
else
WriteCharAsUnicode(writer, c);
break;
}
}
public void WriteEscapedJavaScriptString(TextWriter writer, string value, char delimiter)
{
if (value != null)
{
for (int i = 0; i < value.Length; i++)
{
WriteEscapedJavaScriptChar(writer, value[i], delimiter);
}
}
}
public string ToEscapedJavaScriptString(string value)
{
return ToEscapedJavaScriptString(value, '"');
}
public string ToEscapedJavaScriptString(string value, char delimiter)
{
using (StringWriter w = CreateStringWriter(GetLength(value) ?? 16))
{
WriteEscapedJavaScriptString(w, value, delimiter);
return w.ToString();
}
}
public static StringWriter CreateStringWriter(int capacity)
{
StringBuilder sb = new StringBuilder(capacity);
StringWriter sw = new StringWriter(sb, CultureInfo.InvariantCulture);
return sw;
}
public static int? GetLength(string value)
{
if (value == null)
return null;
return value.Length;
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
好吧,是的,如果您有:
浏览器如何知道第一个
不是 script 元素的真正结尾?每个浏览器(不仅仅是 Firefox)都会将其读作:
为了避免包含
(ETGO) 序列的字符串文字过早结束,您必须以某种方式对其进行转义。您可以说
'<\/script>'
、'\x3C/script>'
甚至'<'+'/script>'< /code> (那个很流行,尽管我觉得它很不优雅)。
Well, yeah, if you have:
How is the browser supposed to know that the first
</script>
isn't a real end of the script element? Every browser, not just Firefox, will read that as:To avoid a premature end to a string literal containing the
</
(ETAGO) sequence, you must escape it in some way. You could say'<\/script>'
, or'\x3C/script>'
or even'<'+'/script>'
(that one is popular, though I find it quite inelegant).解码后的值不会在 chrome 或 ff 3.6.10 中引发错误
你用的什么ff版本?
the decoded value doesn't throw an error in chrome or ff 3.6.10
What ff version are you using?