在 JavaScript 中取消转义 HTML 实体?
我有一些与 XML-RPC 后端通信的 JavaScript 代码。 XML-RPC 返回以下形式的字符串:
<img src='myimage.jpg'>
但是,当我使用 JavaScript 将字符串插入 HTML 时,它们会按字面意思呈现。我没有看到图像,但看到了字符串:
<img src='myimage.jpg'>
我猜测 HTML 正在通过 XML-RPC 通道进行转义。
如何在 JavaScript 中对字符串进行转义?我尝试了此页面上的技术,但没有成功: http://paulschreiber.com/blog/2008/09/20/javascript-how-to-unescape-html-entities/
还有哪些其他方法可以诊断该问题?
I have some JavaScript code that communicates with an XML-RPC backend.
The XML-RPC returns strings of the form:
<img src='myimage.jpg'>
However, when I use JavaScript to insert the strings into HTML, they render literally. I don't see an image, I see the string:
<img src='myimage.jpg'>
I guess that the HTML is being escaped over the XML-RPC channel.
How can I unescape the string in JavaScript? I tried the techniques on this page, unsuccessfully: http://paulschreiber.com/blog/2008/09/20/javascript-how-to-unescape-html-entities/
What are other ways to diagnose the issue?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(30)
这里给出的大多数答案都有一个巨大的缺点:如果您尝试转换的字符串不可信,那么您最终会得到 跨站脚本 (XSS) 漏洞。对于接受的答案中的函数,请考虑以下事项:
此处的字符串包含未转义的 HTML 标记,因此不要解码任何内容
htmlDecode
函数将实际运行字符串内指定的 JavaScript 代码。可以通过使用 DOMParser 来避免这种情况,它支持 所有现代浏览器:
该函数保证不会运行任何 JavaScript 代码作为副作用。任何 HTML 标签都将被忽略,仅返回文本内容。
兼容性说明:使用
DOMParser
解析 HTML 至少需要 Chrome 30、Firefox 12、Opera 17、Internet Explorer 10、Safari 7.1 或 Microsoft Edge。因此,所有不支持的浏览器都已过时,截至 2017 年,唯一偶尔仍能在野外看到的浏览器是较旧的 Internet Explorer 和 Safari 版本(通常这些浏览器的数量还不足以打扰)。Most answers given here have a huge disadvantage: if the string you are trying to convert isn't trusted then you will end up with a Cross-Site Scripting (XSS) vulnerability. For the function in the accepted answer, consider the following:
The string here contains an unescaped HTML tag, so instead of decoding anything the
htmlDecode
function will actually run JavaScript code specified inside the string.This can be avoided by using DOMParser which is supported in all modern browsers:
This function is guaranteed to not run any JavaScript code as a side-effect. Any HTML tags will be ignored, only text content will be returned.
Compatibility note: Parsing HTML with
DOMParser
requires at least Chrome 30, Firefox 12, Opera 17, Internet Explorer 10, Safari 7.1 or Microsoft Edge. So all browsers without support are way past their EOL and as of 2017 the only ones that can still be seen in the wild occasionally are older Internet Explorer and Safari versions (usually these still aren't numerous enough to bother).您需要解码所有编码的 HTML 实体还是仅解码
&
本身?如果您只需要处理
&
那么您可以这样做:如果您需要解码所有 HTML 实体,那么您可以在没有 jQuery 的情况下完成它:
请注意下面 Mark 的评论突出显示此答案的早期版本中的安全漏洞,并建议使用
textarea
而不是div
来减轻潜在的 XSS 漏洞。无论您使用 jQuery 还是纯 JavaScript,这些漏洞都存在。Do you need to decode all encoded HTML entities or just
&
itself?If you only need to handle
&
then you can do this:If you need to decode all HTML entities then you can do it without jQuery:
Please take note of Mark's comments below which highlight security holes in an earlier version of this answer and recommend using
textarea
rather thandiv
to mitigate against potential XSS vulnerabilities. These vulnerabilities exist whether you use jQuery or plain JavaScript.编辑:您应该按照Wladimir建议使用 DOMParser API,自从发布的功能引入了安全漏洞。
以下代码片段是旧答案的代码,稍作修改:使用
textarea
代替div
减少了 XSS 漏洞,但在 IE9 和 Firefox 中仍然存在问题。基本上,我以编程方式创建一个 DOM 元素,将编码的 HTML 分配给其 insideHTML,并从 innerHTML 插入时创建的文本节点检索 nodeValue。由于它只是创建一个元素但从未添加它,因此不会修改任何站点 HTML。
它将跨浏览器(包括旧版浏览器)工作并接受所有 HTML 字符实体< /a>.
编辑:此代码的旧版本不适用于空白输入的 IE,如 jsFiddle 上的此处所证明(在 IE 中查看)。上面的版本适用于所有输入。
更新:似乎这不适用于大字符串,并且还引入了安全漏洞,请参阅评论。
EDIT: You should use the DOMParser API as Wladimir suggests, I edited my previous answer since the function posted introduced a security vulnerability.
The following snippet is the old answer's code with a small modification: using a
textarea
instead of adiv
reduces the XSS vulnerability, but it is still problematic in IE9 and Firefox.Basically I create a DOM element programmatically, assign the encoded HTML to its innerHTML and retrieve the nodeValue from the text node created on the innerHTML insertion. Since it just creates an element but never adds it, no site HTML is modified.
It will work cross-browser (including older browsers) and accept all the HTML Character Entities.
EDIT: The old version of this code did not work on IE with blank inputs, as evidenced here on jsFiddle (view in IE). The version above works with all inputs.
UPDATE: appears this doesn't work with large string, and it also introduces a security vulnerability, see comments.
从 JavaScript 解释 HTML(文本和其他内容)的更现代选项是 DOMParser API 中的 HTML 支持 (请参阅 MDN 中的此处)。这允许您使用浏览器的本机 HTML 解析器将字符串转换为 HTML 文档。自 2014 年底以来,所有主流浏览器的新版本都支持它。
如果我们只想解码一些文本内容,我们可以将其作为文档正文中的唯一内容,解析文档并提取其
.body.textContent
。我们可以在
DOMParser
的草案规范中看到 解析的文档未启用 JavaScript,因此我们可以执行此文本转换而无需担心安全问题。这超出了这个问题的范围,但是请注意,如果您获取解析的 DOM 节点本身(不仅仅是它们的文本内容)并将它们移动到实时文档 DOM,它们的脚本可能会将重新启用,并且可能存在安全问题。我没有研究过,所以请谨慎。
A more modern option for interpreting HTML (text and otherwise) from JavaScript is the HTML support in the
DOMParser
API (see here in MDN). This allows you to use the browser's native HTML parser to convert a string to an HTML document. It has been supported in new versions of all major browsers since late 2014.If we just want to decode some text content, we can put it as the sole content in a document body, parse the document, and pull out the its
.body.textContent
.We can see in the draft specification for
DOMParser
that JavaScript is not enabled for the parsed document, so we can perform this text conversion without security concerns.It's beyond the scope of this question, but please note that if you're taking the parsed DOM nodes themselves (not just their text content) and moving them to the live document DOM, it's possible that their scripting would be reenabled, and there could be security concerns. I haven't researched it, so please exercise caution.
Matthias Bynens 有一个用于此目的的库: https://github.com/mathiasbynens/he
示例:
我建议与涉及设置元素的 HTML 内容然后读回其文本内容的 hack 相比,它更受青睐。此类方法可行,但具有欺骗性的危险,如果用于不受信任的用户输入,则会出现 XSS 机会。
如果您确实无法忍受加载库,您可以使用 这个答案中描述的
textarea
hack 一个几乎重复的问题,与所建议的各种类似方法不同,它没有我所知的安全漏洞:但请注意影响与此类似的方法的安全问题,我在链接的答案!这种方法是一种黑客行为,未来对文本区域允许内容的更改(或特定浏览器中的错误)可能会导致依赖它的代码有一天突然出现 XSS 漏洞。
Matthias Bynens has a library for this: https://github.com/mathiasbynens/he
Example:
I suggest favouring it over hacks involving setting an element's HTML content and then reading back its text content. Such approaches can work, but are deceptively dangerous and present XSS opportunities if used on untrusted user input.
If you really can't bear to load in a library, you can use the
textarea
hack described in this answer to a near-duplicate question, which, unlike various similar approaches that have been suggested, has no security holes that I know of:But take note of the security issues, affecting similar approaches to this one, that I list in the linked answer! This approach is a hack, and future changes to the permissible content of a
textarea
(or bugs in particular browsers) could lead to code that relies upon it suddenly having an XSS hole one day.如果您使用的是 jQuery:
否则,请使用 Strictly Software 的编码器对象,它具有出色的
htmlDecode()
函数。If you're using jQuery:
Otherwise, use Strictly Software's Encoder Object, which has an excellent
htmlDecode()
function.您可以使用 Lodash unescape / escape 函数 https://lodash.com/docs/4.17.5#unescape
str 将变为
'fred, barney, &卵石'
You can use Lodash unescape / escape function https://lodash.com/docs/4.17.5#unescape
str will become
'fred, barney, & pebbles'
这是来自 ExtJS 源代码。
This is from ExtJS source code.
技巧是利用浏览器的能力来解码特殊的 HTML 字符,但不允许浏览器像实际的 html 一样执行结果...该函数使用正则表达式来识别和替换编码的 HTML 字符,一个字符一次。
The trick is to use the power of the browser to decode the special HTML characters, but not allow the browser to execute the results as if it was actual html... This function uses a regex to identify and replace encoded HTML characters, one character at a time.
element.innerText
也能达到这个目的。element.innerText
also does the trick.如果您像我一样正在寻找它 - 同时还有一个很好且安全的 JQuery 方法。
https://api.jquery.com/jquery.parsehtml/
你可以f.ex。在控制台中输入:
$.parseHTML(x) 返回一个数组,如果文本中有 HTML 标记,则 array.length 将大于 1。
In case you're looking for it, like me - meanwhile there's a nice and safe JQuery method.
https://api.jquery.com/jquery.parsehtml/
You can f.ex. type this in your console:
So $.parseHTML(x) returns an array, and if you have HTML markup within your text, the array.length will be greater than 1.
jQuery 将为您编码和解码。但是,您需要使用 textarea 标签,而不是 div。
jQuery will encode and decode for you. However, you need to use a textarea tag, not a div.
CMS 的答案工作正常,除非您想要转义的 HTML 非常长,超过 65536 个字符。因为在 Chrome 中,内部 HTML 被分成许多子节点,每个子节点最多 65536 长,你需要将它们连接起来。此函数也适用于非常长的字符串:
请参阅有关
innerHTML
最大长度的答案以获取更多信息:https:// /stackoverflow.com/a/27545633/694469CMS' answer works fine, unless the HTML you want to unescape is very long, longer than 65536 chars. Because then in Chrome the inner HTML gets split into many child nodes, each one at most 65536 long, and you need to concatenate them. This function works also for very long strings:
See this answer about
innerHTML
max length for more info: https://stackoverflow.com/a/27545633/694469要在 JavaScript 中转义 HTML 实体*,您可以使用小型库 html-escaper:
npm install html-escaper
或来自 Lodashunescape 函数a> 或 下划线(如果您正在使用它)。
*) 请注意,这些函数并不涵盖所有 HTML 实体,而仅涵盖最常见的实体,即
&
、<
、>,
'
,"
。要转义所有 HTML 实体,您可以使用To unescape HTML entities* in JavaScript you can use small library html-escaper:
npm install html-escaper
Or
unescape
function from Lodash or Underscore, if you are using it.*) please note that these functions don't cover all HTML entities, but only the most common ones, i.e.
&
,<
,>
,'
,"
. To unescape all HTML entities you can use he library.首先在正文中的某个位置创建一个
接下来,将要解码为 innerHTML 的字符串分配给以下内容:
最后,
这是整体代码:
First create a
<span id="decodeIt" style="display:none;"></span>
somewhere in the bodyNext, assign the string to be decoded as innerHTML to this:
Finally,
Here is the overall code:
该问题没有指定 x 的来源,但如果可以的话,防御恶意(或来自我们自己的应用程序的意外)输入是有意义的。例如,假设
x
的值为& <脚本>警报('你好');
。在 jQuery 中处理此问题的一种安全而简单的方法是:通过 https://gist.github.com/ 找到jmblog/3222899。我看不出有太多理由避免使用此解决方案,因为它至少与某些提供 XSS 防御的替代方案一样短(如果不是更短的话)。
(我最初将其作为评论发布,但由于同一线程中的后续评论要求我这样做,因此将其添加为答案)。
The question doesn't specify the origin of
x
but it makes sense to defend, if we can, against malicious (or just unexpected, from our own application) input. For example, supposex
has a value of& <script>alert('hello');</script>
. A safe and simple way to handle this in jQuery is:Found via https://gist.github.com/jmblog/3222899. I can't see many reasons to avoid using this solution given it is at least as short, if not shorter than some alternatives and provides defence against XSS.
(I originally posted this as a comment, but am adding it as an answer since a subsequent comment in the same thread requested that I do so).
不是对您的问题的直接回答,但是您的 RPC 返回一些结构(无论是 XML 还是 JSON 或其他)以及该结构内的那些图像数据(示例中的 url)不是更好吗?
然后您可以在 javascript 中解析它并使用 javascript 本身构建
。
您从 RPC 收到的结构可能如下所示:
我认为这种方式更好,因为将来自外部源的代码注入到您的页面中看起来不太安全。想象一下有人劫持了您的 XML-RPC 脚本并在其中放入了您不想要的内容(甚至是一些 javascript...)
Not a direct response to your question, but wouldn't it be better for your RPC to return some structure (be it XML or JSON or whatever) with those image data (urls in your example) inside that structure?
Then you could just parse it in your javascript and build the
<img>
using javascript itself.The structure you recieve from RPC could look like:
I think it's better this way, as injecting a code that comes from external source into your page doesn't look very secure. Imaging someone hijacking your XML-RPC script and putting something you wouldn't want in there (even some javascript...)
捕获常见问题的 JavaScript 解决方案:
这与 https://stackoverflow.com/a/4835406/2738039
a javascript solution that catches the common ones:
this is the reverse of https://stackoverflow.com/a/4835406/2738039
对于单线的人:
For one-line guys:
不客气...只是一个信使...全部归功于 ourcodeworld.com,链接如下。
完整学分:https://ourcodeworld .com/articles/read/188/encode-and-decode-html-entities-using-pure-javascript
You're welcome...just a messenger...full credit goes to ourcodeworld.com, link below.
Full Credit: https://ourcodeworld.com/articles/read/188/encode-and-decode-html-entities-using-pure-javascript
我知道这里有很多好的答案,但由于我实施了一些不同的方法,所以我想分享一下。
该代码是一种完全安全的安全方法,因为转义处理程序依赖于浏览器,而不是函数。因此,如果将来发现新的漏洞,就会覆盖这个解决方案。
顺便说一句,我选择使用字符
⪪
和⪫
,因为它们很少使用,所以通过匹配它们影响性能的机会要低得多。I know there are a lot of good answers here, but since I have implemented a bit different approach, I thought to share.
This code is a perfectly safe security-wise approach, as the escaping handler dependant on the browser, instead on the function. So, if a new vulnerability will be discovered in the future, this solution will be covered.
By the way, I have chosen to use the characters
⪪
and⪫
, because they are rarely used, so the chance of impacting the performance by matching them is significantly lower.克里斯的回答很好&优雅,但如果值未定义,则会失败。只需简单的改进即可使其变得可靠:
Chris answer is nice & elegant but it fails if value is undefined. Just simple improvement makes it solid:
我尝试了一切方法来删除&来自 JSON 数组。上面的例子都不是,但是 https://stackoverflow.com/users/2030321/chris 提供了一个很好的解决方案,导致我来解决我的问题。
我没有使用,因为我不明白如何将它插入到将 JSON 数据拉入数组的模态窗口中,但我确实根据示例尝试了这一点,并且它有效:
我喜欢它,因为它很简单,并且它有效,但不确定为什么它没有被广泛使用。搜索嗨&低找到一个简单的解决方案。
我继续寻求对语法的理解,以及使用它是否有任何风险。还没有发现任何东西。
I tried everything to remove & from a JSON array. None of the above examples, but https://stackoverflow.com/users/2030321/chris gave a great solution that led me to fix my problem.
I did not use, because I did not understand how to insert it into a modal window that was pulling JSON data into an array, but I did try this based upon the example, and it worked:
I like it because it was simple, and it works, but not sure why it's not widely used. Searched hi & low to find a simple solution.
I continue to seek understanding of the syntax, and if there is any risk to using this. Have not found anything yet.
我很疯狂地完成并制作了这个功能,它应该是漂亮的(如果不是完全的话)详尽的:
像这样使用:
打印:
Ich Heiße David
PS这花了大约一个半小时来制作。
I was crazy enough to go through and make this function that should be pretty, if not completely, exhaustive:
Used like so:
Prints:
Ich Heiße David
P.S. this took like an hour and a half to make.
这是迄今为止我尝试过的最全面的解决方案:
This is the most comprehensive solution I've tried so far:
闭包可以避免创建不必要的对象。
更简洁的方式
Closures can avoid creating unnecessary objects.
A more concise way
使用身份!我发现上面的答案都不令人满意,所以我从这里挑选了一些东西,修复了他们的问题并添加了完整的 W3C 实体定义以及更多功能。我还使其尽可能小,现在压缩后为 31KB,gzip 后为 14KB。您可以从 https://github.com/arashkazemi/dentity 下载它,
它包括解码器和编码器功能,它可以在浏览器和节点环境中工作。希望能有效解决问题!
Use Dentity! I found none of the answers above satisfying, so I cherry picked some stuff from here, fixed their problems and added the complete W3C entity definitions, and some more functionality. I also made it as small as possible, which is now 31KB minified and 14KB when gzipped. You can download it from https://github.com/arashkazemi/dentity
It includes both the decoder and encoder functions and it works both in browser and in node environment. I hope it solves the problem efficiently!
我在我的项目中使用这个:受到其他答案的启发,但有一个额外的安全参数,当你处理装饰字符
它的用途如下:
I use this in my project: inspired by other answers but with an extra secure parameter, can be useful when you deal with decorated characters
And it's usable like: