解码包含特殊 HTML 实体的字符串的正确方法是什么?

发布于 2024-12-04 17:55:09 字数 379 浏览 0 评论 0原文

假设我从服务请求中返回一些 JSON,如下所示:

{
    "message": "We're unable to complete your request at this time."
}

我不确定为什么撇号是这样编码的 (');我只知道我想解码它。

我脑海中浮现出一种使用 jQuery 的方法:

function decodeHtml(html) {
    return $('<div>').html(html).text();
}

不过,这看起来(非常)hacky。有什么更好的办法呢?有“正确”的方法吗?

Say I get some JSON back from a service request that looks like this:

{
    "message": "We're unable to complete your request at this time."
}

I'm not sure why that apostraphe is encoded like that ('); all I know is that I want to decode it.

Here's one approach using jQuery that popped into my head:

function decodeHtml(html) {
    return $('<div>').html(html).text();
}

That seems (very) hacky, though. What's a better way? Is there a "right" way?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

断念 2024-12-11 17:55:09

这是我最喜欢的 HTML 字符解码方式。使用此代码的优点是标签也被保留。

function decodeHtml(html) {
    var txt = document.createElement("textarea");
    txt.innerHTML = html;
    return txt.value;
}

示例: http://jsfiddle.net/k65s3/

输入:

Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>

输出:

Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>

This is my favourite way of decoding HTML characters. The advantage of using this code is that tags are also preserved.

function decodeHtml(html) {
    var txt = document.createElement("textarea");
    txt.innerHTML = html;
    return txt.value;
}

Example: http://jsfiddle.net/k65s3/

Input:

Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>

Output:

Entity: Bad attempt at XSS:<script>alert('new\nline?')</script><br>
弱骨蛰伏 2024-12-11 17:55:09

如果您关心遗留兼容性,请不要使用 DOM 来执行此操作。使用 DOM 解码 HTML 实体(如当前接受的答案中所建议的)会导致 非现代浏览器上跨浏览器结果的差异

对于一个强大的和根据 HTML 标准中的算法解码字符引用的确定性解决方案,请使用 he 库< /a>.来自其自述文件:

he(“HTML 实体”)是一个用 JavaScript 编写的强大的 HTML 实体编码器/解码器。它支持所有符合 HTML 的标准化命名字符引用,处理不明确的&符号和其他边缘情况就像浏览器一样,有一个广泛的测试套件,并且- 与许多其他 JavaScript 解决方案相反 - 可以很好地处理星体 Unicode 符号。 提供在线演示。

以下是您如何使用它:

he.decode("We're unable to complete your request at this time.");
→ "We're unable to complete your request at this time."

免责声明:我是图书馆。

有关更多信息,请参阅此 Stack Overflow 答案

Don’t use the DOM to do this if you care about legacy compatibility. Using the DOM to decode HTML entities (as suggested in the currently accepted answer) leads to differences in cross-browser results on non-modern browsers.

For a robust & deterministic solution that decodes character references according to the algorithm in the HTML Standard, use the he library. From its README:

he (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML, handles ambiguous ampersands and other edge cases just like a browser would, has an extensive test suite, and — contrary to many other JavaScript solutions — he handles astral Unicode symbols just fine. An online demo is available.

Here’s how you’d use it:

he.decode("We're unable to complete your request at this time.");
→ "We're unable to complete your request at this time."

Disclaimer: I'm the author of the he library.

See this Stack Overflow answer for some more info.

[浮城] 2024-12-11 17:55:09

如果你不想使用 html/dom,你可以使用正则表达式。我还没有测试过这个;但大致如下:

function parseHtmlEntities(str) {
    return str.replace(/&#([0-9]{1,3});/gi, function(match, numStr) {
        var num = parseInt(numStr, 10); // read num as normal number
        return String.fromCharCode(num);
    });
}

[编辑]

注意:这只适用于数字 html 实体,而不适用于像 &oring; 这样的东西。

[编辑2]

修复了功能(一些错别字),在这里测试:http://jsfiddle.net/Be2Bd/1 /

If you don't want to use html/dom, you could use regex. I haven't tested this; but something along the lines of:

function parseHtmlEntities(str) {
    return str.replace(/&#([0-9]{1,3});/gi, function(match, numStr) {
        var num = parseInt(numStr, 10); // read num as normal number
        return String.fromCharCode(num);
    });
}

[Edit]

Note: this would only work for numeric html-entities, and not stuff like &oring;.

[Edit 2]

Fixed the function (some typos), test here: http://jsfiddle.net/Be2Bd/1/

我一直都在从未离去 2024-12-11 17:55:09

有 JS 函数来处理 &#xxxx 样式的实体:
GitHub 上的函数

// encode(decode) html text into html entity
var decodeHtmlEntity = function(str) {
  return str.replace(/&#(\d+);/g, function(match, dec) {
    return String.fromCharCode(dec);
  });
};

var encodeHtmlEntity = function(str) {
  var buf = [];
  for (var i=str.length-1;i>=0;i--) {
    buf.unshift(['&#', str[i].charCodeAt(), ';'].join(''));
  }
  return buf.join('');
};

var entity = '高级程序设计';
var str = '高级程序设计';

let element = document.getElementById("testFunct");
element.innerHTML = (decodeHtmlEntity(entity));

console.log(decodeHtmlEntity(entity) === str);
console.log(encodeHtmlEntity(str) === entity);
// output:
// true
// true
<div><span id="testFunct"></span></div>

There's JS function to deal with &#xxxx styled entities:
function at GitHub

// encode(decode) html text into html entity
var decodeHtmlEntity = function(str) {
  return str.replace(/&#(\d+);/g, function(match, dec) {
    return String.fromCharCode(dec);
  });
};

var encodeHtmlEntity = function(str) {
  var buf = [];
  for (var i=str.length-1;i>=0;i--) {
    buf.unshift(['&#', str[i].charCodeAt(), ';'].join(''));
  }
  return buf.join('');
};

var entity = '高级程序设计';
var str = '高级程序设计';

let element = document.getElementById("testFunct");
element.innerHTML = (decodeHtmlEntity(entity));

console.log(decodeHtmlEntity(entity) === str);
console.log(encodeHtmlEntity(str) === entity);
// output:
// true
// true
<div><span id="testFunct"></span></div>

冷情 2024-12-11 17:55:09

jQuery 将为您编码和解码。

function htmlDecode(value) {
  return $("<textarea/>").html(value).text();
}

function htmlEncode(value) {
  return $('<textarea/>').text(value).html();
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<script>
$(document).ready(function() {
   $("#encoded")
  .text(htmlEncode("<img src onerror='alert(0)'>"));
   $("#decoded")
  .text(htmlDecode("<img src onerror='alert(0)'>"));
});
</script>

<span>htmlEncode() result:</span><br/>
<div id="encoded"></div>
<br/>
<span>htmlDecode() result:</span><br/>
<div id="decoded"></div>

jQuery will encode and decode for you.

function htmlDecode(value) {
  return $("<textarea/>").html(value).text();
}

function htmlEncode(value) {
  return $('<textarea/>').text(value).html();
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<script>
$(document).ready(function() {
   $("#encoded")
  .text(htmlEncode("<img src onerror='alert(0)'>"));
   $("#decoded")
  .text(htmlDecode("<img src onerror='alert(0)'>"));
});
</script>

<span>htmlEncode() result:</span><br/>
<div id="encoded"></div>
<br/>
<span>htmlDecode() result:</span><br/>
<div id="decoded"></div>

笑看君怀她人 2024-12-11 17:55:09

_.unescape 可以满足您的要求

https://lodash.com/docs /#逃逸

_.unescape does what you're looking for

https://lodash.com/docs/#unescape

心凉怎暖 2024-12-11 17:55:09

这是很好的答案。您可以将其与角度一起使用,如下所示:

 moduleDefinitions.filter('sanitize', ['$sce', function($sce) {
    return function(htmlCode) {
        var txt = document.createElement("textarea");
        txt.innerHTML = htmlCode;
        return $sce.trustAsHtml(txt.value);
    }
}]);

This is so good answer. You can use this with angular like this:

 moduleDefinitions.filter('sanitize', ['$sce', function($sce) {
    return function(htmlCode) {
        var txt = document.createElement("textarea");
        txt.innerHTML = htmlCode;
        return $sce.trustAsHtml(txt.value);
    }
}]);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文