使用 JavaScript 正则表达式将数字 HTML 实体替换为其实际字符

发布于 2024-10-05 02:48:44 字数 326 浏览 4 评论 0原文

我正在尝试使用 JavaScript &正则表达式将数字 HTML 实体替换为实际的 Unicode 字符,例如,

foo's bar
→
foo's bar

这就是我到目前为止所得到的:

"foo's bar".replace(/&#([^\s]*);/g, "$1"); // "foo39s bar"

剩下要做的就是用 String.fromCharCode($1) 替换数字,但我可以'似乎无法让它发挥作用。我该怎么做?

I'm trying to use JavaScript & regex to replace numerical HTML entities with their actual Unicode characters, e.g.

foo's bar
→
foo's bar

This is what I got so far:

"foo's bar".replace(/&#([^\s]*);/g, "$1"); // "foo39s bar"

All that's left to do is to replace the number with String.fromCharCode($1), but I can't seem to get it to work. How can I do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

昇り龍 2024-10-12 02:48:44
"foo's bar".replace(/&#(\d+);/g, function(match, match2) {return String.fromCharCode(+match2);})
"foo's bar".replace(/&#(\d+);/g, function(match, match2) {return String.fromCharCode(+match2);})
站稳脚跟 2024-10-12 02:48:44
"foo's bar".replace(/&#([^\s]*);/g, function(x, y) { return String.fromCharCode(y) })

在当前示例中,第一个参数 (x) 是“'”。 y 是 39。

"foo's bar".replace(/&#([^\s]*);/g, function(x, y) { return String.fromCharCode(y) })

First argument (x) is a "'" in current example. y is 39.

你的他你的她 2024-10-12 02:48:44

除了使用回调函数之外,您可能还需要考虑添加对十六进制字符引用 () 的支持。

此外,fromCharCode 可能还不够。例如 𐤀 是对腓尼基字符的有效引用,但因为它位于基本多语言平面之外,并且 JavaScript 的字符串模型基于 UTF-16 代码单元,而不是完整的字符代码点,fromCharCode(67840) 不起作用。您需要一个 UTF-16 编码器,例如:

String.fromCharCodePoint= function(/* codepoints */) {
    var codeunits= [];
    for (var i= 0; i<arguments.length; i++) {
        var c= arguments[i];
        if (arguments[i]<0x10000) {
            codeunits.push(arguments[i]);
        } else if (arguments[i]<0x110000) {
            c-= 0x10000;
            codeunits.push((c>>10 & 0x3FF) + 0xD800);
            codeunits.push((c&0x3FF) + 0xDC00);
        }
    }
    return String.fromCharCode.apply(String, codeunits);
};

function decodeCharacterReferences(s) {
    return s.replace(/&#(\d+);/g, function(_, n) {;
        return String.fromCharCodePoint(parseInt(n, 10));
    }).replace(/&#x([0-9a-f]+);/gi, function(_, n) {
        return String.fromCharCodePoint(parseInt(n, 16));
    });
};

alert(decodeCharacterReferences('Hello 𐤀 mum 𐤀!'));

As well as using a callback function, you may want to consider adding support for hex character references ().

Also, fromCharCode may not be enough. eg 𐤀 is a valid reference to a Phoenician character, but because it is outside the Basic Multilingual Plane, and JavaScript's String model is based on UTF-16 code units, not complete character code points, fromCharCode(67840) won't work. You'd need a UTF-16 encoder, for example:

String.fromCharCodePoint= function(/* codepoints */) {
    var codeunits= [];
    for (var i= 0; i<arguments.length; i++) {
        var c= arguments[i];
        if (arguments[i]<0x10000) {
            codeunits.push(arguments[i]);
        } else if (arguments[i]<0x110000) {
            c-= 0x10000;
            codeunits.push((c>>10 & 0x3FF) + 0xD800);
            codeunits.push((c&0x3FF) + 0xDC00);
        }
    }
    return String.fromCharCode.apply(String, codeunits);
};

function decodeCharacterReferences(s) {
    return s.replace(/&#(\d+);/g, function(_, n) {;
        return String.fromCharCodePoint(parseInt(n, 10));
    }).replace(/&#x([0-9a-f]+);/gi, function(_, n) {
        return String.fromCharCodePoint(parseInt(n, 16));
    });
};

alert(decodeCharacterReferences('Hello 𐤀 mum 𐤀!'));
泡沫很甜 2024-10-12 02:48:44

如果您不想定义所有实体,您可以让浏览器为您做这件事 - 该位创建一个空的 p 元素,写入 html 并返回它生成的文本。
p 元素永远不会添加到文档中。

function translateEntities(string){
    var text, p=document.createElement('p');
    p.innerHTML=string;
    text= p.innerText || p.textContent;
    p.innerHTML='';
    return text;
}
var s= 'foo's bar';
translateEntities(s);

/*  returned value: (String)
foo's bar
*/

If you don't want to define all the entities you can let the browser do it for you- this bit creates an empty p element, writes the html and returns the text it produces.
The p element is never added to the document.

function translateEntities(string){
    var text, p=document.createElement('p');
    p.innerHTML=string;
    text= p.innerText || p.textContent;
    p.innerHTML='';
    return text;
}
var s= 'foo's bar';
translateEntities(s);

/*  returned value: (String)
foo's bar
*/
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文