使用 JavaScript 正则表达式将数字 HTML 实体替换为其实际字符
我正在尝试使用 JavaScript &正则表达式将数字 HTML 实体替换为实际的 Unicode 字符,例如,
foo's bar
→
foo's bar
这就是我到目前为止所得到的:
"foo's bar".replace(/&#([^\s]*);/g, "$1"); // "foo39s bar"
剩下要做的就是用 String.fromCharCode($1)
替换数字,但我可以'似乎无法让它发挥作用。我该怎么做?
I'm trying to use JavaScript & regex to replace numerical HTML entities with their actual Unicode characters, e.g.
foo's bar
→
foo's bar
This is what I got so far:
"foo's bar".replace(/([^\s]*);/g, "$1"); // "foo39s bar"
All that's left to do is to replace the number with String.fromCharCode($1)
, but I can't seem to get it to work. How can I do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
在当前示例中,第一个参数 (x) 是“'”。 y 是 39。
First argument (x) is a "'" in current example. y is 39.
除了使用回调函数之外,您可能还需要考虑添加对十六进制字符引用 (
ሴ
) 的支持。此外,
fromCharCode
可能还不够。例如𐤀
是对腓尼基字符的有效引用,但因为它位于基本多语言平面之外,并且 JavaScript 的字符串模型基于 UTF-16 代码单元,而不是完整的字符代码点,fromCharCode(67840)
不起作用。您需要一个 UTF-16 编码器,例如:As well as using a callback function, you may want to consider adding support for hex character references (
ሴ
).Also,
fromCharCode
may not be enough. eg𐤀
is a valid reference to a Phoenician character, but because it is outside the Basic Multilingual Plane, and JavaScript's String model is based on UTF-16 code units, not complete character code points,fromCharCode(67840)
won't work. You'd need a UTF-16 encoder, for example:如果您不想定义所有实体,您可以让浏览器为您做这件事 - 该位创建一个空的 p 元素,写入 html 并返回它生成的文本。
p 元素永远不会添加到文档中。
If you don't want to define all the entities you can let the browser do it for you- this bit creates an empty p element, writes the html and returns the text it produces.
The p element is never added to the document.