Javascript 正则表达式解析 HTML 和自动换行?
我需要创建一些 Javascript,它可以从文本框中搜索输入的 HTML,并忽略所有标签,以按设定数字(例如 70)自动换行,并添加
标签。
我还需要找到所有 ascii,例如 ©
和 –
并将其计为 1 个空格,而不是 5 个或 4 个空格。
所以代码将采用:
<b>Hello</b> Here is some code that I would like to wrap. Lets pretend this goes on for over 70 spaces.
输出将是:
<b>Hello</b> Here is some code that I would like to wrap. Lets pretend <br>
this goes on for over 70 spaces.
这可能吗?我该如何开始呢?已经有这样的工具了吗?
顺便说一句,CSS 是不可能使用的。
I need to create a bit of Javascript that can search inputted HTML from a text box and ignore all the tags to automatically word wrap at a set number like say 70 and add a <br>
tag.
I also need to find all the ascii like ©
and
and count that as one space not 5 or 4 spaces.
So the code would take:
<b>Hello</b> Here is some code that I would like to wrap. Lets pretend this goes on for over 70 spaces.
Output would be:
<b>Hello</b> Here is some code that I would like to wrap. Lets pretend <br>
this goes on for over 70 spaces.
Is this possible? How would I begin? Is there already a tool for this?
By the way CSS is out of the question to use.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
虽然短语“正则表达式”和“解析 HTML”的组合通常会导致 整个宇宙崩溃,你的用例看起来足够简单,它可以工作,但你想保留的事实换行后的 HTML 格式使得处理以空格分隔的序列变得更加容易。这是您想要执行的操作的一个非常粗略的近似:
这会导致
注意 HTML 标记 (
b
,em
,strong
) 被保留,只是 Markdown 没有显示它们。基本上,输入字符串在每个空格处被分割成单词,这是幼稚的并且可能会引起麻烦,但它是一个开始。然后,在删除任何类似于 HTML 标签或实体的内容后,计算每个单词的长度。然后,迭代每个单词,保持我们所在列的运行记录就很简单了;一旦达到 70,我们将聚合的单词弹出到输出字符串中并重置。同样,它非常粗糙,但对于大多数基本的 HTML 来说应该足够了。
While the combination of the phrases "regular expression" and "parse HTML" usually causes entire universes to crumble, your use case seems simplistic enough that it could work, but the fact that you want to preserve HTML formatting after wrapping makes it much easier to just work on a space-delimited sequence. Here is a very rough approximation of what you'd like to do:
which results in
Note that the HTML tags (
b
,em
,strong
) are preserved, it's just that Markdown doesn't show them.Basically, the input string is split into words at each space, which is naïve and likely to cause trouble, but it's a start. Then, the length of each word is calculated after anything resembling an HTML tag or entity has been removed. Then it's a simple matter of iterating over each word, keeping a running tally of the column we're on; once we've struck 70, we pop the aggregated words into the output string and reset. Again, it's very rough, but it should suffice for most basic HTML.
该解决方案通过标记计数来“遍历”字符串标记,直至达到所需的行长度。正则表达式捕获四个不同标记之一:
请注意,我添加了一个行终止符标记,以防您的文本框已使用换行符格式化(带有可选的回车符)。下面是一个 JavaScript 函数,它使用
String.replace()
和一个匿名回调来遍历字符串,并在字符串
运行时对标记进行计数:以下是注释格式的正则表达式细分,以便您可以看到正在捕获的内容:
This solution "walks" the string token by token counting up to the desired line length. The regex captures one of four different tokens:
Note that I've added a line terminator token in case your textbox is already formatted with linefeed (with optional carriage returns). Here is a JavaScript function that walks the string using
String.replace()
and an anonymous callback counting tokens as it goes:function breakupHTML(text, len);
Here's a breakdown of the regex in commented format so you can see what is being captured:
不想释放 Cthulhu,我决定(与我的其他答案不同)为您的问题提供一个不尝试的答案使用正则表达式解析 HTML。相反,我转向了 jQuery 这一令人敬畏的力量,并使用它在客户端解析 HTML。
一个工作小提琴: http://jsfiddle.net/CKQ9f/6/
html:
jQuery :
注意递归 - 不能用正则表达式做到这一点!
Not wanting to unleash Cthulhu, I decided (unlike my fellow answers) to instead provide an answer to your problem that does not attempt to parse HTML with regular expressions. Instead, I turned to the awe-inspiring force for good that is jQuery, and used that to parse your HTML on the client side.
A working fiddle: http://jsfiddle.net/CKQ9f/6/
The html:
The jQuery:
Note the recursion - can't do that with a regex!