删除之间的 html 换行符标签
我有一个CMS系统,允许人们也使用HTML代码,但是在函数末尾提供了一个nl2br
,这使得:
<ul>
<li></li>
</ul>
变成这样:
<ul><br/>
<li></li><br/>
</ul>
现在我想删除这些< ;br/>
存在于
标记之间。我已经发现另一个问题几乎相同,但换行符。我已将其集成到我的 CMS 中,但对于一个客户端,所有内容都已填写,因此我必须在将 \n
替换为
的。
之后修复此问题。
其他问题将此作为正则表达式提供以匹配 \
:
中的 n
/(?<=<ul>|<\/li>)\s*?(?=<\/ul>|<li>)/is
我认为这样的事情:
/(?<=<ul>|<\/li>)(<br>|<br\/>|<br \/>)(?=<\/ul>|<li>)/is
可以解决问题,但事实并非如此。我缺少什么?
编辑
我对 DOMDocument 解决方案非常开放,如果有一种方法可以使用 xpath 查询换行符,这可能会解决我的问题。
I have a CMS system that allows people to also use HTML code, but a nl2br
is provided at the end of the function, which makes this:
<ul>
<li></li>
</ul>
into this:
<ul><br/>
<li></li><br/>
</ul>
Now I want to remove these <br/>
's that exist between <ul>
tags.
I already found another question which asks almost the same, but for newlines. I've integrated this inside my CMS but for one client all the content is already filled in so I have to fix this after the \n
's are replaced with <br/>
's.
The other question provides this as a regex to match \n
within <ul></ul>
:
/(?<=<ul>|<\/li>)\s*?(?=<\/ul>|<li>)/is
I'd think something like this:
/(?<=<ul>|<\/li>)(<br>|<br\/>|<br \/>)(?=<\/ul>|<li>)/is
Would do the trick, but it doesn't. What am I missing?
EDIT
I am very open for DOMDocument solutions, if there's a way to query linebreaks with xpath this would probably fix my problem.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
在您提供的示例中,
标记被一些空格包围(至少被换行符包围),因此需要在相应的正则表达式中反映出来。在许多情况下,正则表达式并不是解析 HTML 的最佳方法(我绝对同意上面/下面的评论),但对于某些特定目的来说,它们总是足够好的。
In the example you provided,
<br>
tags are surrounded by some white-space (at least by new line characters), so this needs to be reflected in the corresponding regular expression.In many cases regular expressions are NOT the best way for parsing HTML (I definitely agree with the comments above/below), but they are always good enough for some particular purposes.
SO 上有很多示例说明了为什么使用正则表达式解析 HTML 是一个坏主意,所以我不会在这里包含另一个示例。
相反,请考虑使用 HTML 解析器,例如 HTMLCleaner 或 HTML Agility Pack 来完成此任务。
There are plenty of examples on SO that demonstrate why parsing HTML with regular expressions is a bad idea, so I won't include another one here.
Instead, consider using an HTML parser such as HTMLCleaner or HTML Agility Pack to accomplish this task.