删除 之外的所有空格、换行符和制表符。堵塞
好的,所以我当前正在运行此脚本,以从最终的 HTML 输出中删除所有多余的空格、换行符和制表符:
$html = preg_replace(array("/\t/", "/\s{2,}/", "/\n/"), array("", " ", " "), $html);
但是,我的代码块遇到了问题,这些代码块与此处的代码块类似,是因为这个而突出。它将整个代码放在一行上,所以我想知道如何运行上面的代码,但仅适用于未包含在 标签中的文本,这是唯一的我需要这个元素。如果它是代码块内的文本,我知道如何执行此操作,但我对如何处理代码块外的文本有点迷失。
我想到的唯一合理的事情是删除所有代码块,然后进行替换并将代码块放回原处。
Ok, so I'm currently running this script to remove all the excess spaces, linebreaks, and tabs from my final HTML output:
$html = preg_replace(array("/\t/", "/\s{2,}/", "/\n/"), array("", " ", " "), $html);
However, I'm having a problem with my code blocks, which are similar to the code blocks here, being outdented because of this. It's putting the entire code onto one line, so I was wondering how I could run the code above but only for text that is not enclosed in <code></code>
tags which is the only element I need this for. I know how to do this if it were the text inside a code block but I'm a bit lost on how to approach it for text outside of code blocks.
The only reasonable thing I've come up with is removing all the code blocks then doing the replacement and putting the code blocks back in.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我会避免单独使用正则表达式。我确信有人会发布一个半生不熟的正则表达式,它要么 1) 无法维护,要么 2) 有错误(或两者兼而有之),但实际上,您会想要 lex 将您的输入放入标记中,并根据这些标记构造的上下文将其输出。
我有一个工具,用于从现有 HTML 创建 HTML 实体。例如,只要在更改该实体有意义的上下文中(例如,不在
块,不在 URL 中等)。
我刚刚将其从旧的、布满灰尘的 Subversion 存储库导入到 Github,此处:https://github.com/scoates /lexentity
下面是使用 lexentity 的示例: http://files.seancoates.com/lexentity/ (我们将其用于 http://phpadvent.org/ 上的文章
)在我看来,这样的系统将创建比纯粹基于正则表达式的系统更加灵活和强大的解决方案。您必须根据自己的目的修改灵活性,但可以根据需要随意借用。
S
I would avoid using regular expressions alone for this. I'm sure someone will post a half-baked, regex that will be either 1) unmaintainable or 2) buggy (or both), but realistically, you'll want to lex your input into tokens and output it according to the context those tokens construct.
I have a tool that I use to create HTML entities from existing HTML. For example, it turns
I'm
intoI’m
as long as it's in a context where changing that entity would make sense (for example, not in a <code> block, not in a URL, etc).I've just imported this from my old, dusty Subversion repository to Github, here: https://github.com/scoates/lexentity
Here's an example of lexentity in use: http://files.seancoates.com/lexentity/ (we use it for the articles at http://phpadvent.org/)
All of this to say that a system like this will create a much more flexible and robust solution than a pure regular expression-based system, in my opinion. You'll have to modify lexentity for your purposes, but feel free to borrow as much or as little as you need.
S