从 p 标签中删除所有类

发布于 2024-07-28 00:11:53 字数 163 浏览 9 评论 0原文

我只是想知道是否有人知道从 php 中的字符串中删除所有类的函数。基本上我只想要

<p> 

标签而不是

<p class="...">

如果这有意义:)

I was just wondering if any one knew a function to remove ALL classes from a string in php.. Basically I only want

<p> 

tags rather than

<p class="...">

If that makes sense :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

无远思近则忧 2024-08-04 00:11:53

一个相当幼稚的正则表达式可能会为你工作,

$html=preg_replace('/class=".*?"/', '', $html);

我说幼稚是因为如果你的正文文本由于某种原因碰巧包含 class="something" ,它就会失败! 如果需要的话,可以通过在尖括号标签内查找 class="" 来使其更加健壮。

A fairly naive regex will probably work for you

$html=preg_replace('/class=".*?"/', '', $html);

I say naive because it would fail if your body text happened to contain class="something" for some reason!. It could be made a little more robust by looking for class="" inside angled bracketted tags if need be.

过气美图社 2024-08-04 00:11:53

也许这对您的需求来说有点过大,但是,要解析/验证/清理 HTML 数据,我知道的最好的工具是 HTML Purifier< /a>

它允许您定义哪些标签和哪些属性是可以的; 和/或哪些不是; 它提供有效/干净的 (X)HTML 作为输出。

(使用正则表达式“解析” HTML 一开始似乎没问题......然后,当你想添加特定的东西时,它通常会变得难以理解/维护)

Maybe it's a bit overkill for your need, but, to parse/validate/clean HTML data, the best tool I know is HTML Purifier

It allows you to define which tags, and which attributes, are OK ; and/or which ones are not ; and it gives valid/clean (X)HTML as output.

(Using regexes to "parse" HTML seems OK at the beginning... And then, when you want to add specific stuff, it generally becomes hell to understand/maintain)

恰似旧人归 2024-08-04 00:11:53
$html = "<p id='fine' class='r3e1 b4d 1' style='widows: inherit;'>";    
preg_replace('/\sclass=[\'|"][^\'"]+[\'|"]/', '', $html);

如果您要针对 Microsoft Office 导出的 HTML 进行测试,您需要的不仅仅是类删除,而是 HTML Tidy 有一个 配置标志 仅适用于 Microsoft Office!

否则,这应该比其他一些答案更安全,因为它们有点贪婪,并且您不知道将使用哪种封装('")。

< strong>注意:该模式实际上是 /\sclass=['|"][^'"]+['|"]/ 但是,因为都有引号 (< code>") 撇号 ('),我必须转义所有出现的一个 (\') 来封装该模式。

$html = "<p id='fine' class='r3e1 b4d 1' style='widows: inherit;'>";    
preg_replace('/\sclass=[\'|"][^\'"]+[\'|"]/', '', $html);

If you are being put to the test against Microsoft Office-exported HTML you'll need more than class-removal but HTML Tidy has a config flag just for Microsoft Office!

Otherwise, this should be safer than some other answers given they are a little greedy and you don't know what sort of encapsulation will be used (' or ").

Note: The pattern is actually /\sclass=['|"][^'"]+['|"]/ but, as there are both inverted commas (") apostrophes ('), I had to escape all occurrences of one (\') to encapsulate the pattern.

九厘米的零° 2024-08-04 00:11:53

您将 HTML 加载到 DOMDocument 类中,然后将其加载到 simpleXML 中。 然后对所有 p 元素执行 XPath 查询,然后循环遍历它们。 在每个循环中,您将类属性重命名为“killmeplease”之类的名称。

完成后,将 simpleXML 重新输出为 XML(顺便说一句,这可能会更改 HTML,但通常只是为了更好),您将得到一个 HTML 字符串,其中每个 p 都有一个“killmeplease”类。 使用 str_replace 来实际删除它们。

示例:

$html_file = "somehtmlfile.html";

$dom = new DOMDocument();
$dom->loadHTMLFile($html_file);

$xml = simplexml_import_dom($dom);

$paragraphs = $xml->xpath("//p");

foreach($paragraphs as $paragraph) {
     $paragraph['class'] = "killmeplease";
 }

 $new_html = $xml->asXML();

 $better_html = str_replace('class="killmeplease"', "", $new_html);

或者,如果您想让代码更简单但与 preg_replace 纠缠在一起,您可以这样做:

$html_file = "somehtmlfile.html";
$html_string = file_get_contents($html_file);

$bad_p_class = "/(<p ).*(class=.*)(\s.*>)/";

$better_html = preg_replace($bad_p_class, '$1 $3', $html_string);

正则表达式的棘手部分是它们往往是贪婪的,如果您的 p 元素标记有,尝试将其关闭可能会导致问题其中有一个换行符。 但请尝试其中任何一个。

You load the HTML into a DOMDocument class, load that into simpleXML. Then you do an XPath query for all p elements and then loop through them. On each loop, you rename the class attribute to something like "killmeplease".

When that's done, reoutput the simpleXML as XML (which, by the way, may change the HTML, but usually only for the better), and you will have a HTML string where each p has a class of "killmeplease". Use str_replace to actually remove them.

Example:

$html_file = "somehtmlfile.html";

$dom = new DOMDocument();
$dom->loadHTMLFile($html_file);

$xml = simplexml_import_dom($dom);

$paragraphs = $xml->xpath("//p");

foreach($paragraphs as $paragraph) {
     $paragraph['class'] = "killmeplease";
 }

 $new_html = $xml->asXML();

 $better_html = str_replace('class="killmeplease"', "", $new_html);

Or, if you want to make the code more simple but tangle with preg_replace, you could go with:

$html_file = "somehtmlfile.html";
$html_string = file_get_contents($html_file);

$bad_p_class = "/(<p ).*(class=.*)(\s.*>)/";

$better_html = preg_replace($bad_p_class, '$1 $3', $html_string);

The tricky part with regular expressions is they tend to be greedy and trying to turn that off can cause problems if your p element tag has a line break in it. But give either of those a shot.

无需解释 2024-08-04 00:11:53

HTML Purifier

HTML 对于正则表达式来说可能非常棘手,因为代码的编写或格式化方式有数百种。

HTML purifier 是一个成熟的开源库,用于清理 HTML。 我建议在这种情况下使用它。

在 HTML 净化器的配置文档中,您可以指定应允许的类和属性,以及净化器在找到它们时应执行的操作。

http://htmlpurifier.org/docs/

HTML Purifier

HTML can be very tricky to regex because of the hundreds of different ways code can be written or formatted.

The HTML purifier is a mature open source library for cleaning up HTML. I would advise its usage in this case.

In HTML purifier's configuration documentation, you can specify classes and attributes which should be allowed and what the purifier should do if it finds them.

http://htmlpurifier.org/docs/

别挽留 2024-08-04 00:11:53

我会在 jQuery 上做类似的事情。 将其放在页眉中:

$(document).ready(function(){
$(p).each(function(){
     $(this).removeAttr("class");
     //or  $(this).removeclass("className");
})

});

I would do something like this on jQuery. Place this in your page header:

$(document).ready(function(){
$(p).each(function(){
     $(this).removeAttr("class");
     //or  $(this).removeclass("className");
})

});

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文