从 p 标签中删除所有类

发布于 2024-07-28 00:11:53 字数 163 浏览 9 评论 0原文

我只是想知道是否有人知道从 php 中的字符串中删除所有类的函数。基本上我只想要

<p>

标签而不是

<p class="...">

如果这有意义:)

原文

I was just wondering if any one knew a function to remove ALL classes from a string in php.. Basically I only want

<p>

tags rather than

<p class="...">

If that makes sense :)

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无远思近则忧 2024-08-04 00:11:53

一个相当幼稚的正则表达式可能会为你工作，

$html=preg_replace('/class=".*?"/', '', $html);

我说幼稚是因为如果你的正文文本由于某种原因碰巧包含 class="something" ，它就会失败！如果需要的话，可以通过在尖括号标签内查找 class="" 来使其更加健壮。

A fairly naive regex will probably work for you

$html=preg_replace('/class=".*?"/', '', $html);

I say naive because it would fail if your body text happened to contain class="something" for some reason!. It could be made a little more robust by looking for class="" inside angled bracketted tags if need be.

回复收藏 0 原文

过气美图社 2024-08-04 00:11:53

也许这对您的需求来说有点过大，但是，要解析/验证/清理 HTML 数据，我知道的最好的工具是 HTML Purifier< /a>

它允许您定义哪些标签和哪些属性是可以的；和/或哪些不是；它提供有效/干净的 (X)HTML 作为输出。

（使用正则表达式“解析” HTML 一开始似乎没问题......然后，当你想添加特定的东西时，它通常会变得难以理解/维护）

回复收藏 0 原文

恰似旧人归 2024-08-04 00:11:53

$html = "<p id='fine' class='r3e1 b4d 1' style='widows: inherit;'>";    
preg_replace('/\sclass=[\'|"][^\'"]+[\'|"]/', '', $html);

如果您要针对 Microsoft Office 导出的 HTML 进行测试，您需要的不仅仅是类删除，而是 HTML Tidy 有一个配置标志仅适用于 Microsoft Office！

否则，这应该比其他一些答案更安全，因为它们有点贪婪，并且您不知道将使用哪种封装（'或"）。

< strong>注意：该模式实际上是 /\sclass=['|"][^'"]+['|"]/ 但是，因为都有引号 (< code>") 撇号 (')，我必须转义所有出现的一个 (\') 来封装该模式。

$html = "<p id='fine' class='r3e1 b4d 1' style='widows: inherit;'>";    
preg_replace('/\sclass=[\'|"][^\'"]+[\'|"]/', '', $html);

If you are being put to the test against Microsoft Office-exported HTML you'll need more than class-removal but HTML Tidy has a config flag just for Microsoft Office!

Otherwise, this should be safer than some other answers given they are a little greedy and you don't know what sort of encapsulation will be used (' or ").

Note: The pattern is actually /\sclass=['|"][^'"]+['|"]/ but, as there are both inverted commas (") apostrophes ('), I had to escape all occurrences of one (\') to encapsulate the pattern.

回复收藏 0 原文

九厘米的零° 2024-08-04 00:11:53

您将 HTML 加载到 DOMDocument 类中，然后将其加载到 simpleXML 中。然后对所有 p 元素执行 XPath 查询，然后循环遍历它们。在每个循环中，您将类属性重命名为“killmeplease”之类的名称。

完成后，将 simpleXML 重新输出为 XML（顺便说一句，这可能会更改 HTML，但通常只是为了更好），您将得到一个 HTML 字符串，其中每个 p 都有一个“killmeplease”类。使用 str_replace 来实际删除它们。

示例：

$html_file = "somehtmlfile.html";

$dom = new DOMDocument();
$dom->loadHTMLFile($html_file);

$xml = simplexml_import_dom($dom);

$paragraphs = $xml->xpath("//p");

foreach($paragraphs as $paragraph) {
     $paragraph['class'] = "killmeplease";
 }

 $new_html = $xml->asXML();

 $better_html = str_replace('class="killmeplease"', "", $new_html);

或者，如果您想让代码更简单但与 preg_replace 纠缠在一起，您可以这样做：

$html_file = "somehtmlfile.html";
$html_string = file_get_contents($html_file);

$bad_p_class = "/(<p ).*(class=.*)(\s.*>)/";

$better_html = preg_replace($bad_p_class, '$1 $3', $html_string);

正则表达式的棘手部分是它们往往是贪婪的，如果您的 p 元素标记有，尝试将其关闭可能会导致问题其中有一个换行符。但请尝试其中任何一个。

You load the HTML into a DOMDocument class, load that into simpleXML. Then you do an XPath query for all p elements and then loop through them. On each loop, you rename the class attribute to something like "killmeplease".

When that's done, reoutput the simpleXML as XML (which, by the way, may change the HTML, but usually only for the better), and you will have a HTML string where each p has a class of "killmeplease". Use str_replace to actually remove them.

Example:

$html_file = "somehtmlfile.html";

$dom = new DOMDocument();
$dom->loadHTMLFile($html_file);

$xml = simplexml_import_dom($dom);

$paragraphs = $xml->xpath("//p");

foreach($paragraphs as $paragraph) {
     $paragraph['class'] = "killmeplease";
 }

 $new_html = $xml->asXML();

 $better_html = str_replace('class="killmeplease"', "", $new_html);

Or, if you want to make the code more simple but tangle with preg_replace, you could go with:

$html_file = "somehtmlfile.html";
$html_string = file_get_contents($html_file);

$bad_p_class = "/(<p ).*(class=.*)(\s.*>)/";

$better_html = preg_replace($bad_p_class, '$1 $3', $html_string);

The tricky part with regular expressions is they tend to be greedy and trying to turn that off can cause problems if your p element tag has a line break in it. But give either of those a shot.

回复收藏 0 原文

无需解释 2024-08-04 00:11:53

HTML Purifier

HTML 对于正则表达式来说可能非常棘手，因为代码的编写或格式化方式有数百种。

HTML purifier 是一个成熟的开源库，用于清理 HTML。我建议在这种情况下使用它。

在 HTML 净化器的配置文档中，您可以指定应允许的类和属性，以及净化器在找到它们时应执行的操作。

http://htmlpurifier.org/docs/

回复收藏 0 原文

别挽留 2024-08-04 00:11:53

我会在 jQuery 上做类似的事情。将其放在页眉中：

$(document).ready(function(){
$(p).each(function(){
     $(this).removeAttr("class");
     //or  $(this).removeclass("className");
})

});

I would do something like this on jQuery. Place this in your page header:

$(document).ready(function(){
$(p).each(function(){
     $(this).removeAttr("class");
     //or  $(this).removeclass("className");
})

});

回复收藏 0 原文

~没有更多了~

关于作者

病女

暂无简介

文章

25 人气

关注发私信

友情链接

文江博客

从 p 标签中删除所有类

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

HTML Purifier

HTML Purifier

关于作者

相关话题

热门标签

推荐作者

忆悲凉

hgfg1645

qq_qLPLYi

戏舞

殊姿

﹂绝世的画

友情链接

从 p 标签中删除所有类

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（6）

HTML Purifier

HTML Purifier

关于作者

相关话题

热门标签

推荐作者

忆悲凉

hgfg1645

qq_qLPLYi

戏舞

殊姿

﹂绝世的画

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。