PHP:正则表达式和特定标签剥离

发布于 2024-10-06 21:17:52 字数 1549 浏览 3 评论 0原文

我正在寻找一种方法来删除所有锚标记,我也希望删除从 ',' 到
的所有内容,但
应保留。

脏输入:

Abstractor HLTH<br>
Account Representative, Major <a href="#P">P</a><br>
Accountant <a href="#NP">NP</a>, <a href="#M">M</a>, <a href="#REA">REA</a>, <a href="#SKI">SKI</a><br>

应该是这样的:

Abstractor HLTH<br>
Account Representative<br>
Accountant <br>

请帮忙!

-- 以下是脏文本:

$str = sprintf('

Abstractor HLTH<br>
Account Representative, Major <a href="#P">P</a><br>

Accountant <a href="#NP">NP</a>, <a href="#M">M</a>, <a href="#REA">REA</a>, <a href="#SKI">SKI</a><br>
Accountant, Cost I & II (See Cost Accountant I, II) <a href="#FR">FR</a><br>
Accountant, General <a href="#G">G</a><br>
Accountant, General I (Junior) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a><br>

Accountant, General II (Intermediate) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a>, <a href="#HA">HA</a> <br>
Accountant, General III (Senior) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a> <br>

');

I am looking for a way to strip all anchor tags also i want everything from ',' to <br> to be removed but <br> should remain thr.

dirty input:

Abstractor HLTH<br>
Account Representative, Major <a href="#P">P</a><br>
Accountant <a href="#NP">NP</a>, <a href="#M">M</a>, <a href="#REA">REA</a>, <a href="#SKI">SKI</a><br>

it should be like:

Abstractor HLTH<br>
Account Representative<br>
Accountant <br>

please help!

--
following is the dirty text:

$str = sprintf('

Abstractor HLTH<br>
Account Representative, Major <a href="#P">P</a><br>

Accountant <a href="#NP">NP</a>, <a href="#M">M</a>, <a href="#REA">REA</a>, <a href="#SKI">SKI</a><br>
Accountant, Cost I & II (See Cost Accountant I, II) <a href="#FR">FR</a><br>
Accountant, General <a href="#G">G</a><br>
Accountant, General I (Junior) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a><br>

Accountant, General II (Intermediate) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a>, <a href="#HA">HA</a> <br>
Accountant, General III (Senior) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a> <br>

');

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

末骤雨初歇 2024-10-13 21:17:52

通常,使用正则表达式处理 HTML 字符串是不好的,但假设所有链接都是这样形成的,那么使用 preg_replace() 不应该造成问题。试试这个

// Removes all links
$str = preg_replace("/<a href=\"#([A-Z\\/]+?)\">\\1<\\/a>(?:, )?/i", "", $str);

// Strip the comma and everything from the comma
// to the next <br> in the line
$str = preg_replace("/,(.*?)(?=<br>)/i", "", $str);

对于建议 strip_tags() 的其他答案:它不会删除它所删除的一对 HTML 标签所包含的文本。例如

Accountant <a href="#NP">NP</a>

Accountant NP

这并不完全是OP想要的。

Normally it's bad to use regex to deal with HTML strings, but assuming all your links are formed like that then using preg_replace() shouldn't pose problems. Try this

// Removes all links
$str = preg_replace("/<a href=\"#([A-Z\\/]+?)\">\\1<\\/a>(?:, )?/i", "", $str);

// Strip the comma and everything from the comma
// to the next <br> in the line
$str = preg_replace("/,(.*?)(?=<br>)/i", "", $str);

To the other answers suggesting strip_tags(): it won't erase text contained by a pair of HTML tags that it strips. For example

Accountant <a href="#NP">NP</a>

becomes

Accountant NP

which isn't quite what the OP wants.

烂柯人 2024-10-13 21:17:52

我强烈建议使用 HTML Purifier http://htmlpurifier.org/

它相当简单成立以来,拥有极好的声誉和极其强大的实力。

I would strongly advise using HTML Purifier http://htmlpurifier.org/

It is fairly simple to set up, has an excellent reputation and extremely powerful.

近箐 2024-10-13 21:17:52

strip-tags() 用于标签,str_replace()strpos() 为另一件事。

strip-tags() for the tags, str_replace() with strpos() for the other thing.

汐鸠 2024-10-13 21:17:52

HTML Purifier 是您的朋友。它具有灵活的选项,并且非常复杂。使用 str_replace 或正则表达式执行此类操作是错误

HTML Purifier is your friend. It has flexible options, and is very sophisticated. Doing such things with str_replace or regular expressions is wrong.

贪了杯 2024-10-13 21:17:52

strip_tags 有第二个参数,允许您提供一串允许的标签。它将删除除您提供的标签之外的所有标签:

$string = strip_tags($string, '<br>'); // will leave <br>-tags in place

strip_tags has a second argument which allows you to supply a string of allowable tags. It will strip all tags except the ones you supply:

$string = strip_tags($string, '<br>'); // will leave <br>-tags in place
不羁少年 2024-10-13 21:17:52
$clean_string = strip_tags($original_string, '<br>');

这将剥离除 br 标签之外的所有内容。

正如 KingCrunch 所说,剩下的用 str_replacestrpos 来完成。

$clean_string = strip_tags($original_string, '<br>');

This will strip everything apart from br tags.

As KingCrunch says, str_replace and strpos for the rest.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文