PHP:正则表达式和特定标签剥离
我正在寻找一种方法来删除所有锚标记,我也希望删除从 ',' 到
的所有内容,但
应保留。
脏输入:
Abstractor HLTH<br>
Account Representative, Major <a href="#P">P</a><br>
Accountant <a href="#NP">NP</a>, <a href="#M">M</a>, <a href="#REA">REA</a>, <a href="#SKI">SKI</a><br>
应该是这样的:
Abstractor HLTH<br>
Account Representative<br>
Accountant <br>
请帮忙!
-- 以下是脏文本:
$str = sprintf('
Abstractor HLTH<br>
Account Representative, Major <a href="#P">P</a><br>
Accountant <a href="#NP">NP</a>, <a href="#M">M</a>, <a href="#REA">REA</a>, <a href="#SKI">SKI</a><br>
Accountant, Cost I & II (See Cost Accountant I, II) <a href="#FR">FR</a><br>
Accountant, General <a href="#G">G</a><br>
Accountant, General I (Junior) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a><br>
Accountant, General II (Intermediate) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a>, <a href="#HA">HA</a> <br>
Accountant, General III (Senior) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a> <br>
');
I am looking for a way to strip all anchor tags also i want everything from ',' to <br>
to be removed but <br>
should remain thr.
dirty input:
Abstractor HLTH<br>
Account Representative, Major <a href="#P">P</a><br>
Accountant <a href="#NP">NP</a>, <a href="#M">M</a>, <a href="#REA">REA</a>, <a href="#SKI">SKI</a><br>
it should be like:
Abstractor HLTH<br>
Account Representative<br>
Accountant <br>
please help!
--
following is the dirty text:
$str = sprintf('
Abstractor HLTH<br>
Account Representative, Major <a href="#P">P</a><br>
Accountant <a href="#NP">NP</a>, <a href="#M">M</a>, <a href="#REA">REA</a>, <a href="#SKI">SKI</a><br>
Accountant, Cost I & II (See Cost Accountant I, II) <a href="#FR">FR</a><br>
Accountant, General <a href="#G">G</a><br>
Accountant, General I (Junior) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a><br>
Accountant, General II (Intermediate) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a>, <a href="#HA">HA</a> <br>
Accountant, General III (Senior) (See General Accountant) <a href="#FR">FR</a>, <a href="#O/G">O/G</a>, <a href="#W">W</a> <br>
');
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
通常,使用正则表达式处理 HTML 字符串是不好的,但假设所有链接都是这样形成的,那么使用
preg_replace()
不应该造成问题。试试这个对于建议
strip_tags()
的其他答案:它不会删除它所删除的一对 HTML 标签所包含的文本。例如,
这并不完全是OP想要的。
Normally it's bad to use regex to deal with HTML strings, but assuming all your links are formed like that then using
preg_replace()
shouldn't pose problems. Try thisTo the other answers suggesting
strip_tags()
: it won't erase text contained by a pair of HTML tags that it strips. For examplebecomes
which isn't quite what the OP wants.
我强烈建议使用 HTML Purifier http://htmlpurifier.org/
它相当简单成立以来,拥有极好的声誉和极其强大的实力。
I would strongly advise using HTML Purifier http://htmlpurifier.org/
It is fairly simple to set up, has an excellent reputation and extremely powerful.
strip-tags() 用于标签,str_replace() 与 strpos() 为另一件事。
strip-tags() for the tags, str_replace() with strpos() for the other thing.
HTML Purifier 是您的朋友。它具有灵活的选项,并且非常复杂。使用 str_replace 或正则表达式执行此类操作是错误。
HTML Purifier is your friend. It has flexible options, and is very sophisticated. Doing such things with str_replace or regular expressions is wrong.
strip_tags 有第二个参数,允许您提供一串允许的标签。它将删除除您提供的标签之外的所有标签:
strip_tags has a second argument which allows you to supply a string of allowable tags. It will strip all tags except the ones you supply:
这将剥离除 br 标签之外的所有内容。
正如 KingCrunch 所说,剩下的用
str_replace
和strpos
来完成。This will strip everything apart from br tags.
As KingCrunch says,
str_replace
andstrpos
for the rest.