我有一堆字符串,每个字符串都包含一个锚标记和网址。
字符串前。
here is a link <a href="http://www.google.com">http://www.google.com</a>. enjoy!
我想解析出锚标记以及之间的所有内容。
结果例如
here is a link. enjoy!
然而,href= 部分中的 url 并不总是与链接文本匹配(有时有缩短的 url,有时只是描述性文本)。
我很难弄清楚如何使用正则表达式或 php 函数来做到这一点。我如何从字符串中解析整个锚标记/链接?
谢谢!
I have a bunch of strings, each containing an anchor tag and url.
string ex.
here is a link <a href="http://www.google.com">http://www.google.com</a>. enjoy!
i want to parse out the anchor tags and everything in between.
result ex.
here is a link. enjoy!
the urls in the href= portion don't always match the link text however (sometimes there are shortened urls,sometimes just descriptive text).
i'm having an extremely difficult time figuring out how to do this with either regular expressions or php functions. how can i parse an entire anchor tag/link from a string?
thanks!
发布评论
评论(5)
看看你的结果示例,看起来你只是删除了标签/内容 - 你想保留你删除的内容吗?如果没有,您可能正在寻找
strip_tags()
。Looking at your result example, it seems like you're just removing the tags/content - did you want to keep what you stripped out or no? If not you might be looking for
strip_tags()
.您不应该使用 正则表达式来解析 html< /a> 并使用 html 解析器。
但是,如果您应该使用正则表达式,并且您的锚标记内部内容保证不含像
这样的 html,并且每个字符串保证只包含一个锚标记,如示例情况所示,然后 - 只有那时 - 您可以使用类似的内容:
将
/^(.+)(.+)$/
替换为$1$2< /代码>
You shouldn't use regex to parse html and use an html parser instead.
But if you should use regex, and your anchor tags inner contents are guaranteed to be free of html like
</a>
, and each string is guaranteed to contain only one anchor tag as in the example case, then - only then - you can use something like:Replacing
/^(.+)<a.+<\/a>(.+)$/
with$1$2
由于您的问题似乎非常具体,我认为应该这样做:
Since your problem seems to be very specific, I think this should do it:
只需使用普通的 PHP 字符串函数即可。
输出
just use your normal PHP string functions.
output