PHP preg_match 从 HTML 页面查找和定位动态 URL
我需要有关正则表达式的帮助,该正则表达式将根据插入到 HTML 页面的方式找到不同格式的链接。
我能够将页面读入 PHP。只是无法使用正确的正则表达式来查找 URL 并将它们隔离。
我有几个关于如何插入它们的例子。有时它们是纯文本链接,有些链接围绕它们。甚至在奇怪的情况下,不属于链接的文本会被无间距地插入。
文章 ID 和文章密钥永远不会相同。然而,文章键始终以数字结尾。如果可能的话,我肯定可以使用帮助。谢谢
Here are a few examples.
http://www.example.com/ArticleDetails.aspx?ArticleID=3D10045411&AidKey=3D-2086622941
http://example.com/ArticleDetails.aspx?ArticleID=10919199&AidKey=1956996566
<a href="http://www.example.com/ArticleDetails.aspx?ArticleID=10773616&AidKey=1998267392">http://www.example.com/ArticleDetails.aspx?ArticleID=10773616&AidKey=1998267392</a>
<a href="http://www.example.com/ArticleDetails.aspx?ArticleID=10773616&AidKey=1998267392">This is a link description</a>
http://example.com/ArticleDetails.aspx?ArticleID=10975137&AidKey=701321736this is not part of the url.
最后我只是在寻找网址。
http://example.com/ArticleDetails.aspx?ArticleID=10975137&AidKey=701321736
I need help with a REGEX that will find a link that comes in different formats based on how it got inserted to the HTML page.
I am capable of reading the pages into PHP. Just not able to the right REGEX that will find URL and insulate them.
I have a few examples on how they are getting inserted. Where sometimes they are plain text links, some of wrapped around them. There is even the odd occasion where text that is not part of the link gets inserted without spacing.
Both Article ID and Article Key are never the same. Article Key however always ends with a numeric. If this is possible I sure could use the help. Thanks
Here are a few examples.
http://www.example.com/ArticleDetails.aspx?ArticleID=3D10045411&AidKey=3D-2086622941
http://example.com/ArticleDetails.aspx?ArticleID=10919199&AidKey=1956996566
<a href="http://www.example.com/ArticleDetails.aspx?ArticleID=10773616&AidKey=1998267392">http://www.example.com/ArticleDetails.aspx?ArticleID=10773616&AidKey=1998267392</a>
<a href="http://www.example.com/ArticleDetails.aspx?ArticleID=10773616&AidKey=1998267392">This is a link description</a>
http://example.com/ArticleDetails.aspx?ArticleID=10975137&AidKey=701321736this is not part of the url.
In the end I am just looking for the URL.
http://example.com/ArticleDetails.aspx?ArticleID=10975137&AidKey=701321736
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
不要使用正则表达式!使用 XML 解析器...
所以
$regexToMatchUrls
将是一个正则表达式 jsut 来匹配您正在寻找的 URL...而不是任何更简单的 html - 然后您可以在匹配发生时采取行动。DO NOT USE A REGEX! Use a XML parser...
So
$regexToMatchUrls
would be a regex jsut to match the URLs your are looking for... not any of the html which is much simpler - then you can take action when a match occurs.这个正则表达式对我有用:
更新:
我在正则表达式的末尾添加了一个
\d
。要在 PHP 中使用它,您需要
/.../msi
PHP 示例:http:// ideone.com/N0TKM
This regex work for me:
UPDATE:
I added a
\d
at the end of the regex.To use it in PHP you need
/.../msi
PHP Example in action: http://ideone.com/N0TKM