无法使用正则表达式捕获 html 标签!
我正在尝试找到一种方法来查找html 标签。
所以我尝试使用 preg_match_all 函数来查找 html 标签。
这是我使用的代码:
$code = "<div>This is a test</div>";
preg_match_all("/(<[^<>]+>)([^<>]+)(<[^<>]+>)/",
$code, $matches);
var_dump($matches);
当我使用此代码时,我尝试运行它..返回的页面
数组(4) { [0]=>;数组(1) { [0]=>;字符串(25)“ 这是一个测试 " } [1]=> 数组(1) { [0]=> 字符串(5) " " } [2]=> array(1) { [0]=> string(14) "这是一个测试" } [3]=> array(1) { [0]=> string(6) )” “ } }
正如您在数组中看到的那样..
和
你能帮我吗?请告诉我问题到底出在哪里。
谢谢
,
Possible Duplicate:
How to parse HTML with PHP?
crawling a html page using php?
Im trying to find a way to find the html tags.
So i tried to use preg_match_all function to find the html tags.
and here is the code what i used :
$code = "<div>This is a test</div>";
preg_match_all("/(<[^<>]+>)([^<>]+)(<[^<>]+>)/",
$code, $matches);
var_dump($matches);
when i used this code, and i try to run it.. the page returned
array(4) { [0]=> array(1) { [0]=> string(25) "
This is a test
" } [1]=> array(1) { [0]=> string(5) "
" } [2]=> array(1) { [0]=> string(14) "This is a test" } [3]=> array(1) { [0]=> string(6) "
" } }
as you see in the arrays.. the <div>
and </div>
didn't detected.
can you help me ? , and tell me where is the problem exactly.
Sorry for my english ..
Thanks,
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
请参阅:正则表达式匹配除 XHTML self 之外的开放标签-包含标签
正如Bobince“解释的”,你不应该使用正则表达式来解析HTML。
由于您使用的是 PHP,因此您可以查看
DOMDocument
允许您安全地解析 HTML。查看参考资料,尝试将DOMDocument
合并到您的应用中,如果仍有问题,请回答新问题或适当编辑此问题。Please see: RegEx match open tags except XHTML self-contained tags
As Bobince "explains", you shouldn't use regexes to parse HTML.
Since you're using PHP you can check out
DOMDocument
which allows you to safely parse HTML. Take a look at the reference material, attempt to incorporateDOMDocument
into your app and if you still have problems answer a new question or appropriately edit this one.