如何获取网站上 javascript/ajax 加载的 div 的内容?
我有一个 PHP 脚本,它使用 CURL 和 simple_html_dom PHP 库从另一个网站加载页面内容。这很好用。如果我回显返回的 HTML,我可以看到那里的 div 内容。
但是,如果我尝试仅使用 simple_html_dom 选择该 div,则该 div 始终返回空。起初我不知道为什么。现在我知道这是因为它的内容显然是用 javascript/ajax 填充的。
我如何获取网站的内容,然后在 javascript 填充正确的内容后能够选择 div 内容?
有可能吗? 谢谢!
I have a PHP-script that loads page-content from another website by using CURL and simple_html_dom PHP library. This works great. If I echo out the HTML returned I can see the div-content there.
However, if I try to select only that div with the simple_html_dom, the div always returned empty. At first I didn't know why. Now I know that it's because its content apparently is populated with javascript/ajax.
How would I get the content of the site and then be able to select the div-content AFTER the javascript has populated it with the correct content?
Is it even possible?
Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
是的,如果您只对 ajax 返回的特定 html 感兴趣,那么这是小菜一碟。
Yes its piece of cake if you are interested only in that particular html which is returned by ajax.
对于这种屏幕抓取,您可以尝试 phpQuery 或 史努比。
phpQuery 有一个网络浏览器插件,并且 scoopy 声称可以模拟一个
For this kind of screen scraping you could try phpQuery or Snoopy.
phpQuery has a web browser plugin and scoopy claims to simulate one
您始终可以绑定到 xhr 将数据返回到浏览器时触发的事件并在那里执行操作。
you can always bind to the event that is fired when the xhr returns data to the browser and do your operations there.
是的,这是可能的。
您需要执行以下操作:
。假设您想获取 http://www.domain.com/page.html 的内容此 page.html 使用 Ajax 检索一些其他数据,例如 $("#div").load("http://www.domain.com/ajax/data.php?time=48484&c=487387")。
您要做的就是首先向 page.html 发出 CURL 请求,然后使用 preg_match() PHP 函数或任何其他语言中的任何等效函数获取 Ajax 调用的完整 URL。之后,创建另一个对该 URL 的 CURL 请求 - http:// www.domain.com/ajax/data.php?time=48484&c=487387 - 并获取其内容。
你都准备好了!
Yes, it is possible.
What you need to do is the following:
ex. Say you want to get the content of http://www.domain.com/page.html and this page.html retrieves some other data using Ajax, say $("#div").load("http://www.domain.com/ajax/data.php?time=48484&c=487387").
What you will do is to make a CURL request to page.html first, and get the full URL of the Ajax call using preg_match() PHP function or any equivalent function in any other language. After that, create another CURL request to that URL - http://www.domain.com/ajax/data.php?time=48484&c=487387 - and get its content.
You're all set!
不幸的是,Javascript 在浏览器中的客户端运行,因此除非页面加载到 Web 浏览器中,否则没有简单的方法可以做到这一点。
我能想到的唯一方法是让浏览器在服务器后台运行,自动重新加载生成的页面并将其保存在可供 PHP 脚本获取的文件中。
嗯……我不知道有谁实施过这样的想法。
最好尝试获取正在填充 div 的 URL。例如,如果 div 内容是通过 AJAX 生成的,那么如果您使用 cURL 获取数据源 URL,那么数据也将可供您使用。
Unfortunately Javascript is run client-side, in a browser, so unless the page is loaded in a web browser there is no simple way to do it.
The only way I can think of, is having a browser running in a server’s background, reloading and saving the generated page automatically in a file which will be available for a PHP script to fetch.
Well... I don’t know about anyone who has implemented such an idea.
Better try to get the URL where the div is being populated from. If the div contents are generated through AJAX for example, maybe if you fetch the data-origin URL with cURL, the data will be available for you as well.