使用 PHP 复制 javascript 向远程服务器发出请求/参数
我正在构建一个从各个站点获取 html 源的应用程序。 使用 xpath 或简单的 html dom,我可以很容易地解析这个 html 并将其哑化到数据库等。
不幸的是,这种方法不适用于某个特定站点。 这是因为该网站使用 JavaScript 加载其内容,因此其大部分内容在 html 源中不可见。
在 Google 上一遍又一遍地搜索这个问题,并在 Stackoverflow 上阅读了大量涉及该主题的帖子。我仍然不确定如何解决这个问题。
这是该网站用来显示其内容的代码的重要部分。
<script type="text/javascript" src="/jquery-1.3.2.min.js"></script>
<script>
var example = {
getServiceCall:function(url) {
{
var srtPos=url.indexOf('Filter');
var endPos=url.indexOf('/',srtPos);
var filter = $.getUrlVar("Filter");
var filterInServiceUrl=url.slice(srtPos,endPos).split(":");
url = (filter)
? url.slice(0,srtPos) + filter + url.slice(endPos,url.length)
: url.slice(0,srtPos) + filterInServiceUrl[1] + url.slice(endPos,url.length);
}
document.writeln('<scri'+'pt src="'+url+'" type="text/javascript"> </sc' + 'ript>');
},
};
$.extend({
getUrlVars: function(){
var hashes = window.location.href.slice(window.location.href.indexOf('?') + 1).split('&');
},
getUrlVar: function(name){
}
});
</script>
<div id="content">
<script language="javascript" type="text/javascript">
function doPerItem(html){ $("#content").html(html.toString()); }
example.getServiceCall('http://www.example.com/?callback=doPerItem');
</script>
</div>
使用 Google Chrome 中的 Inspect Element 我可以看到有一个文件包含我想要的 html 源代码。
如何使用 php 向远程服务发出相同的请求/参数,然后将响应保存到文件中?
然后我将能够像其他站点一样使用 xpath 或简单的 html dom 解析它。
非常感谢您的帮助。
I am building an application that grabs html source from various sites.
Using xpath or simple html dom, I can then quite easily parse this html and dumb it to a database etc.
Unfortunately this approach does not work for one particular site.
This is because the site loads its content with JavaScript and so most of its content is not visible in the html source.
Having googled this over and over and read loads of threads covering the subject here on Stackoverflow. I'm still not sure how to go about solving this problem.
Here is the important part of the code this site is using to display its content.
<script type="text/javascript" src="/jquery-1.3.2.min.js"></script>
<script>
var example = {
getServiceCall:function(url) {
{
var srtPos=url.indexOf('Filter');
var endPos=url.indexOf('/',srtPos);
var filter = $.getUrlVar("Filter");
var filterInServiceUrl=url.slice(srtPos,endPos).split(":");
url = (filter)
? url.slice(0,srtPos) + filter + url.slice(endPos,url.length)
: url.slice(0,srtPos) + filterInServiceUrl[1] + url.slice(endPos,url.length);
}
document.writeln('<scri'+'pt src="'+url+'" type="text/javascript"> </sc' + 'ript>');
},
};
$.extend({
getUrlVars: function(){
var hashes = window.location.href.slice(window.location.href.indexOf('?') + 1).split('&');
},
getUrlVar: function(name){
}
});
</script>
<div id="content">
<script language="javascript" type="text/javascript">
function doPerItem(html){ $("#content").html(html.toString()); }
example.getServiceCall('http://www.example.com/?callback=doPerItem');
</script>
</div>
Using Inspect Element in Google Chrome I can see that there is a file that contains html source that I want.
How can I use php to make the same request/arguments to the remote serve and then save the response to a file?
I will then be in a position to parse it with xpath or simple html dom just like the other sites.
Your help will much appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我不知道有任何基于 PHP 的远程访问工具(包括 cURL)可以解释 JavaScript。 Selenium(通常用于测试)可能会做到这一点,但 Selenium-RC 对 PHP 根本不起作用,并且 IDE 中存在错误。
您实际上无法使用 Ajax,因为它也无法解析 JavaScript(也许您可以使用
eval()
以某种方式解决它,这有其安全问题),并且 JSONP 仅在远程服务器故意提供时才起作用用于获取其数据的 API(您可以编写自己的代理,然后将数据作为 JSONP 提供,但这样您仍然会遇到解析 JavaScript 的问题)。你可以做什么(尽管它对你的网站有真正的安全风险):
不幸的是,您无法避免步骤 1,因为您无法侦听 iframe,除非它来自与您的域相同的域。
请注意,如果您访问的网站以某种方式制作 JavaScript,他们就可以访问您包含的 HTML,并执行诸如获取用户的 cookie 以便窃取密码、找出您的域或页面上显示的内容等操作。
可能有更好的解决方案,但我不知道。
I don't know of any PHP-based remote access tool (including cURL) which interprets JavaScript. Selenium (normally used for testing) might do this, but Selenium-RC did not work for me at all with PHP and had bugs in the IDE.
You cannot practically use Ajax because that doesn't resolve JavaScript either (maybe you can resolve it somehow with
eval()
which has its security concerns), and JSONP will only work if the remote server is deliberately offering an API for getting its data (you could write your own proxy and then give the data as JSONP but then you'd still have the problem of resolving JavaScript).What you could do (though it has real security risks for your site):
You can't avoid step 1 unfortunately because you can't listen in on an iframe unless it comes from the same domain as yours.
Note that if the site you are visiting crafts their JavaScript in a certain way, they could access your containing HTML, and do things like grab your user's cookies so as to steal passwords, find out your domain or what's showing on your page, etc.
There may be better solutions out there, but I'm not aware of any.