<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>
<script>
$(function () {
$("#googleFrame").load(function () { // bind to the load event, so we'll know that the embedded resources will all have finished rendering (including the images we're after)
// this will simply include the images from google on the current page
$("#rippedGoogleImages")
.html('') // remove the loading message
.append($(this).contents().find("img.goog-serverchart-image")); // pull the loaded images out of the frame
// if you just want to see the URLs of those images:
/* $(this).contents().find("img.goog-serverchart-image").each(function (){
console.log($(this).attr('src'));
});
*/
});
$("#googleFrame").attr("src", "googleProxy.cfm"); // trickiness that will become clear below
});
</script>
<iframe id="googleFrame" style="display:none"></iframe><!-- hidden iframe -->
<h1>Finance Images</h1>
<div id="rippedGoogleImages">
Loading Images From Google...
</div>
以及允许我们绕过跨域限制的关键,一个在服务器上执行对 google 的直接请求的文件:googleProxy.cfm:
<cfhttp url="http://www.google.com/finance?q=NASDAQ:SQNM&fstype=ii">
<cfoutput>
#cfhttp.filecontent#
</cfoutput>
<!--- The next line injects the necessary base href to allow the resources (js, css, images, etc...) to resolve correctly when served from this new location --->
<cfhtmlhead text="<base href='http://www.google.com/finance/'>">
所有这些都在没有令人讨厌的服务器端屏幕的情况下完成抓取或正则表达式。
I've tested this, and it does work. Hopefully my understanding of "work" is the same as yours. There are a few tricks involved, because the images you need to access are produced by javascript rendering within the browser, so server-side screen-scraping would not suffice (unless things got really complicated, and I'm not going there). Also, things are tricky because the content you need loaded is on a remote domain (www.google.com) and that means there are restrictions on how you can access that content within the browser (can't really do this via Ajax without proper Access-Control-Allow-Origin headers on the google server; can't access this by embedding a direct iframe for similar restrictions preventing cross-domain DOM manipulation). So, I had to work around both of those limitations. One note - I realize you didn't ask for jQuery, but it honestly makes life so much easier here that I went ahead and used it. So here's how I got it working:
index.cfm
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>
<script>
$(function () {
$("#googleFrame").load(function () { // bind to the load event, so we'll know that the embedded resources will all have finished rendering (including the images we're after)
// this will simply include the images from google on the current page
$("#rippedGoogleImages")
.html('') // remove the loading message
.append($(this).contents().find("img.goog-serverchart-image")); // pull the loaded images out of the frame
// if you just want to see the URLs of those images:
/* $(this).contents().find("img.goog-serverchart-image").each(function (){
console.log($(this).attr('src'));
});
*/
});
$("#googleFrame").attr("src", "googleProxy.cfm"); // trickiness that will become clear below
});
</script>
<iframe id="googleFrame" style="display:none"></iframe><!-- hidden iframe -->
<h1>Finance Images</h1>
<div id="rippedGoogleImages">
Loading Images From Google...
</div>
And the key that allows us to get around cross-domain restrictions, a file that performs the direct request to google on the server: googleProxy.cfm:
<cfhttp url="http://www.google.com/finance?q=NASDAQ:SQNM&fstype=ii">
<cfoutput>
#cfhttp.filecontent#
</cfoutput>
<!--- The next line injects the necessary base href to allow the resources (js, css, images, etc...) to resolve correctly when served from this new location --->
<cfhtmlhead text="<base href='http://www.google.com/finance/'>">
All accomplished without nasty server-side screen scraping or regular expressions.
发布评论
评论(1)
我已经测试过这个,它确实有效。希望我对“工作”的理解和你一样。这里涉及到一些技巧,因为您需要访问的图像是由浏览器中的 javascript 渲染生成的,因此服务器端屏幕抓取是不够的(除非事情变得非常复杂,而我不会去那里)。另外,事情很棘手,因为您需要加载的内容位于远程域 (www.google.com) 上,这意味着您在浏览器中访问该内容的方式受到限制(如果没有适当的方法,无法通过 Ajax 真正做到这一点)谷歌服务器上的 Access-Control-Allow-Origin 标头无法通过嵌入直接 iframe 来访问它,以防止跨域 DOM 操作。因此,我必须解决这两个限制。需要注意的是 - 我意识到你没有要求 jQuery,但老实说,它让这里的生活变得更加轻松,所以我继续使用它。所以这就是我如何让它工作:
index.cfm
以及允许我们绕过跨域限制的关键,一个在服务器上执行对 google 的直接请求的文件:googleProxy.cfm:
所有这些都在没有令人讨厌的服务器端屏幕的情况下完成抓取或正则表达式。
I've tested this, and it does work. Hopefully my understanding of "work" is the same as yours. There are a few tricks involved, because the images you need to access are produced by javascript rendering within the browser, so server-side screen-scraping would not suffice (unless things got really complicated, and I'm not going there). Also, things are tricky because the content you need loaded is on a remote domain (www.google.com) and that means there are restrictions on how you can access that content within the browser (can't really do this via Ajax without proper Access-Control-Allow-Origin headers on the google server; can't access this by embedding a direct iframe for similar restrictions preventing cross-domain DOM manipulation). So, I had to work around both of those limitations. One note - I realize you didn't ask for jQuery, but it honestly makes life so much easier here that I went ahead and used it. So here's how I got it working:
index.cfm
And the key that allows us to get around cross-domain restrictions, a file that performs the direct request to google on the server: googleProxy.cfm:
All accomplished without nasty server-side screen scraping or regular expressions.