不知道如何在这个特定实例中处理 javascript 和 mechanize
我将访问 Amazon KDP 上的多个账户 - http://kdp.amazon.com/
我的任务是登录每个帐户并检查该帐户的收入。 Mechanize 非常适合登录和处理 cookie 等,但显示帐户收入的页面使用 javascript 动态填充页面。
我做了一些挖掘,发现 javascript 发出了以下请求:
https://kdp.amazon.com/self-publishing/reports/transactionSummary?_=1326419839161&marketplaceID=ATVPDKIKX0DER
以及一个包含会话 ID、令牌和一些随机内容的 cookie。每次我点击链接显示结果时,上面的 GET url 的数字部分都是不同的,即使是同一个链接。
为了响应该请求,浏览器会收到此消息(剪掉一堆,这样它就不会占用整个页面):
{"iTotalDisplayRecords":13,"iTotalRecords":13,"aaData":[["12/03/2011","<span
title=\"Booktitle\">Hold That ...<\/span>","<span
title=\"Author\">Amy
....
<\/span>","B004PGMHEM","1","1","0","70%","4.47","0.06","4.47","0.01","0.00",""],["","","","","","","","","","","","","<div
class='grandtotal'>Total: $ 39.53<\/div>","Junk"]]}
我想我可以使用 mechanize 的 cookie 容器来提取属于该请求一部分的 cookie,但是我如何知道该数字是什么以及它是如何生成的?即使在最好的情况下,页面源代码中的 JavaScript 也显得很神秘。这是其中之一:
http://kdp.amazon.com/DTPUIFramework/ js/all-signin-thin.js
有没有一种方法可以真正追踪哪些 JavaScript 在“幕后”运行,可以这么说,在我点击页面上的某些内容后,以便我可以模拟与机械化结合的要求?
Danke..
PS:我不能(或者更确切地说,我不想)使用 watir 来完成这项任务,因为理论上我可能处理的不仅仅是少数几个帐户,所以这必须非常快捷。
I'm going to be accessing a number of accounts on Amazon's KDP - http://kdp.amazon.com/
My task is to login to each account and check the account's earnings. Mechanize works great for logging in and dealing with the cookies and such but the page which displays the account earnings uses javascript to dynamically populate the page.
I did a little bit of digging and found that the javascripts sends out the following request:
https://kdp.amazon.com/self-publishing/reports/transactionSummary?_=1326419839161&marketplaceID=ATVPDKIKX0DER
Along with a cookie which contains a session ID, a token, and some random stuff. Every time I click a link to display the results, the numerical part of the above GET url is different, even if it's the same link.
In response to the request, the browser then receives this (cut out a bunch of it so it doesn't take up the whole page):
{"iTotalDisplayRecords":13,"iTotalRecords":13,"aaData":[["12/03/2011","<span
title=\"Booktitle\">Hold That ...<\/span>","<span
title=\"Author\">Amy
....
<\/span>","B004PGMHEM","1","1","0","70%","4.47","0.06","4.47","0.01","0.00",""],["","","","","","","","","","","","","<div
class='grandtotal'>Total: $ 39.53<\/div>","Junk"]]}
I think I can use mechanize's cookie container to extract the cookies which are a part of that request but how do I figure out what that number is and how it's generated? The javascripts in the source code of the page seem cryptic on the best of days. Here's one of them:
http://kdp.amazon.com/DTPUIFramework/js/all-signin-thin.js
Is there a way to really track down what javascripts are running "behind the scenes" so to speak after I click on something on the page so that I can emulate that request in conjunction with mechanize?
Danke..
PS: I can't (or, rather, I don't want to) use watir for this task, because in theory I might be handling more than just a handful of accounts so this's gotta be pretty snappy.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
它只是一个时间戳,仅用于缓存清除。试试这个:
It's just a timestamp and it's only used for cache busting. Try this:
Mechanize 不运行页面中嵌入的 JavaScript。它仅检索 HTML。
如果页面包含 JavaScript,Mechanize 可以看到它,并且您可以使用 Mechanize 内部使用的 Nokogiri 来检索
标记的内容。但是,由于 JavaScript 在浏览器中执行而加载的任何内容都不会在 Mechanize 中运行。 Watir 是解决方案,因为它驱动浏览器本身,浏览器将解释并运行页面中的 JavaScript。
您可以在浏览器中单步浏览页面并查看源代码,以了解 FireBug 正在运行的内容。从这些信息中,您可以了解 JavaScript 正在做什么,然后使用 Mechanize 和 Nokogiri 从页面中提取所需的信息,以便您构建下一个 URL,但这可能需要大量工作。
您提出的问题与许多其他人关于 Mechanize 和 JavaScript 的问题类似。我建议您查看这些 SO 链接以获得替代想法:
或 搜索 Stack Overflow 了解有关 Ruby、JavaScript 和 Mechanize 的问题。
Mechanize doesn't run JavaScript that is embedded in the page. It only retrieves the HTML.
If the page contains JavaScript, Mechanize can see it and you can use Nokogiri, which Mechanize uses internally, to retrieve the
<script>
tags' content. But, anything that would be loaded as a result of the JavaScript being executed in a browser will not run in Mechanize. Watir is the solution for that, because it drives the browser itself, which will interpret and run the JavaScript in the page.You can step through the pages in a browser and look at the source code to get an idea what is running using FireBug. From that information you can get an understanding of what the JavaScript is doing, and then use Mechanize and Nokogiri to extract the needed information from a page that lets you build up your next URLs, but it can be a lot of work.
What you ask is similar to many other's questions regarding Mechanize and JavaScript. I'd recommend you look at these SO links to get alternate ideas:
Or search Stack Overflow for questions about Ruby, JavaScript and Mechanize.