获取源代码中没有的网站内容
我想从 http://www.fxstreet.com/ 等网站获取一些财务数据rates-charts/currency-rates/
到目前为止,我正在使用 liburl 来获取源代码和一些正则表达式搜索来获取数据,然后将其存储在文件中。
不过还有一个小问题: 在我在浏览器中看到的页面上,数据几乎每秒更新一次。然而,当我打开源代码时,我要查找的数据每两分钟才会更改一次。 所以我的程序只能获取时间分辨率比可能低得多的数据。
我有两个问题:
(i)两分钟内保持静态的源代码怎么可能生成每秒都在变化的表?机制是什么?
(ii)如何获取具有第二时间分辨率的数据,即如何读出源代码中未显示的变化表。
提前致谢, 大卫
I want to grab some financial data from sites like http://www.fxstreet.com/rates-charts/currency-rates/
up to now I'm using liburl to grab the sourcecode and some regexp search to get the data, which I afterwards store in a file.
Yet there is a little problem:
On the page as I see it in the browser, the data is updated almost each second. When I open the source code however the data I'm looking for changes only every two minutes.
So my program only gets the data with a much lower time-resolution than possible.
I have two questions:
(i) How is it possible that a source-code which remains static over two minutes produces a table that changes every second? What is the mechanism?
(ii) How do I get the data with second time-resolution, i.e. how do I read out such a changing table thats not shown in the sourcecode.
thanks in advance,
David
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您可以使用 FireBug 中的网络面板来检查页面打开时发出的 HTTP 请求(通常是为了获取数据)打开。您引用的这个特定页面似乎正在向 http://ttpush.fxstreet.com/http_push/ 发送 POST 请求,然后接收并解析 JSON 响应。
You can use the network panel in FireBug to examine the HTTP requests being sent out (typically to fetch data) while the page is open. This particular page you've referenced appears to be sending POST requests to http://ttpush.fxstreet.com/http_push/, then receiving and parsing a JSON response.
尝试将 POST 请求发送到 http://ttpush.fxstreet.com/http_push/connect,然后查看你得到的
它将不断加载新数据
编辑:
你可以使用 liburl 或 python,这并不重要。在 HTTP 下,当您浏览网页时,您会发送 GET 或 POST 请求。
转到网站,打开开发人员工具(Chrome)/firebug(firefox插件),您将看到加载所有数据后,有一个请求不会关闭 - 它保持打开状态。
当您有一个网站并且想要连续获取数据时,您可以通过几种技术来实现:
您发布的网站使用第二种方法 - 当它检测到对该页面的 POST 请求时,它会保持连接打开并连续转储数据。
您需要做的是向该页面发出 POST 请求,您需要查看需要发送哪些参数(如果有)。只要发送正确的参数,您如何发出请求并不重要。
您需要使用分隔符读取响应 - 可能每次他们想要处理数据时,他们都会发送 \n 或其他分隔符。
希望这有帮助。如果您发现仍然无法解决此问题,请告诉我,我将介绍更多技术细节
try sending POST request to http://ttpush.fxstreet.com/http_push/connect, and see what you get
it will continuously load new data
EDIT:
you can use liburl or python, it doesn't really matter. Under HTTP, when you browse the web, you send GET or POST requests.
Go to the website, open the Developer Tools (Chrome)/firebug(firefox plugin) and you will see that after all the data is loaded, there's a request that doesn't close - it stays open.
When you have a website and you want to fetch data continuously, you can do it in a few techniques:
the website you posted uses the second method - when it detects a POST request to that page, it keeps the connection open and dumps data continuously.
What you need to do is make a POST request to that page, you need to see which parameters (if any) are needed to be sent. It doesn't matter how you make the request, as long as you send the right parameters.
you need to read the response with a delimiter - probably every time they want to process data, they send \n or some other delimiter.
Hope this helps. If you see that you still can't get around this let me know and i'll get into more technical details