获取源代码中没有的网站内容

发布于 2024-12-01 03:46:38 字数 446 浏览 3 评论 0原文

我想从 http://www.fxstreet.com/ 等网站获取一些财务数据rates-charts/currency-rates/

到目前为止,我正在使用 liburl 来获取源代码和一些正则表达式搜索来获取数据,然后将其存储在文件中。

不过还有一个小问题: 在我在浏览器中看到的页面上,数据几乎每秒更新一次。然而,当我打开源代码时,我要查找的数据每两分钟才会更改一次。 所以我的程序只能获取时间分辨率比可能低得多的数据。

我有两个问题:

(i)两分钟内保持静态的源代码怎么可能生成每秒都在变化的表?机制是什么?

(ii)如何获取具有第二时间分辨率的数据,即如何读出源代码中未显示的变化表。

提前致谢, 大卫

I want to grab some financial data from sites like http://www.fxstreet.com/rates-charts/currency-rates/

up to now I'm using liburl to grab the sourcecode and some regexp search to get the data, which I afterwards store in a file.

Yet there is a little problem:
On the page as I see it in the browser, the data is updated almost each second. When I open the source code however the data I'm looking for changes only every two minutes.
So my program only gets the data with a much lower time-resolution than possible.

I have two questions:

(i) How is it possible that a source-code which remains static over two minutes produces a table that changes every second? What is the mechanism?

(ii) How do I get the data with second time-resolution, i.e. how do I read out such a changing table thats not shown in the sourcecode.

thanks in advance,
David

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

感悟人生的甜 2024-12-08 03:46:38

您可以使用 FireBug 中的网络面板来检查页面打开时发出的 HTTP 请求(通常是为了获取数据)打开。您引用的这个特定页面似乎正在向 http://ttpush.fxstreet.com/http_push/ 发送 POST 请求,然后接收并解析 JSON 响应。

You can use the network panel in FireBug to examine the HTTP requests being sent out (typically to fetch data) while the page is open. This particular page you've referenced appears to be sending POST requests to http://ttpush.fxstreet.com/http_push/, then receiving and parsing a JSON response.

徒留西风 2024-12-08 03:46:38

尝试将 POST 请求发送到 http://ttpush.fxstreet.com/http_push/connect,然后查看你得到的

它将不断加载新数据

编辑:

你可以使用 liburl 或 python,这并不重要。在 HTTP 下,当您浏览网页时,您会发送 GET 或 POST 请求。
转到网站,打开开发人员工具(Chrome)/firebug(firefox插件),您将看到加载所有数据后,有一个请求不会关闭 - 它保持打开状态。

当您有一个网站并且想要连续获取数据时,您可以通过几种技术来实现:

  • 每隔几秒发出单独的请求(使用ajax) - 这将为每个请求打开一个连接,如果您想要频繁的数据更新 - 使用长轮询或服务器轮询是一种浪费
  • - 发出 1 个获取数据的请求。它保持打开状态,并在需要时将数据刷新到套接字(到您的浏览器)。 TCP 连接保持打开状态。当连接超时时 - 您可以重新打开它。它比通常情况下的上述方法更有效 - 但连接保持打开状态。
  • 使用 XMPP 或其他一些协议(不是 HTTP) - 主要用于聊天,例如 facebook/msn 我认为,可能是 google 和其他一些协议。

您发布的网站使用第二种方法 - 当它检测到对该页面的 POST 请求时,它会保持连接打开并连续转储数据。
您需要做的是向该页面发出 POST 请求,您需要查看需要发送哪些参数(如果有)。只要发送正确的参数,您如何发出请求并不重要。

您需要使用分隔符读取响应 - 可能每次他们想要处理数据时,他们都会发送 \n 或其他分隔符。

希望这有帮助。如果您发现仍然无法解决此问题,请告诉我,我将介绍更多技术细节

try sending POST request to http://ttpush.fxstreet.com/http_push/connect, and see what you get

it will continuously load new data

EDIT:

you can use liburl or python, it doesn't really matter. Under HTTP, when you browse the web, you send GET or POST requests.
Go to the website, open the Developer Tools (Chrome)/firebug(firefox plugin) and you will see that after all the data is loaded, there's a request that doesn't close - it stays open.

When you have a website and you want to fetch data continuously, you can do it in a few techniques:

  • make separate requests (using ajax) every few seconds - this will open a connection for each request, and if you want frequent data updates - it's wasteful
  • use long polling or server polling - make 1 request that fetches the data. it stays open, and flushes data to the socket (to your browser) whenever it needs. the TCP connection remains open. When the connection times out - you can reopen it. It's more effective than the above normally - but the connection remains open.
  • use XMPP or some other protocol (not HTTP) - used mainly on chats, like facebook/msn i think., probably google's and some others.

the website you posted uses the second method - when it detects a POST request to that page, it keeps the connection open and dumps data continuously.
What you need to do is make a POST request to that page, you need to see which parameters (if any) are needed to be sent. It doesn't matter how you make the request, as long as you send the right parameters.

you need to read the response with a delimiter - probably every time they want to process data, they send \n or some other delimiter.

Hope this helps. If you see that you still can't get around this let me know and i'll get into more technical details

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文