JS 或任何其他语言挂钩在 HTML 页面中加载资源

发布于 2024-08-22 05:20:19 字数 478 浏览 9 评论 0原文

我想要完成的任务：

HTTP GET 网站的内容（例如 google.com）
然后使用某种挂钩或过滤器来捕获此页面尝试加载的所有资源（例如 CSS 文件、所有 JavaScript 文件、所有图像、所有 iframe 等）

首先想到的是解析下载的页面/代码并提取可能链接到资源的所有标签，但是它们非常多，其中一些很棘手，例如图像CSS 中声明的背景，示例：

body {background-image:url('paper.gif');}

此外，我需要捕获所有打算通过 JavaScript 加载的资源。例如，有一个 JS 函数将生成一个 URL，然后解释它以加载资源。

因此，我认为我需要某种挂钩或过滤器/监视器。

编程语言并不重要（尽管如果能在 Unix 机器上运行就更好了）。

更新：这需要一个自动化的解决方案。

谢谢。

原文

What I am trying to accomplish:

HTTP GET the contents of a site (say google.com)
Then have some sort of hook or filter that will catch all resources that this page tries to load (for instance the CSS files, all JavaScript files, all images, all iframes, etc)

The first thing that comes in mind is to parse the downloaded page/code and extract all tags that might link to a resource, however they are very many and some of them a tricky, like the a image background declared in CSS, example:

body {background-image:url('paper.gif');}

Also, I need to catch all resources that are intended to be loaded via JavaScript. For instance have a JS function that will generate a URL and than interpret it to load the resource.

For this reason I think having some sort of hook or filter/monitor is what I need.

The programming language does not matter (although would be nice something that works on a Unix box).

UPDATE: This needs to be an automated solution.

Thank you.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

忆离笙 2024-08-29 05:20:19

我假设您正在寻找完全自动化的解决方案。

有多种方法可以解析文件（在所有主要脚本语言中，基于 wget 的语言和其他语言），但据我所知，没有一种方法可以真正解释 JavaScript（因为这就是最终的结果）。

我认为您唯一的选择是在您的 Unix/Linux 机器上设置 Firefox（或其他现代浏览器）实例，为其提供 URL 并监视/阻止它尝试建立的所有传出连接。在客户端 PC 上，这是 Firebug 中“网络”选项卡的内容。我不知道是否以及在多大程度上可以在不实际重写浏览器部分的情况下实现自动化。也许是 Selenium RC 或 Selenium 中的其他工具之一套房是一个起点。

回复收藏 0 原文