Web在Google Chrome扩展中刮擦（JavaScript＆＃x2B; Chrome API）

发布于 2025-02-07 16:09:13 字数 692 浏览 3 评论 0原文

用JavaScript和其他任何可用的技术，在Google Chrome扩展程序中从Google Chrome扩展程序中执行网络刮擦的最佳选择是什么。 其他JavaScript-libraries也被接受。

重要的是要掩盖刮擦，以表现得像普通的Web-Request 。没有AJAX或XMLHTTPREQUEST的指示，例如X-Requested-with：XMLHTTPRequest或onect> onect> onect。

必须从JavaScript访问刮擦内容，以在扩展中进行进一步的操纵和演示，这很可能是字符串。

在任何WebKit/Chrome特定的API：S中是否有任何钩子可以用来制作正常的Web重新要求并获得操纵结果？

var pageContent = getPageContent(url); // TODO: Implement
var items = $(pageContent).find('.item');
// Display items with further selections

奖励点可以从磁盘上的本地文件 进行最初调试。但是，如果这是唯一的一点是停止解决方案，那么请忽略奖励点。

原文

What are the best options for performing Web Scraping of a not currently open tab from within a Google Chrome Extension with JavaScript and whatever more technologies are available. Other JavaScript-libraries are also accepted.

The important thing is to mask the scraping to behave like a normal web-request. No indications of AJAX or XMLHttpRequest, like X-Requested-With: XMLHttpRequest or Origin.

The scraped content must be accessible from JavaScript for further manipulation and presentation within the extension, most probably as a string.

Are there any hooks in any WebKit/Chrome-specific API:s that can be used to make a normal web-request and get the results for manipulation?

var pageContent = getPageContent(url); // TODO: Implement
var items = $(pageContent).find('.item');
// Display items with further selections

Bonus-points to make this work from a local file on disk, for initial debugging. But if that is the only point is stopping a solution, then disregard the bonus-points.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

怼怹恏 2025-02-14 16:09:13

尝试使用 xhr2 code>并倒在上（new domparser）.parsefromstring（withseText，getResponseheader（“ content-type”））带有我的text/html patch 。请参阅 https://gist.github.com/1138724 以示例说明我如何检测ResponseType =“ document支持（同步检查响应=== null在由text/html blob创建的对象URL上。

使用 chrome webrequest api 隐藏x-requested-with-with-with等。

回复收藏 0 原文

盗琴音 2025-02-14 16:09:13

自从提出这个问题以来，已经发布了许多工具。

artoo.js 是其中之一。这是一块JavaScript代码，旨在在浏览器的控制台中运行，以为您提供一些刮擦实用程序。它也可以用作镀铬扩展。

回复收藏 0 原文

是伱的 2025-02-14 16:09:13

如果您可以看一些Google Chrome插件以外的东西，请查看 phantomjs 使用qt-webkit在后台运行并运行就像浏览器提出AJAX请求一样。您可以将其称为无头浏览器，因为它不会在屏幕上显示输出，并且可以在进行其他操作时在后台工作。如果需要，可以将图像导出，PDF从其获取的页面中删除。它提供JS接口来加载页面，单击按钮等，就像您在浏览器中所拥有的一样。您还可以在要刮擦的任何页面上注入自定义JS，例如jQuery，并使用它访问DOM并导出所需的数据。因为它使用 webkit 其渲染行为与Google Chrome完全一样。

另一个选择是使用aptana jaxer 基于Mozilla引擎，本身就是非常好的概念。它也可以用作简单的刮擦工具。

回复收藏 0 原文