Chrome扩展V3-发送HTTP请求并刮擦网站
有了清单V2版本,一个可以使用一个简单的内容脚本,并使用get请求到另一个网站。 时,将返回网页
xmlhttprequest设置xhr.responsetype ='document'
,然后可以轻松调用
this.Responsexml.getElementsByClassName('some class name')
scrape scrape之 类的东西。返回的网页。
使用清单v3,我正在遇到以下CORS错误(在这里我只想向stackoverflow问题提出一个get请求,而用户必须为quora.com打开选项卡):
访问在'https上获取的访问://stackoverflow.com/questions/72333126/chrome-extension-extension-v3-sendens-http-requests-http-requests-and-scrape-a-website'from'https://www.quora.com/'策略:没有“访问控制”标题在请求的资源上。如果不透明的响应满足您的需求,请将请求模式设置为“无现体”,以通过禁用CORS来获取资源。
我需要获取响应文本,因此在错误消息中执行建议的执行不是一个选项。 。
可能的解决方法是让背景脚本提出HTTP请求,但是背景脚本无法访问内容脚本的DOM功能。具体而言,据我所知,背景脚本无法刮擦数据。
一个更复杂的解决方案是在后台提出HTTP请求,然后将响应传递给内容脚本的响应。问题是,响应对象不是JSON,而是文本。而且我不知道将整个响应文本编码为JSON的好方法。
这是浏览器扩展程序的非常简单的功能。我敢肯定,有一种方法可以做这项工作。
With manifest v2 version one could make a simple content script with GET requests to another website. XMLHttpRequest would return the webpage when setting
xhr.responseType = 'document'
And one could then easily call something like
this.responseXML.getElementsByClassName('Some class name')
to scrape the returned webpage.
With manifest v3 I'm getting the following CORS error (here I just want to make a GET request to a StackOverflow question, while the user has to have a tab opened for quora.com):
Access to fetch at 'https://stackoverflow.com/questions/72333126/chrome-extension-v3-sending-http-requests-and-scrape-a-website' from origin 'https://www.quora.com/' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource. If an opaque response serves your needs, set the request's mode to 'no-cors' to fetch the resource with CORS disabled.
I need to get the response text, so doing the suggested in the error message is not an option.
A possible workaround would be to have a background script make the http request, however the background script does not have access to the DOM capabilities of the content script. Specifically, a background script can't scrape the data as far as I'm aware.
An even more complicated solution would be to make the http request on the background, and then passing the response to the content script with a message. The problem is just, that the response object is not json but text. And I don't know of a good way to encode the whole response text into json.
This is a very simple functionality of a browser extension. I'm sure there is somehow a way to make this work.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
内容脚本与DOM
清单交互:
的官方示例。
这是Google V3 Docs https://developer.chrome.com/docs/extensions/mv3/getstarted/tut-reading time/
Content Scripts Interact With The DOM
Manifest:
Here is the Official Example From Google V3 Docs
https://developer.chrome.com/docs/extensions/mv3/getstarted/tut-reading-time/