将 URLFETCH 与 Google Apps 脚本结合使用时出现 403 错误的解决方法(外部网站)
我已经使用 sof 很多年了(我几乎总能找到所有答案!),但我对当前的项目很困惑,所以这是我第一次在这里发帖。 :)
我想使用 URL 或产品参考从 www.hermes.com 获取产品价格。 例如: https://www.hermes.com/fr /fr/product/portefeuille-dogon-duo-H050896CK5E/ 参考 = H050896CK5E URL 和 Refs 存储在电子表格中。 当我在脚本中调用 UrlFetchApp.fetch 函数时,出现 403 错误。 如果我的理解是正确的,那就意味着 hermes.com 服务器阻止了我。
我还尝试了=IMPORTXML,它说电子表格无法访问该URL。
以下是我找到的解决方法:使用 Google 自定义搜索 API 搜索 URL 并迭代,直到结果 URL 与查询匹配。
[当前问题]
- 如果商品缺货或找不到 URL,我无法获取价格。 前任: 当我搜索 https://www.hermes 时。 com/it/it/product/cappello-alla-pescatora-eden-H221007NvA259/ 它没有给我带来任何回报。 我知道它可以回来 https://www.hermes.com/it /it/product/cappello-alla-pescatora-eden-H221007Nv0156/ 但颜色不同(有时颜色之间的价格确实会发生变化)
所以我的问题是: 您将如何绕过 403 错误? (当然不是绕过安全性,但如果您有任何想法如何检索 hermes.com 价格,请告诉我!)
我将粘贴下面的脚本。 先感谢您。
→ 我在 hermes.com 上使用的。 使用 muteHttpExceptions = true,我得到验证码 html
var response = UrlFetchApp.fetch("http://www.hermes.com/",
{
method: "get",
contentType: "application/json",
muteHttpExceptions: true,
});
→ 上面的结果(验证码 html,我认为 hermes.com 知道我是机器人)
<html><head><title>hermes.com</title><style>#cmsg{animation: A 1.5s;}@keyframes A{0%{opacity:0;}99%{opacity:0;}100%{opacity:1;}}</style></head><body style="margin:0"><p id="cmsg">Please enable JS and disable any ad blocker</p><script>var dd={'cid':'AHrlqAAAAAMAs2XwactPh88AInQWTw==','hsh':'2211F522B61E269B869FA6EAFFB5E1','t':'fe','s':13461,'host':'geo.captcha-delivery.com'}</script><script src="https://ct.captcha-delivery.com/c.js"></script></body></html>
→ 我现在使用的(谷歌自定义搜索)
for (var i = 0; i < 5; i++) {
var start = (i * 10) + 1;
var apiUrl = "https://www.googleapis.com/customsearch/v1?key=" + apiKey + "&cx=" + searchId + "&q=search " + query + "&start=" + start;
var apiOptions = {
method: 'get'
};
var responseApi = UrlFetchApp.fetch(apiUrl, apiOptions);
var responseJson = JSON.parse(responseApi.getContentText());
var checkDomain = "";
for (var v = 0; v < 10; v++) {
if (responseJson["items"] != null && responseJson["items"][v] != null) {
checkDomain = responseJson["items"][v]["link"];
if (checkDomain != null && checkDomain == query) {
productPrice = responseJson["items"][v]["pagemap"]["metatags"][0]["product:price:amount"];
currency = responseJson["items"][v]["pagemap"]["metatags"][0]["product:price:currency"];
break;
}
}
}
if (productPrice > 0) { break; }
}
I've used sof for many years (I almost always found all my answers!) but I'm quite stuck for the current project so this is the first time I post here. :)
I want to get the product price from www.hermes.com using either the URL or the product ref.
ex: https://www.hermes.com/fr/fr/product/portefeuille-dogon-duo-H050896CK5E/
ref = H050896CK5E
The URLs and Refs are stored in a Spreadsheet.
As I called UrlFetchApp.fetch function in my script, I got 403 error.
If my understanding is correct, that means the hermes.com server is blocking me out.
I also tried =IMPORTXML and it says that the spreadsheet cannot access the URL.
Here are the workaround I found: use Google Custom Search API to search the URL and iterate until the result URL matches the query.
[Current issues]
- If the object is out of stock or if the URL is not found, I am unable to get the price.
ex:
when I search https://www.hermes.com/it/it/product/cappello-alla-pescatora-eden-H221007NvA259/
it returns me nothing.
I know it can return
https://www.hermes.com/it/it/product/cappello-alla-pescatora-eden-H221007Nv0156/
but not the same colour (and sometimes the price does change between colours)
So my question was:
How would you do to bypass the 403 error ? (not bypass security of course but if you have any ideas how to retrieve the hermes.com prices, please let me know!)
I will paste the scripts below.
Thank you in advance.
→ What I used for hermes.com.
With the muteHttpExceptions = true, I get the captcha html
var response = UrlFetchApp.fetch("http://www.hermes.com/",
{
method: "get",
contentType: "application/json",
muteHttpExceptions: true,
});
→ Result of above (a captcha html, I think hermes.com knows I'm a bot)
<html><head><title>hermes.com</title><style>#cmsg{animation: A 1.5s;}@keyframes A{0%{opacity:0;}99%{opacity:0;}100%{opacity:1;}}</style></head><body style="margin:0"><p id="cmsg">Please enable JS and disable any ad blocker</p><script>var dd={'cid':'AHrlqAAAAAMAs2XwactPh88AInQWTw==','hsh':'2211F522B61E269B869FA6EAFFB5E1','t':'fe','s':13461,'host':'geo.captcha-delivery.com'}</script><script src="https://ct.captcha-delivery.com/c.js"></script></body></html>
→ What I'm using now (Google Custom Search)
for (var i = 0; i < 5; i++) {
var start = (i * 10) + 1;
var apiUrl = "https://www.googleapis.com/customsearch/v1?key=" + apiKey + "&cx=" + searchId + "&q=search " + query + "&start=" + start;
var apiOptions = {
method: 'get'
};
var responseApi = UrlFetchApp.fetch(apiUrl, apiOptions);
var responseJson = JSON.parse(responseApi.getContentText());
var checkDomain = "";
for (var v = 0; v < 10; v++) {
if (responseJson["items"] != null && responseJson["items"][v] != null) {
checkDomain = responseJson["items"][v]["link"];
if (checkDomain != null && checkDomain == query) {
productPrice = responseJson["items"][v]["pagemap"]["metatags"][0]["product:price:amount"];
currency = responseJson["items"][v]["pagemap"]["metatags"][0]["product:price:currency"];
break;
}
}
}
if (productPrice > 0) { break; }
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论