使用 `open-uri` 打开带有逗号的 WIKI URL
我遇到 OpenURI::HTTPError: 403 Forbidden
错误 当我尝试打开
带有逗号(或其他特殊字符,如.
)的URL时。 我可以在浏览器中打开相同的网址。
require 'open-uri'
url = "http://en.wikipedia.org/wiki/Thor_Industries,_Inc."
f = open(url)
# throws OpenURI::HTTPError: 403 Forbidden error
如何转义此类 URL?
我尝试使用 CGI::escape
转义 url,但出现了相同的错误。
f = open(CGI::escape(url))
I am running in to OpenURI::HTTPError: 403 Forbidden
error
when I try to open
a URL with a comma (OR other special characters like .
).
I am able to open the same url in a browser.
require 'open-uri'
url = "http://en.wikipedia.org/wiki/Thor_Industries,_Inc."
f = open(url)
# throws OpenURI::HTTPError: 403 Forbidden error
How do I escape such URL?
I have tried to escape the url with CGI::escape
and I get the same error.
f = open(CGI::escape(url))
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
通常,只需需要模块
cgi
,然后使用CGI::escape(str)
。但是,这似乎不适用于您的特定实例,并且仍然返回 403。无论如何,我将把它留在这里供参考。
编辑:维基百科拒绝您的请求,因为它怀疑您是机器人。看起来某些内容明确的页面会授予您,但那些不符合其“安全”模式的页面(例如包含点或逗号的页面)将受到其筛选。如果您实际输出内容(我使用
Net::HTTP
执行此操作),您将得到以下内容:但是,提供用户代理字符串可以解决该问题:
Typically, one would simply require the module
cgi
, then useCGI::escape(str)
.However, this doesn't seem to work for your particular instance, and still returns a 403. I'll leave this here for reference, regardless.
Edit: Wikipedia is refusing your requests because it suspects that you are a bot. It would seem that certain pages that are clearly content are granted to you, but those that don't match its "safe" pattern (e.g. those that contain dots or commas) are subject to its screening. If you actually output the content (I did this with
Net::HTTP
), you get the following:Providing a user-agent string, however, solves the issue: