是否有解决方法可以在 Ruby 中打开包含下划线的 URL?

发布于 2024-10-20 12:44:39 字数 296 浏览 6 评论 0原文

我正在使用 open-uri 来打开 URL。

resp = open("http://sub_domain.domain.com")

如果它包含下划线,我会得到一个错误:

URI::InvalidURIError: the scheme http does not accept registry part: sub_domain.domain.com (or bad hostname?)

我明白这是因为根据 RFC URL 只能包含字母和数字。有什么解决方法吗?

I'm using open-uri to open URLs.

resp = open("http://sub_domain.domain.com")

If it contains underscore I get an error:

URI::InvalidURIError: the scheme http does not accept registry part: sub_domain.domain.com (or bad hostname?)

I understand that this is because according to RFC URLs can contain only letters and numbers. Is there any workaround?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

时光病人 2024-10-27 12:44:39

这看起来像是 URI 中的一个错误,uri-open、HTTParty 和许多其他 gem 都使用 URI.parse。

这是一个解决方法:

require 'net/http'
require 'open-uri'

def hopen(url)
  begin
    open(url)
  rescue URI::InvalidURIError
    host = url.match(".+\:\/\/([^\/]+)")[1]
    path = url.partition(host)[2] || "/"
    Net::HTTP.get host, path
  end
end

resp = hopen("http://dear_raed.blogspot.com/2009_01_01_archive.html")

This looks like a bug in URI, and uri-open, HTTParty and many other gems make use of URI.parse.

Here's a workaround:

require 'net/http'
require 'open-uri'

def hopen(url)
  begin
    open(url)
  rescue URI::InvalidURIError
    host = url.match(".+\:\/\/([^\/]+)")[1]
    path = url.partition(host)[2] || "/"
    Net::HTTP.get host, path
  end
end

resp = hopen("http://dear_raed.blogspot.com/2009_01_01_archive.html")
最美的太阳 2024-10-27 12:44:39

URI 对于 url 的外观有一种老式的想法。

最近我正在使用addressable 解决这个问题:

require 'open-uri'
require 'addressable/uri'

class URI::Parser
  def split url
    a = Addressable::URI::parse url
    [a.scheme, a.userinfo, a.host, a.port, nil, a.path, nil, a.query, a.fragment]
  end
end

resp = open("http://sub_domain.domain.com") # Yay!

不要忘记gem install addressable

URI has an old-fashioned idea of what an url looks like.

Lately I'm using addressable to get around that:

require 'open-uri'
require 'addressable/uri'

class URI::Parser
  def split url
    a = Addressable::URI::parse url
    [a.scheme, a.userinfo, a.host, a.port, nil, a.path, nil, a.query, a.fragment]
  end
end

resp = open("http://sub_domain.domain.com") # Yay!

Don't forget to gem install addressable

心的位置 2024-10-27 12:44:39

我的 Rails 应用程序中的这个初始化程序似乎至少可以使 URI.parse 工作:

# config/initializers/uri_underscore.rb
class URI::Generic
  def initialize_with_registry_check(scheme,
                 userinfo, host, port, registry,
                 path, opaque,
                 query,
                 fragment,
                 parser = DEFAULT_PARSER,
                 arg_check = false)
    if %w(http https).include?(scheme) && host.nil? && registry =~ /_/
      initialize_without_registry_check(scheme, userinfo, registry, port, nil, path, opaque, query, fragment, parser, arg_check)
    else
      initialize_without_registry_check(scheme, userinfo, host, port, registry, path, opaque, query, fragment, parser, arg_check)
    end
  end
  alias_method_chain :initialize, :registry_check
end

This initializer in my rails app seems to make URI.parse work at least:

# config/initializers/uri_underscore.rb
class URI::Generic
  def initialize_with_registry_check(scheme,
                 userinfo, host, port, registry,
                 path, opaque,
                 query,
                 fragment,
                 parser = DEFAULT_PARSER,
                 arg_check = false)
    if %w(http https).include?(scheme) && host.nil? && registry =~ /_/
      initialize_without_registry_check(scheme, userinfo, registry, port, nil, path, opaque, query, fragment, parser, arg_check)
    else
      initialize_without_registry_check(scheme, userinfo, host, port, registry, path, opaque, query, fragment, parser, arg_check)
    end
  end
  alias_method_chain :initialize, :registry_check
end
执笔绘流年 2024-10-27 12:44:39

这是一个补丁,可以解决各种情况(rest-client、open-uri 等)的问题,而无需使用外部 gem 或覆盖 URI.parse 的部分:

module URI
  DEFAULT_PARSER = Parser.new(:HOSTNAME => "(?:(?:[a-zA-Z\\d](?:[-\\_a-zA-Z\\d]*[a-zA-Z\\d])?)\\.)*(?:[a-zA-Z](?:[-\\_a-zA-Z\\d]*[a-zA-Z\\d])?)\\.?")
end

来源:lib/uri/rfc2396_parser.rb#L86

Ruby-core 有一个未解决的问题:https://bugs.ruby-lang.org/issues/8241

Here is a patch that solves the problem for a wide variety of situations (rest-client, open-uri, etc.) without using external gems or overriding parts of URI.parse:

module URI
  DEFAULT_PARSER = Parser.new(:HOSTNAME => "(?:(?:[a-zA-Z\\d](?:[-\\_a-zA-Z\\d]*[a-zA-Z\\d])?)\\.)*(?:[a-zA-Z](?:[-\\_a-zA-Z\\d]*[a-zA-Z\\d])?)\\.?")
end

Source: lib/uri/rfc2396_parser.rb#L86

Ruby-core has an open issue: https://bugs.ruby-lang.org/issues/8241

最丧也最甜 2024-10-27 12:44:39

这样的域名中不能包含下划线。这是 DNS 标准的一部分。您的意思是使用破折号(-)吗?

即使 open-uri 没有抛出错误,这样的命令也是毫无意义的。为什么?因为它没有办法解析这样的域名。最多你会得到一个未知主机错误。您无法注册带有 _ 的域名,即使运行您自己的私有 DNS 服务器,使用 _ 也是违反规范的。您可以改变规则并允许它(通过修改 DNS 服务器软件),但是您的操作系统的 DNS 解析器将不支持它,您的路由器的 DNS 软件也不会支持它。

解决方案:不要尝试在 DNS 名称中使用 _。它在任何地方都不起作用,而且不符合规范

An underscore can not be contained in a domain name like that. That is part of the DNS standard. Did you mean to use a dash(-)?

Even if open-uri didn't throw an error such a command would be pointless. Why? Because there is no way it can resolve such a domain name. At best you'd get an unknown host error. There is no way for you to register a domain name with an _ in it, and even running your own private DNS server, it is against the specification to use a _. You could bend the rules and allow it(by modifying the DNS server software), but then your operating system's DNS resolver won't support it, neither will your router's DNS software.

Solution: Don't try to use a _ in a DNS name. It won't work anywhere and it's against the specifications

菊凝晚露 2024-10-27 12:44:39

我在尝试使用 gem update / gem install 等时遇到了同样的错误,所以我使用了 IP 地址,现在一切正常了。

I had this same error while trying to use gem update / gem install etc. so I used the IP address instead and its fine now.

贱人配狗天长地久 2024-10-27 12:44:39

这是另一个丑陋的黑客,不需要 gem:

def parse(url = nil)
    begin
        URI.parse(url)
    rescue URI::InvalidURIError
        host = url.match(".+\:\/\/([^\/]+)")[1]
        uri = URI.parse(url.sub(host, 'dummy-host'))
        uri.instance_variable_set('@host', host)
        uri
    end
end

Here is another ugly hack, no gem needed:

def parse(url = nil)
    begin
        URI.parse(url)
    rescue URI::InvalidURIError
        host = url.match(".+\:\/\/([^\/]+)")[1]
        uri = URI.parse(url.sub(host, 'dummy-host'))
        uri.instance_variable_set('@host', host)
        uri
    end
end
三生池水覆流年 2024-10-27 12:44:39

我建议使用 Curb gem: https://github.com/taf2/curb 它只包装了 libcurl。这是一个简单的示例,它将自动遵循重定向并打印响应代码和响应正文:

rsp = Curl::Easy.http_get(url){|curl| curl.follow_location = true; curl.max_redirects=10;}
puts rsp.response_code
puts rsp.body_str

我通常会避免使用 ruby​​ URI 类,因为它们对规范过于严格,正如您所知,网络是狂野的西部:) Curl/curb 句柄我像冠军一样向它抛出的每个网址。

I recommend using the Curb gem: https://github.com/taf2/curb which just wraps libcurl. Here is a simple example that will automatically follow redirects and print the response code and response body:

rsp = Curl::Easy.http_get(url){|curl| curl.follow_location = true; curl.max_redirects=10;}
puts rsp.response_code
puts rsp.body_str

I usually avoid the ruby URI classes since they are too strick to the spec which as you know the web is the wild west :) Curl / curb handles every url I throw at it like a champ.

策马西风 2024-10-27 12:44:39

对于任何偶然发现这一点的人:

Ruby 的 URI.parse 曾经基于 RFC2396(1998 年 8 月发布),请参阅 https://bugs.ruby-lang.org/issues/8241

但从 ruby​​ 2.2 URI 开始是 升级到 RFC 3986,因此如果您使用的是现代版本,现在不需要猴子补丁。

For anyone stumbling upon this:

Ruby's URI.parse used to be based on RFC2396 (published in Aug 1998), see https://bugs.ruby-lang.org/issues/8241

But starting at ruby 2.2 URI is upgraded into RFC 3986, so if you're on a modern version, no monkey patches are necessary now.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文