如何在 Rails 中解析这个请求参数?

发布于 2024-10-03 07:07:34 字数 121 浏览 1 评论 0原文

我收到像 s = "%u041D%u0430%u0434%u043E%u0435%u043B" 这样的参数,并向我的网络服务器发出传入请求。

如何在 Rails 中将其解码为正常的 UTF8 字符串? 谢谢你!

I get params like s = "%u041D%u0430%u0434%u043E%u0435%u043B" with incoming request to my web server.

How to decode this to normal UTF8 string in Rails ?
thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

最初的梦 2024-10-10 07:07:34

它看起来像 escape 生成的非标准格式JavaScript。如果您可以影响发送此数据的代码,您可能应该尝试安排它使用 encodeURI (这会产生 UTF-8 编码字符的“正常”百分比编码)。

# Unescape percent encoding.
#
# The normal byte-oriented format ("%41") and the non-standard <em>%u</em>
# format ("%u0410") are both supported. The single-byte variant is decoded
# as if it represents bytes encoded with the same encoding as +str+. The
# two-byte <em>%u</em> variant is decoded as UTF-16BE and then re-encoded
# with the same encoding as +str+; surrogate pairs are supported.
#
# Since the resulting string will have the same encoding as +str+, all byte
# sequences resulting from the byte-oriented decoding must be valid sequences
# in the the encoding of +str+. Correspondingly, the encoding of +str+ must
# be compatible with any extended characters that are decoded from the
# UTF-16BE <em>%u</em> encodings.

def unescape(str)
  hh = /[0-9a-f]{2}/i
  hhhh = /[0-9a-f]{4}/i
  str.gsub(/((?:%#{hh})+)|((?:%u#{hhhh})+)/) do
    if $1
      $1.scan(hh).map(&:hex).pack('C*').force_encoding(str.encoding)
    elsif $2
      $2.scan(hhhh).map(&:hex).pack('S*').force_encoding(Encoding::UTF_16BE).
        encode!(str.encoding)
    else
      raise 'unhandled match'
    end
  end
end


def all_same?(e)
  first = e.first
  e.drop(1).all? { |o| o.eql?(first) }
end

ss = [
  # %-encoded-UTF-16BE -> SJIS (just for something fun... UTF-8 works fine)
  '%u041D%u0430%u0434%u043E%u0435%u043B'.encode!(Encoding::SJIS),
  # %-encoded-ISO-8859-5 -> ISO-8859-5
  '%bd%d0%d4%de%d5%db'.encode!(Encoding::ISO8859_5),
  # %-encoded-UTF-8 -> UTF-8
  '%d0%9d%d0%b0%d0%b4%d0%be%d0%b5%d0%bb'.encode!(Encoding::UTF_8),
]

ss2 = [ # demonstrate non-decoded content and UTF-16BE surrogate pair decoding
  # %-encoded-UTF-16BE -> UTF-8
  'A%uD801%uDC10%u0410'.encode!(Encoding::UTF_8),
  # %-encoded-UTF-8 -> UTF-8
  '%41%f0%90%90%90%D0%90'.encode!(Encoding::UTF_8),
]

ss = ss.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
all_same? ss.map { |s| s.encode(Encoding::UTF_8) }

ss2 = ss2.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
all_same? ss2.map { |s| s.encode(Encoding::UTF_8) }

当运行irb时:

ruby-1.9.2-head >   ss = ss.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
[#<Encoding:Shift_JIS>, #<Encoding:ISO-8859-5>, #<Encoding:UTF-8>]
=> ["\x{844E}\x{8470}\x{8474}\x{8480}\x{8475}\x{847C}", "\xBD\xD0\xD4\xDE\xD5\xDB", "Надоел"]
ruby-1.9.2-head > all_same? ss.map { |s| s.encode(Encoding::UTF_8) }
=> true
ruby-1.9.2-head >
ruby-1.9.2-head > ss2 = ss2.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
[#<Encoding:UTF-8>, #<Encoding:UTF-8>]
=> ["A

It looks like the non-standard format produced by escape in JavaScript. If you can influence the code that is sending this data you should probably try to arrange for it to use encodeURI instead (which yields “normal” percent encoding of UTF-8 encoded characters).

# Unescape percent encoding.
#
# The normal byte-oriented format ("%41") and the non-standard <em>%u</em>
# format ("%u0410") are both supported. The single-byte variant is decoded
# as if it represents bytes encoded with the same encoding as +str+. The
# two-byte <em>%u</em> variant is decoded as UTF-16BE and then re-encoded
# with the same encoding as +str+; surrogate pairs are supported.
#
# Since the resulting string will have the same encoding as +str+, all byte
# sequences resulting from the byte-oriented decoding must be valid sequences
# in the the encoding of +str+. Correspondingly, the encoding of +str+ must
# be compatible with any extended characters that are decoded from the
# UTF-16BE <em>%u</em> encodings.

def unescape(str)
  hh = /[0-9a-f]{2}/i
  hhhh = /[0-9a-f]{4}/i
  str.gsub(/((?:%#{hh})+)|((?:%u#{hhhh})+)/) do
    if $1
      $1.scan(hh).map(&:hex).pack('C*').force_encoding(str.encoding)
    elsif $2
      $2.scan(hhhh).map(&:hex).pack('S*').force_encoding(Encoding::UTF_16BE).
        encode!(str.encoding)
    else
      raise 'unhandled match'
    end
  end
end


def all_same?(e)
  first = e.first
  e.drop(1).all? { |o| o.eql?(first) }
end

ss = [
  # %-encoded-UTF-16BE -> SJIS (just for something fun... UTF-8 works fine)
  '%u041D%u0430%u0434%u043E%u0435%u043B'.encode!(Encoding::SJIS),
  # %-encoded-ISO-8859-5 -> ISO-8859-5
  '%bd%d0%d4%de%d5%db'.encode!(Encoding::ISO8859_5),
  # %-encoded-UTF-8 -> UTF-8
  '%d0%9d%d0%b0%d0%b4%d0%be%d0%b5%d0%bb'.encode!(Encoding::UTF_8),
]

ss2 = [ # demonstrate non-decoded content and UTF-16BE surrogate pair decoding
  # %-encoded-UTF-16BE -> UTF-8
  'A%uD801%uDC10%u0410'.encode!(Encoding::UTF_8),
  # %-encoded-UTF-8 -> UTF-8
  '%41%f0%90%90%90%D0%90'.encode!(Encoding::UTF_8),
]

ss = ss.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
all_same? ss.map { |s| s.encode(Encoding::UTF_8) }

ss2 = ss2.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
all_same? ss2.map { |s| s.encode(Encoding::UTF_8) }

When run through irb:

ruby-1.9.2-head >   ss = ss.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
[#<Encoding:Shift_JIS>, #<Encoding:ISO-8859-5>, #<Encoding:UTF-8>]
 => ["\x{844E}\x{8470}\x{8474}\x{8480}\x{8475}\x{847C}", "\xBD\xD0\xD4\xDE\xD5\xDB", "Надоел"] 
ruby-1.9.2-head > all_same? ss.map { |s| s.encode(Encoding::UTF_8) }
 => true 
ruby-1.9.2-head > 
ruby-1.9.2-head >   ss2 = ss2.map { |s| s = unescape(s) }.tap { |ss| p ss.map { |s| s.encoding } }
[#<Encoding:UTF-8>, #<Encoding:UTF-8>]
 => ["A????А", "A????А"] 
ruby-1.9.2-head > all_same? ss2.map { |s| s.encode(Encoding::UTF_8) }
 => true 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文