网页源代码中的奇怪符号
我有问题 我尝试使用 Hpricot 解析 UTF-8 格式并包含俄语文本的网页
问题是我收到带有一些奇怪符号的俄语文本,并且当我尝试将 (iconv) 从 UTF-8 转换为 Windows 时出现错误-1251 或 ASCII
本页 http://market.yandex.ru/model-spec.xml?modelid=929123&hid=90548
所以
require 'rubygems'
require 'open-uri'
require 'hpricot'
require 'net/http'
url = "http://market.yandex.ru/model-spec.xml?modelid=929123&hid=90548"
f = open(url).read
doc = Hpricot(f)
html = doc.search("th.b-properties__title")
html.each do |h|
puts h.inner_html
end
这个源代码是 UTF-8 但是!有几个奇怪的符号,例如“\u{2192}”
i've got a problem
i try to parce a web page which in UTF-8 and have russian text by using Hpricot
The problem is that i get russian text with some strange symbols and i get an error when i try to convert (iconv) from UTF-8 to windows-1251 or ASCII
this page http://market.yandex.ru/model-spec.xml?modelid=929123&hid=90548
So
require 'rubygems'
require 'open-uri'
require 'hpricot'
require 'net/http'
url = "http://market.yandex.ru/model-spec.xml?modelid=929123&hid=90548"
f = open(url).read
doc = Hpricot(f)
html = doc.search("th.b-properties__title")
html.each do |h|
puts h.inner_html
end
This source is in UTF-8 BUT! there are several strange symbols such as "\u{2192}"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
所以,
我解决了。
我在 Windows 上使用 PowerShell 并使用 chcp 65001 以 UTF8 输出所有内容
所以这就是问题所在!
so,
i solved it.
i used PowerShell on windows and used chcp 65001 to output everything in UTF8
so that was the problem!