为什么我会收到“错误的状态行” Nokogiri 的错误?
我的 Ruby/Nokogiri 脚本是:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
f = File.new("enterret" + ".txt", 'w')
1.upto(100) do |page|
urltext = "http://xxxxxxx.com/" + "page/"
urltext << page.to_s + "/"
doc = Nokogiri::HTML(open(urltext))
doc.css(".photoPost").each do |post|
quote = post.css("h1 + p").text
author = post.css("h1 + p + p").text
f.puts "#{quote}" + "#{author}"
f.puts "--------------------------------------------------------"
end
end
运行此脚本时,出现以下错误:
http.rb:2030:in `read_status_line': wrong status line: "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"" (Net::HTTPBadResponse)
但是我的脚本正确写入文件,只是此错误不断出现。错误是什么意思?
My Ruby/Nokogiri script is:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
f = File.new("enterret" + ".txt", 'w')
1.upto(100) do |page|
urltext = "http://xxxxxxx.com/" + "page/"
urltext << page.to_s + "/"
doc = Nokogiri::HTML(open(urltext))
doc.css(".photoPost").each do |post|
quote = post.css("h1 + p").text
author = post.css("h1 + p + p").text
f.puts "#{quote}" + "#{author}"
f.puts "--------------------------------------------------------"
end
end
When running this script i get the following error:
http.rb:2030:in `read_status_line': wrong status line: "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"" (Net::HTTPBadResponse)
However my script writes to file correctly, it just that this error keeps coming up. What does the error mean?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在不知道您正在访问哪个网站的情况下,很难确定,但我怀疑问题不在于 Nokogiri。
该错误是由
http.rb
报告的,它很可能会抱怨返回的 HTTPd 标头。http.rb
与 HTTPd 服务器的握手有关,并且会抱怨丢失/格式错误的标头,但它不关心有效负载。另一方面,Nokogiri 会关心有效负载,即 HTML。 DOCTYPE 应该是 HTML 有效负载的一部分,因此我怀疑他们的服务器正在发送 HTML DOCTYPE 而不是 MIME doctype,后者应该是
"text/html"
。在 Ruby 1.8.7 http.rb 文件中,您将在代码中的 2030 处看到以下行:
这似乎是生成您所看到的消息类型的可能位置。
Without knowing what site you are accessing it is hard to say for sure, but I suspect that the problem isn't in Nokogiri.
The error is being reported by
http.rb
, which would most likely be complaining about the HTTPd headers being returned.http.rb
is concerned with the handshake with the HTTPd server and would whine about missing/malformed headers, but it wouldn't care about the payload.Nokogiri, on the other hand, would be concerned about the payload, i.e., the HTML. The DOCTYPE is supposed to be part of the HTML payload, so I suspect their server is sending a HTML DOCTYPE instead of a MIME doctype, which should be
"text/html"
.In the Ruby 1.8.7 http.rb file you'll see the following lines at 2030 in the code:
That seems a likely place to generate the sort of message you're seeing.