保存一系列页面然后尝试组合它们但只组合一个的脚本?
这是我的代码..
require "open-uri"
base_url = "http://en.wikipedia.org/wiki"
(1..5).each do |x|
# sets up the url
full_url = base_url + "/" + x.to_s
# reads the url
read_page = open(full_url).read
# saves the contents to a file and closes it
local_file = "my_copy_of-" + x.to_s + ".html"
file = open(local_file,"w")
file.write(read_page)
file.close
# open a file to store all entrys in
combined_numbers = open("numbers.html", "w")
entrys = open(local_file, "r")
combined_numbers.write(entrys.read)
entrys.close
combined_numbers.close
end
如你所见。它基本上抓取了维基百科文章 1 到 5 的内容,然后尝试将它们合并到一个名为numbers.html 的文件中。
它做到了第一点。但到了第二个的时候。好像只在循环中写第五篇的内容。
但我看不出哪里出了问题。有什么帮助吗?
Here's my code..
require "open-uri"
base_url = "http://en.wikipedia.org/wiki"
(1..5).each do |x|
# sets up the url
full_url = base_url + "/" + x.to_s
# reads the url
read_page = open(full_url).read
# saves the contents to a file and closes it
local_file = "my_copy_of-" + x.to_s + ".html"
file = open(local_file,"w")
file.write(read_page)
file.close
# open a file to store all entrys in
combined_numbers = open("numbers.html", "w")
entrys = open(local_file, "r")
combined_numbers.write(entrys.read)
entrys.close
combined_numbers.close
end
As you can see. It basically scrapes the contents of the wikipedia articles 1 through 5 and then attempts to combine them nto a single file called numbers.html.
It does the first bit right. But when it gets to the second. It only seem's to write in the contents of the fifth article in the loop.
I can't see where im going wrong though. Any help?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
打开摘要文件时,您选择了错误的模式。 “w”覆盖现有文件,而“a”附加到现有文件。
因此,请使用它来让您的代码正常工作:
否则,每次循环时,numbers.html 的文件内容都会被当前文章覆盖。
此外,我认为您应该使用
read_page
中的内容写入numbers.html
而不是从新编写的文件中读回它们:You chose the wrong mode when opening your summary file. "w" overwrites existing files while "a" appends to existing files.
So use this to get your code working:
Otherwise with each pass of the loop the file contents of numbers.html are overwritten with the current article.
Besides I think you should use the contents in
read_page
to write tonumbers.html
instead of reading them back in from your freshly written file: