保存一系列页面然后尝试组合它们但只组合一个的脚本？

发布于 2025-01-05 01:33:17 字数 741 浏览 5 评论 0原文

这是我的代码..

require "open-uri"

base_url = "http://en.wikipedia.org/wiki"

(1..5).each do |x|
  # sets up the url
  full_url = base_url + "/" + x.to_s
  # reads the url
  read_page = open(full_url).read
  # saves the contents to a file and closes it
  local_file = "my_copy_of-" + x.to_s + ".html"
  file = open(local_file,"w")
  file.write(read_page)
  file.close

  # open a file to store all entrys in

  combined_numbers = open("numbers.html", "w")

  entrys = open(local_file, "r")

  combined_numbers.write(entrys.read)

  entrys.close
  combined_numbers.close

end

如你所见。它基本上抓取了维基百科文章 1 到 5 的内容，然后尝试将它们合并到一个名为numbers.html 的文件中。

它做到了第一点。但到了第二个的时候。好像只在循环中写第五篇的内容。

但我看不出哪里出了问题。有什么帮助吗？

原文

Here's my code..

require "open-uri"

base_url = "http://en.wikipedia.org/wiki"

(1..5).each do |x|
  # sets up the url
  full_url = base_url + "/" + x.to_s
  # reads the url
  read_page = open(full_url).read
  # saves the contents to a file and closes it
  local_file = "my_copy_of-" + x.to_s + ".html"
  file = open(local_file,"w")
  file.write(read_page)
  file.close

  # open a file to store all entrys in

  combined_numbers = open("numbers.html", "w")

  entrys = open(local_file, "r")

  combined_numbers.write(entrys.read)

  entrys.close
  combined_numbers.close

end

As you can see. It basically scrapes the contents of the wikipedia articles 1 through 5 and then attempts to combine them nto a single file called numbers.html.

It does the first bit right. But when it gets to the second. It only seem's to write in the contents of the fifth article in the loop.

I can't see where im going wrong though. Any help?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

九公里浅绿 2025-01-12 01:33:17

打开摘要文件时，您选择了错误的模式。 “w”覆盖现有文件，而“a”附加到现有文件。

因此，请使用它来让您的代码正常工作：

combined_numbers = open("numbers.html", "a")

否则，每次循环时，numbers.html 的文件内容都会被当前文章覆盖。

此外，我认为您应该使用 read_page 中的内容写入 numbers.html 而不是从新编写的文件中读回它们：

require "open-uri"

(1..5).each do |x|
  # set up and read url
  url = "http://en.wikipedia.org/wiki/#{x.to_s}"
  article = open(url).read

  # saves current article to a file
  # (only possible with 1.9.x use open too if on 1.8.x)
  IO.write("my_copy_of-#{x.to_s}.html", article)

  # add current article to summary file
  open("numbers.html", "a") do |f|
    f.write(article)
  end
end

You chose the wrong mode when opening your summary file. "w" overwrites existing files while "a" appends to existing files.

So use this to get your code working:

combined_numbers = open("numbers.html", "a")

Otherwise with each pass of the loop the file contents of numbers.html are overwritten with the current article.

Besides I think you should use the contents in read_page to write to numbers.html instead of reading them back in from your freshly written file:

require "open-uri"

(1..5).each do |x|
  # set up and read url
  url = "http://en.wikipedia.org/wiki/#{x.to_s}"
  article = open(url).read

  # saves current article to a file
  # (only possible with 1.9.x use open too if on 1.8.x)
  IO.write("my_copy_of-#{x.to_s}.html", article)

  # add current article to summary file
  open("numbers.html", "a") do |f|
    f.write(article)
  end
end

回复收藏 0 原文

~没有更多了~