查找每 50,000 个，然后查找下一个 50,000 个，依此类推，并将它们保存在不同的文件中

发布于 2024-11-04 12:59:51 字数 853 浏览 1 评论 0原文

我有以下 Rake 文件。使用 RoR 2.3.8。

desc "Create shops sitemap"
task(:shops => :environment) do
  sitemap = Sitemap.new
  #add every item
  for i in shop.find(:all, :select => 'id, updated_at', :order => 'updated_at DESC', :limit => 50000)
    sitemap.add_url("http://abc.com/shops/#{i.id}",w3c_date(i.updated_at),'daily','1.0')
  end

  puts "#{sitemap.urls.length} total urls"
  #delete the file
  FileUtils.rm(File.join(RAILS_ROOT, "public/sitemap_shops_1.xml.gz"), :force => true)

  f =File.new(File.join(RAILS_ROOT, "public/sitemap_shops_1.xml"), 'w')

  sitemap.write(f,2)
  f.close

  system("gzip #{File.join(RAILS_ROOT, 'public/sitemap_shops_1.xml')}")
end

上面的文件根据上次更新搜索前 50,000 条记录，然后保存在编号为 1 的文件中。

如何修改代码以使其搜索接下来的 50,000 条记录，并保存编号为 2 的文件，然后保存下一个 50,000 条记录，另存为编号为 1 的文件3，等等？

谢谢。

原文

I have the following Rake file. Using RoR 2.3.8.

desc "Create shops sitemap"
task(:shops => :environment) do
  sitemap = Sitemap.new
  #add every item
  for i in shop.find(:all, :select => 'id, updated_at', :order => 'updated_at DESC', :limit => 50000)
    sitemap.add_url("http://abc.com/shops/#{i.id}",w3c_date(i.updated_at),'daily','1.0')
  end

  puts "#{sitemap.urls.length} total urls"
  #delete the file
  FileUtils.rm(File.join(RAILS_ROOT, "public/sitemap_shops_1.xml.gz"), :force => true)

  f =File.new(File.join(RAILS_ROOT, "public/sitemap_shops_1.xml"), 'w')

  sitemap.write(f,2)
  f.close

  system("gzip #{File.join(RAILS_ROOT, 'public/sitemap_shops_1.xml')}")
end

The file above searches the first 50,000 records based on last updated, then save in a file numbered 1.

How do I modify the code to have it search the next 50,000, and save the file numbered 2, then next 50,000, save as file numbered 3, etc.?

Thanks.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

探春 2024-11-11 12:59:51

您可以使用 find_in_batches 代替 find，它一次返回 1,000 个组（但您可以使用 :batch_size 将其覆盖为 50,000） > 选项）。添加一个计数器变量（因为我认为 find_in_batches 没有类似 each_with_index 的东西），您就可以处理您需要的所有文件。

desc "Create shops sitemap"
task(:shops => :environment) do
  file_name_index = 0
  Shop.find_in_batches(:all, :select => 'id, updated_at', :order => 'updated_at DESC', :batch_size => 50000) do |group_of_50000|
    file_name_index += 1
    sitemap = Sitemap.new
    #add every item
    for i in group_of_50000
      sitemap.add_url("http://abc.com/shops/#{i.id}",w3c_date(i.updated_at),'daily','1.0')
    end

    puts "#{sitemap.urls.length} total urls"
    #delete the file
    FileUtils.rm(File.join(RAILS_ROOT, "public/sitemap_shops_#{file_name_index}.xml.gz"), :force => true)

    f =File.new(File.join(RAILS_ROOT, "public/sitemap_shops_#{file_name_index}.xml"), 'w')

    sitemap.write(f,2)
    f.close

    system("gzip #{File.join(RAILS_ROOT, 'public/sitemap_shops_#{file_name_index}.xml')}")
  end
end

Instead of find, you can use find_in_batches which will return groups of 1,000 at a time (but you can override this to be 50,000 with the :batch_size option). Throw in a counter variable (since I don't think find_in_batches has anything like an each_with_index) and you can handle all the files you need.

desc "Create shops sitemap"
task(:shops => :environment) do
  file_name_index = 0
  Shop.find_in_batches(:all, :select => 'id, updated_at', :order => 'updated_at DESC', :batch_size => 50000) do |group_of_50000|
    file_name_index += 1
    sitemap = Sitemap.new
    #add every item
    for i in group_of_50000
      sitemap.add_url("http://abc.com/shops/#{i.id}",w3c_date(i.updated_at),'daily','1.0')
    end

    puts "#{sitemap.urls.length} total urls"
    #delete the file
    FileUtils.rm(File.join(RAILS_ROOT, "public/sitemap_shops_#{file_name_index}.xml.gz"), :force => true)

    f =File.new(File.join(RAILS_ROOT, "public/sitemap_shops_#{file_name_index}.xml"), 'w')

    sitemap.write(f,2)
    f.close

    system("gzip #{File.join(RAILS_ROOT, 'public/sitemap_shops_#{file_name_index}.xml')}")
  end
end

回复收藏 0 原文

~没有更多了~