获取一个包含过去 2 周内创建的帖子数的数组

发布于 2024-08-23 17:02:36 字数 863 浏览 8 评论 0原文

我想创建迷你图来说明过去 2 周内我的博客上创建的帖子数量。为此,我需要首先生成一个数组,其中包含相关期间每天创建的帖子数。

例如,此数组:

[40, 18, 0, 2, 39, 37, 22, 25, 30, 60, 36, 5, 2, 2]

生成此迷你图:(我正在使用 Google Charts 周围的 Googlecharts 包装器 API )

我的问题是如何创建这些数组。这就是我现在正在做的事情:(我正在使用 Searchlogic 进行查询,但是即使你从未使用过它也应该是可以理解的)

  history = []
  14.downto(1) do |days_ago|
    history.push(Post.created_at_after((days_ago + 1).day.ago.beginning_of_day).created_at_before((days_ago - 1).days.ago.beginning_of_day).size)
  end

这种方法很丑陋而且很慢——必须有更好的方法!

I want to create sparklines that illustrate the number of posts created on my blog in the last 2 weeks. To do this, I need to first generate an array that contains the number of posts created on each day during the period in question.

For example, this array:

[40, 18, 0, 2, 39, 37, 22, 25, 30, 60, 36, 5, 2, 2]

generates this sparkline: (I'm using the Googlecharts wrapper around the Google Charts API)

My question is how to create these arrays. Here's what I'm doing now: (I'm using Searchlogic to do the queries, but it should be understandable even if you've never used it)

  history = []
  14.downto(1) do |days_ago|
    history.push(Post.created_at_after((days_ago + 1).day.ago.beginning_of_day).created_at_before((days_ago - 1).days.ago.beginning_of_day).size)
  end

This approach is ugly and slow -- there must be a better way!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

千柳 2024-08-30 17:02:36

这将为您提供一个哈希映射日期到帖子计数:

counts = Post.count(
  :conditions => ["created_at >= ?", 14.days.ago],
  :group => "DATE(created_at)"
)

然后您可以将其转换为数组:

counts_array = []
14.downto(1) do |d|
  counts_array << (counts[d.days.ago.to_date.to_s] || 0)
end

This will give you a hash mapping dates to post counts:

counts = Post.count(
  :conditions => ["created_at >= ?", 14.days.ago],
  :group => "DATE(created_at)"
)

You can then turn this into an array:

counts_array = []
14.downto(1) do |d|
  counts_array << (counts[d.days.ago.to_date.to_s] || 0)
end
只等公子 2024-08-30 17:02:36

您需要对数据进行正确索引,否则将永远无法有效工作。如果您使用“天”的粒度,那么有一个日期列是值得的。然后,您可以使用标准 SQL GROUP BY 操作直接获取所需的值。

例如,迁移可以这样完成:

self.up
  add_column :posts, :created_on_date
  add_index :posts, :created_on_date

  execute "UPDATE posts SET created_on_date=created_at"
end

然后检索非常快,因为它可以使用索引:

def sparkline_data
  self.class.connection.select_values("
    SELECT created_on_date, COUNT(id) FROM posts
      WHERE created_on_date>DATE_SUB(UTC_TIMESTAMP(), INTERVAL 14 DAY)
      GROUP BY created_on_date
  ").collect(&:to_i)
end

请记住,如果您可能错过了一天,则必须通过在结果中插入零值来解决这一问题。此处返回日期,因此您应该能够计算缺失值并填写它们。通常,这是通过使用collect迭代一组天来完成的。

当您需要快速检索一小部分数据时,加载模型实例始终是一个巨大的瓶颈。如果没有简单的方法来获取所需内容,通常您需要直接使用 SQL。

You need to have your data indexed properly or this will never work efficiently. If you're using a granularity of "day" then it pays to have a Date column. You can then use a standard SQL GROUP BY operation to get the values you need directly.

For example, a migration could be done like:

self.up
  add_column :posts, :created_on_date
  add_index :posts, :created_on_date

  execute "UPDATE posts SET created_on_date=created_at"
end

Then retrieval is really fast since it can exercise the index:

def sparkline_data
  self.class.connection.select_values("
    SELECT created_on_date, COUNT(id) FROM posts
      WHERE created_on_date>DATE_SUB(UTC_TIMESTAMP(), INTERVAL 14 DAY)
      GROUP BY created_on_date
  ").collect(&:to_i)
end

Keep in mind if you're potentially missing a day you'll have to account for that by inserting a zero value into your results. The date is returned here, so you should be able to compute the missing values and fill them in. Typically this is done by iterating over a group of days using collect.

When you need to retrieve a thin slice of data quickly, loading instances of the models will always be a huge bottleneck. Often you need to go directly to SQL if there's no simple way to fetch what you need.

倒数 2024-08-30 17:02:36

试试这个:

n_days_ago, today = (Date.today-days_ago), Date.today

# get the count by date from the database  
post_count_hash = Post.count(:group => "DATE(created_at)", 
             :conditions => ["created_at BETWEEN ? AND ? ", n_days_ago, today])

# now fill the missing date with 0   
(n_days_ago..today).each{ |date| post_count_hash[date.to_s] ||=0 }

post_count_hash.sort.collect{|kv| kv[0]}

注释 1
如果您在 created_at 上添加索引,此方法应该可以很好地扩展。如果你遇到
每天有数百万条记录,那么您最好将每天的帖子计数存储在另一个表中。

注2
您可以缓存和老化结果以提高性能。在我的系统中,我通常将 TTL 设置为 10-15 分钟。

Try this:

n_days_ago, today = (Date.today-days_ago), Date.today

# get the count by date from the database  
post_count_hash = Post.count(:group => "DATE(created_at)", 
             :conditions => ["created_at BETWEEN ? AND ? ", n_days_ago, today])

# now fill the missing date with 0   
(n_days_ago..today).each{ |date| post_count_hash[date.to_s] ||=0 }

post_count_hash.sort.collect{|kv| kv[0]}

Note 1:
If you add an index on created_at this method should scale well. If you run in to
millions of records each day then you are better off storing the post count by day in another table.

Note 2:
You can cache and age the results to improve performance. In my system I typically set the TTL to be 10-15min.

流年里的时光 2024-08-30 17:02:36

除了 tadman 的答案之外,如果您具有所需的管理员访问权限,您可能需要调查 分区 基于日期,尤其是当您每天收到大量帖子时。

In addition to tadman's answer, if you have the required administrator access, you may want to investigate partitioning based on date, especially if you receive an extremely high volume of posts per day.

天冷不及心凉 2024-08-30 17:02:36

大部分时间花费在执行 14 个数据库查询上,每个查询都需要扫描表中的每一行以检查日期(假设您没有按created_at 建立索引)。

为了最大限度地减少这种情况,我们可以执行单个数据库查询来获取相关行,然后对它们进行排序。

history = []
14.times { history << 0 }
recent_posts = Post.created_at_after(14.days.ago.beginning_of_day)
recent_posts.each do |post|
  history[(Date.today - post.created_at.to_date).to_i] += 1
end

我还建议您添加一个索引,就像 tadman 推荐的那样,但在本例中添加到 posts 表中的created_at 字段。

The majority of the time spent is doing the 14 database queries that each need to scan every row in the table to check the date (assuming you are not indexing by created_at).

To minimize this, we can do a single database query to grab the relevant rows, and then sort through them.

history = []
14.times { history << 0 }
recent_posts = Post.created_at_after(14.days.ago.beginning_of_day)
recent_posts.each do |post|
  history[(Date.today - post.created_at.to_date).to_i] += 1
end

I also recommend you add an index, like tadman recommended, but in this case to the created_at field in the posts table.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文