获取一个包含过去 2 周内创建的帖子数的数组
我想创建迷你图来说明过去 2 周内我的博客上创建的帖子数量。为此,我需要首先生成一个数组,其中包含相关期间每天创建的帖子数。
例如,此数组:
[40, 18, 0, 2, 39, 37, 22, 25, 30, 60, 36, 5, 2, 2]
生成此迷你图:(我正在使用 Google Charts 周围的 Googlecharts 包装器 API )
我的问题是如何创建这些数组。这就是我现在正在做的事情:(我正在使用 Searchlogic 进行查询,但是即使你从未使用过它也应该是可以理解的)
history = []
14.downto(1) do |days_ago|
history.push(Post.created_at_after((days_ago + 1).day.ago.beginning_of_day).created_at_before((days_ago - 1).days.ago.beginning_of_day).size)
end
这种方法很丑陋而且很慢——必须有更好的方法!
I want to create sparklines that illustrate the number of posts created on my blog in the last 2 weeks. To do this, I need to first generate an array that contains the number of posts created on each day during the period in question.
For example, this array:
[40, 18, 0, 2, 39, 37, 22, 25, 30, 60, 36, 5, 2, 2]
generates this sparkline: (I'm using the Googlecharts wrapper around the Google Charts API)
My question is how to create these arrays. Here's what I'm doing now: (I'm using Searchlogic to do the queries, but it should be understandable even if you've never used it)
history = []
14.downto(1) do |days_ago|
history.push(Post.created_at_after((days_ago + 1).day.ago.beginning_of_day).created_at_before((days_ago - 1).days.ago.beginning_of_day).size)
end
This approach is ugly and slow -- there must be a better way!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
这将为您提供一个哈希映射日期到帖子计数:
然后您可以将其转换为数组:
This will give you a hash mapping dates to post counts:
You can then turn this into an array:
您需要对数据进行正确索引,否则将永远无法有效工作。如果您使用“天”的粒度,那么有一个日期列是值得的。然后,您可以使用标准 SQL GROUP BY 操作直接获取所需的值。
例如,迁移可以这样完成:
然后检索非常快,因为它可以使用索引:
请记住,如果您可能错过了一天,则必须通过在结果中插入零值来解决这一问题。此处返回日期,因此您应该能够计算缺失值并填写它们。通常,这是通过使用collect迭代一组天来完成的。
当您需要快速检索一小部分数据时,加载模型实例始终是一个巨大的瓶颈。如果没有简单的方法来获取所需内容,通常您需要直接使用 SQL。
You need to have your data indexed properly or this will never work efficiently. If you're using a granularity of "day" then it pays to have a Date column. You can then use a standard SQL GROUP BY operation to get the values you need directly.
For example, a migration could be done like:
Then retrieval is really fast since it can exercise the index:
Keep in mind if you're potentially missing a day you'll have to account for that by inserting a zero value into your results. The date is returned here, so you should be able to compute the missing values and fill them in. Typically this is done by iterating over a group of days using collect.
When you need to retrieve a thin slice of data quickly, loading instances of the models will always be a huge bottleneck. Often you need to go directly to SQL if there's no simple way to fetch what you need.
试试这个:
注释 1:
如果您在
created_at
上添加索引,此方法应该可以很好地扩展。如果你遇到每天有数百万条记录,那么您最好将每天的帖子计数存储在另一个表中。
注2:
您可以缓存和老化结果以提高性能。在我的系统中,我通常将 TTL 设置为 10-15 分钟。
Try this:
Note 1:
If you add an index on
created_at
this method should scale well. If you run in tomillions of records each day then you are better off storing the post count by day in another table.
Note 2:
You can cache and age the results to improve performance. In my system I typically set the TTL to be 10-15min.
除了 tadman 的答案之外,如果您具有所需的管理员访问权限,您可能需要调查 分区 基于日期,尤其是当您每天收到大量帖子时。
In addition to tadman's answer, if you have the required administrator access, you may want to investigate partitioning based on date, especially if you receive an extremely high volume of posts per day.
大部分时间花费在执行 14 个数据库查询上,每个查询都需要扫描表中的每一行以检查日期(假设您没有按created_at 建立索引)。
为了最大限度地减少这种情况,我们可以执行单个数据库查询来获取相关行,然后对它们进行排序。
我还建议您添加一个索引,就像 tadman 推荐的那样,但在本例中添加到 posts 表中的created_at 字段。
The majority of the time spent is doing the 14 database queries that each need to scan every row in the table to check the date (assuming you are not indexing by created_at).
To minimize this, we can do a single database query to grab the relevant rows, and then sort through them.
I also recommend you add an index, like tadman recommended, but in this case to the created_at field in the posts table.