返回“评论最多”列表的单个 SQL 查询相对于过去 7 天,但列表始终必须包含某些内容
起初看似简单的事情现在变得有点脑筋急转弯。
我正在构建您在新闻/博客网站右侧栏中看到的“评论最多”或“电子邮件最多”列表之一。
该列表必须与最近的日期范围(例如最近 7 天)相关,因为您希望列表始终看起来新鲜并包含较新的内容。否则,获得大量评论的少数文章将永远位居榜首。
例如,在我的例子中,该网站有几篇文章,每篇都有数百条评论,但其余的文章都少于 20 条。因此,如果像这样拉出“评论最多”列表:
select
ArticleId,
count(CommentId) as Comments
from
Comment
group by
ArticleId
order by
count(CommentId) desc
limit 10;
那么具有数百条评论的几篇文章将始终处于顶部。该清单永远不会改变。
因此,我重写了选择以仅包含过去 7 天:
select
ArticleId,
count(CommentId) as Comments
from
Comment
where
Created >= '2011-06-14'
group by
ArticleId
order by
count(CommentId) desc
limit 10;
这看起来更好,但它仍然不起作用,因为它没有考虑过去 7 天内没有活动的情况。
如果过去 7 天内没有发生任何事情(或只发生了一些评论),则列表应包含在此之前的活动。底线 - 列表始终需要显示 10 篇文章。
我总是可以调用多个 SQL 语句,以 7 天为增量返回,直到填满列表,但我不想这样做。我将结果缓存在进程内内存中,但如果可能的话,我仍然希望只进行一次 SQL 调用。
想法?我觉得这很容易做到,我只是忽略了某个地方显而易见的事情。
What first seemed easy has now become a bit of a brain teaser.
I'm building one of those "Most Commented" or "Most Emailed" lists that you see in the right sidebar of news/blog websites.
The list has to be relative to a recent date range (say the last 7 days) because you want the list to always to look fresh and contain newer content. Otherwise, the few articles that got tons of comments will always sit atop the list.
For instance, in my case, the website has a few articles that have several hundred comments each but all the rest have less than 20. So if the "Most Commented" list was pulled like this:
select
ArticleId,
count(CommentId) as Comments
from
Comment
group by
ArticleId
order by
count(CommentId) desc
limit 10;
Then the few articles with several hundred comments would always be at the top. The list would never change.
So I rewrote the select to include just the last 7 days:
select
ArticleId,
count(CommentId) as Comments
from
Comment
where
Created >= '2011-06-14'
group by
ArticleId
order by
count(CommentId) desc
limit 10;
That looks better, but it still won't work because it doesn't account for situations in which there is no activity in the last 7 days.
If nothing happened (or only a few comments happened) in the last 7 days, the list should contain activity from before that. Bottom line - the list always needs to show 10 articles.
I could always call multiple SQL statements, going back in 7 days increments until I fill the list, but I don't want to do that. I'm caching the results in in-process memory, but I still want to only make one SQL call if at all possible.
Thoughts? I feel like this is easily doable and I'm just overlooking the obvious somewhere.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我建议计算每个结果的权重并按其排序。例如,您的体重可以是操作次数(评论、电子邮件等)/年龄(以天为单位)。这样,文章越旧,其权重就越低,除非它具有极高的行动率。
类似于:
您需要尝试使用权重公式,直到获得较新内容与稍旧但活跃的内容的正确组合。
I recommend computing a weight for each result and sort by that. For example, your weight could be the number of actions (comments, emails, etc) / age in days. That way, the older the article, the lower its weight unless it has an extremely high action rate.
Something like:
You'll want to play with your weight formula until you get the right mix of newer content with slightly older but active content.
这是另一种方法,可以回答您所写的问题(我之前的回答建议采用不同的方法)。
这个将返回本周的顶部,如果需要填充前 10 名,则返回前几周的更多内容。它的工作原理是计算一列,该列表示文章的发布周数,然后按该列进行排序,然后按评论数进行排序。
这里的优点是,将始终返回 10 个结果(假设您有超过 10 篇文章),并将返回按年龄和评论排序的结果,最近几周的文章位于顶部,接下来是一周前的文章,等等。
Here is another approach that answers your question as written (my previous answer suggests a different approach).
This one will return the top for this week and more from previous weeks if needed to fill the top 10. It works by computing a column that is the number of weeks old the article is and sorts by that and then the number of comments.
The advantage here is that will will always return 10 results (assuming you have more than 10 articles) and will return them sorted by age and comments with the most recent weeks articles at the top and the articles that are one week old next, etc.
我假设您还想限制记录数量。尝试如下操作:(注意:我使用 MSSQL,因此某些详细信息/语法对于 mysql 可能不正确...抱歉!)
I assume you also want to limit to a number of records. Try something like the following: (NOTE: I use MSSQL, so some details/syntax might be incorrect for mysql...sorry!)
假设您想继续以 7 天的增量填充它:
我不使用 MySQL,因此您可能需要使用语法,尤其是日期函数。
另外,这使用了一个数字表,它只是一个从 1 到任何数字的整数表。也许 MySQL 有一个内置函数可以实现此目的,否则您将需要创建或生成表。在这种情况下,它最多只能返回四个星期,因此您甚至可以简单地使用
SELECT 1 UNION ALL SELECT 2...
最后,由于它只能返回四个星期,您仍然可能会得到少于 10 篇文章。您可以将周数增加到在您的业务案例中看起来合理的值。
Assuming that you want to keep filling it in by 7 day increments:
I don't work with MySQL, so you may need to play with the syntax, especially with the date functions.
Also, this uses a Numbers table, which is just a table of numeric integers from 1 to whatever. Maybe MySQL has a built-in function for this, otherwise you would need to create or generate the table. It only goes back up to four weeks in this case, so you could even simply use a
SELECT 1 UNION ALL SELECT 2...
Finally, since it only goes back four weeks, you could still end up with less than 10 articles. You can increase the number of weeks to something that seems reasonable in your business case.
我可能遗漏了一些东西,但你不能将你已经拥有的东西结合起来吗?
我不知道 mySQL 语法,所以你必须转换。
I may be missing something but couldn't you Union what you already have?
I don't know the mySQL syntax so you will have to convert.
这将为您提供根据文章收到的评论总数排序的最近评论的文章。
希望这更接近你想要的。如果您需要准确的查询,即仅在 CommentsCount 字段中包含最近几周的评论,请告诉我:-)
This will give you the recently commented articles sorted based on the total comments received by an article.
Hope that is closer to what you want. If you need the accurate query i.e. include only the last weeks comment in the CommentsCount field, please let me know :-)