初级 SQL 问题:在 Stack Exchange Data Explorer 中查询金牌和银牌标签
我正在使用 Stack Exchange Data Explorer 来学习 SQL,但我认为问题的基本原理是适用的到其他数据库。
我正在尝试查询 Badges
表,根据 Stexdex(这就是我从现在开始将其称为)具有以下架构:
- Badges
- 身份证
- 用户 ID
- 姓名
- 日期
这对于具有唯一名称的徽章(如 [Epic]
和 [Legendary]
)效果很好,但银色和金色标签特定徽章似乎通过具有相同的确切名称而混合在一起。
这是我为 [mysql]
标记编写的示例查询:(
SELECT
UserId as [User Link],
Date
FROM
Badges
Where
Name = 'mysql'
Order By
Date ASC
稍微注释的)输出为: 如在 stexdex 上看到的:
User Link Date
--------------- ------------------- // all for silver except where noted
Bill Karwin 2009-02-20 11:00:25
Quassnoi 2009-06-01 10:00:16
Greg 2009-10-22 10:00:25
Quassnoi 2009-10-31 10:00:24 // for gold
Bill Karwin 2009-11-23 11:00:30 // for gold
cletus 2010-01-01 11:00:23
OMG Ponies 2010-01-03 11:00:48
Pascal MARTIN 2010-02-17 11:00:29
Mark Byers 2010-04-07 10:00:35
Daniel Vassallo 2010-05-14 10:00:38
这与当前的银牌和金币 在撰写本文时,但用更永恒的术语来说,截至 2010 年 5 月底,只有 2 位用户赢得了黄金 [mysql]
标签:Quassnoi 和 Bill Karwin,如上面的结果所示,他们的名字是唯一出现两次的名字。
所以这就是我的理解方式:
- 第一次出现
Id
(按时间顺序)是银色徽章 - 第二次是金色徽章
现在,上面的结果混合了银色和金色条目一起。我的问题是:
- 这是一个典型的设计,还是有更友好的模式/规范化/无论你怎么称呼它?
- 在目前的设计中,如何分别查询银牌和金牌呢?
GROUP BY Id
并按Date
以某种方式选择最小/最大或第一/第二?- 如何编写一个查询,首先列出所有银徽章,然后列出所有金徽章?
- 还可以想象“真实”查询可能更复杂,即不仅仅是按日期列出。
- 您将如何编写它,以便在白银子查询和黄金子查询之间不会有太多重复?
- 执行两个完全独立的查询是否更为典型?
- 这个成语叫什么?行“分区”查询将它们放入“桶”或其他东西中?
需求澄清
最初我想要以下输出,本质上是:
User Link Date
--------------- -------------------
Bill Karwin 2009-02-20 11:00:25 // result of query for silver
Quassnoi 2009-06-01 10:00:16 // :
Greg 2009-10-22 10:00:25 // :
cletus 2010-01-01 11:00:23 // :
OMG Ponies 2010-01-03 11:00:48 // :
Pascal MARTIN 2010-02-17 11:00:29 // :
Mark Byers 2010-04-07 10:00:35 // :
Daniel Vassallo 2010-05-14 10:00:38 // :
------- maybe some sort of row separator here? can SQL do this? -------
Quassnoi 2009-10-31 10:00:24 // result of query for gold
Bill Karwin 2009-11-23 11:00:30 // :
但是到目前为止,针对白银和黄金的单独列的答案也很棒,所以也可以随意追求这个角度。不过,我仍然很好奇你会如何执行上述操作。
I'm using the Stack Exchange Data Explorer to learn SQL, but I think the fundamentals of the question is applicable to other databases.
I'm trying to query the Badges
table, which according to Stexdex (that's what I'm going to call it from now on) has the following schema:
- Badges
- Id
- UserId
- Name
- Date
This works well for badges like [Epic]
and [Legendary]
which have unique names, but the silver and gold tag-specific badges seems to be mixed in together by having the same exact name.
Here's an example query I wrote for [mysql]
tag:
SELECT
UserId as [User Link],
Date
FROM
Badges
Where
Name = 'mysql'
Order By
Date ASC
The (slightly annotated) output is: as seen on stexdex:
User Link Date
--------------- ------------------- // all for silver except where noted
Bill Karwin 2009-02-20 11:00:25
Quassnoi 2009-06-01 10:00:16
Greg 2009-10-22 10:00:25
Quassnoi 2009-10-31 10:00:24 // for gold
Bill Karwin 2009-11-23 11:00:30 // for gold
cletus 2010-01-01 11:00:23
OMG Ponies 2010-01-03 11:00:48
Pascal MARTIN 2010-02-17 11:00:29
Mark Byers 2010-04-07 10:00:35
Daniel Vassallo 2010-05-14 10:00:38
This is consistent with the current list of silver and gold earners at the moment of this writing, but to speak in more timeless terms, as of the end of May 2010 only 2 users have earned the gold [mysql]
tag: Quassnoi and Bill Karwin, as evidenced in the above result by their names being the only ones that appear twice.
So this is the way I understand it:
- The first time an
Id
appears (in chronological order) is for the silver badge - The second time is for the gold
Now, the above result mixes the silver and gold entries together. My questions are:
- Is this a typical design, or are there much friendlier schema/normalization/whatever you call it?
- In the current design, how would you query the silver and gold badges separately?
GROUP BY Id
and picking the min/max or first/second by theDate
somehow?- How can you write a query that lists all the silver badges first then all the gold badges next?
- Imagine also that the "real" query may be more complicated, i.e. not just listing by date.
- How would you write it so that it doesn't have too many repetition between the silver and gold subqueries?
- Is it perhaps more typical to do two totally separate queries instead?
- What is this idiom called? A row "partitioning" query to put them into "buckets" or something?
Requirement clarification
Originally I wanted the following output, essentially:
User Link Date
--------------- -------------------
Bill Karwin 2009-02-20 11:00:25 // result of query for silver
Quassnoi 2009-06-01 10:00:16 // :
Greg 2009-10-22 10:00:25 // :
cletus 2010-01-01 11:00:23 // :
OMG Ponies 2010-01-03 11:00:48 // :
Pascal MARTIN 2010-02-17 11:00:29 // :
Mark Byers 2010-04-07 10:00:35 // :
Daniel Vassallo 2010-05-14 10:00:38 // :
------- maybe some sort of row separator here? can SQL do this? -------
Quassnoi 2009-10-31 10:00:24 // result of query for gold
Bill Karwin 2009-11-23 11:00:30 // :
But the answers so far with a separate column for silver and gold is also great, so feel free to pursue that angle as well. I'm still curious how you'd do the above, though.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
当然,您可以添加类型代码以使其更加明确。但是,当您考虑到金徽章不能先于银徽章时,日期戳对于区分它们非常有意义。
是的 - 连接到派生表(又名内联视图),该表是用户和用户的列表。最短日期将返回银徽章。使用
HAVING COUNT(*) >= 1
也可以。您必须使用 GROUP BY 和 HAVING COUNT(*) = 2` 的组合来获得金色徽章 - 最大日期并不能确保一个用户 ID 有多个记录...抱歉 - 由用户决定,还是先获得银牌,然后获得金牌?前者可以简单地通过使用 ORDER BY t.userid, t.date 来完成;后者我可能会使用分析函数(即:ROW_NUMBER(),RANK())...
无论如何,请参阅上面您的要求是多么模糊......
您所问的内容由以下同义词引用:分析、窗口化、排名...
Sure, you could add a type code to make it more explicit. But when you consider that one can not get a gold badge before a silver one, the date stamp makes a lot of sense to differentiate between them.
Yes - joining onto a derived table (AKA inline view) that is a list of users & the minimum date would return the silver badges. Using
HAVING COUNT(*) >= 1
would work too. You'd have to use a combination of GROUP BY and HAVING COUNT(*) = 2` to get gold badges - the max date doesn't ensure that there are more than one record for a userid...Sorry - by users, or all silvers first and then golds? The former might be done simply by using
ORDER BY t.userid, t.date
; the latter I'd likely use analytic functions (IE: ROW_NUMBER(), RANK())...See above about how vague your requirements are, to me anyways...
What you're asking about is referred to by the following synonyms: Analytic, Windowing, ranking...
您会执行类似的操作,并且仅依赖于日期或总计计数。
可以说,先查询白银,然后查询黄金也是没有意义的,而是像这样并排获取数据:
不幸的是,您还没有真正指定您想要的内容,但是聚合的一个很好的起点是用简单的英语表达它
示例:“给我标签 mysql 的每个用户银牌和金牌徽章奖励的日期”。这是做什么的:
更新后编辑:
您想要的输出并不是真正的 SQL:它是 2 个独立的记录集。分隔符是不行的。作为基于 setb 的操作,不存在“自然”顺序,因此引入一个:
You'd do something like this and rely only on date or count in an aggregate.
Arguably, it also makes no sense to query silver followed by gold, but rather get data side by side like this:
Unfortunately, you haven't really specified what you want, but a good starting point for aggregates is to express it in plain English
Example: "Give me dates of silver and gold badge awards per user for tag mysql". Which this does:
Edit, after update:
Your desired output is not really SQL: it's 2 separate recordsets. The separator is a no-go. As a setb based operation, there is no "natural" order so this introduces one: