初级 SQL 问题：在 Stack Exchange Data Explorer 中查询金牌和银牌标签

发布于 2024-09-04 06:21:28 字数 2834 浏览 7 评论 0原文

我正在使用 Stack Exchange Data Explorer 来学习 SQL，但我认为问题的基本原理是适用的到其他数据库。

我正在尝试查询 Badges 表，根据 Stexdex（这就是我从现在开始将其称为）具有以下架构：

Badges
- 身份证
- 用户 ID
- 姓名
- 日期

这对于具有唯一名称的徽章（如 [Epic] 和 [Legendary]）效果很好，但银色和金色标签特定徽章似乎通过具有相同的确切名称而混合在一起。

这是我为 [mysql] 标记编写的示例查询：（

SELECT
  UserId as [User Link],
  Date
FROM
  Badges
Where
  Name = 'mysql'
Order By
  Date ASC

稍微注释的）输出为：如在 stexdex 上看到的：

User Link       Date                    
--------------- -------------------     // all for silver except where noted
Bill Karwin     2009-02-20 11:00:25     
Quassnoi        2009-06-01 10:00:16     
Greg            2009-10-22 10:00:25     
Quassnoi        2009-10-31 10:00:24     // for gold
Bill Karwin     2009-11-23 11:00:30     // for gold
cletus          2010-01-01 11:00:23    
OMG Ponies      2010-01-03 11:00:48     
Pascal MARTIN   2010-02-17 11:00:29 
Mark Byers      2010-04-07 10:00:35     
Daniel Vassallo 2010-05-14 10:00:38

这与当前的银牌和金币在撰写本文时，但用更永恒的术语来说，截至 2010 年 5 月底，只有 2 位用户赢得了黄金 [mysql] 标签：Quassnoi 和 Bill Karwin，如上面的结果所示，他们的名字是唯一出现两次的名字。

所以这就是我的理解方式：

第一次出现Id（按时间顺序）是银色徽章
第二次是金色徽章

现在，上面的结果混合了银色和金色条目一起。我的问题是：

这是一个典型的设计，还是有更友好的模式/规范化/无论你怎么称呼它？
在目前的设计中，如何分别查询银牌和金牌呢？
- GROUP BY Id 并按Date 以某种方式选择最小/最大或第一/第二？
- 如何编写一个查询，首先列出所有银徽章，然后列出所有金徽章？
  - 还可以想象“真实”查询可能更复杂，即不仅仅是按日期列出。
  - 您将如何编写它，以便在白银子查询和黄金子查询之间不会有太多重复？
- 执行两个完全独立的查询是否更为典型？
- 这个成语叫什么？行“分区”查询将它们放入“桶”或其他东西中？

需求澄清

最初我想要以下输出，本质上是：

User Link       Date                    
--------------- -------------------     
Bill Karwin     2009-02-20 11:00:25     // result of query for silver
Quassnoi        2009-06-01 10:00:16     // :
Greg            2009-10-22 10:00:25     // :
cletus          2010-01-01 11:00:23     // :
OMG Ponies      2010-01-03 11:00:48     // :
Pascal MARTIN   2010-02-17 11:00:29     // :
Mark Byers      2010-04-07 10:00:35     // :
Daniel Vassallo 2010-05-14 10:00:38     // :
------- maybe some sort of row separator here? can SQL do this? -------
Quassnoi        2009-10-31 10:00:24     // result of query for gold
Bill Karwin     2009-11-23 11:00:30     // :

但是到目前为止，针对白银和黄金的单独列的答案也很棒，所以也可以随意追求这个角度。不过，我仍然很好奇你会如何执行上述操作。

原文

I'm using the Stack Exchange Data Explorer to learn SQL, but I think the fundamentals of the question is applicable to other databases.

I'm trying to query the Badges table, which according to Stexdex (that's what I'm going to call it from now on) has the following schema:

Badges
- Id
- UserId
- Name
- Date

This works well for badges like [Epic] and [Legendary] which have unique names, but the silver and gold tag-specific badges seems to be mixed in together by having the same exact name.

Here's an example query I wrote for [mysql] tag:

SELECT
  UserId as [User Link],
  Date
FROM
  Badges
Where
  Name = 'mysql'
Order By
  Date ASC

The (slightly annotated) output is: as seen on stexdex:

User Link       Date                    
--------------- -------------------     // all for silver except where noted
Bill Karwin     2009-02-20 11:00:25     
Quassnoi        2009-06-01 10:00:16     
Greg            2009-10-22 10:00:25     
Quassnoi        2009-10-31 10:00:24     // for gold
Bill Karwin     2009-11-23 11:00:30     // for gold
cletus          2010-01-01 11:00:23    
OMG Ponies      2010-01-03 11:00:48     
Pascal MARTIN   2010-02-17 11:00:29 
Mark Byers      2010-04-07 10:00:35     
Daniel Vassallo 2010-05-14 10:00:38

This is consistent with the current list of silver and gold earners at the moment of this writing, but to speak in more timeless terms, as of the end of May 2010 only 2 users have earned the gold [mysql] tag: Quassnoi and Bill Karwin, as evidenced in the above result by their names being the only ones that appear twice.

So this is the way I understand it:

The first time an Id appears (in chronological order) is for the silver badge
The second time is for the gold

Now, the above result mixes the silver and gold entries together. My questions are:

Is this a typical design, or are there much friendlier schema/normalization/whatever you call it?
In the current design, how would you query the silver and gold badges separately?
- GROUP BY Id and picking the min/max or first/second by the Date somehow?
- How can you write a query that lists all the silver badges first then all the gold badges next?
  - Imagine also that the "real" query may be more complicated, i.e. not just listing by date.
  - How would you write it so that it doesn't have too many repetition between the silver and gold subqueries?
- Is it perhaps more typical to do two totally separate queries instead?
- What is this idiom called? A row "partitioning" query to put them into "buckets" or something?

Requirement clarification

Originally I wanted the following output, essentially:

User Link       Date                    
--------------- -------------------     
Bill Karwin     2009-02-20 11:00:25     // result of query for silver
Quassnoi        2009-06-01 10:00:16     // :
Greg            2009-10-22 10:00:25     // :
cletus          2010-01-01 11:00:23     // :
OMG Ponies      2010-01-03 11:00:48     // :
Pascal MARTIN   2010-02-17 11:00:29     // :
Mark Byers      2010-04-07 10:00:35     // :
Daniel Vassallo 2010-05-14 10:00:38     // :
------- maybe some sort of row separator here? can SQL do this? -------
Quassnoi        2009-10-31 10:00:24     // result of query for gold
Bill Karwin     2009-11-23 11:00:30     // :

But the answers so far with a separate column for silver and gold is also great, so feel free to pursue that angle as well. I'm still curious how you'd do the above, though.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

仙气飘飘 2024-09-11 06:21:28

这是一个典型的设计，还是有更友好的模式/规范化/无论你怎么称呼它？

当然，您可以添加类型代码以使其更加明确。但是，当您考虑到金徽章不能先于银徽章时，日期戳对于区分它们非常有意义。

在目前的设计中，如何分别查询银牌和金牌？ GROUP BY Id 并以某种方式按日期选择最小/最大或第一/第二？

是的 - 连接到派生表（又名内联视图），该表是用户和用户的列表。最短日期将返回银徽章。使用 HAVING COUNT(*) >= 1 也可以。您必须使用 GROUP BY 和 HAVING COUNT(*) = 2` 的组合来获得金色徽章 - 最大日期并不能确保一个用户 ID 有多个记录...

如何编写一个查询，首先列出所有银徽章，然后列出所有金徽章？

抱歉 - 由用户决定，还是先获得银牌，然后获得金牌？前者可以简单地通过使用 ORDER BY t.userid, t.date 来完成；后者我可能会使用分析函数（即：ROW_NUMBER（），RANK（））...

执行两个完全独立的查询可能更典型吗？

无论如何，请参阅上面您的要求是多么模糊......

这个成语叫什么？行“分区”查询将它们放入“桶”或其他东西中？

您所问的内容由以下同义词引用：分析、窗口化、排名...

回复收藏 0 原文

谁的年少不轻狂 2024-09-11 06:21:28

您会执行类似的操作，并且仅依赖于日期或总计计数。

可以说，先查询白银，然后查询黄金也是没有意义的，而是像这样并排获取数据：

不幸的是，您还没有真正指定您想要的内容，但是聚合的一个很好的起点是用简单的英语表达它

示例：“给我标签 mysql 的每个用户银牌和金牌徽章奖励的日期”。这是做什么的：

SELECT
  UserId as [User Link],
  min(Date) as [Silver Date],
  case when count(*) = 1 THEN NULL ELSE max(date) END
FROM
  Badges
Where
  Name = 'mysql'
group by
  UserId
Order By
  case when count(*) = 1 THEN NULL ELSE max(date) END DESC, min(Date)

更新后编辑：

您想要的输出并不是真正的 SQL：它是 2 个独立的记录集。分隔符是不行的。作为基于 setb 的操作，不存在“自然”顺序，因此引入一个：

SELECT
  UserId as [User Link],
  min(Date) as [Date],
  0 as dummyorder
FROM
  Badges
Where
  Name = 'mysql'
group by
  UserId
union all
select
  UserId as [User Link],
  max(Date) as [Date],
  1 as dummyorder
FROM
  Badges
Where
  Name = 'mysql'
group by
  UserId
having
  count(*) = 2
Order By
  dummyorder, Date

You'd do something like this and rely only on date or count in an aggregate.

Arguably, it also makes no sense to query silver followed by gold, but rather get data side by side like this:

Unfortunately, you haven't really specified what you want, but a good starting point for aggregates is to express it in plain English

Example: "Give me dates of silver and gold badge awards per user for tag mysql". Which this does:

SELECT
  UserId as [User Link],
  min(Date) as [Silver Date],
  case when count(*) = 1 THEN NULL ELSE max(date) END
FROM
  Badges
Where
  Name = 'mysql'
group by
  UserId
Order By
  case when count(*) = 1 THEN NULL ELSE max(date) END DESC, min(Date)

Edit, after update:

Your desired output is not really SQL: it's 2 separate recordsets. The separator is a no-go. As a setb based operation, there is no "natural" order so this introduces one:

SELECT
  UserId as [User Link],
  min(Date) as [Date],
  0 as dummyorder
FROM
  Badges
Where
  Name = 'mysql'
group by
  UserId
union all
select
  UserId as [User Link],
  max(Date) as [Date],
  1 as dummyorder
FROM
  Badges
Where
  Name = 'mysql'
group by
  UserId
having
  count(*) = 2
Order By
  dummyorder, Date

回复收藏 0 原文

~没有更多了~