当前位置：文江博客话题详情

如何在 Stata 中创建 group by 语句（来自 MySQL）？

发布于 2024-12-02 21:17:14 字数 921 浏览 3 评论 0原文

我不是统计人员，但必须处理相当多的数据。在大多数情况下，这些数据集来自在线调查；因此我确实有一个 MySQL 数据库并且知道如何从中获取一些结果。

然而，现在我得到了一个Stata文件，我需要做一些分析。在 MySQL 中我知道如何做到这一点，但我被困在 Stata 中并请求你的帮助。

我有一个不太小的表（大约 50k 行），包含以下列（还有更多列，但这些是我必须使用的列）：

id - 对象 ID、唯一值

name - 对象名称、字符串值

class - 类对象的来源，整数范围 1 - 6

origin - 对象的来源，整数范围 1 - 2

在 50k 行中，只有大约 7k 个不同的名称。在 Stata 中，我可以检索带有 list name 的所有名称，甚至可以将其限制为带有 list name if class == 2 的单个类。

现在我想要一个所有不同名称的列表以及具有该名称的对象的计数并按计数对列表进行排序。在 MySQL 中，我会查询 SELECT name, COUNT(*) AS cnt FROM objects GROUP BY name ORDER BY cnt DESC。 但是在 Stata 中如何做到这一点？

接下来的步骤是为每个类或两个来源获取这样的列表，即SELECT name, COUNT(*) AS cnt FROM objects WHERE class = 2 GROUP BY name ORDER BY cnt DESC，Stata也可以吗？

ps：我不知道 stackoverflow 是否是正确的地方，因为 Stata 并不是真正的编程语言，是吗？但我在这里发现了一些Stata相关的问题；这就是我将其发布在这里的原因。如果有更好的地方可以这样做，请指出我正确的地方。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

南城追梦 2024-12-09 21:17:14

请记住，Stata 仅适用于固定长度的矩形表，因此您只能添加跨越整个 50k 行的列。在此设置中，您可以执行以下操作。

对于第一个问题（名称和频率列表），您可以

   collapse (count) freq = name, by(class)
   sort class freq name
   list class name freq, sepby(class)

注意，collapse 将删除现有数据并替换为摘要。（通常，我讨厌这个命令用于数据管理的这方面，但它应该在这里工作。）如果您不希望这种情况发生，这里有一个更复杂的技巧：（

   bysort class name : generate long freq = _N
   bysort class name : generate byte first = (_n==1)
   sort class freq name
   list class name freq if first, sepby(class)

说明：_N是by-group 中的观测值数量，_n 是 by-group 中当前观测值的数量。）

然后您可以将其子集化使用 if 到感兴趣的类class==#，正如您所知。

Keep in mind that Stata only works with rectangular tables of fixed length, so you can only add columns that span the whole 50k rows. Within this setup, this is what you can do.

For the first problem (the list of names and frequencies), you can

   collapse (count) freq = name, by(class)
   sort class freq name
   list class name freq, sepby(class)

Note that collapse will delete the existing data and replace with the summary. (Usually, I hate this command for this aspect of data management, but it should work here.) If you don't want this to happen, here's a more sophisticated trick:

   bysort class name : generate long freq = _N
   bysort class name : generate byte first = (_n==1)
   sort class freq name
   list class name freq if first, sepby(class)

(Explanation: _N is the number of observations in by-group, and _n is the number of the current observation within the by-group.)

You can then subset this to the class of interest with if class==#, as you already know.

回复收藏 0 原文