如何在 Stata 中创建 group by 语句(来自 MySQL)?

发布于 2024-12-02 21:17:14 字数 921 浏览 1 评论 0原文

我不是统计人员,但必须处理相当多的数据。在大多数情况下,这些数据集来自在线调查;因此我确实有一个 MySQL 数据库并且知道如何从中获取一些结果。

然而,现在我得到了一个Stata文件,我需要做一些分析。在 MySQL 中我知道如何做到这一点,但我被困在 Stata 中并请求你的帮助。

我有一个不太小的表(大约 50k 行),包含以下列(还有更多列,但这些是我必须使用的列):

id - 对象 ID、唯一值

name - 对象名称、字符串值

class - 类对象的来源,整数范围 1 - 6

origin - 对象的来源,整数范围 1 - 2

在 50k 行中,只有大约 7k 个不同的名称。在 Stata 中,我可以检索带有 list name 的所有名称,甚至可以将其限制为带有 list name if class == 2 的单个类。

现在我想要一个所有不同名称的列表以及具有该名称的对象的计数并按计数对列表进行排序。 在 MySQL 中,我会查询 SELECT name, COUNT(*) AS cnt FROM objects GROUP BY name ORDER BY cnt DESC。 但是在 Stata 中如何做到这一点?

接下来的步骤是为每个类或两个来源获取这样的列表,即SELECT name, COUNT(*) AS cnt FROM objects WHERE class = 2 GROUP BY name ORDER BY cnt DESC,Stata也可以吗?

ps:我不知道 stackoverflow 是否是正确的地方,因为 Stata 并不是真正的编程语言,是吗?但我在这里发现了一些Stata相关的问题;这就是我将其发布在这里的原因。如果有更好的地方可以这样做,请指出我正确的地方。

I am not a statistics guy but have to deal with quite some data. In most cases these data sets come from an online survey; hence I do have a MySQL database and know how to get some results out of that.

However, now I got a Stata file and I am required to do some analysis. In MySQL I'd know how to do that, but I am stuck in Stata and ask for your help.

I have a not too small table (roughly 50k rows) containing following columns (there are more cols but these are the ones I have to work with):

id - Object ID, unique values

name - Name of object, string value

class - Class of object, integer range 1 - 6

origin - Origin of object, integer range 1 - 2

Within the 50k rows there are only about 7k different names. In Stata I can retrieve all names with list name and could even restrict it to a single class with list name if class == 2.

Now I want a list of all different names along with a count of objects having that name and have the list sorted by count.
In MySQL I'd query SELECT name, COUNT(*) AS cnt FROM objects GROUP BY name ORDER BY cnt DESC. But how would that be done in Stata?

Next steps would be to get such lists for each class or for both origins, i.e. SELECT name, COUNT(*) AS cnt FROM objects WHERE class = 2 GROUP BY name ORDER BY cnt DESC, is that possible with Stata, too?

ps: I don't know if stackoverflow is the right place as Stata is not really a programming language, is it? But I found some Stata-related questions here; that's why I posted it here. If there's a better place to do so, please point me to the right place.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

南城追梦 2024-12-09 21:17:14

请记住,Stata 仅适用于固定长度的矩形表,因此您只能添加跨越整个 50k 行的列。在此设置中,您可以执行以下操作。

对于第一个问题(名称和频率列表),您可以

   collapse (count) freq = name, by(class)
   sort class freq name
   list class name freq, sepby(class)

注意,collapse 将删除现有数据并替换为摘要。 (通常,我讨厌这个命令用于数据管理的这方面,但它应该在这里工作。)如果您不希望这种情况发生,这里有一个更复杂的技巧:(

   bysort class name : generate long freq = _N
   bysort class name : generate byte first = (_n==1)
   sort class freq name
   list class name freq if first, sepby(class)

说明:_Nby-group 中的观测值数量,_nby-group 中当前观测值的数量。)

然后您可以将其子集化使用 if 到感兴趣的类class==#,正如您所知。

Keep in mind that Stata only works with rectangular tables of fixed length, so you can only add columns that span the whole 50k rows. Within this setup, this is what you can do.

For the first problem (the list of names and frequencies), you can

   collapse (count) freq = name, by(class)
   sort class freq name
   list class name freq, sepby(class)

Note that collapse will delete the existing data and replace with the summary. (Usually, I hate this command for this aspect of data management, but it should work here.) If you don't want this to happen, here's a more sophisticated trick:

   bysort class name : generate long freq = _N
   bysort class name : generate byte first = (_n==1)
   sort class freq name
   list class name freq if first, sepby(class)

(Explanation: _N is the number of observations in by-group, and _n is the number of the current observation within the by-group.)

You can then subset this to the class of interest with if class==#, as you already know.

明媚殇 2024-12-09 21:17:14

另请查看可使用 ssc inst groups 下载的 groups 命令。

Also check out the groups command downloadable using ssc inst groups.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文