如何在 Stata 中聚合关系数据?
我无法解决以下 Stata 编程问题:
我有一个表,列出了客户购买的所有汽车,并制作:
Customer | Make | Price
-----------------------
c1 | m1 | 1
c1 | m1 | 2
c1 | m3 | 1
c2 | m2 | 2
c3 | . | .
我想将其转换为每个客户有一个观察/行的表,列出每个客户支付的最高价格make:
Customer | m1 | m2 | m3
-----------------------
c1 | 2 | 0 | 1
c2 | 0 | 1 | 0
c3 | 0 | 0 | 0
我该如何实现这一目标?我知道reshape Wide
,但是这不起作用,因为c1 | 加倍了。 m1 行。此外,
c3
的缺失值也会造成问题。
I can't wrap my head around the following Stata programming problem:
I have a table listing all car purchases by customers and make:
Customer | Make | Price
-----------------------
c1 | m1 | 1
c1 | m1 | 2
c1 | m3 | 1
c2 | m2 | 2
c3 | . | .
I want to transform this into a table with one observation/row per customer, listing the maximum price paid for every make:
Customer | m1 | m2 | m3
-----------------------
c1 | 2 | 0 | 1
c2 | 0 | 1 | 0
c3 | 0 | 0 | 0
How do I achieve this? I know reshape wide
, but that doesn't work because of the doubled c1 | m1
row. Also, the missing values for c3
are causing troubles.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
根据您想要做什么,我建议采取稍微不同的方法。例如,使用 -bysort- 您可以找到每个品牌的客户的最高价格。
或者,您可以使用折叠来查找客户的最高价格并进行:
但是,如果您确实想要使用 -reshape- 发布的表,您可以运行以下命令:
请注意,如果在价格列中遇到缺失数据,则 reshape 将失败。我在上面的代码中删除了这些观察结果,但您可以选择执行不同的操作,例如用零替换丢失的数据,如您在发布的目标表中显示的那样。
Depending on what you want to do, I suggest approaching this a little differently. For example using -bysort- you can find the maximum price by customer for each make.
Or, you can use collapse to find the max price by customer and make:
But, if you really want the table you posted using -reshape- you could run the following:
Note that reshape will fail if it encounters missing data in the Price column. I dropped these observations in the code above but you may choose to do something different like replace the missing data with zeros as you show in your posted target table.