IN R 计数分层数据
我有一份 1995 年至 2005 年期间各州未达标县的名单。
我想知道每年每个州有多少个县获得此地位。
如果我的数据格式如下,
State1 Country1 YR1 Yr2 Yr3 Yr4...
State1 Country2 YR1 Yr2 Yr3 Yr4
State2 County1 Yr1 Yr2.....
每年变量可能有 1 或 0,因为一个县可能会在一段时间内获得或失去此状态。
我需要每年计算每个州有多少个县未达标(YRx=1),但不知道该怎么做。
I have a list of counties in each state that received nonattainment status in years 1995-2005.
I want to know how many counties in each state each year that received this status.
If my data is formatted like this,
State1 Country1 YR1 Yr2 Yr3 Yr4...
State1 Country2 YR1 Yr2 Yr3 Yr4
State2 County1 Yr1 Yr2.....
Each year variable could have a 1 or a zero, since a county may gain or lose this status in a time period.
I need to count each year how many counties in each state have nonattainment status (YRx=1), but can't think of how to do it.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我使用了以下示例:
结果:
I used the following example:
Result:
该数据是否组织为数据框?如果是这样,行是如何定义的?如果您的数据是这样组织的:
那么只需 1 行代码就可以获取您正在寻找的汇总数据。希望您的符号意味着您的数据是这样组织的:
使用
reshape
包中的melt()
将此格式转换为上面列出的格式。它将调用 Year 变量
variable
和 Attainment 变量value
。现在,通过巧妙地使用来自reshape
包的cast
函数,您可以获得所需的摘要。但是,如果您的数据经过组织,每个州、县和年份都是一列,并且只有 1 行长,我认为您下一步最好的做法是在 R 之外重新格式化数据,使其看起来至少像我的第二个数据例子。
Is this data organized as a dataframe? If so, how are the rows defined? If your data were organized this way:
Then it would be possible to get the kind of summary data you're looking for with 1 line of code. Hopefully your notation means that your data is organized like this:
Use
melt()
from thereshape
package to get from this format to the one laid out above.It'll call the Year variable
variable
and the Attainment variablevalue
. Now, with clever use of thecast
function, also from thereshape
package, you can get the summary you want.However, if your data is organized so that every state, county and year is a column, and it's only 1 row long, I think your best next step would be to reformat the data outside of R so that it looks at least like my second example.