IN R 计数分层数据

发布于 2024-09-06 08:19:18 字数 299 浏览 9 评论 0原文

我有一份 1995 年至 2005 年期间各州未达标县的名单。

我想知道每年每个州有多少个县获得此地位。

如果我的数据格式如下，

State1 Country1 YR1 Yr2 Yr3 Yr4...
State1 Country2 YR1 Yr2 Yr3 Yr4
State2 County1  Yr1 Yr2.....

每年变量可能有 1 或 0，因为一个县可能会在一段时间内获得或失去此状态。

我需要每年计算每个州有多少个县未达标（YRx=1），但不知道该怎么做。

原文

I have a list of counties in each state that received nonattainment status in years 1995-2005.

I want to know how many counties in each state each year that received this status.

If my data is formatted like this,

State1 Country1 YR1 Yr2 Yr3 Yr4...
State1 Country2 YR1 Yr2 Yr3 Yr4
State2 County1  Yr1 Yr2.....

Each year variable could have a 1 or a zero, since a county may gain or lose this status in a time period.

I need to count each year how many counties in each state have nonattainment status (YRx=1), but can't think of how to do it.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

负佳期 2024-09-13 08:19:18

我使用了以下示例：

data <- read.table(textConnection("
state county Yr1 Yr2 Yr3 Yr4
state1 county1 1 0 0 1
state1 county2 0 0 0 0
state1 county3 0 1 0 0
state1 county4 0 0 0 0
state1 county5 0 1 0 1
state2 county6 0 0 0 0
state2 county7 0 0 1 0
state2 county8 1 0 0 1
state2 county9 0 0 0 0
state2 county10 0 1 0 0
state3 county11 1 1 1 1
state3 county12 0 0 0 0
state3 county13 0 1 1 0
state3 county14 0 0 0 1
state4 county15 0 0 0 0
state4 county16 1 0 1 0
state4 county17 0 0 0 0
state4 county18 1 1 1 1
"), header = T)

library(reshape)
data2 <- melt(data, id = c("state", "county"))
cast(data2, state ~ variable, fun = sum)

结果：

   state Yr1 Yr2 Yr3 Yr4
1 state1   1   2   0   2
2 state2   1   1   1   1
3 state3   1   2   2   2
4 state4   2   1   2   1

I used the following example:

data <- read.table(textConnection("
state county Yr1 Yr2 Yr3 Yr4
state1 county1 1 0 0 1
state1 county2 0 0 0 0
state1 county3 0 1 0 0
state1 county4 0 0 0 0
state1 county5 0 1 0 1
state2 county6 0 0 0 0
state2 county7 0 0 1 0
state2 county8 1 0 0 1
state2 county9 0 0 0 0
state2 county10 0 1 0 0
state3 county11 1 1 1 1
state3 county12 0 0 0 0
state3 county13 0 1 1 0
state3 county14 0 0 0 1
state4 county15 0 0 0 0
state4 county16 1 0 1 0
state4 county17 0 0 0 0
state4 county18 1 1 1 1
"), header = T)

library(reshape)
data2 <- melt(data, id = c("state", "county"))
cast(data2, state ~ variable, fun = sum)

Result:

   state Yr1 Yr2 Yr3 Yr4
1 state1   1   2   0   2
2 state2   1   1   1   1
3 state3   1   2   2   2
4 state4   2   1   2   1

回复收藏 0 原文

深空失忆 2024-09-13 08:19:18

该数据是否组织为数据框？如果是这样，行是如何定义的？如果您的数据是这样组织的：

State   County  Year    Attainment  
State1   County1  1       1  
State1   County1  2       0
State1   County1  3       1
State1   County1  4       1
State1   County2  1       1
State1   County2  2       1
...

那么只需 1 行代码就可以获取您正在寻找的汇总数据。希望您的符号意味着您的数据是这样组织的：

State   County  Yr1 Yr2 Yr3 Yr4
State1   County1 1  0   1   1
State1   County2 1  1   1   1

使用 reshape 包中的 melt() 将此格式转换为上面列出的格式。

new.df <- melt(df, id = 1:2)

它将调用 Year 变量 variable 和 Attainment 变量 value。现在，通过巧妙地使用来自 reshape 包的 cast 函数，您可以获得所需的摘要。

counties <- cast(new.df, State ~ value, fun = length)
head(counties)

但是，如果您的数据经过组织，每个州、县和年份都是一列，并且只有 1 行长，我认为您下一步最好的做法是在 R 之外重新格式化数据，使其看起来至少像我的第二个数据例子。

Is this data organized as a dataframe? If so, how are the rows defined? If your data were organized this way:

State   County  Year    Attainment  
State1   County1  1       1  
State1   County1  2       0
State1   County1  3       1
State1   County1  4       1
State1   County2  1       1
State1   County2  2       1
...

Then it would be possible to get the kind of summary data you're looking for with 1 line of code. Hopefully your notation means that your data is organized like this:

State   County  Yr1 Yr2 Yr3 Yr4
State1   County1 1  0   1   1
State1   County2 1  1   1   1

Use melt() from the reshape package to get from this format to the one laid out above.

new.df <- melt(df, id = 1:2)

It'll call the Year variable variable and the Attainment variable value. Now, with clever use of the cast function, also from the reshape package, you can get the summary you want.

counties <- cast(new.df, State ~ value, fun = length)
head(counties)

However, if your data is organized so that every state, county and year is a column, and it's only 1 row long, I think your best next step would be to reformat the data outside of R so that it looks at least like my second example.

回复收藏 0 原文

~没有更多了~