如何旋转/取消旋转（铸造/熔化）数据框？

发布于 2024-12-13 07:02:20 字数 412 浏览 4 评论 0原文

如何“取消透视”表格？正确的技术术语是什么？

更新：这个术语称为 melt

我有一个国家/地区数据框架和每年的数据

Country     2001    2002    2003
Nigeria     1       2       3
UK          2       NA       1

我想要类似的东西

Country    Year    Value
Nigeria    2001    1
Nigeria    2002    2
Nigeria    2003    3
UK         2001    2
UK         2002    NA
UK         2003    1

原文

How can I 'unpivot' a table? What is the proper technical term for this?

UPDATE: The term is called melt

I have a data frame for countries and data for each year

Country     2001    2002    2003
Nigeria     1       2       3
UK          2       NA       1

And I want to have something like

Country    Year    Value
Nigeria    2001    1
Nigeria    2002    2
Nigeria    2003    3
UK         2001    2
UK         2002    NA
UK         2003    1

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

握住我的手 2024-12-20 07:02:20

我仍然不敢相信我用答案击败了安德里。 :)

> library(reshape)
> my.df <- read.table(text = "Country     2001    2002    2003
   + Nigeria     1       2       3
   + UK          2       NA       1", header = TRUE)
> my.result <- melt(my.df, id = c("Country"))
> my.result[order(my.result$Country),]
     Country variable value
   1 Nigeria    X2001     1
   3 Nigeria    X2002     2
   5 Nigeria    X2003     3
   2      UK    X2001     2
   4      UK    X2002    NA
   6      UK    X2003     1

I still can't believe I beat Andrie with an answer. :)

> library(reshape)
> my.df <- read.table(text = "Country     2001    2002    2003
   + Nigeria     1       2       3
   + UK          2       NA       1", header = TRUE)
> my.result <- melt(my.df, id = c("Country"))
> my.result[order(my.result$Country),]
     Country variable value
   1 Nigeria    X2001     1
   3 Nigeria    X2002     2
   5 Nigeria    X2003     3
   2      UK    X2001     2
   4      UK    X2002    NA
   6      UK    X2003     1

回复收藏 0 原文

菊凝晚露 2024-12-20 07:02:20

解决这个问题的基本 R reshape 方法非常丑陋，特别是因为名称不是 reshape 喜欢的形式。如下所示，第一行 setNames 将列名称修改为 reshape 可以使用的名称。

reshape(
  setNames(mydf, c("Country", paste0("val.", c(2001, 2002, 2003)))), 
  direction = "long", idvar = "Country", varying = 2:ncol(mydf), 
  sep = ".", new.row.names = seq_len(prod(dim(mydf[-1]))))

基础 R 中更好的替代方案是使用 stack，如下所示：

cbind(mydf[1], stack(mydf[-1]))
#   Country values  ind
# 1 Nigeria      1 2001
# 2      UK      2 2001
# 3 Nigeria      2 2002
# 4      UK     NA 2002
# 5 Nigeria      3 2003
# 6      UK      1 2003

现在还有用于重塑数据的新工具，例如“tidyr”包，它为我们提供了聚集功能。当然，tidyr:::gather_.data.frame方法只是调用reshape2::melt，所以我的这部分答案不一定会增加太多，除了介绍您可能在 Hadleyverse 中遇到的较新语法。

library(tidyr)
gather(mydf, year, value, `2001`:`2003`) ## Note the backticks
#   Country year value
# 1 Nigeria 2001     1
# 2      UK 2001     2
# 3 Nigeria 2002     2
# 4      UK 2002    NA
# 5 Nigeria 2003     3
# 6      UK 2003     1

如果您想要问题中显示的行顺序，则此处的所有三个选项都需要对行进行重新排序。

第四个选项是使用我的“splitstackshape”包中的merged.stack。与基本 R 的 reshape 一样，您需要将列名称修改为包含“变量”和“时间”指示符的名称。

library(splitstackshape)
merged.stack(
  setNames(mydf, c("Country", paste0("V.", 2001:2003))),
  var.stubs = "V", sep = ".")
#    Country .time_1  V
# 1: Nigeria    2001  1
# 2: Nigeria    2002  2
# 3: Nigeria    2003  3
# 4:      UK    2001  2
# 5:      UK    2002 NA
# 6:      UK    2003  1

样本数据

 mydf <- structure(list(Country = c("Nigeria", "UK"), `2001` = 1:2, `2002` = c(2L, 
     NA), `2003` = c(3L, 1L)), .Names = c("Country", "2001", "2002",               
     "2003"), row.names = 1:2, class = "data.frame")

The base R reshape approach for this problem is pretty ugly, particularly since the names aren't in a form that reshape likes. It would be something like the following, where the first setNames line modifies the column names into something that reshape can make use of.

reshape(
  setNames(mydf, c("Country", paste0("val.", c(2001, 2002, 2003)))), 
  direction = "long", idvar = "Country", varying = 2:ncol(mydf), 
  sep = ".", new.row.names = seq_len(prod(dim(mydf[-1]))))

A better alternative in base R is to use stack, like this:

cbind(mydf[1], stack(mydf[-1]))
#   Country values  ind
# 1 Nigeria      1 2001
# 2      UK      2 2001
# 3 Nigeria      2 2002
# 4      UK     NA 2002
# 5 Nigeria      3 2003
# 6      UK      1 2003

There are also new tools for reshaping data now available, like the "tidyr" package, which gives us gather. Of course, the tidyr:::gather_.data.frame method just calls reshape2::melt, so this part of my answer doesn't necessarily add much except introduce the newer syntax that you might be encountering in the Hadleyverse.

library(tidyr)
gather(mydf, year, value, `2001`:`2003`) ## Note the backticks
#   Country year value
# 1 Nigeria 2001     1
# 2      UK 2001     2
# 3 Nigeria 2002     2
# 4      UK 2002    NA
# 5 Nigeria 2003     3
# 6      UK 2003     1

All three options here would need reordering of rows if you want the row order you showed in your question.

A fourth option would be to use merged.stack from my "splitstackshape" package. Like base R's reshape, you'll need to modify the column names to something that includes a "variable" and "time" indicator.

library(splitstackshape)
merged.stack(
  setNames(mydf, c("Country", paste0("V.", 2001:2003))),
  var.stubs = "V", sep = ".")
#    Country .time_1  V
# 1: Nigeria    2001  1
# 2: Nigeria    2002  2
# 3: Nigeria    2003  3
# 4:      UK    2001  2
# 5:      UK    2002 NA
# 6:      UK    2003  1

Sample data

 mydf <- structure(list(Country = c("Nigeria", "UK"), `2001` = 1:2, `2002` = c(2L, 
     NA), `2003` = c(3L, 1L)), .Names = c("Country", "2001", "2002",               
     "2003"), row.names = 1:2, class = "data.frame")

回复收藏 0 原文