将 data.frame 从宽格式重塑为长格式

发布于 2024-08-19 15:52:23 字数 923 浏览 8 评论 0原文

我在将 data.frame 从宽表转换为长表时遇到一些麻烦。目前它看起来像这样：

Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246

现在我想将这个 data.frame 转换为一个长的 data.frame。像这样的事情：

Code Country        Year    Value
AFG  Afghanistan    1950    20,249
AFG  Afghanistan    1951    21,352
AFG  Afghanistan    1952    22,532
AFG  Afghanistan    1953    23,557
AFG  Afghanistan    1954    24,555
ALB  Albania        1950    8,097
ALB  Albania        1951    8,986
ALB  Albania        1952    10,058
ALB  Albania        1953    11,123
ALB  Albania        1954    12,246

我已经看过并已经尝试使用 melt() 和 reshape() 函数正如一些人在类似问题中所建议的那样。然而，到目前为止我只得到混乱的结果。

如果可能的话，我想使用 reshape() 函数来实现，因为它看起来更好处理一些。

原文

I have some trouble to convert my data.frame from a wide table to a long table.
At the moment it looks like this:

Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246

Now I would like to transform this data.frame into a long data.frame.
Something like this:

Code Country        Year    Value
AFG  Afghanistan    1950    20,249
AFG  Afghanistan    1951    21,352
AFG  Afghanistan    1952    22,532
AFG  Afghanistan    1953    23,557
AFG  Afghanistan    1954    24,555
ALB  Albania        1950    8,097
ALB  Albania        1951    8,986
ALB  Albania        1952    10,058
ALB  Albania        1953    11,123
ALB  Albania        1954    12,246

I have looked at and already tried using the melt() and the reshape() functions
as some people were suggesting in similar questions.
However, so far I only get messy results.

If it is possible I would like to do it with the reshape() function since
it looks a little bit nicer to handle.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

紫南 2024-08-26 15:52:23

两种替代解决方案：

1) 使用 data.table：

您可以使用 melt 函数：

library(data.table)
long <- melt(setDT(wide), id.vars = c("Code","Country"), variable.name = "year")

给出：

<前><代码>>长的
代码国家年份值
1: AFG 阿富汗 1950 20,249
2: ALB 阿尔巴尼亚 1950 8,097
3：AFG 阿富汗 1951 21,352
4：ALB 阿尔巴尼亚 1951 8,986
5：AFG阿富汗 1952年 22,532
6：ALB 阿尔巴尼亚 1952 10,058
7：AFG 阿富汗 1953 23,557
8：ALB 阿尔巴尼亚 1953 11,123
9：AFG阿富汗 1954年 24,555
10：ALB 阿尔巴尼亚 1954 12,246

一些替代符号：

melt(setDT(wide), id.vars = 1:2, variable.name = "year")
melt(setDT(wide), measure.vars = 3:7, variable.name = "year")
melt(setDT(wide), measure.vars = as.character(1950:1954), variable.name = "year")

2) 使用 tidyr：

使用 pivot_longer()：

library(tidyr)

long <- wide %>% 
  pivot_longer(
    cols = `1950`:`1954`, 
    names_to = "year",
    values_to = "value"
)

注意：

names_to 和 values_to 默认为 "name"和 "value"，因此您可以将其更加简洁地编写为 wide %>%ivot_longer(`1950`:`1954`)。
cols 参数使用高度灵活的 tidyselect DSL，因此您可以使用否定选择 (!c(Code, Country))、选择帮助器(starts_with("19"); 匹配来选择相同的列("^\\d{4}$"))、数字索引 (3:7) 等。
tidyr::pivot_longer() 是 tidyr::gather() 和 reshape2::melt() 的后继者，它们不再属于发展。

转换值

数据的另一个问题是 R 将把值读取为字符值（作为数字中 , 的结果）。您可以在重塑之前使用 gsub 和 as.numeric 进行修复：

long$value <- as.numeric(gsub(",", "", long$value))

或者在重塑期间使用 data.table 或 tidyr 进行修复代码>：

# data.table
long <- melt(setDT(wide),
             id.vars = c("Code","Country"),
             variable.name = "year")[, value := as.numeric(gsub(",", "", value))]

# tidyr
long <- wide %>%
  pivot_longer(
    cols = `1950`:`1954`, 
    names_to = "year",
    values_to = "value",
    values_transform = ~ as.numeric(gsub(",", "", .x))
  )

数据：

wide <- read.table(text="Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246", header=TRUE, check.names=FALSE)

Two alternative solutions:

1) With data.table:

You can use the melt function:

library(data.table)
long <- melt(setDT(wide), id.vars = c("Code","Country"), variable.name = "year")

which gives:

> long
    Code     Country year  value
 1:  AFG Afghanistan 1950 20,249
 2:  ALB     Albania 1950  8,097
 3:  AFG Afghanistan 1951 21,352
 4:  ALB     Albania 1951  8,986
 5:  AFG Afghanistan 1952 22,532
 6:  ALB     Albania 1952 10,058
 7:  AFG Afghanistan 1953 23,557
 8:  ALB     Albania 1953 11,123
 9:  AFG Afghanistan 1954 24,555
10:  ALB     Albania 1954 12,246

Some alternative notations:

melt(setDT(wide), id.vars = 1:2, variable.name = "year")
melt(setDT(wide), measure.vars = 3:7, variable.name = "year")
melt(setDT(wide), measure.vars = as.character(1950:1954), variable.name = "year")

2) With tidyr:

Use pivot_longer():

library(tidyr)

long <- wide %>% 
  pivot_longer(
    cols = `1950`:`1954`, 
    names_to = "year",
    values_to = "value"
)

Note:

names_to and values_to default to "name" and "value", respectively, so you could write this extra-succinctly as wide %>% pivot_longer(`1950`:`1954`).
The cols argument uses the highly flexible tidyselect DSL, so you can select the same columns using a negative selection (!c(Code, Country)), a selection helper(starts_with("19"); matches("^\\d{4}$")), numeric indices (3:7), and more.
tidyr::pivot_longer() is the successor to tidyr::gather() and reshape2::melt(), which are no longer under development.

Transforming values

Another problem with the data is that the values will be read by R as character-values (as a result of the , in the numbers). You can repair with gsub and as.numeric, either before reshaping:

long$value <- as.numeric(gsub(",", "", long$value))

Or during reshaping, with data.table or tidyr:

# data.table
long <- melt(setDT(wide),
             id.vars = c("Code","Country"),
             variable.name = "year")[, value := as.numeric(gsub(",", "", value))]

# tidyr
long <- wide %>%
  pivot_longer(
    cols = `1950`:`1954`, 
    names_to = "year",
    values_to = "value",
    values_transform = ~ as.numeric(gsub(",", "", .x))
  )

Data:

wide <- read.table(text="Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246", header=TRUE, check.names=FALSE)

回复收藏 0 原文

述情 2024-08-26 15:52:23

reshape() 需要一段时间来适应，就像 melt/cast 一样。这是一个重塑的解决方案，假设您的数据框名为 d：

reshape(d, 
        direction = "long",
        varying = list(names(d)[3:7]),
        v.names = "Value",
        idvar = c("Code", "Country"),
        timevar = "Year",
        times = 1950:1954)

reshape() takes a while to get used to, just as melt/cast. Here is a solution with reshape, assuming your data frame is called d:

reshape(d, 
        direction = "long",
        varying = list(names(d)[3:7]),
        v.names = "Value",
        idvar = c("Code", "Country"),
        timevar = "Year",
        times = 1950:1954)

回复收藏 0 原文

与他有关 2024-08-26 15:52:23

对于 tidyr_1.0.0，另一个选项是 pivot_longer

library(tidyr)
pivot_longer(df1, -c(Code, Country), values_to = "Value", names_to = "Year")
# A tibble: 10 x 4
#   Code  Country     Year  Value 
#   <fct> <fct>       <chr> <fct> 
# 1 AFG   Afghanistan 1950  20,249
# 2 AFG   Afghanistan 1951  21,352
# 3 AFG   Afghanistan 1952  22,532
# 4 AFG   Afghanistan 1953  23,557
# 5 AFG   Afghanistan 1954  24,555
# 6 ALB   Albania     1950  8,097 
# 7 ALB   Albania     1951  8,986 
# 8 ALB   Albania     1952  10,058
# 9 ALB   Albania     1953  11,123
#10 ALB   Albania     1954  12,246

数据

df1 <- structure(list(Code = structure(1:2, .Label = c("AFG", "ALB"), class = "factor"), 
    Country = structure(1:2, .Label = c("Afghanistan", "Albania"
    ), class = "factor"), `1950` = structure(1:2, .Label = c("20,249", 
    "8,097"), class = "factor"), `1951` = structure(1:2, .Label = c("21,352", 
    "8,986"), class = "factor"), `1952` = structure(2:1, .Label = c("10,058", 
    "22,532"), class = "factor"), `1953` = structure(2:1, .Label = c("11,123", 
    "23,557"), class = "factor"), `1954` = structure(2:1, .Label = c("12,246", 
    "24,555"), class = "factor")), class = "data.frame", row.names = c(NA, 
-2L))

With tidyr_1.0.0, another option is pivot_longer

library(tidyr)
pivot_longer(df1, -c(Code, Country), values_to = "Value", names_to = "Year")
# A tibble: 10 x 4
#   Code  Country     Year  Value 
#   <fct> <fct>       <chr> <fct> 
# 1 AFG   Afghanistan 1950  20,249
# 2 AFG   Afghanistan 1951  21,352
# 3 AFG   Afghanistan 1952  22,532
# 4 AFG   Afghanistan 1953  23,557
# 5 AFG   Afghanistan 1954  24,555
# 6 ALB   Albania     1950  8,097 
# 7 ALB   Albania     1951  8,986 
# 8 ALB   Albania     1952  10,058
# 9 ALB   Albania     1953  11,123
#10 ALB   Albania     1954  12,246

data

df1 <- structure(list(Code = structure(1:2, .Label = c("AFG", "ALB"), class = "factor"), 
    Country = structure(1:2, .Label = c("Afghanistan", "Albania"
    ), class = "factor"), `1950` = structure(1:2, .Label = c("20,249", 
    "8,097"), class = "factor"), `1951` = structure(1:2, .Label = c("21,352", 
    "8,986"), class = "factor"), `1952` = structure(2:1, .Label = c("10,058", 
    "22,532"), class = "factor"), `1953` = structure(2:1, .Label = c("11,123", 
    "23,557"), class = "factor"), `1954` = structure(2:1, .Label = c("12,246", 
    "24,555"), class = "factor")), class = "data.frame", row.names = c(NA, 
-2L))

回复收藏 0 原文

本王不退位尔等都是臣 2024-08-26 15:52:23

使用reshape包：

#data
x <- read.table(textConnection(
"Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246"), header=TRUE)

library(reshape)

x2 <- melt(x, id = c("Code", "Country"), variable_name = "Year")
x2[,"Year"] <- as.numeric(gsub("X", "" , x2[,"Year"]))

Using reshape package:

#data
x <- read.table(textConnection(
"Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246"), header=TRUE)

library(reshape)

x2 <- melt(x, id = c("Code", "Country"), variable_name = "Year")
x2[,"Year"] <- as.numeric(gsub("X", "" , x2[,"Year"]))

回复收藏 0 原文

青瓷清茶倾城歌 2024-08-26 15:52:23

由于此答案带有 r-faq，我觉得分享基础 R 的另一个替代方案会很有用：stack。

但请注意，stack 不能与 factor 一起使用 - 它仅在 is.vector 为 TRUE 时才有效>，从 is.vector 的文档中，我们发现：

如果 x 是指定模式的向量，除了名称之外没有任何属性，
is.vector 返回 TRUE。否则返回FALSE。

我正在使用来自@Jaap的答案的示例数据，其中年份列中的值是因子s。

这是stack方法：

cbind(wide[1:2], stack(lapply(wide[-c(1, 2)], as.character)))
##    Code     Country values  ind
## 1   AFG Afghanistan 20,249 1950
## 2   ALB     Albania  8,097 1950
## 3   AFG Afghanistan 21,352 1951
## 4   ALB     Albania  8,986 1951
## 5   AFG Afghanistan 22,532 1952
## 6   ALB     Albania 10,058 1952
## 7   AFG Afghanistan 23,557 1953
## 8   ALB     Albania 11,123 1953
## 9   AFG Afghanistan 24,555 1954
## 10  ALB     Albania 12,246 1954

Since this answer is tagged with r-faq, I felt it would be useful to share another alternative from base R: stack.

Note, however, that stack does not work with factors--it only works if is.vector is TRUE, and from the documentation for is.vector, we find that:

is.vector returns TRUE if x is a vector of the specified mode having no attributes other than names. It returns FALSE otherwise.

I'm using the sample data from @Jaap's answer, where the values in the year columns are factors.

Here's the stack approach:

cbind(wide[1:2], stack(lapply(wide[-c(1, 2)], as.character)))
##    Code     Country values  ind
## 1   AFG Afghanistan 20,249 1950
## 2   ALB     Albania  8,097 1950
## 3   AFG Afghanistan 21,352 1951
## 4   ALB     Albania  8,986 1951
## 5   AFG Afghanistan 22,532 1952
## 6   ALB     Albania 10,058 1952
## 7   AFG Afghanistan 23,557 1953
## 8   ALB     Albania 11,123 1953
## 9   AFG Afghanistan 24,555 1954
## 10  ALB     Albania 12,246 1954

回复收藏 0 原文

错爱 2024-08-26 15:52:23

这是另一个示例，展示了 tidyr 中的 gather 的使用。您可以选择要收集的列，方法是单独删除它们（就像我在此处所做的那样），或者明确包含您想要的年份。

请注意，为了处理逗号（如果未设置 check.names = FALSE 则添加 X），我还使用 dplyr 的 mutate 和 parse_number< /code> 来自 readr 将文本值转换回数字。这些都是 tidyverse 的一部分，因此可以与 library(tidyverse) 一起加载

wide %>%
  gather(Year, Value, -Code, -Country) %>%
  mutate(Year = parse_number(Year)
         , Value = parse_number(Value))

   Code     Country Year Value
1   AFG Afghanistan 1950 20249
2   ALB     Albania 1950  8097
3   AFG Afghanistan 1951 21352
4   ALB     Albania 1951  8986
5   AFG Afghanistan 1952 22532
6   ALB     Albania 1952 10058
7   AFG Afghanistan 1953 23557
8   ALB     Albania 1953 11123
9   AFG Afghanistan 1954 24555
10  ALB     Albania 1954 12246

Here is another example showing the use of gather from tidyr. You can select the columns to gather either by removing them individually (as I do here), or by including the years you want explicitly.

Note that, to handle the commas (and X's added if check.names = FALSE is not set), I am also using dplyr's mutate with parse_number from readr to convert the text values back to numbers. These are all part of the tidyverse and so can be loaded together with library(tidyverse)

wide %>%
  gather(Year, Value, -Code, -Country) %>%
  mutate(Year = parse_number(Year)
         , Value = parse_number(Value))

Returns:

   Code     Country Year Value
1   AFG Afghanistan 1950 20249
2   ALB     Albania 1950  8097
3   AFG Afghanistan 1951 21352
4   ALB     Albania 1951  8986
5   AFG Afghanistan 1952 22532
6   ALB     Albania 1952 10058
7   AFG Afghanistan 1953 23557
8   ALB     Albania 1953 11123
9   AFG Afghanistan 1954 24555
10  ALB     Albania 1954 12246

回复收藏 0 原文

溺孤伤于心 2024-08-26 15:52:23

这是一个 sqldf 解决方案：

sqldf("Select Code, Country, '1950' As Year, `1950` As Value From wide
        Union All
       Select Code, Country, '1951' As Year, `1951` As Value From wide
        Union All
       Select Code, Country, '1952' As Year, `1952` As Value From wide
        Union All
       Select Code, Country, '1953' As Year, `1953` As Value From wide
        Union All
       Select Code, Country, '1954' As Year, `1954` As Value From wide;")

无需输入即可进行查询在所有内容中，您都可以使用以下内容：

^{感谢 G. Grothendieck 实现它。}

ValCol <- tail(names(wide), -2)

s <- sprintf("Select Code, Country, '%s' As Year, `%s` As Value from wide", ValCol, ValCol)
mquery <- paste(s, collapse = "\n Union All\n")

cat(mquery) #just to show the query
 #> Select Code, Country, '1950' As Year, `1950` As Value from wide
 #>  Union All
 #> Select Code, Country, '1951' As Year, `1951` As Value from wide
 #>  Union All
 #> Select Code, Country, '1952' As Year, `1952` As Value from wide
 #>  Union All
 #> Select Code, Country, '1953' As Year, `1953` As Value from wide
 #>  Union All
 #> Select Code, Country, '1954' As Year, `1954` As Value from wide

sqldf(mquery)

 #>    Code     Country Year  Value
 #> 1   AFG Afghanistan 1950 20,249
 #> 2   ALB     Albania 1950  8,097
 #> 3   AFG Afghanistan 1951 21,352
 #> 4   ALB     Albania 1951  8,986
 #> 5   AFG Afghanistan 1952 22,532
 #> 6   ALB     Albania 1952 10,058
 #> 7   AFG Afghanistan 1953 23,557
 #> 8   ALB     Albania 1953 11,123
 #> 9   AFG Afghanistan 1954 24,555
 #> 10  ALB     Albania 1954 12,246

不幸的是，我不认为 PIVOT 和 UNPIVOT适用于R SQLite。如果您想以更复杂的方式编写查询，您还可以查看这些帖子：

Here's a sqldf solution:

sqldf("Select Code, Country, '1950' As Year, `1950` As Value From wide
        Union All
       Select Code, Country, '1951' As Year, `1951` As Value From wide
        Union All
       Select Code, Country, '1952' As Year, `1952` As Value From wide
        Union All
       Select Code, Country, '1953' As Year, `1953` As Value From wide
        Union All
       Select Code, Country, '1954' As Year, `1954` As Value From wide;")

To make the query without typing in everything, you can use the following:

^{Thanks to G. Grothendieck for implementing it.}

ValCol <- tail(names(wide), -2)

s <- sprintf("Select Code, Country, '%s' As Year, `%s` As Value from wide", ValCol, ValCol)
mquery <- paste(s, collapse = "\n Union All\n")

cat(mquery) #just to show the query
 #> Select Code, Country, '1950' As Year, `1950` As Value from wide
 #>  Union All
 #> Select Code, Country, '1951' As Year, `1951` As Value from wide
 #>  Union All
 #> Select Code, Country, '1952' As Year, `1952` As Value from wide
 #>  Union All
 #> Select Code, Country, '1953' As Year, `1953` As Value from wide
 #>  Union All
 #> Select Code, Country, '1954' As Year, `1954` As Value from wide

sqldf(mquery)

 #>    Code     Country Year  Value
 #> 1   AFG Afghanistan 1950 20,249
 #> 2   ALB     Albania 1950  8,097
 #> 3   AFG Afghanistan 1951 21,352
 #> 4   ALB     Albania 1951  8,986
 #> 5   AFG Afghanistan 1952 22,532
 #> 6   ALB     Albania 1952 10,058
 #> 7   AFG Afghanistan 1953 23,557
 #> 8   ALB     Albania 1953 11,123
 #> 9   AFG Afghanistan 1954 24,555
 #> 10  ALB     Albania 1954 12,246

Unfortunately, I don't think that PIVOT and UNPIVOT would work for R SQLite. If you want to write up your query in a more sophisticated manner, you can also take a look at these posts:

回复收藏 0 原文

别靠近我心 2024-08-26 15:52:23

您还可以使用 cdata 包，它使用（转换）控制表的概念：

# data
wide <- read.table(text="Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246", header=TRUE, check.names=FALSE)

library(cdata)
# build control table
drec <- data.frame(
    Year=as.character(1950:1954),
    Value=as.character(1950:1954),
    stringsAsFactors=FALSE
)
drec <- cdata::rowrecs_to_blocks_spec(drec, recordKeys=c("Code", "Country"))

# apply control table
cdata::layout_by(drec, wide)

我目前正在探索该包并发现它非常易于访问。它是为更复杂的转换而设计的，并且包括反向转换。有教程可用。

You can also use the cdata package, which uses the concept of (transformation) control table:

# data
wide <- read.table(text="Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246", header=TRUE, check.names=FALSE)

library(cdata)
# build control table
drec <- data.frame(
    Year=as.character(1950:1954),
    Value=as.character(1950:1954),
    stringsAsFactors=FALSE
)
drec <- cdata::rowrecs_to_blocks_spec(drec, recordKeys=c("Code", "Country"))

# apply control table
cdata::layout_by(drec, wide)

I am currently exploring that package and find it quite accessible. It is designed for much more complicated transformations and includes the backtransformation. There is a tutorial available.

回复收藏 0 原文

反目相谮 2024-08-26 15:52:23

使用 x=unlist(df) 代替 x=c(m)）：

> m=matrix(sample(1:100,6),3,dimnames=list(2021:2023,c("male","female")))
> m
     male female
2021   89     42
2022   39     96
2023   26     40
> cbind(expand.grid(dimnames(m)),x=c(m))
  Var1   Var2  x
1 2021   male 89
2 2022   male 39
3 2023   male 26
4 2021 female 42
5 2022 female 96
6 2023 female 40
> data.frame(row=rownames(m),col=colnames(m)[col(m)],x=c(m))
   row    col  x
1 2021   male 89
2 2022   male 39
3 2023   male 26
4 2021 female 42
5 2022 female 96
6 2023 female 40

这是基本 R 中的两个选项（当输入是数据帧而不是矩阵时，选项是使用 as.table 后跟 as.data.frame，但它将行和列名称转换为因子，如果您的输入是数据框，那么您有首先将其转换为矩阵：

> as.data.frame(as.table(m))
  Var1   Var2 Freq
1 2021   male   89
2 2022   male   39
3 2023   male   26
4 2021 female   42
5 2022 female   96
6 2023 female   40
> as.data.frame(as.table(m))|>sapply(class)
     Var1      Var2      Freq
 "factor"  "factor" "integer"
> d=as.data.frame(m)
> as.data.frame(as.table(d))
Error in h(simpleError(msg, call)) :
  error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': cannot coerce to a table
> as.data.frame(as.table(as.matrix(d)))
  Var1   Var2 Freq
1 2021   male   89
2 2022   male   39
3 2023   male   26
4 2021 female   42
5 2022 female   96
6 2023 female   40

第四个选项是使用 stack，但它将行名和列名转换为因子，当输入是矩阵时，列名将转换为 Rle 因子（但当输入是数据帧时则不然）：

> stack(m)
DataFrame with 6 rows and 3 columns
       row    col     value
  <factor>  <Rle> <integer>
1     2021   male        89
2     2022   male        39
3     2023   male        26
4     2021 female        42
5     2022 female        96
6     2023 female        40

当 stack 的输入是数据帧时，行名不会作为列包含在内，因此您必须对它们进行 cbind ：

> d=as.data.frame(m);cbind(row=rownames(d),stack(d))
   row values    ind
1 2021     89   male
2 2022     39   male
3 2023     26   male
4 2021     42 female
5 2022     96 female
6 2023     40 female

Here's two options in base R (use x=unlist(df) instead of x=c(m) when the input is a dataframe and not a matrix):

> m=matrix(sample(1:100,6),3,dimnames=list(2021:2023,c("male","female")))
> m
     male female
2021   89     42
2022   39     96
2023   26     40
> cbind(expand.grid(dimnames(m)),x=c(m))
  Var1   Var2  x
1 2021   male 89
2 2022   male 39
3 2023   male 26
4 2021 female 42
5 2022 female 96
6 2023 female 40
> data.frame(row=rownames(m),col=colnames(m)[col(m)],x=c(m))
   row    col  x
1 2021   male 89
2 2022   male 39
3 2023   male 26
4 2021 female 42
5 2022 female 96
6 2023 female 40

A third option is to use as.table followed by as.data.frame, but it converts the row and column names to factors, and if your input is a dataframe then you have to convert it to a matrix first:

> as.data.frame(as.table(m))
  Var1   Var2 Freq
1 2021   male   89
2 2022   male   39
3 2023   male   26
4 2021 female   42
5 2022 female   96
6 2023 female   40
> as.data.frame(as.table(m))|>sapply(class)
     Var1      Var2      Freq
 "factor"  "factor" "integer"
> d=as.data.frame(m)
> as.data.frame(as.table(d))
Error in h(simpleError(msg, call)) :
  error in evaluating the argument 'x' in selecting a method for function 'as.data.frame': cannot coerce to a table
> as.data.frame(as.table(as.matrix(d)))
  Var1   Var2 Freq
1 2021   male   89
2 2022   male   39
3 2023   male   26
4 2021 female   42
5 2022 female   96
6 2023 female   40

A fourth option is to use stack, but it converts the rownames and column names to factors, and the column names get converted to an Rle factor when the input is a matrix (but not when the input is a dataframe):

> stack(m)
DataFrame with 6 rows and 3 columns
       row    col     value
  <factor>  <Rle> <integer>
1     2021   male        89
2     2022   male        39
3     2023   male        26
4     2021 female        42
5     2022 female        96
6     2023 female        40

When the input for stack is a dataframe, the rownames don't get included as a column so you have to cbind them:

> d=as.data.frame(m);cbind(row=rownames(d),stack(d))
   row values    ind
1 2021     89   male
2 2022     39   male
3 2023     26   male
4 2021     42 female
5 2022     96 female
6 2023     40 female

回复收藏 0 原文

汹涌人海 2024-08-26 15:52:23

使用collapse::pivot

collapse::pivot(wide, ids = c("Code", "Country"))

#   Code     Country variable  value
#1   AFG Afghanistan     1950 20,249
#2   ALB     Albania     1950  8,097
#3   AFG Afghanistan     1951 21,352
#4   ALB     Albania     1951  8,986
#5   AFG Afghanistan     1952 22,532
#6   ALB     Albania     1952 10,058
#7   AFG Afghanistan     1953 23,557
#8   ALB     Albania     1953 11,123
#9   AFG Afghanistan     1954 24,555
#10  ALB     Albania     1954 12,246

数据

wide <- structure(list(Code = c("AFG", "ALB"), Country = c("Afghanistan", 
"Albania"), `1950` = c("20,249", "8,097"), `1951` = c("21,352", 
"8,986"), `1952` = c("22,532", "10,058"), `1953` = c("23,557", "11,123"), 
`1954` = c("24,555", "12,246")), class = "data.frame", row.names = c(NA, -2L))

Using collapse::pivot

collapse::pivot(wide, ids = c("Code", "Country"))

#   Code     Country variable  value
#1   AFG Afghanistan     1950 20,249
#2   ALB     Albania     1950  8,097
#3   AFG Afghanistan     1951 21,352
#4   ALB     Albania     1951  8,986
#5   AFG Afghanistan     1952 22,532
#6   ALB     Albania     1952 10,058
#7   AFG Afghanistan     1953 23,557
#8   ALB     Albania     1953 11,123
#9   AFG Afghanistan     1954 24,555
#10  ALB     Albania     1954 12,246

data

wide <- structure(list(Code = c("AFG", "ALB"), Country = c("Afghanistan", 
"Albania"), `1950` = c("20,249", "8,097"), `1951` = c("21,352", 
"8,986"), `1952` = c("22,532", "10,058"), `1953` = c("23,557", "11,123"), 
`1954` = c("24,555", "12,246")), class = "data.frame", row.names = c(NA, -2L))

回复收藏 0 原文

~没有更多了~