将 data.frame 从宽格式重塑为长格式
我在将 data.frame
从宽表转换为长表时遇到一些麻烦。 目前它看起来像这样:
Code Country 1950 1951 1952 1953 1954
AFG Afghanistan 20,249 21,352 22,532 23,557 24,555
ALB Albania 8,097 8,986 10,058 11,123 12,246
现在我想将这个 data.frame
转换为一个长的 data.frame
。 像这样的事情:
Code Country Year Value
AFG Afghanistan 1950 20,249
AFG Afghanistan 1951 21,352
AFG Afghanistan 1952 22,532
AFG Afghanistan 1953 23,557
AFG Afghanistan 1954 24,555
ALB Albania 1950 8,097
ALB Albania 1951 8,986
ALB Albania 1952 10,058
ALB Albania 1953 11,123
ALB Albania 1954 12,246
我已经看过并已经尝试使用 melt()
和 reshape()
函数 正如一些人在类似问题中所建议的那样。 然而,到目前为止我只得到混乱的结果。
如果可能的话,我想使用 reshape()
函数来实现,因为 它看起来更好处理一些。
I have some trouble to convert my data.frame
from a wide table to a long table.
At the moment it looks like this:
Code Country 1950 1951 1952 1953 1954
AFG Afghanistan 20,249 21,352 22,532 23,557 24,555
ALB Albania 8,097 8,986 10,058 11,123 12,246
Now I would like to transform this data.frame
into a long data.frame
.
Something like this:
Code Country Year Value
AFG Afghanistan 1950 20,249
AFG Afghanistan 1951 21,352
AFG Afghanistan 1952 22,532
AFG Afghanistan 1953 23,557
AFG Afghanistan 1954 24,555
ALB Albania 1950 8,097
ALB Albania 1951 8,986
ALB Albania 1952 10,058
ALB Albania 1953 11,123
ALB Albania 1954 12,246
I have looked at and already tried using the melt()
and the reshape()
functions
as some people were suggesting in similar questions.
However, so far I only get messy results.
If it is possible I would like to do it with the reshape()
function since
it looks a little bit nicer to handle.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(10)
两种替代解决方案:
1) 使用 data.table:
您可以使用
melt
函数:给出:
一些替代符号:
2) 使用 tidyr:
使用
pivot_longer()
:注意:
names_to
和values_to
默认为"name"
和"value"
,因此您可以将其更加简洁地编写为wide %>%ivot_longer(`1950`:`1954`)
。cols
参数使用高度灵活的 tidyselect DSL,因此您可以使用否定选择 (!c(Code, Country)
)、选择帮助器(starts_with("19")
;匹配来选择相同的列("^\\d{4}$")
)、数字索引 (3:7
) 等。tidyr::pivot_longer()
是tidyr::gather()
和reshape2::melt()
的后继者,它们不再属于发展。转换值
数据的另一个问题是 R 将把值读取为字符值(作为数字中
,
的结果)。您可以在重塑之前使用gsub
和as.numeric
进行修复:或者在重塑期间使用
data.table
或tidyr
进行修复代码>:数据:
Two alternative solutions:
1) With data.table:
You can use the
melt
function:which gives:
Some alternative notations:
2) With tidyr:
Use
pivot_longer()
:Note:
names_to
andvalues_to
default to"name"
and"value"
, respectively, so you could write this extra-succinctly aswide %>% pivot_longer(`1950`:`1954`)
.cols
argument uses the highly flexible tidyselect DSL, so you can select the same columns using a negative selection (!c(Code, Country)
), a selection helper(starts_with("19")
;matches("^\\d{4}$")
), numeric indices (3:7
), and more.tidyr::pivot_longer()
is the successor totidyr::gather()
andreshape2::melt()
, which are no longer under development.Transforming values
Another problem with the data is that the values will be read by R as character-values (as a result of the
,
in the numbers). You can repair withgsub
andas.numeric
, either before reshaping:Or during reshaping, with
data.table
ortidyr
:Data:
reshape()
需要一段时间来适应,就像melt
/cast
一样。这是一个重塑的解决方案,假设您的数据框名为d
:reshape()
takes a while to get used to, just asmelt
/cast
. Here is a solution with reshape, assuming your data frame is calledd
:对于
tidyr_1.0.0
,另一个选项是pivot_longer
数据
With
tidyr_1.0.0
, another option ispivot_longer
data
使用reshape包:
Using reshape package:
由于此答案带有 r-faq,我觉得分享基础 R 的另一个替代方案会很有用:
stack
。但请注意,
stack
不能与factor
一起使用 - 它仅在is.vector
为TRUE
时才有效>,从is.vector
的文档中,我们发现:我正在使用来自@Jaap的答案的示例数据,其中年份列中的值是
因子s。
这是
stack
方法:Since this answer is tagged with r-faq, I felt it would be useful to share another alternative from base R:
stack
.Note, however, that
stack
does not work withfactor
s--it only works ifis.vector
isTRUE
, and from the documentation foris.vector
, we find that:I'm using the sample data from @Jaap's answer, where the values in the year columns are
factor
s.Here's the
stack
approach:这是另一个示例,展示了
tidyr
中的gather
的使用。您可以选择要收集
的列,方法是单独删除它们(就像我在此处所做的那样),或者明确包含您想要的年份。请注意,为了处理逗号(如果未设置
check.names = FALSE
则添加 X),我还使用dplyr
的 mutate 和parse_number< /code> 来自
readr
将文本值转换回数字。这些都是tidyverse
的一部分,因此可以与library(tidyverse)
一起加载返回:
Here is another example showing the use of
gather
fromtidyr
. You can select the columns togather
either by removing them individually (as I do here), or by including the years you want explicitly.Note that, to handle the commas (and X's added if
check.names = FALSE
is not set), I am also usingdplyr
's mutate withparse_number
fromreadr
to convert the text values back to numbers. These are all part of thetidyverse
and so can be loaded together withlibrary(tidyverse)
Returns:
这是一个 sqldf 解决方案:
无需输入即可进行查询在所有内容中,您都可以使用以下内容:
感谢 G. Grothendieck 实现它。
不幸的是,我不认为
PIVOT
和UNPIVOT
适用于R
SQLite
。如果您想以更复杂的方式编写查询,您还可以查看这些帖子:使用
sprintf
编写 sql 查询将变量传递给
sqldf
< /a>Here's a sqldf solution:
To make the query without typing in everything, you can use the following:
Thanks to G. Grothendieck for implementing it.
Unfortunately, I don't think that
PIVOT
andUNPIVOT
would work forR
SQLite
. If you want to write up your query in a more sophisticated manner, you can also take a look at these posts:Using
sprintf
writing up sql queriesPass variables to
sqldf
您还可以使用
cdata
包,它使用(转换)控制表的概念:我目前正在探索该包并发现它非常易于访问。它是为更复杂的转换而设计的,并且包括反向转换。有教程可用。
You can also use the
cdata
package, which uses the concept of (transformation) control table:I am currently exploring that package and find it quite accessible. It is designed for much more complicated transformations and includes the backtransformation. There is a tutorial available.
使用 x=unlist(df) 代替 x=c(m)):
这是基本 R 中的两个选项(当输入是数据帧而不是矩阵时, 选项是使用
as.table
后跟as.data.frame
,但它将行和列名称转换为因子,如果您的输入是数据框,那么您有首先将其转换为矩阵:第四个选项是使用 stack,但它将行名和列名转换为因子,当输入是矩阵时,列名将转换为 Rle 因子(但当输入是数据帧时则不然):
当 stack 的输入是数据帧时,行名不会作为列包含在内,因此您必须对它们进行 cbind :
Here's two options in base R (use
x=unlist(df)
instead ofx=c(m)
when the input is a dataframe and not a matrix):A third option is to use
as.table
followed byas.data.frame
, but it converts the row and column names to factors, and if your input is a dataframe then you have to convert it to a matrix first:A fourth option is to use
stack
, but it converts the rownames and column names to factors, and the column names get converted to an Rle factor when the input is a matrix (but not when the input is a dataframe):When the input for
stack
is a dataframe, the rownames don't get included as a column so you have tocbind
them:使用
collapse::pivot
数据
Using
collapse::pivot
data