如何将数据从长格式重塑为宽格式
我在重新排列以下数据框时遇到问题:
set.seed(45)
dat1 <- data.frame(
name = rep(c("firstName", "secondName"), each=4),
numbers = rep(1:4, 2),
value = rnorm(8)
)
dat1
name numbers value
1 firstName 1 0.3407997
2 firstName 2 -0.7033403
3 firstName 3 -0.3795377
4 firstName 4 -0.7460474
5 secondName 1 -0.8981073
6 secondName 2 -0.3347941
7 secondName 3 -0.5013782
8 secondName 4 -0.1745357
我想重新调整它的形状,以便每个唯一的“名称”变量都是一个行名称,“值”作为该行的观察值,“数字”作为列名称。有点像这样:
name 1 2 3 4
1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
我看过 melt
和 cast
以及其他一些东西,但似乎没有一个能完成这项工作。
I'm having trouble rearranging the following data frame:
set.seed(45)
dat1 <- data.frame(
name = rep(c("firstName", "secondName"), each=4),
numbers = rep(1:4, 2),
value = rnorm(8)
)
dat1
name numbers value
1 firstName 1 0.3407997
2 firstName 2 -0.7033403
3 firstName 3 -0.3795377
4 firstName 4 -0.7460474
5 secondName 1 -0.8981073
6 secondName 2 -0.3347941
7 secondName 3 -0.5013782
8 secondName 4 -0.1745357
I want to reshape it so that each unique "name" variable is a rowname, with the "values" as observations along that row and the "numbers" as colnames. Sort of like this:
name 1 2 3 4
1 firstName 0.3407997 -0.7033403 -0.3795377 -0.7460474
5 secondName -0.8981073 -0.3347941 -0.5013782 -0.1745357
I've looked at melt
and cast
and a few other things, but none seem to do the job.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(15)
使用
reshape
函数:Using
reshape
function:新的(2014 年)
tidyr
包也简单地做到了这一点,gather()
/spread()
是melt 的术语
/投射
。编辑: 现在,2019 年,tidyr v 1.0 已启动,并将
spread
和gather
设置为弃用路径,而是改为pivot_wider< /code> 和
pivot_longer
,您可以找到描述在此答案中。如果您想简要了解传播/聚集
的短暂生命,请继续阅读。来自 github,
The new (in 2014)
tidyr
package also does this simply, withgather()
/spread()
being the terms formelt
/cast
.Edit: Now, in 2019, tidyr v 1.0 has launched and set
spread
andgather
on a deprecation path, preferring insteadpivot_wider
andpivot_longer
, which you can find described in this answer. Read on if you want a brief glimpse into the brief life ofspread/gather
.From github,
您可以使用
reshape()
函数或 reshape 包中的melt()
/cast()
函数来完成此操作。对于第二个选项,示例代码是Or using
reshape2
You can do this with the
reshape()
function, or with themelt()
/cast()
functions in the reshape package. For the second option, example code isOr using
reshape2
如果关注性能,另一种选择是使用
data.table
的reshape2
的 Melt & 扩展。 dcast 函数(参考:使用 data.tables 进行高效重塑)
并且,从 data.table v1.9.6 开始,我们可以在多个列上进行转换
Another option if performance is a concern is to use
data.table
's extension ofreshape2
's melt & dcast functions(Reference: Efficient reshaping using data.tables)
And, as of data.table v1.9.6 we can cast on multiple columns
对于 tidyr,有
pivot_wider()
和pivot_longer()
,它们被概括为从 long -> 进行重塑。宽还是宽->分别长。使用OP的数据:单列长->宽
多列长-> Wide
pivot_wider()
还能够进行更复杂的枢轴操作。例如,您可以同时旋转多个列:在文档。
With tidyr, there is
pivot_wider()
andpivot_longer()
which are generalized to do reshaping from long -> wide or wide -> long, respectively. Using the OP's data:single column long -> wide
multiple columns long -> wide
pivot_wider()
is also capable of more complex pivot operations. For example, you can pivot multiple columns simultaneously:There is much more functionality to be found in the docs.
使用您的示例数据框,我们可以:
Using your example dataframe, we could:
其他两个选项:
基础包:
sqldf
包:Other two options:
Base package:
sqldf
package:使用基本 R
aggregate
函数:Using base R
aggregate
function:基本的
reshape
函数工作得很好:其中
idvar
是分隔行的类的列timevar
是要转换宽>v.names
是包含数值的列direction
指定宽或长格式sep
参数是timevar
之间使用的分隔符code> 类名和v.names
中输出data.frame
。如果
idvar
不存在,请在使用reshape()
函数之前创建一个:只需记住
idvar
是必需的!timevar
和v.names
部分很简单。该函数的输出比其他一些函数更可预测,因为所有内容都是明确定义的。The base
reshape
function works perfectly fine:Where
idvar
is the column of classes that separates rowstimevar
is the column of classes to cast widev.names
is the column containing numeric valuesdirection
specifies wide or long formatsep
argument is the separator used in betweentimevar
class names andv.names
in the outputdata.frame
.If no
idvar
exists, create one before using thereshape()
function:Just remember that
idvar
is required! Thetimevar
andv.names
part is easy. The output of this function is more predictable than some of the others, as everything is explicitly defined.Win-Vector 的天才数据科学家(他们开发了
vtreat
、seplyr
和replyr
)推出了非常强大的新软件包,名为cdata< /代码>。它实现了本文档中描述的“协调数据”原则,并且在这篇博客文章中。这个想法是,无论您如何组织数据,都应该可以使用“数据坐标”系统来识别各个数据点。以下是约翰·蒙特 (John Mount) 最近发表的博客文章的摘录:
我们将首先构建控制表(参见博客文章了解详细信息),然后执行数据从行到列的移动。
There's very powerful new package from genius data scientists at Win-Vector (folks that made
vtreat
,seplyr
andreplyr
) calledcdata
. It implements "coordinated data" principles described in this document and also in this blog post. The idea is that regardless how you organize your data, it should be possible to identify individual data points using a system of "data coordinates". Here's a excerpt from the recent blog post by John Mount:We will first build the control table (see blog post for details) and then perform the move of data from rows to columns.
更简单的方法!
如果你想从宽变回长,只需将宽变回长,而不会改变对象。
much easier way!
if you want to go back from wide to long, only change Wide to Long, and no changes in objects.
即使您缺少对并且不需要排序(
as.matrix(dat1)[,1:2]
可以替换为cbind(dat1[,1], dat1[,2])
):如果缺少对并且需要排序,则此方法不起作用,但如果对已排序,则它会更短:
这是第一种方法的函数版本(添加
as.data.frame
使其与 tibbles 一起工作):或者如果您只有下三角形的值,您可以这样做:
或者这是另一种方法:
基 R 中的另一种简单方法是使用 xtabs。 xtabs 的结果基本上只是一个具有奇特类名的矩阵,但您可以使用 class(x)=NULL;attr(x,"call" 使其看起来像常规矩阵)=NULL;dimnames(x)=unname(dimnames(x)):
通常
as.data.frame(x)
将xtabs
的结果转换回来到长格式,但您可以使用class(x)=NULL
来避免它:这会将宽格式的数据转换为长格式(
unlist
将数据帧转换为向量,并且< code>c 将矩阵转换为向量):This works even if you have missing pairs and it doesn't require sorting (
as.matrix(dat1)[,1:2]
can be replaced withcbind(dat1[,1],dat1[,2])
):This doesn't work if you have missing pairs and it requires sorting, but it's a bit shorter in case the pairs are already sorted:
Here's a function version of the first approach (add
as.data.frame
to make it work with tibbles):Or if you only have the values of the lower triangle, you can do this:
Or here's another approach:
Another simple method in base R is to use
xtabs
. The result ofxtabs
is basically just a matrix with a fancy class name, but you can make it look like a regular matrix withclass(x)=NULL;attr(x,"call")=NULL;dimnames(x)=unname(dimnames(x))
:Normally
as.data.frame(x)
converts the result ofxtabs
back to long format, but you can avoid it withclass(x)=NULL
:This converts data in wide fromat to long format (
unlist
converts a dataframe to a vector andc
converts a matrix to a vector):通过链接问题来到这里 将三列数据框重塑为矩阵(“长”为“宽”格式)。这个问题已经结束,所以我在这里写了一个替代解决方案。
我找到了一个替代解决方案,对于寻求将三列转换为矩阵的人来说可能有用。我指的是 de CoupleR (2.3.2) 包。下面是从他们的 网站 复制
的 生成一种表,其中行来自id_cols、names_from 中的列以及values_from 中的值。
用法
Came here via a linked question Reshape three column data frame to matrix ("long" to "wide" format). That question is closed, so I writing an alternative solution here.
I found a alternative solution, perhaps useful for someone looking for converting three columns to a matrix. I am referring to decoupleR (2.3.2) package. Below is copied from their site
Generates a kind of table where the rows come from id_cols, the columns from names_from and the values from values_from.
Usage
我最近开始研究
collapse
包,它非常快速且有用。在collapse
中,我们可以使用pivot
函数来进行这种转换。I have recently started looking into
collapse
package which is super fast and useful. Incollapse
we can use thepivot
function to do this transformation.仅使用
dplyr
和map
。Using only
dplyr
andmap
.