如何分割数据框?
我想将一个数据框分成几个较小的数据框。这看起来是一个非常微不足道的问题,但我无法从网络搜索中找到解决方案。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
我想将一个数据框分成几个较小的数据框。这看起来是一个非常微不足道的问题,但我无法从网络搜索中找到解决方案。
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(9)
您可能还想将数据框切割成任意数量的较小数据框。在这里,我们切分成两个数据框。
您
还可以根据现有列拆分数据框。例如,要根据
mtcars
中的cyl
列创建三个数据框:You may also want to cut the data frame into an arbitrary number of smaller dataframes. Here, we cut into two dataframes.
gives
You can also split a data frame based upon an existing column. For example, to create three data frames based on the
cyl
column inmtcars
:如果您想根据某些变量的值拆分数据帧,我建议使用
plyr
包中的daply()
。现在,
x
是一个数据帧数组。要访问其中一个数据帧,您可以使用拆分变量级别的名称对其进行索引。我确信在将数据分割成许多数据帧之前没有其他更聪明的方法来处理数据。
If you want to split a dataframe according to values of some variable, I'd suggest using
daply()
from theplyr
package.Now,
x
is an array of dataframes. To access one of the dataframes, you can index it with the name of the level of the splitting variable.I'd be sure that there aren't other more clever ways to deal with your data before splitting it up into many dataframes though.
您还可以使用
这将创建一个数据框,其中的值 sum_points = 2500
它给出:
You could also use
This will make a dataframe with the values where sum_points = 2500
It gives :
我刚刚发布了一种可能对您有帮助的 RFC: 将向量分割成块在 R
干杯中,
塞巴斯蒂安
I just posted a kind of a RFC that might help you: Split a vector into chunks in R
Cheers,
Sebastian
您想要的答案很大程度上取决于您想要如何以及为何分解数据框。
例如,如果您想省略某些变量,您可以从数据库的特定列创建新的数据框。数据框后面括号中的下标表示行号和列号。查看 Spoetry 以获得完整的描述。
或者,您可以选择特定行。
这些下标也可以是逻辑测试,例如选择包含特定值的行,或具有所需值的因子。
您想用剩下的块做什么?您需要对数据库的每个块执行相同的操作吗?然后,您需要确保数据帧的子集最终位于一个方便的对象中,例如列表,这将帮助您对数据帧的每个块执行相同的命令。
The answer you want depends very much on how and why you want to break up the data frame.
For example, if you want to leave out some variables, you can create new data frames from specific columns of the database. The subscripts in brackets after the data frame refer to row and column numbers. Check out Spoetry for a complete description.
Or, you can choose specific rows.
And these subscripts can also be logical tests, such as choosing rows that contain a particular value, or factors with a desired value.
What do you want to do with the chunks left over? Do you need to perform the same operation on each chunk of the database? Then you'll want to ensure that the subsets of the data frame end up in a convenient object, such as a list, that will help you perform the same command on each chunk of the data frame.
subset()
也很有用:对于调查包,也许
survey
包是相关的?http://faculty.washington.edu/tlumley/survey/
subset()
is also useful:For a survey package, maybe the
survey
package is pertinent?http://faculty.washington.edu/tlumley/survey/
如果您想按其中一列中的值进行拆分,可以使用
lapply
。例如,将ChickWeight
拆分为每只小鸡的单独数据集:If you want to split by values in one of the columns, you can use
lapply
. For instance, to splitChickWeight
into a separate dataset for each chick:分割数据框似乎会适得其反。相反,使用 split-apply-combine 范例,例如,生成一些数据,
然后仅拆分相关列,并将
scale()
函数应用于每个组中的 x,然后组合结果(使用>split<-
或ave
)与分割 data.frames 相比,这会非常快,并且结果在下游分析中仍然可用,无需迭代。我认为 dplyr 语法
一般来说,这个 dplyr 解决方案比分割数据帧更快,但不如 split-apply-combine 快。
Splitting the data frame seems counter-productive. Instead, use the split-apply-combine paradigm, e.g., generate some data
then split only the relevant columns and apply the
scale()
function to x in each group, and combine the results (usingsplit<-
orave
)This will be very fast compared to splitting data.frames, and the result remains usable in downstream analysis without iteration. I think the dplyr syntax is
In general this dplyr solution is faster than splitting data frames but not as fast as split-apply-combine.
如果您想根据特定列中的值拆分数据框,tidyverse 现在有一个名为
group_split
的函数来执行此操作,您还可以轻松拆分多个列:上面的代码将为您提供一个包含以下内容的列表: 8 个数据帧,每个数据帧都有
cyl
和gear
的独特组合。If you want to split a dataframe based on values in specific columns, tidyverse now has a function called
group_split
that does this and you can also split easily for multiple columns:The above code will give you a list containing 8 dataframes, each with a unique combination of
cyl
andgear
.