当前位置：文江博客话题详情

r dataframe split r-faq

如何分割数据框？

发布于 2024-09-10 12:16:40 字数 60 浏览 12 评论 0 原文

我想将一个数据框分成几个较小的数据框。这看起来是一个非常微不足道的问题，但我无法从网络搜索中找到解决方案。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

樱花坊 2024-09-17 12:16:40

您可能还想将数据框切割成任意数量的较小数据框。在这里，我们切分成两个数据框。

x = data.frame(num = 1:26, let = letters, LET = LETTERS)
set.seed(10)
split(x, sample(rep(1:2, 13)))

您

您可能还想将数据框切割成任意数量的较小数据框。在这里，我们切分成两个数据框。
x = data.frame(num = 1:26, let = letters, LET = LETTERS)
set.seed(10)
split(x, sample(rep(1:2, 13)))

您
1`
   num let LET
3    3   c   C
6    6   f   F
10  10   j   J
12  12   l   L
14  14   n   N
15  15   o   O
17  17   q   Q
18  18   r   R
20  20   t   T
21  21   u   U
22  22   v   V
23  23   w   W
26  26   z   Z

您可能还想将数据框切割成任意数量的较小数据框。在这里，我们切分成两个数据框。
x = data.frame(num = 1:26, let = letters, LET = LETTERS)
set.seed(10)
split(x, sample(rep(1:2, 13)))

您
2`
   num let LET
1    1   a   A
2    2   b   B
4    4   d   D
5    5   e   E
7    7   g   G
8    8   h   H
9    9   i   I
11  11   k   K
13  13   m   M
16  16   p   P
19  19   s   S
24  24   x   X
25  25   y   Y

还可以根据现有列拆分数据框。例如，要根据 mtcars 中的 cyl 列创建三个数据框：

split(mtcars,mtcars$cyl)

You may also want to cut the data frame into an arbitrary number of smaller dataframes. Here, we cut into two dataframes.

x = data.frame(num = 1:26, let = letters, LET = LETTERS)
set.seed(10)
split(x, sample(rep(1:2, 13)))

gives

You may also want to cut the data frame into an arbitrary number of smaller dataframes. Here, we cut into two dataframes.
x = data.frame(num = 1:26, let = letters, LET = LETTERS)
set.seed(10)
split(x, sample(rep(1:2, 13)))

gives
1`
   num let LET
3    3   c   C
6    6   f   F
10  10   j   J
12  12   l   L
14  14   n   N
15  15   o   O
17  17   q   Q
18  18   r   R
20  20   t   T
21  21   u   U
22  22   v   V
23  23   w   W
26  26   z   Z

You may also want to cut the data frame into an arbitrary number of smaller dataframes. Here, we cut into two dataframes.
x = data.frame(num = 1:26, let = letters, LET = LETTERS)
set.seed(10)
split(x, sample(rep(1:2, 13)))

gives
2`
   num let LET
1    1   a   A
2    2   b   B
4    4   d   D
5    5   e   E
7    7   g   G
8    8   h   H
9    9   i   I
11  11   k   K
13  13   m   M
16  16   p   P
19  19   s   S
24  24   x   X
25  25   y   Y

You can also split a data frame based upon an existing column. For example, to create three data frames based on the cyl column in mtcars:

split(mtcars,mtcars$cyl)

回复收藏 0 原文

美煞众生 2024-09-17 12:16:40

如果您想根据某些变量的值拆分数据帧，我建议使用 plyr 包中的 daply() 。

library(plyr)
x <- daply(df, .(splitting_variable), function(x)return(x))

现在，x 是一个数据帧数组。要访问其中一个数据帧，您可以使用拆分变量级别的名称对其进行索引。

x$Level1
#or
x[["Level1"]]

我确信在将数据分割成许多数据帧之前没有其他更聪明的方法来处理数据。

If you want to split a dataframe according to values of some variable, I'd suggest using daply() from the plyr package.

library(plyr)
x <- daply(df, .(splitting_variable), function(x)return(x))

Now, x is an array of dataframes. To access one of the dataframes, you can index it with the name of the level of the splitting variable.

x$Level1
#or
x[["Level1"]]

I'd be sure that there aren't other more clever ways to deal with your data before splitting it up into many dataframes though.

回复收藏 0 原文

源来凯始玺欢你 2024-09-17 12:16:40

您还可以使用

data2 <- data[data$sum_points == 2500, ]

这将创建一个数据框，其中的值 sum_points = 2500

它给出：

airfoils sum_points field_points   init_t contour_t   field_t
...
491        5       2500         5625 0.000086  0.004272  6.321774
498        5       2500         5625 0.000087  0.004507  6.325083
504        5       2500         5625 0.000088  0.004370  6.336034
603        5        250        10000 0.000072  0.000525  1.111278
577        5        250        10000 0.000104  0.000559  1.111431
587        5        250        10000 0.000072  0.000528  1.111524
606        5        250        10000 0.000079  0.000538  1.111685
....
> data2 <- data[data$sum_points == 2500, ]
> data2
airfoils sum_points field_points   init_t contour_t   field_t
108        5       2500          625 0.000082  0.004329  0.733109
106        5       2500          625 0.000102  0.004564  0.733243
117        5       2500          625 0.000087  0.004321  0.733274
112        5       2500          625 0.000081  0.004428  0.733587

You could also use

data2 <- data[data$sum_points == 2500, ]

This will make a dataframe with the values where sum_points = 2500

It gives :

airfoils sum_points field_points   init_t contour_t   field_t
...
491        5       2500         5625 0.000086  0.004272  6.321774
498        5       2500         5625 0.000087  0.004507  6.325083
504        5       2500         5625 0.000088  0.004370  6.336034
603        5        250        10000 0.000072  0.000525  1.111278
577        5        250        10000 0.000104  0.000559  1.111431
587        5        250        10000 0.000072  0.000528  1.111524
606        5        250        10000 0.000079  0.000538  1.111685
....
> data2 <- data[data$sum_points == 2500, ]
> data2
airfoils sum_points field_points   init_t contour_t   field_t
108        5       2500          625 0.000082  0.004329  0.733109
106        5       2500          625 0.000102  0.004564  0.733243
117        5       2500          625 0.000087  0.004321  0.733274
112        5       2500          625 0.000081  0.004428  0.733587

回复收藏 0 原文

网白 2024-09-17 12:16:40

我刚刚发布了一种可能对您有帮助的 RFC：将向量分割成块在 R

x = data.frame(num = 1:26, let = letters, LET = LETTERS)
## number of chunks
n <- 2
dfchunk <- split(x, factor(sort(rank(row.names(x))%%n)))
dfchunk
我刚刚发布了一种可能对您有帮助的 RFC： 将向量分割成块在 R
0`
   num let LET
1    1   a   A
2    2   b   B
3    3   c   C
4    4   d   D
5    5   e   E
6    6   f   F
7    7   g   G
8    8   h   H
9    9   i   I
10  10   j   J
11  11   k   K
12  12   l   L
13  13   m   M

我刚刚发布了一种可能对您有帮助的 RFC： 将向量分割成块在 R
1`
   num let LET
14  14   n   N
15  15   o   O
16  16   p   P
17  17   q   Q
18  18   r   R
19  19   s   S
20  20   t   T
21  21   u   U
22  22   v   V
23  23   w   W
24  24   x   X
25  25   y   Y
26  26   z   Z

干杯中，
塞巴斯蒂安

I just posted a kind of a RFC that might help you: Split a vector into chunks in R

x = data.frame(num = 1:26, let = letters, LET = LETTERS)
## number of chunks
n <- 2
dfchunk <- split(x, factor(sort(rank(row.names(x))%%n)))
dfchunk
I just posted a kind of a RFC that might help you: Split a vector into chunks in R
0`
   num let LET
1    1   a   A
2    2   b   B
3    3   c   C
4    4   d   D
5    5   e   E
6    6   f   F
7    7   g   G
8    8   h   H
9    9   i   I
10  10   j   J
11  11   k   K
12  12   l   L
13  13   m   M

I just posted a kind of a RFC that might help you: Split a vector into chunks in R
1`
   num let LET
14  14   n   N
15  15   o   O
16  16   p   P
17  17   q   Q
18  18   r   R
19  19   s   S
20  20   t   T
21  21   u   U
22  22   v   V
23  23   w   W
24  24   x   X
25  25   y   Y
26  26   z   Z

Cheers,
Sebastian

回复收藏 0 原文

反话 2024-09-17 12:16:40

您想要的答案很大程度上取决于您想要如何以及为何分解数据框。

例如，如果您想省略某些变量，您可以从数据库的特定列创建新的数据框。数据框后面括号中的下标表示行号和列号。查看 Spoetry 以获得完整的描述。

newdf <- mydf[,1:3]

或者，您可以选择特定行。

newdf <- mydf[1:3,]

这些下标也可以是逻辑测试，例如选择包含特定值的行，或具有所需值的因子。

您想用剩下的块做什么？您需要对数据库的每个块执行相同的操作吗？然后，您需要确保数据帧的子集最终位于一个方便的对象中，例如列表，这将帮助您对数据帧的每个块执行相同的命令。

The answer you want depends very much on how and why you want to break up the data frame.

For example, if you want to leave out some variables, you can create new data frames from specific columns of the database. The subscripts in brackets after the data frame refer to row and column numbers. Check out Spoetry for a complete description.

newdf <- mydf[,1:3]

Or, you can choose specific rows.

newdf <- mydf[1:3,]

And these subscripts can also be logical tests, such as choosing rows that contain a particular value, or factors with a desired value.

What do you want to do with the chunks left over? Do you need to perform the same operation on each chunk of the database? Then you'll want to ensure that the subsets of the data frame end up in a convenient object, such as a list, that will help you perform the same command on each chunk of the data frame.

回复收藏 0 原文

一页 2024-09-17 12:16:40

subset() 也很有用：

subset(DATAFRAME, COLUMNNAME == "")

对于调查包，也许 survey 包是相关的？

http://faculty.washington.edu/tlumley/survey/

subset() is also useful:

subset(DATAFRAME, COLUMNNAME == "")

For a survey package, maybe the survey package is pertinent?

http://faculty.washington.edu/tlumley/survey/

回复收藏 0 原文

悍妇囚夫 2024-09-17 12:16:40

如果您想按其中一列中的值进行拆分，可以使用lapply。例如，将 ChickWeight 拆分为每只小鸡的单独数据集：

data(ChickWeight)
lapply(unique(ChickWeight$Chick), function(x) ChickWeight[ChickWeight$Chick == x,])

If you want to split by values in one of the columns, you can use lapply. For instance, to split ChickWeight into a separate dataset for each chick:

data(ChickWeight)
lapply(unique(ChickWeight$Chick), function(x) ChickWeight[ChickWeight$Chick == x,])

回复收藏 0 原文

久随 2024-09-17 12:16:40

分割数据框似乎会适得其反。相反，使用 split-apply-combine 范例，例如，生成一些数据，

df = data.frame(grp=sample(letters, 100, TRUE), x=rnorm(100))

然后仅拆分相关列，并将 scale() 函数应用于每个组中的 x，然后组合结果（使用 >split<- 或 ave）

df$z = 0
split(df$z, df$grp) = lapply(split(df$x, df$grp), scale)
## alternative: df$z = ave(df$x, df$grp, FUN=scale)

与分割 data.frames 相比，这会非常快，并且结果在下游分析中仍然可用，无需迭代。我认为 dplyr 语法

library(dplyr)
df %>% group_by(grp) %>% mutate(z=scale(x))

一般来说，这个 dplyr 解决方案比分割数据帧更快，但不如 split-apply-combine 快。

Splitting the data frame seems counter-productive. Instead, use the split-apply-combine paradigm, e.g., generate some data

df = data.frame(grp=sample(letters, 100, TRUE), x=rnorm(100))

then split only the relevant columns and apply the scale() function to x in each group, and combine the results (using split<- or ave)

df$z = 0
split(df$z, df$grp) = lapply(split(df$x, df$grp), scale)
## alternative: df$z = ave(df$x, df$grp, FUN=scale)

This will be very fast compared to splitting data.frames, and the result remains usable in downstream analysis without iteration. I think the dplyr syntax is

library(dplyr)
df %>% group_by(grp) %>% mutate(z=scale(x))

In general this dplyr solution is faster than splitting data frames but not as fast as split-apply-combine.

回复收藏 0 原文

也只是曾经 2024-09-17 12:16:40

如果您想根据特定列中的值拆分数据框，tidyverse 现在有一个名为 group_split 的函数来执行此操作，您还可以轻松拆分多个列：

library(tidyverse)

cars <- mtcars %>% 
  group_by(cyl, gear)

cars_split <- group_split(cars)

上面的代码将为您提供一个包含以下内容的列表： 8 个数据帧，每个数据帧都有 cyl 和 gear 的独特组合。

If you want to split a dataframe based on values in specific columns, tidyverse now has a function called group_split that does this and you can also split easily for multiple columns:

library(tidyverse)

cars <- mtcars %>% 
  group_by(cyl, gear)

cars_split <- group_split(cars)

The above code will give you a list containing 8 dataframes, each with a unique combination of cyl and gear.

回复收藏 0 原文

~没有更多了~

关于作者

浪漫之都

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

如何分割数据框？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（9）

关于作者

相关话题

热门标签

推荐作者

佚名

今天

゛时过境迁

达拉崩吧

呆萌少年

孤者何惧

友情链接

如何分割数据框？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（9）

关于作者

相关话题

热门标签

推荐作者

佚名

今天

゛时过境迁

达拉崩吧

呆萌少年

孤者何惧

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。