创建大型数据框

发布于 2024-12-01 10:23:56 字数 1495 浏览 0 评论 0原文

假设我想从头开始生成一个大数据框。

使用 data.frame 函数是我通常创建数据框的方式。然而，像下面这样的 df 非常容易出错并且效率低下。

那么有没有更有效的方法来创建以下数据框。

df <- data.frame(GOOGLE_CAMPAIGN=c(rep("Google - Medicare - US", 928), rep("MedicareBranded", 2983),
                                   rep("Medigap", 805), rep("Medigap Branded", 1914),
                                   rep("Medicare Typos", 1353), rep("Medigap Typos", 635),
                                   rep("Phone - MedicareGeneral", 585),
                                   rep("Phone - MedicareBranded", 2967),
                                   rep("Phone-Medigap", 812),
                                   rep("Auto Broad Match", 27),
                                   rep("Auto Exact Match", 80),
                                   rep("Auto Exact Match", 875)),                   
                 GOOGLE_AD_GROUP=c(rep("Medicare", 928), rep("MedicareBranded", 2983),
                                   rep("Medigap", 805), rep("Medigap Branded", 1914),
                                   rep("Medicare Typos", 1353), rep("Medigap Typos", 635),
                                   rep("Phone ads 1-Medicare Terms",585),
                                   rep("Ad Group #1", 2967), rep("Medigap-phone", 812),
                                   rep("Auto Insurance", 27),
                                   rep("Auto General", 80),
                                   rep("Auto Brand", 875)))

哎呀，那是一些“坏”代码。如何以更有效的方式生成这个“大”数据框？

原文

Let's say that I want to generate a large data frame from scratch.

Using the data.frame function is how I would generally create data frames.
However, df's like the following are extremely error prone and inefficient.

So is there a more efficient way of creating the following data frame.

df <- data.frame(GOOGLE_CAMPAIGN=c(rep("Google - Medicare - US", 928), rep("MedicareBranded", 2983),
                                   rep("Medigap", 805), rep("Medigap Branded", 1914),
                                   rep("Medicare Typos", 1353), rep("Medigap Typos", 635),
                                   rep("Phone - MedicareGeneral", 585),
                                   rep("Phone - MedicareBranded", 2967),
                                   rep("Phone-Medigap", 812),
                                   rep("Auto Broad Match", 27),
                                   rep("Auto Exact Match", 80),
                                   rep("Auto Exact Match", 875)),                   
                 GOOGLE_AD_GROUP=c(rep("Medicare", 928), rep("MedicareBranded", 2983),
                                   rep("Medigap", 805), rep("Medigap Branded", 1914),
                                   rep("Medicare Typos", 1353), rep("Medigap Typos", 635),
                                   rep("Phone ads 1-Medicare Terms",585),
                                   rep("Ad Group #1", 2967), rep("Medigap-phone", 812),
                                   rep("Auto Insurance", 27),
                                   rep("Auto General", 80),
                                   rep("Auto Brand", 875)))

Yikes, that is some 'bad' code. How can I generate this 'large' data frame in a more efficient manner?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

御弟哥哥 2024-12-08 10:23:56

如果您获取该信息的唯一来源是一张纸，那么您可能不会得到比这更好的信息，但您至少可以将所有这些信息整合到一个rep 调用每一列：

#I'm going to cheat and not type out all those strings by hand
x <- unique(df[,1])
y <- unique(df[,2])

#Vectors of the number of times for each    
x1 <- c(928,2983,805,1914,1353,635,585,2967,812,27,955)
y1 <- c(x1[-11],80,875)

dd <- data.frame(GOOGLE_CAMPAIGN = rep(x, times = x1), 
                 GOOGLE_AD_GROUP = rep(y, times = y1))

这应该是相同的：

> all.equal(dd,df)
[1] TRUE

但是如果这些信息已经以某种方式存在于 R 中的数据结构中，并且您只需要转换它，那可能会更容易，但我们' d 需要知道该结构是什么。

If your only source for that information is a piece of paper, then you probably won't get much better than that, but you can at least consolidate all that into a single rep call for each column:

#I'm going to cheat and not type out all those strings by hand
x <- unique(df[,1])
y <- unique(df[,2])

#Vectors of the number of times for each    
x1 <- c(928,2983,805,1914,1353,635,585,2967,812,27,955)
y1 <- c(x1[-11],80,875)

dd <- data.frame(GOOGLE_CAMPAIGN = rep(x, times = x1), 
                 GOOGLE_AD_GROUP = rep(y, times = y1))

which should be the same:

> all.equal(dd,df)
[1] TRUE

But if this information is already in a data structure in R somehow and you just need to transform it, that could possibly be even easier, but we'd need to know what that structure is.

回复收藏 0 原文

公布 2024-12-08 10:23:56

手动，(1) 创建此数据框：

> dfu <- unique(df)
> rownames(dfu) <- NULL
> dfu
           GOOGLE_CAMPAIGN            GOOGLE_AD_GROUP
1   Google - Medicare - US                   Medicare
2          MedicareBranded            MedicareBranded
3                  Medigap                    Medigap
4          Medigap Branded            Medigap Branded
5           Medicare Typos             Medicare Typos
6            Medigap Typos              Medigap Typos
7  Phone - MedicareGeneral Phone ads 1-Medicare Terms
8  Phone - MedicareBranded                Ad Group #1
9            Phone-Medigap              Medigap-phone
10        Auto Broad Match             Auto Insurance
11        Auto Exact Match               Auto General
12        Auto Exact Match                 Auto Brand

以及 (2) 此长度向量：

> lens <- rle(as.numeric(interaction(df[[1]], df[[2]])))$lengths
> lens
 [1]  928 2983  805 1914 1353  635  585 2967  812   27   80  875

根据这两个输入（dfu 和 lens），我们可以重建 df< /code> （此处称为 df2）：

> df2 <- dfu[rep(seq_along(lens), lens), ]
> rownames(df2) <- NULL
> identical(df, df2)
[1] TRUE

Manually, (1) create this data frame:

> dfu <- unique(df)
> rownames(dfu) <- NULL
> dfu
           GOOGLE_CAMPAIGN            GOOGLE_AD_GROUP
1   Google - Medicare - US                   Medicare
2          MedicareBranded            MedicareBranded
3                  Medigap                    Medigap
4          Medigap Branded            Medigap Branded
5           Medicare Typos             Medicare Typos
6            Medigap Typos              Medigap Typos
7  Phone - MedicareGeneral Phone ads 1-Medicare Terms
8  Phone - MedicareBranded                Ad Group #1
9            Phone-Medigap              Medigap-phone
10        Auto Broad Match             Auto Insurance
11        Auto Exact Match               Auto General
12        Auto Exact Match                 Auto Brand

and (2) this vector of lengths:

> lens <- rle(as.numeric(interaction(df[[1]], df[[2]])))$lengths
> lens
 [1]  928 2983  805 1914 1353  635  585 2967  812   27   80  875

From these two inputs (dfu and lens) we can reconstruct df (here called df2):

> df2 <- dfu[rep(seq_along(lens), lens), ]
> rownames(df2) <- NULL
> identical(df, df2)
[1] TRUE

回复收藏 0 原文

~没有更多了~

关于作者

野侃

暂无简介

0 文章

0 评论

24 人气

关注发私信

友情链接

文江博客

创建大型数据框

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

初遇

听闻余生

Z_dy

左岸枫

1848719402

婷

友情链接

创建大型数据框

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

初遇

听闻余生

Z_dy

左岸枫

1848719402

婷

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。