如何在R中的数据框中添加列

发布于 2024-10-09 10:06:35 字数 461 浏览 0 评论 0原文

我已将文件中的数据导入到 R 中的数据框中。它是这样的。

Name      Count   Category
A         100     Cat1
C         10      Cat2
D         40      Cat1 
E         30      Cat3
H         3       Cat3
Z         20      Cat2
M         50      Cat10

所以现在我想根据名称列中的值添加类别列。例如,如果 Name = (A, D)、Category = 'Cat1' 等。

这只是我给出的一个简单示例。我有大量的名称和类别,因此我需要紧凑的语法。我该怎么做?

编辑:我更改了示例以更好地满足我的需求,因为名称可以是任何非数字的内容。抱歉之前没说得太清楚。

I have imported data from a file into a data frame in R. It is something like this.

Name      Count   Category
A         100     Cat1
C         10      Cat2
D         40      Cat1 
E         30      Cat3
H         3       Cat3
Z         20      Cat2
M         50      Cat10

So now i want to add the Category column depending on the values in the column Name. So something like if Name = (A, D), Category = 'Cat1' etc.

This is only a simple example I am giving. I have a large number of Names and Categories so I want a compact syntax. How can I do this?

Edit: I've changed the example to better suit my needs as the name can be anything not numeric. Sorry for not being too clear before.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

月亮坠入山谷 2024-10-16 10:06:35

您可以使用ifelse。如果你的数据框被称为df,你会这样做:

df$cat <- ifelse(df$name<100, "Ones", "Hundreds")
df$cat <- ifelse(df$name<1000, df$cat, "Thousands")

You can use ifelse. If your data frame were called df you would do:

df$cat <- ifelse(df$name<100, "Ones", "Hundreds")
df$cat <- ifelse(df$name<1000, df$cat, "Thousands")
救赎№ 2024-10-16 10:06:35

您可以使用地图。 (更新为使用stringsAsFactors = FALSE

df <- data.frame( Name = c('A', 'C', 'D', 'E', 'H', 'Z', 'M'), 
                  Count = c(100,10,40,30,3,20,50), stringsAsFactors = FALSE)
Categories <- list(Cat1 = c('A','D'), 
                   Cat2 = c('C','Z'), 
                   Cat3 = c('E','H'), 
                   Cat10 = 'M')
nams <- names( Categories )
nums <- sapply(Categories, length)
CatMap <- unlist( Map( rep, nams, nums ) )
names(CatMap) <- unlist( Categories )

df <- transform( df, Category = CatMap[ Name ])

You can use a map. (UPDATED to use stringsAsFactors = FALSE)

df <- data.frame( Name = c('A', 'C', 'D', 'E', 'H', 'Z', 'M'), 
                  Count = c(100,10,40,30,3,20,50), stringsAsFactors = FALSE)
Categories <- list(Cat1 = c('A','D'), 
                   Cat2 = c('C','Z'), 
                   Cat3 = c('E','H'), 
                   Cat10 = 'M')
nams <- names( Categories )
nums <- sapply(Categories, length)
CatMap <- unlist( Map( rep, nams, nums ) )
names(CatMap) <- unlist( Categories )

df <- transform( df, Category = CatMap[ Name ])
墨洒年华 2024-10-16 10:06:35

[根据OP的评论进行更新并更改Q]

DF <- data.frame(Name = c("A","C","D","E","H","Z","M"),
                 Count = c(100,10,40,30,3,20,50), stringsAsFactors = FALSE)
lookup <- data.frame(Name = c("A","C","D","E","H","Z","M"),
                     Category = paste("Cat", c(1,2,1,3,3,2,10), sep = ""),
                     stringsAsFactors = FALSE)

使用上述数据框,我们可以进行数据库合并。您需要为您想要的 Name Category 组合设置 lookup,如果没有大量的< code>Name(至少您只需要在 lookup 中将它们分别列出一次,并且不必按顺序执行 - 列出所有 Cat1 名称优先,等等):

> merge(DF, lookup, by = "Name")
  Name Count Category
1    A   100     Cat1
2    C    10     Cat2
3    D    40     Cat1
4    E    30     Cat3
5    H     3     Cat3
6    M    50    Cat10
7    Z    20     Cat2
> merge(DF, lookup, by = "Name", sort = FALSE)
  Name Count Category
1    A   100     Cat1
2    C    10     Cat2
3    D    40     Cat1
4    E    30     Cat3
5    H     3     Cat3
6    Z    20     Cat2
7    M    50    Cat10

一个选项是索引:

foo <- function(x) {
    out <- character(length = length(x))
    chars <- c("Ones", "Tens", "Hundreds", "Thousands")
    out[x < 10] <- chars[1]
    out[x >= 10 & x < 100] <- chars[2]
    out[x >= 100 & x < 1000] <- chars[3]
    out[x >= 1000 & x < 10000] <- chars[4]
    return(factor(out, levels = chars))
}

另一种可扩展性更好的方法是,

bar <- function(x, cats = c("Ones", "Tens", "Hundreds", "Thousands")) {
    out <- cats[floor(log10(x)) + 1]
    factor(out, levels = cats)
}

[Update following the OP's comment and altered Q]

DF <- data.frame(Name = c("A","C","D","E","H","Z","M"),
                 Count = c(100,10,40,30,3,20,50), stringsAsFactors = FALSE)
lookup <- data.frame(Name = c("A","C","D","E","H","Z","M"),
                     Category = paste("Cat", c(1,2,1,3,3,2,10), sep = ""),
                     stringsAsFactors = FALSE)

Using the above data frames, we can do a data base merge. You need to set-up lookup for the Name Category combinations you want, which is OK if there aren't a very large number of Names (At least you only need to list them once each in lookup and you don't have to do it in order - list all Cat1 Names first, etc):

> merge(DF, lookup, by = "Name")
  Name Count Category
1    A   100     Cat1
2    C    10     Cat2
3    D    40     Cat1
4    E    30     Cat3
5    H     3     Cat3
6    M    50    Cat10
7    Z    20     Cat2
> merge(DF, lookup, by = "Name", sort = FALSE)
  Name Count Category
1    A   100     Cat1
2    C    10     Cat2
3    D    40     Cat1
4    E    30     Cat3
5    H     3     Cat3
6    Z    20     Cat2
7    M    50    Cat10

One option is indexing:

foo <- function(x) {
    out <- character(length = length(x))
    chars <- c("Ones", "Tens", "Hundreds", "Thousands")
    out[x < 10] <- chars[1]
    out[x >= 10 & x < 100] <- chars[2]
    out[x >= 100 & x < 1000] <- chars[3]
    out[x >= 1000 & x < 10000] <- chars[4]
    return(factor(out, levels = chars))
}

An alternative that scales better is,

bar <- function(x, cats = c("Ones", "Tens", "Hundreds", "Thousands")) {
    out <- cats[floor(log10(x)) + 1]
    factor(out, levels = cats)
}
苏别ゝ 2024-10-16 10:06:35

查看:

  • cut() recode()
  • car 包中的

check out:

  • cut()
  • recode() in the car package
烟柳画桥 2024-10-16 10:06:35

使用 ifelse 和 %in% 也许更简单、更易读:

df <- data.frame( Name = c('A', 'C', 'D', 'E', 'H', 'Z', 'M'), 
Count =c(100,10,40,30,3,20,50), stringsAsFactors = FALSE)

cat1 = c("A","D")
cat2 = c("C","Z")
cat3 = c("E","H")
cat10 = c("M")

df$Category = ifelse(df$Name %in% cat1, "Cat1",
              ifelse(df$Name %in% cat2, "Cat2",
              ifelse(df$Name %in% cat3, "Cat3",
              ifelse(df$Name %in% cat10, "Cat10",
              NA))))

   Name Count Category
1    A   100     Cat1
2    C    10     Cat2
3    D    40     Cat1
4    E    30     Cat3
5    H     3     Cat3
6    Z    20     Cat2
7    M    50    Cat10

Perhaps simpler and more readable using ifelse and %in%:

df <- data.frame( Name = c('A', 'C', 'D', 'E', 'H', 'Z', 'M'), 
Count =c(100,10,40,30,3,20,50), stringsAsFactors = FALSE)

cat1 = c("A","D")
cat2 = c("C","Z")
cat3 = c("E","H")
cat10 = c("M")

df$Category = ifelse(df$Name %in% cat1, "Cat1",
              ifelse(df$Name %in% cat2, "Cat2",
              ifelse(df$Name %in% cat3, "Cat3",
              ifelse(df$Name %in% cat10, "Cat10",
              NA))))

   Name Count Category
1    A   100     Cat1
2    C    10     Cat2
3    D    40     Cat1
4    E    30     Cat3
5    H     3     Cat3
6    Z    20     Cat2
7    M    50    Cat10
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文