如何根据R中的分类列的条件总和列

发布于 2025-02-13 02:51:13 字数 1324 浏览 2 评论 0原文

我在r中有一个名为house_expenss的数据框，看起来像这样（2列：描述和金额）：

DESCRIPTION             AMOUNT
-----------            ---------
COUCH                    $801.713

TV                       $4999.996

TV_MOUNT                 $575.867

ENTERTAINMENT_SYSTEM     $1102.392

MATTRESS                 $1225.893

BEDFRAME                 $356.789

PILLOWS                  $528.989

我想为具有总和的数据框架创建两个额外的列，并将其四舍五入为2个小数点

：沙发，电视，TV_Mount，Entertainment_System），= 2）
卧室= sum（圆形（床垫，床架，枕头），= 2）

我尝试过

house_expenses  <- house_expenses %>%

                   group_by(DESCRIPTION) %>%

                   mutate(LIVING_ROOM_COSTS  = sum(round(DESCRIPTION == "COUCH" &
                                                         DESCRIPTION == "TV" &
                                                         DESCRIPTION == "TV_MOUNT" &
                                                         DESCRIPTION == "ENTERTAINMENT_SYSTEM" , digits = 2)),
                    mutate(BEDROOM_COSTS = sum(round(DESCRIPTION == "MATTRESS" &
                                                     DESCRIPTION == "BEDFRAME" &
                                                     DESCRIPTION == "PILLOWS", digits = 2)))

但不幸的是，这没有起作用。以前有人遇到过这个问题吗？

原文

I have a dataframe in r called house_expenses that looks like this (2 columns: DESCRIPTION and AMOUNT):

DESCRIPTION             AMOUNT
-----------            ---------
COUCH                    $801.713

TV                       $4999.996

TV_MOUNT                 $575.867

ENTERTAINMENT_SYSTEM     $1102.392

MATTRESS                 $1225.893

BEDFRAME                 $356.789

PILLOWS                  $528.989

I would like to create two additional columns to the dataframe that has the sums and is rounded to 2 decimal places:

LIVING_ROOM_COSTS = sum(round(COUCH, TV, TV_MOUNT, ENTERTAINMENT_SYSTEM), =2)
BEDROOM_COSTS = sum(round(MATTRESS, BEDFRAME, PILLOWS), =2)

I have tried doing

house_expenses  <- house_expenses %>%

                   group_by(DESCRIPTION) %>%

                   mutate(LIVING_ROOM_COSTS  = sum(round(DESCRIPTION == "COUCH" &
                                                         DESCRIPTION == "TV" &
                                                         DESCRIPTION == "TV_MOUNT" &
                                                         DESCRIPTION == "ENTERTAINMENT_SYSTEM" , digits = 2)),
                    mutate(BEDROOM_COSTS = sum(round(DESCRIPTION == "MATTRESS" &
                                                     DESCRIPTION == "BEDFRAME" &
                                                     DESCRIPTION == "PILLOWS", digits = 2)))

But unfortunately this hasn't worked. Had anyone come across this before and know how to approach this problem?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

寄离 2025-02-20 02:51:13

要获取解决方案，您想做一些子集，
description％in％c（“ couch”，“电视”，“ TV_Mount”，“ Entertainment_System”）
根据该行获得真或错误，然后您的子集金额
金额[Description％in％C（“ Couch”，“ TV”，“ TV_Mount”，“ Entertainment_System”）]

然后，将值包装在一个和围绕：

df$LIVING_ROOM_COSTS = with(df, round(sum(Amount[Description %in% c("COUCH", "TV","TV_MOUNT","ENTERTAINMENT_SYSTEM")]), 2))
df$BEDROOM_COSTS = with(df, round(sum(Amount[Description %in% c("MATRESS", "BEDFRAME","PILLOWS")]), 2))

这给了我们数据。框架：

           Description   Amount LIVING_ROOM_COSTS BEDROOM_COSTS
1                COUCH  801.713           7479.97        885.78
2                   TV 4999.996           7479.97        885.78
3             TV_MOUNT  575.867           7479.97        885.78
4 ENTERTAINMENT_SYSTEM 1102.392           7479.97        885.78
5             MATTRESS 1225.893           7479.97        885.78
6             BEDFRAME  356.789           7479.97        885.78
7              PILLOWS  528.989           7479.97        885.78

使用使用允许我们在不使用$的情况下参考列名，

因为没有足够答案的原因是因为给出的格式化所需的额外工作和人类是通常懒惰。

如果您已经格式化了data.frame。这样：

           Description   Amount
1                COUCH  801.713
2                   TV 4999.996
3             TV_MOUNT  575.867
4 ENTERTAINMENT_SYSTEM 1102.392
5             MATTRESS 1225.893
6             BEDFRAME  356.789
7              PILLOWS  528.989

或使用函数dput：

structure(list(Description = c("COUCH", "TV", "TV_MOUNT", "ENTERTAINMENT_SYSTEM", 
"MATTRESS", "BEDFRAME", "PILLOWS"), Amount = c(801.713, 4999.996, 
575.867, 1102.392, 1225.893, 356.789, 528.989)), class = "data.frame", row.names = c(NA, 
-7L))

它会迅速回答。

To get the solution you want you have to do some subsetting,
Description %in% c("COUCH", "TV","TV_MOUNT","ENTERTAINMENT_SYSTEM")
Gets you the TRUE or FALSE according to the row, then you subset AMOUNT
AMOUNT[Description %in% c("COUCH", "TV","TV_MOUNT","ENTERTAINMENT_SYSTEM")]

Then you wrap the values in a sum and round it:

df$LIVING_ROOM_COSTS = with(df, round(sum(Amount[Description %in% c("COUCH", "TV","TV_MOUNT","ENTERTAINMENT_SYSTEM")]), 2))
df$BEDROOM_COSTS = with(df, round(sum(Amount[Description %in% c("MATRESS", "BEDFRAME","PILLOWS")]), 2))

This gives us the data.frame of:

           Description   Amount LIVING_ROOM_COSTS BEDROOM_COSTS
1                COUCH  801.713           7479.97        885.78
2                   TV 4999.996           7479.97        885.78
3             TV_MOUNT  575.867           7479.97        885.78
4 ENTERTAINMENT_SYSTEM 1102.392           7479.97        885.78
5             MATTRESS 1225.893           7479.97        885.78
6             BEDFRAME  356.789           7479.97        885.78
7              PILLOWS  528.989           7479.97        885.78

Using with allows us to refer to column names without using $

The reason there wasn't an answer sooner enough is because the formatting given required extra work and humans are generally lazy.

If you had formatted your data.frame like this:

           Description   Amount
1                COUCH  801.713
2                   TV 4999.996
3             TV_MOUNT  575.867
4 ENTERTAINMENT_SYSTEM 1102.392
5             MATTRESS 1225.893
6             BEDFRAME  356.789
7              PILLOWS  528.989

Or like this using the function dput:

structure(list(Description = c("COUCH", "TV", "TV_MOUNT", "ENTERTAINMENT_SYSTEM", 
"MATTRESS", "BEDFRAME", "PILLOWS"), Amount = c(801.713, 4999.996, 
575.867, 1102.392, 1225.893, 356.789, 528.989)), class = "data.frame", row.names = c(NA, 
-7L))

It would have been answered swiftly.

回复收藏 0 原文

~没有更多了~