基于R中的两列填充一列

发布于 2025-02-10 11:15:09 字数 2168 浏览 1 评论 0原文

我有以下数据集,并且正在尝试创建一个更有意义的路径。

行#会话单击
第1123123输入PG1
2123PHPBUTTONPG1
3123ENTERPG2
4123ENTERPG3
5123form1pg3
6123form2pg3
7123form1 form18
123form1pg3pg3
9123abcbuttonpg3
10123EnterPG1
pg3103pg3abcbutton
form1123输入PG4
13123输入PG3
14123返回PG3
15123EnterPG1

我希望结果看看以下结果:

会话活动
123PG1
123PHPBUTTON
123PG2
123PG2
123 PG3 123 pG3 123form1
123form2 form2
123form2 form2 form2 form2
form1 form1 form1 form1 123abcbutton
abcbutton 123PG1 123 PG1
123XYZSELECT
123PG4
PG3123 PG4 123 PG3 PG4 123 PG3 PG3 PG3 PG3 PG3 PG3 PG3 PG3
123返回
123PG1

如果单击列已输入,则活动列应显示页面。但是,如果后续页面等于上一页,则“活动”列应显示“单击”列中的值。例如,第1行和第2行具有相同的页码,因此我希望活动列显示PG1,然后是PHPBUTTON。但是,如果单击列具有两个或多个后续相同值,如第7和8行所示,我希望“活动”列仅显示表格1的一个条目

。非常感谢。

I have the below dataset, and I am trying to create a more meaningful path.

Row#SessionClickPage
1123EnterPg1
2123phpbuttonPg1
3123EnterPg2
4123EnterPg3
5123Form1Pg3
6123Form2Pg3
7123Form1Pg3
8123Form1Pg3
9123abcbuttonPg3
10123EnterPg1
11123xyzselectPg1
12123EnterPg4
13123EnterPg3
14123BackPg3
15123EnterPg1

I would like the outcome to look this:

SessionActivity
123Pg1
123phpbutton
123Pg2
123Pg3
123Form1
123Form2
123Form1
123abcbutton
123Pg1
123xyzselect
123Pg4
123Pg3
123Back
123Pg1

If the Click column has Enter, then the Activity column should show the Page. But, if the subsequent page is equal to the previous page, then the Activity column should show the value from the Click column. For instance, row# 1 and 2 have the same Page numbers, so I would like the Activity column to show, Pg1, then, phpbutton. But, if the Click column has two or more subsequent same values, as seen in Row# 7 and 8, I would like the Activity column to show just one entry of Form 1.

Thanks a lot.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

鹿港巷口少年归 2025-02-17 11:15:09

尝试一下

df |> group_by(Session) |> 
mutate(Activity = case_when(Click == "Enter" ~ Page , 
lag(Page) == Page ~ Click)) |> select(Session , Activity)

Try this

df |> group_by(Session) |> 
mutate(Activity = case_when(Click == "Enter" ~ Page , 
lag(Page) == Page ~ Click)) |> select(Session , Activity)
可是我不能没有你 2025-02-17 11:15:09
library(dplyr)

dat$activity <- ifelse(dat$click == "Enter" & (lag(dat$page) != dat$page | is.na(lag(dat$page))), dat$page,
       ifelse(lag(dat$page) == dat$page, dat$click, NA))

   row session     click page  activity
1    1     123     Enter  Pg1       Pg1
2    2     123 phpbutton  Pg1 phpbutton
3    3     123     Enter  Pg2       Pg2
4    4     123     Enter  Pg3       Pg3
5    5     123     Form1  Pg3     Form1
6    6     123     Form2  Pg3     Form2
7    7     123     Form1  Pg3     Form1
8    8     123     Form1  Pg3     Form1
9    9     123 abcbutton  Pg3 abcbutton
10  10     123     Enter  Pg1       Pg1
11  11     123 xyzselect  Pg1 xyzselect
12  12     123     Enter  Pg4       Pg4
13  13     123     Enter  Pg3       Pg3
14  14     123      Back  Pg3      Back
15  15     123     Enter  Pg1       Pg1

您可以删除类似的重复,连续行:

dat[cumsum(rle(paste0(dat$session, dat$click, dat$page, dat$activity))$length),]

   row session     click page  activity
1    1     123     Enter  Pg1       Pg1
2    2     123 phpbutton  Pg1 phpbutton
3    3     123     Enter  Pg2       Pg2
4    4     123     Enter  Pg3       Pg3
5    5     123     Form1  Pg3     Form1
6    6     123     Form2  Pg3     Form2
8    8     123     Form1  Pg3     Form1
9    9     123 abcbutton  Pg3 abcbutton
10  10     123     Enter  Pg1       Pg1
11  11     123 xyzselect  Pg1 xyzselect
12  12     123     Enter  Pg4       Pg4
13  13     123     Enter  Pg3       Pg3
14  14     123      Back  Pg3      Back
15  15     123     Enter  Pg1       Pg1

数据:

structure(list(row = 1:15, session = c(123L, 123L, 123L, 123L, 
123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L
), click = c("Enter", "phpbutton", "Enter", "Enter", "Form1", 
"Form2", "Form1", "Form1", "abcbutton", "Enter", "xyzselect", 
"Enter", "Enter", "Back", "Enter"), page = c("Pg1", "Pg1", "Pg2", 
"Pg3", "Pg3", "Pg3", "Pg3", "Pg3", "Pg3", "Pg1", "Pg1", "Pg4", 
"Pg3", "Pg3", "Pg1")), row.names = c(NA, -15L), class = "data.frame")
library(dplyr)

dat$activity <- ifelse(dat$click == "Enter" & (lag(dat$page) != dat$page | is.na(lag(dat$page))), dat$page,
       ifelse(lag(dat$page) == dat$page, dat$click, NA))

   row session     click page  activity
1    1     123     Enter  Pg1       Pg1
2    2     123 phpbutton  Pg1 phpbutton
3    3     123     Enter  Pg2       Pg2
4    4     123     Enter  Pg3       Pg3
5    5     123     Form1  Pg3     Form1
6    6     123     Form2  Pg3     Form2
7    7     123     Form1  Pg3     Form1
8    8     123     Form1  Pg3     Form1
9    9     123 abcbutton  Pg3 abcbutton
10  10     123     Enter  Pg1       Pg1
11  11     123 xyzselect  Pg1 xyzselect
12  12     123     Enter  Pg4       Pg4
13  13     123     Enter  Pg3       Pg3
14  14     123      Back  Pg3      Back
15  15     123     Enter  Pg1       Pg1

You could remove duplicate, consecutive rows like this:

dat[cumsum(rle(paste0(dat$session, dat$click, dat$page, dat$activity))$length),]

   row session     click page  activity
1    1     123     Enter  Pg1       Pg1
2    2     123 phpbutton  Pg1 phpbutton
3    3     123     Enter  Pg2       Pg2
4    4     123     Enter  Pg3       Pg3
5    5     123     Form1  Pg3     Form1
6    6     123     Form2  Pg3     Form2
8    8     123     Form1  Pg3     Form1
9    9     123 abcbutton  Pg3 abcbutton
10  10     123     Enter  Pg1       Pg1
11  11     123 xyzselect  Pg1 xyzselect
12  12     123     Enter  Pg4       Pg4
13  13     123     Enter  Pg3       Pg3
14  14     123      Back  Pg3      Back
15  15     123     Enter  Pg1       Pg1

data:

structure(list(row = 1:15, session = c(123L, 123L, 123L, 123L, 
123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L
), click = c("Enter", "phpbutton", "Enter", "Enter", "Form1", 
"Form2", "Form1", "Form1", "abcbutton", "Enter", "xyzselect", 
"Enter", "Enter", "Back", "Enter"), page = c("Pg1", "Pg1", "Pg2", 
"Pg3", "Pg3", "Pg3", "Pg3", "Pg3", "Pg3", "Pg1", "Pg1", "Pg4", 
"Pg3", "Pg3", "Pg1")), row.names = c(NA, -15L), class = "data.frame")
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文