基于 R 中的 grep return 编写新变量

发布于 2025-01-02 16:40:02 字数 876 浏览 1 评论 0原文

我有一个变量actor,它是一个字符串,包含诸如“几内亚比绍军事力量(1989-1992)”之类的值以及大量其他不同的值相当复杂。我一直在使用 grep() 来查找与不同类型的演员匹配的角色模式。例如,当 actor 包含 “military Forces of” 时,我想将新变量 actor_type 编码为 1,不包含“mutiny of”,字符串变量country也包含在变量actor中。

我不知道如何有条件地创建这个新变量而不诉诸某种类型的可怕的 for 循环。帮我!

数据大致如下:

|   | actor                                              | country         |
|---+----------------------------------------------------+-----------------|
| 1 | "military forces of guinea-bissau"                 | "guinea-bissau" |
| 2 | "mutiny of military forces of guinea-bissau"       | "guinea-bissau" |
| 3 | "unidentified armed group (guinea-bissau)"         | "guinea-bissau" |
| 4 | "mfdc: movement of democratic forces of casamance" | "guinea-bissau" |

I have a variable actor which is a string and contains values like "military forces of guinea-bissau (1989-1992)" and a large range of other different values that are fairly complex. I have been using grep() to find character patterns that match different types of actors. For example I would like to code a new variable actor_type as 1 when actor contains "military forces of", doesn't contain "mutiny of", and the string variable country is also contained in the variable actor.

I am at a loss as to how to conditionally create this new variable without resorting to some type of horrible for loop. Help me!

Data looks roughly like this:

|   | actor                                              | country         |
|---+----------------------------------------------------+-----------------|
| 1 | "military forces of guinea-bissau"                 | "guinea-bissau" |
| 2 | "mutiny of military forces of guinea-bissau"       | "guinea-bissau" |
| 3 | "unidentified armed group (guinea-bissau)"         | "guinea-bissau" |
| 4 | "mfdc: movement of democratic forces of casamance" | "guinea-bissau" |

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

遗弃M 2025-01-09 16:40:02

如果您的数据位于 data.frame df 中:

> ifelse(!grepl('mutiny of' , df$actor) & grepl('military forces of',df$actor) & apply(df,1,function(x) grepl(x[2],x[1])),1,0)
[1] 1 0 0 0

grepl 返回一个逻辑向量,并且可以将其分配给任何内容,例如 df$actor_type

打破该装置:

!grepl('mutiny of', df$actor)grepl('militaryforces of', df$actor) 满足您的前两个要求。最后一段,apply(df,1,function(x) grepl(x[2],x[1])) 逐行进行,greps 表示国家/地区演员。

if your data is in a data.frame df:

> ifelse(!grepl('mutiny of' , df$actor) & grepl('military forces of',df$actor) & apply(df,1,function(x) grepl(x[2],x[1])),1,0)
[1] 1 0 0 0

grepl returns a logical vector and this can be assigned to whatever, e.g. df$actor_type.

breaking that appart:

!grepl('mutiny of', df$actor) and grepl('military forces of', df$actor) satisfy your first two requirements. the last piece, apply(df,1,function(x) grepl(x[2],x[1])) goes row by row and greps for country in actor.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文