基于 R 中的 grep return 编写新变量
我有一个变量actor
,它是一个字符串,包含诸如“几内亚比绍军事力量(1989-1992)”之类的值以及大量其他不同的值相当复杂。我一直在使用 grep() 来查找与不同类型的演员匹配的角色模式。例如,当
actor
包含 “military Forces of”
时,我想将新变量 actor_type
编码为 1
,不包含“mutiny of”
,字符串变量country
也包含在变量actor
中。
我不知道如何有条件地创建这个新变量而不诉诸某种类型的可怕的 for 循环。帮我!
数据大致如下:
| | actor | country |
|---+----------------------------------------------------+-----------------|
| 1 | "military forces of guinea-bissau" | "guinea-bissau" |
| 2 | "mutiny of military forces of guinea-bissau" | "guinea-bissau" |
| 3 | "unidentified armed group (guinea-bissau)" | "guinea-bissau" |
| 4 | "mfdc: movement of democratic forces of casamance" | "guinea-bissau" |
I have a variable actor
which is a string and contains values like "military forces of guinea-bissau (1989-1992)"
and a large range of other different values that are fairly complex. I have been using grep()
to find character patterns that match different types of actors. For example I would like to code a new variable actor_type
as 1
when actor
contains "military forces of"
, doesn't contain "mutiny of"
, and the string variable country
is also contained in the variable actor
.
I am at a loss as to how to conditionally create this new variable without resorting to some type of horrible for loop. Help me!
Data looks roughly like this:
| | actor | country |
|---+----------------------------------------------------+-----------------|
| 1 | "military forces of guinea-bissau" | "guinea-bissau" |
| 2 | "mutiny of military forces of guinea-bissau" | "guinea-bissau" |
| 3 | "unidentified armed group (guinea-bissau)" | "guinea-bissau" |
| 4 | "mfdc: movement of democratic forces of casamance" | "guinea-bissau" |
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您的数据位于
data.frame
df 中:grepl
返回一个逻辑向量,并且可以将其分配给任何内容,例如df$actor_type
。打破该装置:
!grepl('mutiny of', df$actor)
和grepl('militaryforces of', df$actor)
满足您的前两个要求。最后一段,apply(df,1,function(x) grepl(x[2],x[1]))
逐行进行,greps
表示国家/地区演员。if your data is in a
data.frame
df:grepl
returns a logical vector and this can be assigned to whatever, e.g.df$actor_type
.breaking that appart:
!grepl('mutiny of', df$actor)
andgrepl('military forces of', df$actor)
satisfy your first two requirements. the last piece,apply(df,1,function(x) grepl(x[2],x[1]))
goes row by row andgreps
for country in actor.