在子集函数和逻辑运算符中使用多个条件

发布于 2024-11-03 11:02:33 字数 451 浏览 8 评论 0原文

如果我想在R中选择数据的子集,我可以使用subset函数。我想对符合几个标准之一的数据进行分析,例如某个变量是 1、2 或 3。 我尝试过,

myNewDataFrame <- subset(bigfive, subset = (bigfive$bf11==(1||2||3)))

它总是只选择与第一个条件相匹配的值,这里是 1。我的假设是,它将从 1 开始,如果它确实评估为“假”,它将继续到 2,而不是 3,如果没有一个与 == 后面的语句匹配,则为“假”,如果其中一个匹配,则为“真”。

得到了正确的结果,那么:为什么第一种方法不起作用?

 newDataFrame <- subset(bigfive, subset = (bigfive$bf11==c(1,2,3)))

我使用但我希望能够通过逻辑运算符选择数据

If I want to select a subset of data in R, I can use the subset function. I wanted to base an analysis on data that that was matching one of a few criteria, e.g. that a certain variable was either 1, 2 or 3.
I tried

myNewDataFrame <- subset(bigfive, subset = (bigfive$bf11==(1||2||3)))

It did always just select values that matched the first of the criteria, here 1. My assumption was that it would start with 1 and if it does evaluate to "false" it would go on to 2 and than to 3, and if none matches the statement after == is "false" and if one of them matches, it is "true".

I got the right result using

 newDataFrame <- subset(bigfive, subset = (bigfive$bf11==c(1,2,3)))

But I would like to be able to select data via logical operators, so: why did the first approach not work?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

左耳近心 2024-11-10 11:02:33

这里正确的运算符是%in%。下面是一个虚拟数据的示例:

set.seed(1)
dat <- data.frame(bf11 = sample(4, 10, replace = TRUE),
                  foo = runif(10))

给出:

> head(dat)
  bf11       foo
1    2 0.2059746
2    2 0.1765568
3    3 0.6870228
4    4 0.3841037
5    1 0.7698414
6    4 0.4976992

dat 的子集,其中 bf11 等于集合 1,2,3 中的任何一个被视为使用 %in% 如下:

> subset(dat, subset = bf11 %in% c(1,2,3))
   bf11       foo
1     2 0.2059746
2     2 0.1765568
3     3 0.6870228
5     1 0.7698414
8     3 0.9919061
9     3 0.3800352
10    1 0.7774452

至于为什么你的原始版本不起作用,请分解它以查看问题。看看 1||2||3 的计算结果:

> 1 || 2 || 3
[1] TRUE

使用 | 会得到相同的结果。因此,subset() 调用只会返回 bf11TRUE 的行(或计算结果为 TRUE 的行)代码>)。

您可以编写如下内容:

subset(dat, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)

这给出了与我之前的 subset() 调用相同的结果。关键是你需要一系列的单一比较,而不是一系列选项的比较。但正如您所看到的,在这种情况下,%in% 更有用且更简洁。另请注意,我必须使用 |,因为我想将 bf11 的每个元素与 12 进行比较,和3,依次。比较:

> with(dat, bf11 == 1 || bf11 == 2)
[1] TRUE
> with(dat, bf11 == 1 | bf11 == 2)
 [1]  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE

The correct operator is %in% here. Here is an example with dummy data:

set.seed(1)
dat <- data.frame(bf11 = sample(4, 10, replace = TRUE),
                  foo = runif(10))

giving:

> head(dat)
  bf11       foo
1    2 0.2059746
2    2 0.1765568
3    3 0.6870228
4    4 0.3841037
5    1 0.7698414
6    4 0.4976992

The subset of dat where bf11 equals any of the set 1,2,3 is taken as follows using %in%:

> subset(dat, subset = bf11 %in% c(1,2,3))
   bf11       foo
1     2 0.2059746
2     2 0.1765568
3     3 0.6870228
5     1 0.7698414
8     3 0.9919061
9     3 0.3800352
10    1 0.7774452

As to why your original didn't work, break it down to see the problem. Look at what 1||2||3 evaluates to:

> 1 || 2 || 3
[1] TRUE

and you'd get the same using | instead. As a result, the subset() call would only return rows where bf11 was TRUE (or something that evaluated to TRUE).

What you could have written would have been something like:

subset(dat, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)

Which gives the same result as my earlier subset() call. The point is that you need a series of single comparisons, not a comparison of a series of options. But as you can see, %in% is far more useful and less verbose in such circumstances. Notice also that I have to use | as I want to compare each element of bf11 against 1, 2, and 3, in turn. Compare:

> with(dat, bf11 == 1 || bf11 == 2)
[1] TRUE
> with(dat, bf11 == 1 | bf11 == 2)
 [1]  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
李不 2024-11-10 11:02:33

对于您的示例,我相信以下内容应该有效:

myNewDataFrame <- subset(bigfive, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)

请参阅 ?subset 中的示例了解更多信息。只是为了演示,一个更复杂的逻辑子集将是:

data(airquality)
dat <- subset(airquality, subset = (Temp > 80 & Month > 5) | Ozone < 40)

正如 Chase 指出的那样, %in% 在您的示例中会更有效:

myNewDataFrame <- subset(bigfive, subset = bf11 %in% c(1, 2, 3))

正如 Chase 还指出的那样,请确保您了解 < 之间的区别代码>|和<代码>||。要查看运算符的帮助页面,请使用 ?'||',其中运算符被引用。

For your example, I believe the following should work:

myNewDataFrame <- subset(bigfive, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)

See the examples in ?subset for more. Just to demonstrate, a more complicated logical subset would be:

data(airquality)
dat <- subset(airquality, subset = (Temp > 80 & Month > 5) | Ozone < 40)

And as Chase points out, %in% would be more efficient in your example:

myNewDataFrame <- subset(bigfive, subset = bf11 %in% c(1, 2, 3))

As Chase also points out, make sure you understand the difference between | and ||. To see help pages for operators, use ?'||', where the operator is quoted.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文