r operator-precedence logical-operators subset

在子集函数和逻辑运算符中使用多个条件

发布于 2024-11-03 11:02:33 字数 451 浏览 8 评论 0原文

如果我想在R中选择数据的子集，我可以使用subset函数。我想对符合几个标准之一的数据进行分析，例如某个变量是 1、2 或 3。我尝试过，

myNewDataFrame <- subset(bigfive, subset = (bigfive$bf11==(1||2||3)))

它总是只选择与第一个条件相匹配的值，这里是 1。我的假设是，它将从 1 开始，如果它确实评估为“假”，它将继续到 2，而不是 3，如果没有一个与 == 后面的语句匹配，则为“假”，如果其中一个匹配，则为“真”。

得到了正确的结果，那么：为什么第一种方法不起作用？

 newDataFrame <- subset(bigfive, subset = (bigfive$bf11==c(1,2,3)))

我使用但我希望能够通过逻辑运算符选择数据

原文

If I want to select a subset of data in R, I can use the subset function. I wanted to base an analysis on data that that was matching one of a few criteria, e.g. that a certain variable was either 1, 2 or 3.
I tried

myNewDataFrame <- subset(bigfive, subset = (bigfive$bf11==(1||2||3)))

It did always just select values that matched the first of the criteria, here 1. My assumption was that it would start with 1 and if it does evaluate to "false" it would go on to 2 and than to 3, and if none matches the statement after == is "false" and if one of them matches, it is "true".

I got the right result using

 newDataFrame <- subset(bigfive, subset = (bigfive$bf11==c(1,2,3)))

But I would like to be able to select data via logical operators, so: why did the first approach not work?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

左耳近心 2024-11-10 11:02:33

这里正确的运算符是%in%。下面是一个虚拟数据的示例：

set.seed(1)
dat <- data.frame(bf11 = sample(4, 10, replace = TRUE),
                  foo = runif(10))

给出：

> head(dat)
  bf11       foo
1    2 0.2059746
2    2 0.1765568
3    3 0.6870228
4    4 0.3841037
5    1 0.7698414
6    4 0.4976992

dat 的子集，其中 bf11 等于集合 1,2,3 中的任何一个被视为使用 %in% 如下：

> subset(dat, subset = bf11 %in% c(1,2,3))
   bf11       foo
1     2 0.2059746
2     2 0.1765568
3     3 0.6870228
5     1 0.7698414
8     3 0.9919061
9     3 0.3800352
10    1 0.7774452

至于为什么你的原始版本不起作用，请分解它以查看问题。看看 1||2||3 的计算结果：

> 1 || 2 || 3
[1] TRUE

使用 | 会得到相同的结果。因此，subset() 调用只会返回 bf11 为 TRUE 的行（或计算结果为 TRUE 的行）代码>）。

您可以编写如下内容：

subset(dat, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)

这给出了与我之前的 subset() 调用相同的结果。关键是你需要一系列的单一比较，而不是一系列选项的比较。但正如您所看到的，在这种情况下，%in% 更有用且更简洁。另请注意，我必须使用 |，因为我想将 bf11 的每个元素与 1、2 进行比较，和3，依次。比较：

> with(dat, bf11 == 1 || bf11 == 2)
[1] TRUE
> with(dat, bf11 == 1 | bf11 == 2)
 [1]  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE

The correct operator is %in% here. Here is an example with dummy data:

set.seed(1)
dat <- data.frame(bf11 = sample(4, 10, replace = TRUE),
                  foo = runif(10))

giving:

> head(dat)
  bf11       foo
1    2 0.2059746
2    2 0.1765568
3    3 0.6870228
4    4 0.3841037
5    1 0.7698414
6    4 0.4976992

The subset of dat where bf11 equals any of the set 1,2,3 is taken as follows using %in%:

> subset(dat, subset = bf11 %in% c(1,2,3))
   bf11       foo
1     2 0.2059746
2     2 0.1765568
3     3 0.6870228
5     1 0.7698414
8     3 0.9919061
9     3 0.3800352
10    1 0.7774452

As to why your original didn't work, break it down to see the problem. Look at what 1||2||3 evaluates to:

> 1 || 2 || 3
[1] TRUE

and you'd get the same using | instead. As a result, the subset() call would only return rows where bf11 was TRUE (or something that evaluated to TRUE).

What you could have written would have been something like:

subset(dat, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)

Which gives the same result as my earlier subset() call. The point is that you need a series of single comparisons, not a comparison of a series of options. But as you can see, %in% is far more useful and less verbose in such circumstances. Notice also that I have to use | as I want to compare each element of bf11 against 1, 2, and 3, in turn. Compare:

> with(dat, bf11 == 1 || bf11 == 2)
[1] TRUE
> with(dat, bf11 == 1 | bf11 == 2)
 [1]  TRUE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE

回复收藏 0 原文

李不 2024-11-10 11:02:33

对于您的示例，我相信以下内容应该有效：

myNewDataFrame <- subset(bigfive, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)

请参阅 ?subset 中的示例了解更多信息。只是为了演示，一个更复杂的逻辑子集将是：

data(airquality)
dat <- subset(airquality, subset = (Temp > 80 & Month > 5) | Ozone < 40)

正如 Chase 指出的那样， %in% 在您的示例中会更有效：

myNewDataFrame <- subset(bigfive, subset = bf11 %in% c(1, 2, 3))

正如 Chase 还指出的那样，请确保您了解 < 之间的区别代码>|和<代码>||。要查看运算符的帮助页面，请使用 ?'||'，其中运算符被引用。

For your example, I believe the following should work:

myNewDataFrame <- subset(bigfive, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)

See the examples in ?subset for more. Just to demonstrate, a more complicated logical subset would be:

data(airquality)
dat <- subset(airquality, subset = (Temp > 80 & Month > 5) | Ozone < 40)

And as Chase points out, %in% would be more efficient in your example:

myNewDataFrame <- subset(bigfive, subset = bf11 %in% c(1, 2, 3))

As Chase also points out, make sure you understand the difference between | and ||. To see help pages for operators, use ?'||', where the operator is quoted.

回复收藏 0 原文

~没有更多了~