在WHERE子句/语句中使用indexw函数

发布于 2025-02-12 03:34:40 字数 2353 浏览 1 评论 0原文

我试图了解以下代码的行为。对我来说似乎是错误的,但我很感谢别人看看我只是生气或其他事情。

data testdata;
  length var1 var2 $ 10;
  var1 = "house";  var2 = "";       output;
  var1 = "house";  var2 = "car";    output;
  var1 = "house";  var2 = "house";  output;
run;

proc sql;

  * Select all three obs- ok;
  create table try1 as select * from testdata where indexw("house", var1);

  * Selects one obs - ok;
  create table try2 as select * from testdata where indexw("house", var2);

  * Selects one obs - ok;
  create table try3 as select * from testdata where indexw("car", var2);

  * Selects all three obs - why?;
  create table try4 as select * from testdata where indexw("house", var1) and indexw("house", var2);

  * Selects one obs - ok;
  create table try5 as select * from testdata where indexw("house", var1) and indexw("car", var2);

  * Explicit comparison to zero - selects one obs - ok;
  create table try6 as select * from testdata where indexw("house", var1) and (indexw("house", var2) ne 0);

  * Compare to VAR2 first - selects one obs - ok;
  create table try7 as select * from testdata where indexw("house", var2) and indexw("house", var1);

quit;

当我运行此代码时,Try1具有三个观察结果,Try2具有一个-var1 =“ House”和var2 =“ House”的观测值。这就是我所期望的,基于这一点,我希望Try4仅包含var1和var2都是“房屋”的单个观察结果。而是Try4从输入中选择所有三个观察结果。

更奇怪的是,Try6使用明确的比较零,并且只能按预期选择一个观察值,而Try7也可以逆转比较的顺序。

如果我使用某个语句,则在数据步骤中发生了类似的事情,但是如果我使用if语句:

114  data t;
115    set testdata;
116    where indexw("house", var1) and indexw("house", var2);
117  run;

NOTE: There were 3 observations read from the data set WORK.TESTDATA.
      WHERE not (not INDEXW('house', var1));
NOTE: The data set WORK.T has 3 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


118  data t;
119    set testdata;
120    if indexw("house", var1) and indexw("house", var2);
121  run;

NOTE: There were 3 observations read from the data set WORK.TESTDATA.
NOTE: The data set WORK.T has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

请注意,第一个数据步骤输出所有三个观察结果,而第二个数据仅输出一个。

第一步的日志揭示了一个线索 - 看起来SAS以改变其含义的方式重新解释了Where子句。

这是怎么回事?

在Windows 10上以64位SAS 9.4(TS1M7)运行。

I'm trying to understand the behaviour of the code below. It seems wrong to me but I'd appreciate someone else taking a look in case I'm just going mad or something.

data testdata;
  length var1 var2 $ 10;
  var1 = "house";  var2 = "";       output;
  var1 = "house";  var2 = "car";    output;
  var1 = "house";  var2 = "house";  output;
run;

proc sql;

  * Select all three obs- ok;
  create table try1 as select * from testdata where indexw("house", var1);

  * Selects one obs - ok;
  create table try2 as select * from testdata where indexw("house", var2);

  * Selects one obs - ok;
  create table try3 as select * from testdata where indexw("car", var2);

  * Selects all three obs - why?;
  create table try4 as select * from testdata where indexw("house", var1) and indexw("house", var2);

  * Selects one obs - ok;
  create table try5 as select * from testdata where indexw("house", var1) and indexw("car", var2);

  * Explicit comparison to zero - selects one obs - ok;
  create table try6 as select * from testdata where indexw("house", var1) and (indexw("house", var2) ne 0);

  * Compare to VAR2 first - selects one obs - ok;
  create table try7 as select * from testdata where indexw("house", var2) and indexw("house", var1);

quit;

When I run this code, TRY1 has three observations and TRY2 has one - the ones with VAR1="house" and VAR2="house" respectively. This is what I would expect, and based on that, I would expect TRY4 to only contain the single observation where both VAR1 and VAR2 are "house". Instead TRY4 selects all three observations from the input.

Even more strangely, TRY6 uses an explicit compare to zero and only selects one observation as expected, as does TRY7, which reverses the order of the comparisons.

A similar thing happens in data step if I use a WHERE statement, but not if I use an IF statement:

114  data t;
115    set testdata;
116    where indexw("house", var1) and indexw("house", var2);
117  run;

NOTE: There were 3 observations read from the data set WORK.TESTDATA.
      WHERE not (not INDEXW('house', var1));
NOTE: The data set WORK.T has 3 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


118  data t;
119    set testdata;
120    if indexw("house", var1) and indexw("house", var2);
121  run;

NOTE: There were 3 observations read from the data set WORK.TESTDATA.
NOTE: The data set WORK.T has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

Note that the first data step outputs all three observations, while the second only outputs one.

The log for the first step reveals a clue - it looks like SAS has re-interpreted the WHERE clause in a way that changes its meaning.

What's going on here?

Run in 64-bit SAS 9.4 (TS1M7) on Windows 10.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

泡沫很甜 2025-02-19 03:34:40

该错误至少在SAS中至少可以追溯到9.4m5。直接联系SAS,以获取有关此错误的详细信息。您可以通过以下方式提出一张票: http://support.sas.com

我记得它与使用有关相同的功能(可能还要求两个用法都使用相同的第一个参数?),并且作为您的示例显示将功能结果视为布尔值。

两种表达式中的相似性使优化逻辑混淆。添加您展示的工作以避免错误的工作的显式比较操作员的工作。

 where 0<indexw("house", var1) and 0<indexw("house", var2);

The bug is in SAS at least as far back as 9.4M5 also. Contact SAS directly to get details on this bug. You can raise a ticket at: http://support.sas.com

As I remember it is related to using the same function (possibly it also requires that both usages use the same first argument?) and also as your example shows treating the function result as a boolean.

The similarity in the two expressions is confusing the optimization logic. The work around of adding a explicit comparison operator that you showed works to avoid the bug.

 where 0<indexw("house", var1) and 0<indexw("house", var2);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文