是否可以使用 SAS (PROC SQL) 执行不区分大小写的 DISTINCT?

发布于 2024-07-23 03:07:10 字数 304 浏览 12 评论 0原文

有没有办法从此 SAS SQL 查询中获取不区分大小写的不同行? ...

SELECT DISTINCT country FROM companies;

理想的解决方案将包含单个查询。

结果现在看起来像:

Australia
australia
AUSTRALIA
Hong Kong
HONG KONG

... 其中确实需要 2 个不同行中的任何一个

可以将数据大写,但这会不必要地以不适合此查询目的的方式更改值。

Is there a way to get the case-insensitive distinct rows from this SAS SQL query? ...

SELECT DISTINCT country FROM companies;

The ideal solution would consist of a single query.

Results now look like:

Australia
australia
AUSTRALIA
Hong Kong
HONG KONG

... where any of the 2 distinct rows is really required

One could upper-case the data, but this unnecessarily changes values in a manner that doesn't suit the purpose of this query.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

也只是曾经 2024-07-30 03:07:10

如果你有一些主 int 键(我们称之为 ID),你可以使用:

SELECT country FROM companies
WHERE id =
(
    SELECT Min(id) FROM companies
    GROUP BY Upper(country)
)

If you have some primary int key (let's call it ID), you could use:

SELECT country FROM companies
WHERE id =
(
    SELECT Min(id) FROM companies
    GROUP BY Upper(country)
)
碍人泪离人颜 2024-07-30 03:07:10

规范化大小写似乎确实是可取的——如果“澳大利亚”、“澳大利亚”和“澳大利亚”都出现,那么您希望这三个中的哪一个作为您的查询的“区分大小写的唯一”答案? 如果您热衷于某些特定的启发式方法(例如,计算它们出现的次数并选择最受欢迎的),这当然可以完成,但可能需要大量的额外工作 - 那么,这种挑剔对您来说值多少钱?

Normalizing case does seem advisable -- if 'Australia', 'australia' and 'AUSTRALIA' all occur, which one of the three would you want as the "case-sensitively unique" answer to your query, after all? If you're keen on some specific heuristics (e.g. count how many times they occur and pick the most popular), this can surely be done but might be a huge amount of extra work -- so, how much is such persnicketiness worth to you?

ま柒月 2024-07-30 03:07:10

非 SQL 方法(实际上只有一个步骤,因为数据步骤只是创建视图):


data companies_v /view=companies_v;
  set companies (keep=country);
  _upcase_country = upcase(country);
run;

proc sort data=companies_v out=companies_distinct_countries (drop=_upcase_country) nodupkey noequals;
  by _upcase_country;
run;

A non-SQL method (really only a single step as the data step just creates a view) would be:


data companies_v /view=companies_v;
  set companies (keep=country);
  _upcase_country = upcase(country);
run;

proc sort data=companies_v out=companies_distinct_countries (drop=_upcase_country) nodupkey noequals;
  by _upcase_country;
run;
束缚m 2024-07-30 03:07:10

也许我错过了一些东西,但为什么不只是:

data testZ;
    input Name $;
    cards4;
Bob
Zach
Tim
Eric
Frank
ZacH
BoB
eric
;;;;
run;

proc sql;
    create view distinctNames as
    select distinct Upper(Name) from testz;
quit;

这会创建一个仅将不同名称作为行值的视图。

Maybe I'm missing something, but why not just:

data testZ;
    input Name $;
    cards4;
Bob
Zach
Tim
Eric
Frank
ZacH
BoB
eric
;;;;
run;

proc sql;
    create view distinctNames as
    select distinct Upper(Name) from testz;
quit;

This creates a view with only distinct names as row values.

独木成林 2024-07-30 03:07:10

我的想法与扎克相同,但我想我会用一个更详细的例子来看待这个问题,

proc sql;
    CREATE TABLE contacts (
        line1 CHAR(30), line2 CHAR(30), pcode CHAR(4)
    );
    * Different versions of the same address - L23 Bass Plaza 2199;
    INSERT INTO contacts values('LEVEL 23 bass', 'plaza'  '2199');
    INSERT INTO contacts values('level 23 bass ', ' PLAZA'  '2199');

    INSERT INTO contacts values('Level 23', 'bass plaza'  '2199');
    INSERT INTO contacts values('level 23', 'BASS plaza'  '2199');

    *full address in line 1;
    INSERT INTO contacts values('Level 23 bass plaza', ''  '2199');
    INSERT INTO contacts values(' Level 23 BASS plaza  ', ''  '2199');

;quit;

现在我们可以输出
我。 每个类别一个? 即三个地址?
OR
二. 还是只有一个地址? 如果是这样,我们应该选择哪个版本?

实现案例 1 可以像这样简单:

proc sql;
    SELECT DISTINCT UPCASE(trim(line1)), UPCASE(trim(line2)), pcode 
    FROM contacts 
;quit;

实现案例 2 可以像这样简单:

proc sql;
    SELECT DISTINCT UPCASE( trim(line1) || ' ' || trim(line2) ) , pcode 
    FROM contacts 
;quit;

I was thinking along the same lines as Zach, but thought I would look at the problem with a more elaborate example,

proc sql;
    CREATE TABLE contacts (
        line1 CHAR(30), line2 CHAR(30), pcode CHAR(4)
    );
    * Different versions of the same address - L23 Bass Plaza 2199;
    INSERT INTO contacts values('LEVEL 23 bass', 'plaza'  '2199');
    INSERT INTO contacts values('level 23 bass ', ' PLAZA'  '2199');

    INSERT INTO contacts values('Level 23', 'bass plaza'  '2199');
    INSERT INTO contacts values('level 23', 'BASS plaza'  '2199');

    *full address in line 1;
    INSERT INTO contacts values('Level 23 bass plaza', ''  '2199');
    INSERT INTO contacts values(' Level 23 BASS plaza  ', ''  '2199');

;quit;

Now we can output
i. One from each category? Ie three addresses ?
OR
ii. Or just one address ? if so which version should we prefer ?

Implementing case 1 can be as simple as :

proc sql;
    SELECT DISTINCT UPCASE(trim(line1)), UPCASE(trim(line2)), pcode 
    FROM contacts 
;quit;

Implementing case 2 can be as simple as:

proc sql;
    SELECT DISTINCT UPCASE( trim(line1) || ' ' || trim(line2) ) , pcode 
    FROM contacts 
;quit;
述情 2024-07-30 03:07:10

从 SAS 9 开始:

proc sort data=input_ds sortseq=linguistic(strengh=primary);

  by sort_vars;

跑步;

From SAS 9:

proc sort data=input_ds sortseq=linguistic(strengh=primary);

  by sort_vars;

run;

风柔一江水 2024-07-30 03:07:10

我认为正则表达式可以帮助您找到您想要在搜索字符串中使用的模式。

对于正则表达式,您可以定义一个 UDF,可以参见教程进行准备。 www.sqlteam.com/article/regular-expressions-in-t-sql

谢谢。

I think Regular expressions can help you out with the pattern you want to have in your search string.

For the regular expression you can define a UDF which can be prepared seeing the tutorial. www.sqlteam.com/article/regular-expressions-in-t-sql

Thanks.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文