是否可以使用 SAS (PROC SQL) 执行不区分大小写的 DISTINCT?
有没有办法从此 SAS SQL 查询中获取不区分大小写的不同行? ...
SELECT DISTINCT country FROM companies;
理想的解决方案将包含单个查询。
结果现在看起来像:
Australia
australia
AUSTRALIA
Hong Kong
HONG KONG
... 其中确实需要 2 个不同行中的任何一个
可以将数据大写,但这会不必要地以不适合此查询目的的方式更改值。
Is there a way to get the case-insensitive distinct rows from this SAS SQL query? ...
SELECT DISTINCT country FROM companies;
The ideal solution would consist of a single query.
Results now look like:
Australia
australia
AUSTRALIA
Hong Kong
HONG KONG
... where any of the 2 distinct rows is really required
One could upper-case the data, but this unnecessarily changes values in a manner that doesn't suit the purpose of this query.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
如果你有一些主 int 键(我们称之为 ID),你可以使用:
If you have some primary int key (let's call it ID), you could use:
规范化大小写似乎确实是可取的——如果“澳大利亚”、“澳大利亚”和“澳大利亚”都出现,那么您希望这三个中的哪一个作为您的查询的“区分大小写的唯一”答案? 如果您热衷于某些特定的启发式方法(例如,计算它们出现的次数并选择最受欢迎的),这当然可以完成,但可能需要大量的额外工作 - 那么,这种挑剔对您来说值多少钱?
Normalizing case does seem advisable -- if 'Australia', 'australia' and 'AUSTRALIA' all occur, which one of the three would you want as the "case-sensitively unique" answer to your query, after all? If you're keen on some specific heuristics (e.g. count how many times they occur and pick the most popular), this can surely be done but might be a huge amount of extra work -- so, how much is such persnicketiness worth to you?
非 SQL 方法(实际上只有一个步骤,因为数据步骤只是创建视图):
A non-SQL method (really only a single step as the data step just creates a view) would be:
也许我错过了一些东西,但为什么不只是:
这会创建一个仅将不同名称作为行值的视图。
Maybe I'm missing something, but why not just:
This creates a view with only distinct names as row values.
我的想法与扎克相同,但我想我会用一个更详细的例子来看待这个问题,
现在我们可以输出
我。 每个类别一个? 即三个地址?
OR
二. 还是只有一个地址? 如果是这样,我们应该选择哪个版本?
实现案例 1 可以像这样简单:
实现案例 2 可以像这样简单:
I was thinking along the same lines as Zach, but thought I would look at the problem with a more elaborate example,
Now we can output
i. One from each category? Ie three addresses ?
OR
ii. Or just one address ? if so which version should we prefer ?
Implementing case 1 can be as simple as :
Implementing case 2 can be as simple as:
从 SAS 9 开始:
proc sort data=input_ds sortseq=linguistic(strengh=primary);
跑步;
From SAS 9:
proc sort data=input_ds sortseq=linguistic(strengh=primary);
run;
我认为正则表达式可以帮助您找到您想要在搜索字符串中使用的模式。
对于正则表达式,您可以定义一个 UDF,可以参见教程进行准备。 www.sqlteam.com/article/regular-expressions-in-t-sql
谢谢。
I think Regular expressions can help you out with the pattern you want to have in your search string.
For the regular expression you can define a UDF which can be prepared seeing the tutorial. www.sqlteam.com/article/regular-expressions-in-t-sql
Thanks.