如何在 SQL 中排序,忽略冠词(“the”、“a”、“an”等)

发布于 2024-09-10 03:24:20 字数 931 浏览 8 评论 0原文

这个问题出现了很多次,我可以看到它出现在 StackOverflow 上的 XSLTRubyDrupal 但我没有看到它专门用于 SQL。

所以问题是,当标题以“The”、“A”或“An”开头时,如何正确排序?

一种方法是简单地 TRIM() 这些字符串:

ORDER BY TRIM( 
  LEADING 'a ' FROM 
  TRIM( 
    LEADING 'an ' FROM 
    TRIM( 
      LEADING 'the ' FROM LOWER( title ) 
      ) 
    ) 
  )

这是 不久前在 AskMeFi 上建议(它需要 LOWER() 函数吗?)。

我知道我也见过某种 Case/Switch 实现,但在 Google 上搜索有点困难。

显然,有多种可能的解决方案。如果 SQL 专家能权衡一下对性能的影响,那就太好了。

This comes up a lot, and I can see it's come up on StackOverflow for XSLT, Ruby and Drupal but I don't see it specifically for SQL.

So the question is, how do you sort titles correctly when they begin with "The", "A", or "An"?

One way is simply to TRIM() those strings:

ORDER BY TRIM( 
  LEADING 'a ' FROM 
  TRIM( 
    LEADING 'an ' FROM 
    TRIM( 
      LEADING 'the ' FROM LOWER( title ) 
      ) 
    ) 
  )

which was suggested on AskMeFi a while back (does it need that LOWER() function?).

I know I've also seen some kind of Case/Switch implementation of this but it's a little hard to Google for.

Obviously there are a number of possible solutions. What would be good is SQL gurus weighing in on which have performance implications.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

回梦 2024-09-17 03:24:20

我见过的一种方法是有两列 - 一列用于显示,另一列用于排序:

description  |  sort_desc
----------------------------
The the      | the, The
A test         | test, A
I, Robot      | i, Robot

我没有进行任何实际测试,但这具有能够使用索引并且不需要字符串操作的好处每次您想按说明订购时。除非您的数据库支持物化视图(MySQL 不支持),否则将逻辑实现为视图中的计算列不会带来任何好处,因为您无法对计算列建立索引。

One approach I've seen was to have two columns - one for display and the other for sorting:

description  |  sort_desc
----------------------------
The the      | the, The
A test         | test, A
I, Robot      | i, Robot

I haven't done any real world testing, but this has the benefit of being able to use an index and doesn't require string manipulation every time you want to order by the description. Unless your database supports materialized views (which MySQL doesn't), implementing the logic as a computed column in a view wouldn't provide any benefit because you can't index the computed column.

刘备忘录 2024-09-17 03:24:20

我已经使用这个多年了,但不记得在哪里找到它

SELECT 
CASE
    WHEN SUBSTRING_INDEX(Title, ' ', 1) IN ('a', 'an', 'the') 
    THEN CONCAT( SUBSTRING( Title, INSTR(Title, ' ') + 1 ), ', ', SUBSTRING_INDEX(Title, ' ', 1) ) 
    ELSE Title 
    END AS TitleSort,
Title AS OriginalTitle 
FROM yourtable 
ORDER BY TitleSort 

TitleSort                  | OriginalTitle
------------------------------------------------------
All About Everything       | All About Everything
Beginning Of The End, The  | The Beginning Of The End
Interesting Story, An      | An Interesting Story
Very Long Story, A         | A Very Long Story

I've been using this for years, but can't remember where I found it:

SELECT 
CASE
    WHEN SUBSTRING_INDEX(Title, ' ', 1) IN ('a', 'an', 'the') 
    THEN CONCAT( SUBSTRING( Title, INSTR(Title, ' ') + 1 ), ', ', SUBSTRING_INDEX(Title, ' ', 1) ) 
    ELSE Title 
    END AS TitleSort,
Title AS OriginalTitle 
FROM yourtable 
ORDER BY TitleSort 

Yielding:

TitleSort                  | OriginalTitle
------------------------------------------------------
All About Everything       | All About Everything
Beginning Of The End, The  | The Beginning Of The End
Interesting Story, An      | An Interesting Story
Very Long Story, A         | A Very Long Story
肩上的翅膀 2024-09-17 03:24:20

特别是对于 Postgres,您可以使用 regexp_replace 来为您完成这项工作:

BEGIN;
CREATE TEMPORARY TABLE book (name VARCHAR NOT NULL) ON COMMIT DROP;
INSERT INTO book (name) VALUES ('The Hitchhiker’s Guide to the Galaxy');
INSERT INTO book (name) VALUES ('The Restaurant at the End of the Universe');
INSERT INTO book (name) VALUES ('Life, the Universe and Everything');
INSERT INTO book (name) VALUES ('So Long, and Thanks for All the Fish');
INSERT INTO book (name) VALUES ('Mostly Harmless');
INSERT INTO book (name) VALUES ('A book by Douglas Adams');
INSERT INTO book (name) VALUES ('Another book by Douglas Adams');
INSERT INTO book (name) VALUES ('An omnibus of books by Douglas Adams');

SELECT name FROM book ORDER BY name;
SELECT name, regexp_replace(lower(name), '^(an?|the) (.*)
, '\2, \1') FROM book ORDER BY 2;
SELECT name FROM book ORDER BY regexp_replace(lower(name), '^(an?|the) (.*)
, '\2, \1');
COMMIT;

For Postgres specifically, you can use regexp_replace to do the work for you:

BEGIN;
CREATE TEMPORARY TABLE book (name VARCHAR NOT NULL) ON COMMIT DROP;
INSERT INTO book (name) VALUES ('The Hitchhiker’s Guide to the Galaxy');
INSERT INTO book (name) VALUES ('The Restaurant at the End of the Universe');
INSERT INTO book (name) VALUES ('Life, the Universe and Everything');
INSERT INTO book (name) VALUES ('So Long, and Thanks for All the Fish');
INSERT INTO book (name) VALUES ('Mostly Harmless');
INSERT INTO book (name) VALUES ('A book by Douglas Adams');
INSERT INTO book (name) VALUES ('Another book by Douglas Adams');
INSERT INTO book (name) VALUES ('An omnibus of books by Douglas Adams');

SELECT name FROM book ORDER BY name;
SELECT name, regexp_replace(lower(name), '^(an?|the) (.*)
, '\2, \1') FROM book ORDER BY 2;
SELECT name FROM book ORDER BY regexp_replace(lower(name), '^(an?|the) (.*)
, '\2, \1');
COMMIT;
指尖凝香 2024-09-17 03:24:20

我只能代表 SQL Server:您在 CASE 语句中使用 LTRIM。不需要 LOWER 函数,因为默认情况下选择不区分大小写。但是,如果您想忽略文章,那么我建议您使用干扰词词典并设置全文索引目录。我不确定其他实现是否 SQL 支持这一点。

I can only speak for SQL Server: you use LTRIM within CASE statements. No LOWER function is needed because selections are not case sensitive by default. However, if you want to ignore articles then I would suggest you use a noise word dictionary and set up a full text indexing catalog. I am unsure if other implementations are SQL support this.

旧时光的容颜 2024-09-17 03:24:20

需要LOWER。虽然 SELECT 不区分大小写,但 ORDER BY 却区分大小写。

LOWER is needed. While SELECT is not case-sensitive, ORDER BY is.

心凉 2024-09-17 03:24:20

尝试以下操作:

ORDER BY
替换(替换(替换(YOURCOLUMN,'THE',''),'a\'',''),'an','')

未测试!

Try the following:

ORDER BY
replace(replace(replace(YOURCOLUMN,'THE',''),'a\'',''),'an','')

Not tested!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文