如何对忽略文章(“the”、“a”等)的 SQLite 查询进行排序?

发布于 2024-09-19 00:37:17 字数 402 浏览 8 评论 0原文

我正在使用 C# 显示从 SQLite 数据库调用的电影标题列表。目前,我正在使用一个自定义 ListBox 类,该类具有对文本进行排序的功能,从每个项目的开头删除单词“The”。然而,这似乎并不是最简单的方法,因为它从 SQLite 数据库调用,然后进行排序。我更愿意将其缩减为一步,希望直接从“SELECT”查询中的数据库进行排序。

我对此进行了一些搜索,并找到了一些建议,包括在数据库中创建额外的排序列。虽然这当然是可能的,但我想知道是否有任何更简单的选项不需要插入几乎相同的重复信息(特别是如果数据库变得更大)。我对 SQLite 还很陌生,但我读过一些有关创建可用于创建自定义排序的整理函数的内容。但是,我不确定这是否适合它,并且似乎找不到在 C# 中实现它的任何帮助。

希望有人能够分享一些指导。如果额外的排序列是最好的方法,那么这就是我要做的。

I'm using C# to display a list of movie titles that I am calling from an SQLite database. Currently, I'm using a custom ListBox class that has a function to sort the text stripping the word 'The' from the beginning of every item. However, it doesn't exactly seem to be the simplest way to do it, since it calls from the SQLite database and then sorts. I'd prefer to cut it down to just one step, hopefully sorting straight from the database in my "SELECT" query.

I've done some searching on this, and have found some suggestions, including creating an extra sort-by column in the database. While this is certainly a possibility, I'm wondering if there's any simpler options that don't require inserting almost identical duplicate information (especially if the database becomes larger). I'm pretty new to SQLite, but I've read something about creating a collate function that can be used to create custom ordering. However, I'm not sure if this is appropriate use for it and can't seem to find any help with implementing it in C#.

Was hoping someone might be able to share some guidance. If an extra sorting column is the best way to go, then that is what I shall do.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

忆梦 2024-09-26 00:37:33

创建一个虚拟列(可以在 C# 中实现的函数的结果)并在此虚拟​​列上进行排序。该函数可以将“The”移至末尾,如“三个火枪手,The”中的那样,或者丢弃“The”,无论您想要它做什么。

Create a virtual column (result of a function that can be implemented in C#) and sort on this virtual column. The function could move "The" to the end as in "Three Musketeers, The" or discard "The", whatever you want it to do.

稀香 2024-09-26 00:37:32

您可以尝试在标题。然后,您将能够对标题中的任何单词进行快速搜索,而无需进行大量额外的工作。例如,用户查询好坏丑可能会产生“The Good, the Bad and the Ugly”作为其第一个结果。所有这些的额外成本一般约为文本本身长度的四分之一,但对于您的数据集来说可能会更多,因为标题不是完整的英文文本。您还需要花时间构建这些额外的索引 - 您不想在实时系统的主数据集上构建它们(显然) - 但这不应该是太大的问题。

You could try building a table that supports full-text searching (using the FTS module) on the title. Then you'll be able to do fast searches on any words in the title without requiring lots of extra work on your part. For example, a user query of good bad ugly might produce “The Good, the Bad and the Ugly” as one of its first results. The extra cost of all this is about a quarter of the length of the text itself in general, but might be more for your dataset, as titles aren't full english text. You also need to spend the time building those extra indices – you don't want to build them on your main dataset on a live system (obviously) – but that shouldn't be too big a problem.

最美的太阳 2024-09-26 00:37:31

在 C# 代码中

如果您想在 C# 中执行此操作,请使用 LINQ 为您执行排序。我已经在 PasteBin 上发布了完整示例。这将使您能够:

  • 避免在数据库中重复数据
  • 像往常一样利用数据库索引,无论哪个 RDBMS
  • 在配置文件中放入干扰词,从而减少修改列表时的停机时间/重建/重新部署
  • 确保解决方案是 中更具可读性
DropDownList1.DataSource = myBooks.OrderBy(n => ReplaceNoise(n.Title))

public string ReplaceNoise(string input)
{
     string[] noise = new string[] { "the", "an", "a" };

     //surely this could be LINQ'd 
     foreach (string n in noise)
     {
         if (input.ToLower().StartsWith(n))
         {
             return input.Substring(n.Length).Trim();
         }
     }
     return input;
}

在您的客户端代码在您的 SQLite 语句中

,简单地用 order by 中的空格替换干扰词怎么样?这是一个丑陋的第一步,但强烈考虑使用一个新列来存储该值以用于排序目的。

ORDER BY REPLACE(REPLACE([title],'the',''), 'a', '')

不可否认,当你最终得到这样的结果时,这会变得很难看:

REPLACE(REPLACE(REPLACE(REPLACE([title],'The ',''),'a',''),'of',''),'by','')

Within C# Code

If you wanted to do this within C#, use LINQ to do the ordering for you. I've posted a full sample on PasteBin. This will allow you to:

  • avoid duplicating data in your database
  • take advantage of DB indexes as you normally would, no matter which RDBMS
  • put in noise words in a config file, thereby reducing downtime/rebuild/redeploy when modifying the list
  • ensure a solution is more readable in your client code
DropDownList1.DataSource = myBooks.OrderBy(n => ReplaceNoise(n.Title))

public string ReplaceNoise(string input)
{
     string[] noise = new string[] { "the", "an", "a" };

     //surely this could be LINQ'd 
     foreach (string n in noise)
     {
         if (input.ToLower().StartsWith(n))
         {
             return input.Substring(n.Length).Trim();
         }
     }
     return input;
}

Within your SQLite statement

How about simply replacing the noise words with blanks in the order by? It's an ugly first step, but strongly consider a new column to store this value for sorting purposes.

ORDER BY REPLACE(REPLACE([title],'the',''), 'a', '')

Admittedly, this gets ugly when you end up with this:

REPLACE(REPLACE(REPLACE(REPLACE([title],'The ',''),'a',''),'of',''),'by','')
没有伤那来痛 2024-09-26 00:37:28

您可以将每个标题存储为两部分:标题前缀

使用 SQLite,您可以通过 || 组合 2 个字符串值运算符也称为连接运算符

下面是一个示例:

SELECT prefix || ' ' || title FROM movies ORDER BY title

如果前缀为空,因此前面没有空格,您还可以使用 ltrim

SELECT ltrim(prefix || ' ' || title) FROM movies ORDER BY title

另一种选择是将前缀存储在标题末尾。例如,在很多电影商店,您会看到类似的内容:

三个火枪手

You could store each title in 2 parts: title and prefix.

With SQLite you can combine 2 string values via the || operator also known as the concatenate operator.

Here's an example:

SELECT prefix || ' ' || title FROM movies ORDER BY title

You can also use ltrim in case prefix is empty, so you don't have a space at the front:

SELECT ltrim(prefix || ' ' || title) FROM movies ORDER BY title

Another alternative is to store the prefix at the end of the title. For example at a lot of movie stores you will see something like:

Three Musketeers, The

黑凤梨 2024-09-26 00:37:26

这是解决方案:

ORDER BY (CASE 
    WHEN sortTitle LIKE 'the %' THEN substr(sortTitle,5) 
    WHEN sortTitle LIKE 'a %' THEN substr(sortTitle,3) 
    WHEN sortTitle LIKE 'an %' THEN substr(sortTitle,4) 
    ELSE sortTitle END)

Here is the solution:

ORDER BY (CASE 
    WHEN sortTitle LIKE 'the %' THEN substr(sortTitle,5) 
    WHEN sortTitle LIKE 'a %' THEN substr(sortTitle,3) 
    WHEN sortTitle LIKE 'an %' THEN substr(sortTitle,4) 
    ELSE sortTitle END)
跨年 2024-09-26 00:37:25

为了避免插入重复数据,可以有两列:TITLE_PREFIX(通常为空,但有时包含“The”或“A”;此列上没有索引)和TITLE(包含不带“The”或“A”的标题;这是您创建索引的列)。要显示数据,您必须组合 TITLE_PREFIX 和 TITLE。但你只需搜索“TITLE”即可。

To avoid inserting duplicate data, what about having two columns: TITLE_PREFIX (usually empty, but sometimes contains "The ", or "A "; no index on this column) and TITLE (contains the title without "The " or "A "; this is the column you create the index on). To display the data, you have to combine TITLE_PREFIX and TITLE. But you just search on TITLE.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文