SQL全文索引器、精确匹配和转义

发布于 2024-10-12 22:32:32 字数 1479 浏览 10 评论 0原文

我正在尝试将基于 Lucene.NET 索引的关键字分析器替换为基于 SQL Server 2008 R2 的索引。

我有一个表,其中包含我需要查询的自定义索引字段。索引列的值(见下文)是来自一系列 .NET 类型的自定义索引字段的名称/值对的组合 - 实际值是在运行时从属性中提取的,因为结构未知。

我需要能够使用 AND 和 OR 搜索集合名称和值对,并返回查询匹配的行。

Id          Index
====================================================================
1           [Descriptor.Type]=[5][Descriptor.Url]=[/]
2           [Descriptor.Type]=[23][Descriptor.Url]=[/test]
3           [Descriptor.Type]=[25][Descriptor.Alternative]=[hello]
4           [Descriptor.Type]=[26][Descriptor.Alternative]=[hello][Descriptor.FriendlyName]=[this is a test]

一个简单的查询如下所示:

select * from Indices where contains ([Index], '[Descriptor.Url]=[/]');

该查询将导致以下错误:

Msg 7630, Level 15, State 2, Line 1
Syntax error near '[' in the full-text search condition '[Descriptor.Url]=[/]'.

因此考虑到这一点,我更改了 Index 列中的数据以使用 | 而不是 < code>[ 和 ]

select * from Indices where contains ([Index], '|Descriptor.Url|=|/|');

现在,虽然该查询现在有效,但当我运行包含 Descriptor.Url 并以 / 开头的所有行时返回,而不是完全匹配的记录(在本例中恰好是一条)。

我的问题是,如何转义查询以考虑 [] 并确保仅返回精确匹配行?

更复杂的查询看起来有点像这样:

select * from Indices where contains ([Index], '[Descriptor.Type]=[12] AND ([Descriptor.Url]=[/] OR [Descriptor.Url]=[/test])');

谢谢,
基隆

I'm trying to replace a Keyword Analyser based Lucene.NET index with an SQL Server 2008 R2 based one.

I have a table that contains custom indexed fields that I need to query upon. The value of the index column (see below) is a combination of name/ value pairs of the custom index fields from a series of .NET types - the actual values are pulled from attributes at run time, because the structure is unknown.

I need to be able to search for set name and value pairs, using ANDs and ORs and return the rows where the query matches.

Id          Index
====================================================================
1           [Descriptor.Type]=[5][Descriptor.Url]=[/]
2           [Descriptor.Type]=[23][Descriptor.Url]=[/test]
3           [Descriptor.Type]=[25][Descriptor.Alternative]=[hello]
4           [Descriptor.Type]=[26][Descriptor.Alternative]=[hello][Descriptor.FriendlyName]=[this is a test]

A simple query look like this:

select * from Indices where contains ([Index], '[Descriptor.Url]=[/]');

That query will results in the following error:

Msg 7630, Level 15, State 2, Line 1
Syntax error near '[' in the full-text search condition '[Descriptor.Url]=[/]'.

So with that in mind, I altered the data in the Index column to use | instead of [ and ]:

select * from Indices where contains ([Index], '|Descriptor.Url|=|/|');

Now, while that query is now valid, when I run it all rows containing Descriptor.Url and starting with / are returned, instead of the records (exactly one in this case) that exactly matches.

My question is, how can I escape the query to account for the [ and ] and ensure that just the exact matching row is returned?

A more complex query looks a little like this:

select * from Indices where contains ([Index], '[Descriptor.Type]=[12] AND ([Descriptor.Url]=[/] OR [Descriptor.Url]=[/test])');

Thanks,
Kieron

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

过去的过去 2024-10-19 22:32:32

您的主要问题是使用 SQL 断词器和 CONTAINS 语法。默认情况下,SQL 断字程序会消除标点符号,并对数字、日期、URL、电子邮件地址等进行标准化。它还会将所有内容都小写,并对单词进行词干处理。

因此,对于您的输入字符串:

[描述符.类型]=[5][描述符.Url]=[/]

您会将以下标记添加到索引中(及其位置)

描述符类型 nn5 5 描述符 url

(注意:nn5 是一种简化查询以不同格式给出的数字和日期的方法,原始数字也在同一位置索引)

因此,如您所见,标点符号甚至没有存储在全文索引,因此无法使用 CONTAINS 语句对其进行查询。

因此,您的语句:

select * from Indices where contains ([Index], '|Descriptor.Url|=|/|'); 

在将其提交到全文索引之前,实际上会被查询生成器规范化为“描述符 url”,从而命中“url”旁边有“描述符”的所有条目,不包括标点符号。

您需要的是LIKE 语句

Your main issue is in using a SQL wordbreaker, and the CONTAINS syntax. By default, SQL wordbreakers eliminates punctuation, and normalizes numbers, dates, urls, email addresses, and the like. It also lowercases everything, and stems words.

So, for your input string:

[Descriptor.Type]=[5][Descriptor.Url]=[/]

You would have the following tokens added to the index (along with their positions)

descriptor type nn5 5 descriptor url

(Note: the nn5 is a way to simplify quering numbers and dates given in different formats, the original number is also indexed at the same position)

So, as you can see, the punctutation is not even stored in the full text index, and thus, there is no way to query it using the CONTAINS statement.

So your statement:

select * from Indices where contains ([Index], '|Descriptor.Url|=|/|'); 

Would actually be normalized down to "descriptor url" by the query generator before submitting it to the full text index, thus the hits on all the entries that have "descriptor" next to "url", excluding punctuation.

What you need is the LIKE statement.

半边脸i 2024-10-19 22:32:32

使用“|”因为您的分隔符会导致 contains 查询想到 OR。这就是为什么你会得到意想不到的结果。您应该能够像这样转义括号:

SELECT * FROM Indices WHERE 
    contains ([Index], '[[]Descriptor.Type]=[[]12]') 

Using "|" as your delimiter causes the contains query to think of OR. Which is why you are getting unexpected results. You should be able to escape the bracket like so:

SELECT * FROM Indices WHERE 
    contains ([Index], '[[]Descriptor.Type]=[[]12]') 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文