对文档和相关数据进行全文检索 mssql

发布于 2024-09-25 12:39:00 字数 335 浏览 4 评论 0原文

目前正在构建知识库应用程序,我有点不确定存储和索引文档信息的最佳方式。

用户上传文档,并在执行此操作时从下拉列表中选择多个选项(例如类别、主题、区域...,注意这些并非全部是强制性的),他们还输入一些关键字和文档的描述。目前,所选类别(和其他类别)使用类别表中的 id 作为外键存储在文档表中。 我们想要做的是不仅对文档所在的 varchar(max) 列中的信息进行 FREETEXTTABLE 或 CONTAINSTABLE 操作,还对类别名称、主题名称和区域名称等进行操作。

我查看了该选项创建索引视图的方法,但由于针对类别列的 LEFT JOIN,这是不可能的。因此,我不确定如何能够做到这一点,任何想法将不胜感激。

Currently in the middle of building a knowledge base app and am a bit unsure on the best way to store and index the document information.

The user uploads the document and when doing so selects a number of options from dropdown lists (such as category,topic,area..., note these are not all mandatory) they also enter some keywords and a description of the document. At the moment the category (and others) selected is stored as foreign key in the documents table using the id from the categories table.
What we want to be able to do is do a FREETEXTTABLE or CONTAINSTABLE on not only the information within the varchar(max) column where the document is located but also on the category name, topic name and area name etc.

I looked at the option of creating an indexed view but this wasn't possible due to the LEFT JOIN against the category column. So I'm not sure how to go about being able to do this any ideas would be most appreciated.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

丶情人眼里出诗心の 2024-10-02 12:39:00

我假设您想将这两个搜索组合在一起。例如,查找包含文本“foo”且位于“汽车维修”类别中的所有文档。

也许您不需要完整显示附加数据,只需使用 = 或类似的符号即可?如果附加数据相当小,则可能无法保证全文的复杂性。

但是,如果您想对两者使用全文,请使用存储过程将结果汇总在一起。这里的技巧是暂存结果,而不是试图立即返回结果集。

这是一个粗略的起点。

-- a staging table variable for the document results
declare @documentResults table (
    Id int,       
    Rank int
)

insert into @documentResults
select d.Id, results.[rank]
from containstable (documents, (text), '"foo*"') results
inner join documents d on results.[key] = d.Id

-- now you have all of the primary keys that match the search criteria
-- whittle this list down to only include keys that are in the correct categories

-- a staging table variable for each the metadata results
declare @categories table (
    Id int        
)

insert into @categories
select results.[KEY]
from containstable (Categories, (Category), '"Automotive Repair*"') results

declare @topics table (
    Id int        
)

insert into @topics
select results.[KEY]
from containstable (Topics, (Topic), '"Automotive Repair*"') results

declare @areas table (
    Id int        
)

insert into @areas
select results.[KEY]
from containstable (Areas, (Area), '"Automotive Repair*"') results


select d.text, c.category, t.topic, a.area
from @results r
inner join documents d on d.Id = r.Id
inner join @categories c on c.Id = d.CategoryId
inner join @topics t on t.Id = d.TopicId
inner join @areas a on a.Id = d.AreaId

I assume that you want to AND the two searches together. For example find all documents containing the text "foo" AND in category the "Automotive Repair".

Perhaps you don't need to full text the additional data and can just use = or like? If the additional data is reasonably small it may not warrant the complication of full text.

However, if you want to use full text on both, use a stored procedure that pulls the results together for you. The trick here is to stage the results rather than trying to get a result set back straight away.

This is rough starting point.

-- a staging table variable for the document results
declare @documentResults table (
    Id int,       
    Rank int
)

insert into @documentResults
select d.Id, results.[rank]
from containstable (documents, (text), '"foo*"') results
inner join documents d on results.[key] = d.Id

-- now you have all of the primary keys that match the search criteria
-- whittle this list down to only include keys that are in the correct categories

-- a staging table variable for each the metadata results
declare @categories table (
    Id int        
)

insert into @categories
select results.[KEY]
from containstable (Categories, (Category), '"Automotive Repair*"') results

declare @topics table (
    Id int        
)

insert into @topics
select results.[KEY]
from containstable (Topics, (Topic), '"Automotive Repair*"') results

declare @areas table (
    Id int        
)

insert into @areas
select results.[KEY]
from containstable (Areas, (Area), '"Automotive Repair*"') results


select d.text, c.category, t.topic, a.area
from @results r
inner join documents d on d.Id = r.Id
inner join @categories c on c.Id = d.CategoryId
inner join @topics t on t.Id = d.TopicId
inner join @areas a on a.Id = d.AreaId
心安伴我暖 2024-10-02 12:39:00

您可以为全文索引创建一个新列,其中包含原始文档以及作为元数据附加的类别。然后对该列的搜索可以同时搜索文档和类别。您需要发明一个标签系统,使它们在文档中保持唯一,但标签本身不太可能用作搜索短语。也许是这样的:

This is my regular document text. <FTCategory: Automotive Repair> <FTCategory: Transmissions>

You could create a new column for your full text index which would contain the original document plus the categories appended as metadata. Then a search on that column could search both the document and the categories simultaneously. You'd need to invent a tagging system that would keep them unique within your document yet the tags would not be likely to be used as search phrases themselves. Perhaps something like:

This is my regular document text. <FTCategory: Automotive Repair> <FTCategory: Transmissions>
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文