在Azure搜索中,索引器可以将信息从不同文档组合到单个索引项目而无需互相覆盖吗?
我的目标是创建一个单个可搜索的Azure索引,该索引具有当前存储在许多不同SQL表中的所有相关信息。
我还使用Azure认知服务来添加相关文档中的其他信息。每个文档仅与我的索引中的一个项目绑定,但是索引中的每个项目都将与许多文档有关。
根据我的理解,如果两个文档对索引的密钥具有相同的值,那么索引将覆盖第一个文档中提取的信息,并从第二个文档中提取的信息。我希望有一种方法可以将信息添加而不是覆盖信息。例如:如果两个文档与相同的索引项目有关,我希望将映射到该项目的键形映射的值包括在第一个文档中发现的键形和第二个文档中发现的键盘。
这可能吗?我应该采取不同的方式吗? 如果可能的话,我可以在没有重复值的情况下做到吗?
目前,我有多个索引,并且正在结合每个索引结果,但这似乎效率低下,可能会使默认评分算法混乱。
我发现的每个代码示例只有一个为每个索引项目的文档,并且无法解决我的问题。诚然,我没有如上所述尝试设置我的索引,因为这会需要很多重构,而且我相信这会覆盖自己。
我目前正在使用dotnet编程创建索引和索引器。我假设我的代码与我的问题无关,但是如果需要,我可以提供。
太感谢了!我感谢您能提供的任何反馈。
编辑:我正在考虑创建一种自定义技能来为我完成汇总,但是我不知道该技能将如何访问所需的一切。它需要从当前文档中提取的信息,并且需要先前文档中先前汇总的信息。我想自定义技能可以在索引上进行搜索,并以这种方式获取项目,但这听起来很危险。任何想法都将不胜感激。
My goal is to create a single searchable Azure Index that has all of the relevant information currently stored in many different sql tables.
I'm also using an Azure Cognitive Service to add additional info from related documents. Each document is tied to only a single item in my Index, but each item in the index will be tied to many documents.
According to my understanding, if two documents have the same value for the indexer's Key, then the index will overwrite the extracted information from the first document with the information extracted from the second. I'm hoping there's a way to append the information instead of overwriting it. For example: if two documents relate to the same index item, I want the values mapped to keyphrases for that item to include the keyphrases found in the first document and the keyphrases found in the second document.
Is this possible? Is there a different way I should be approaching this?
If it is possible, can I do it without having duplicate values?
Currently I have multiple indexes and I'm combining the search results from each one, but this seems inefficient and likely messes up the default scoring algorithm.
Every code example I find only has one document for each index item and doesn't address my problem. Admittedly, I haven't tried to set up my index as described above, because it would take a lot of refactoring, and I'm confident it would just overwrite itself.
I am currently creating my indexes and indexers programmatically using dotnet. I'm assuming my code isn't relevant to my question, but I can provide it if need be.
Thank you so much! I'd appreciate any feedback you can give.
Edit: I'm thinking about creating a custom skill to do the aggregation for me, but I don't know how the skill would access access everything it needs. It needs the extracted info from the current document, and it needs the previously aggregated info from previous documents. I guess the custom skill could perform a search on the index and get the item that way, but that sounds dangerously hacky. Any thoughts would be appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
从 docs :
索引动作:上载,合并,合并Pload,删除
您可以按照文档控制索引操作的类型,以指定文档是否应完全上传,与现有文档内容合并或删除。
无论您是使用REST API还是SDK,都支持以下文档操作以进行数据导入:
上传,类似于“ UPSERT”,如果文档是新的,则在其中插入了该文档,并在存在的情况下进行更新或更换。如果文档缺少索引所需的值,则文档字段的值将设置为null。
合并更新已经存在的文档,并使无法找到的文档失败。合并替换现有值。因此,请确保检查包含多个值的收集字段,例如类型集合字段(EDM.String)。例如,如果标签字段以[“预算”]的值开头,并且您与[经济体,“池”]进行合并,则标签字段的最终值是[“经济”,“池”] 。它不会[“预算”,“经济”,“池”]。
合并Pload的行为类似于该文档的存在,如果文档是新的,则上传。
删除从索引中删除整个文档。如果要删除单个字段,请改用合并,将相关字段设置为null。
Pasting from docs:
Indexing actions: upload, merge, mergeOrUpload, delete
You can control the type of indexing action on a per-document basis, specifying whether the document should be uploaded in full, merged with existing document content, or deleted.
Whether you use the REST API or an SDK, the following document operations are supported for data import:
Upload, similar to an "upsert" where the document is inserted if it is new, and updated or replaced if it exists. If the document is missing values that the index requires, the document field's value is set to null.
merge updates a document that already exists, and fails a document that cannot be found. Merge replaces existing values. For this reason, be sure to check for collection fields that contain multiple values, such as fields of type Collection(Edm.String). For example, if a tags field starts with a value of ["budget"] and you execute a merge with ["economy", "pool"], the final value of the tags field is ["economy", "pool"]. It won't be ["budget", "economy", "pool"].
mergeOrUpload behaves like merge if the document exists, and upload if the document is new.
delete removes the entire document from the index. If you want to remove an individual field, use merge instead, setting the field in question to null.