使用 OR 在 MarkLogic 上进行 Xquery
这是 MarkLogic 新手的问题。想象一下像这样的 xml 结构,它是我实际业务问题的浓缩:
<Person id="1">
<Name>Bob</Name>
<City>Oakland</City>
<Phone>2122931022</Phone>
<Phone>3123032902</Phone>
</Person>
请注意,一个文档可以并且将会有多个 Phone 元素。
我需要从具有与电话号码列表中的任何匹配的Phone元素的每个文档中返回信息。该列表中可能有几十个电话号码。
我已经尝试过:
let $a := cts:word-query("3738494044")
let $b := cts:word-query("2373839383")
let $c := cts:word-query("3933849383")
let $or := cts:or-query( ($a, $b, $c) )
return cts:search(/Person/Phone, $or)
它正确执行查询,但它返回结果元素内的 Phone 元素序列。我的目标是为每个匹配文档返回所有 Name 和 City 元素以及 Person 元素中的 id 属性。示例:
<results>
<match id="18" phone="2123339494" name="bob" city="oakland"/>
<match id="22" phone="3940594844" name="mary" city="denver"/>
etc...
</results>
所以我认为我需要某种形式的 cts:search ,它既允许这种布尔功能,又允许我指定返回每个文档的哪些部分。那时我可以使用 XPATH 进一步处理结果。我需要有效地执行此操作,因此例如我认为返回文档 uri 列表然后循环查询每个文档的效率不高。谢谢!
This is a newbie MarkLogic question. Imagine an xml structure like this, a condensation of my real business problem:
<Person id="1">
<Name>Bob</Name>
<City>Oakland</City>
<Phone>2122931022</Phone>
<Phone>3123032902</Phone>
</Person>
Note that a document can and will have multiple Phone elements.
I have a requirement to return information from EVERY document that has a Phone element that matches ANY of a list of phone numbers. The list may have a couple of dozen phone numbers in it.
I have tried this:
let $a := cts:word-query("3738494044")
let $b := cts:word-query("2373839383")
let $c := cts:word-query("3933849383")
let $or := cts:or-query( ($a, $b, $c) )
return cts:search(/Person/Phone, $or)
which does the query properly, but it returns a sequence of Phone elements inside a Results element. My goal is instead to return all the Name and City elements along with the id attribute from the Person element, for every matching document. Example:
<results>
<match id="18" phone="2123339494" name="bob" city="oakland"/>
<match id="22" phone="3940594844" name="mary" city="denver"/>
etc...
</results>
So I think I need some form of cts:search
that allows both this boolean capability but also allows me to specify what part of each document gets returned. At that point then I could further process the result with XPATH
. I need to do this efficiently so for example I think it would NOT be efficient to return a list of document uri's and then query for each document in a loop. Thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
你的方法并不像你想象的那么糟糕。只需进行一些更改即可使其按您的意愿工作。
首先,您最好使用
cts:element-value-query
而不是cts:word-query
。它将允许您将搜索值限制为特定元素。当您为该元素添加元素范围索引时,它的性能最佳,但这不是必需的。它也可以依赖于始终存在的单词索引。其次,不需要
cts:or-query
。cts:word-query
和cts:element-value-query
函数(以及所有其他相关函数)都接受多个搜索字符串作为一个序列参数。它们会自动被视为或查询。第三,电话号码是结果中的“主键”,因此返回所有匹配的 Phone 元素的列表是正确的方法。您只需要意识到生成的 Phone 元素仍然知道它们来自哪里。您可以轻松地使用
XPath
导航到父级和同级。第四,没有什么反对循环搜索结果的。这可能听起来有点奇怪,但它不会花费太多额外的性能。实际上,在 MarkLogic Server 中,它几乎可以忽略不计。当您尝试返回许多结果(超过数千个)时,大多数性能可能会损失,在这种情况下,大部分时间都浪费在序列化所有结果上。如果您可能需要处理大量搜索结果,那么立即开始使用分页是明智的。
要得到您所要求的内容,您可以使用以下代码:
祝您好运。
Your approach is not as bad as you might think. There are only a few changes necessary to make it work as you like.
First of all, you are better off using
cts:element-value-query
instead ofcts:word-query
. It will allow you to limit the searched values to a specific element. It performs best when you add an element range index for that element, but it is not required. It can rely on the always present word index as well.Secondly, there is no need for the
cts:or-query
. Bothcts:word-query
andcts:element-value-query
functions (as well as all other related functions) accept multiple search strings as one sequence argument. They are automatically treated as or-query.Thirdly, the phone numbers are your 'primary key' in the result, so returning a list of all matching Phone elements is the way to go. You just need to realize that the resulting Phone element are still aware of where they came from. You can easily use
XPath
to navigate to parent and siblings.Fourthly, there is nothing against looping over the search results. It may sound a bit weird, but it doesn't cost much extra performance. Actually, it is pretty much negligable, in MarkLogic Server that is. Most performance could be lost when you try to return many results (more than several thousands), in which case most time is lost in serializing it all. And if it is likely you will have to handle lots of search results, it is wise to start using pagination straight away.
To get what you ask, you could use the following code:
Best of luck.
这就是我要做的:
首先
,当将多个值传递到 word-query 和 value-query 及其表兄弟时,有一个隐式 OR,并且可以更有效地解析此查询从索引中,所以当你可以的时候就这样做。
其次,一个人可能会匹配多个电话号码,因此您需要额外的内部循环来有效地按个人进行分组。
我不会为此创建范围索引 - 不需要,而且它不一定更快。默认情况下,元素值有索引,因此您可以通过 element-value-query 利用这些索引。
您可以使用
SearchAPI
和一点XSLT
来完成所有这些工作。这将使在单个查询中开始组合名称、数字和其他条件变得容易。Here's what I would do:
}
First, there's an implicit OR when passing multiple values into word-query and value-query and their cousins, and this query is more efficiently resolved from the indexes, so do this when you can.
Second, an individual might match on more than one phone number, so you need that additional inner loop to effectively group by individual.
I would not create a range index for this - no need, and it isn't necessarily faster. There are indexes for element values by default, so you can leverage those with element-value-query.
You could do all of this with the
SearchAPI
and a littleXSLT
. That would make it easy to start combining names and numbers and other conditions in a single query.