Neo4J Cypher具有不同的零件'如果一部分为空,则一无所获
我是Neo4J的新手,有时还在努力了解引擎盖下的情况。
我要做的主要工作是避免双重查询。因此,我需要经过第1部分到第2部分的合同,以免再次查看它们,依此类推。
我试图使用Union而不是如下所述收集,但这一切都慢得多。
我从这样的四个部分中建立了一个查询(简化):
//PART1
MATCH (x1:X) -[:STARTS_AT]-> (somewhere) <-[:STARTS_AT]- (z1:Z)
MATCH
(x1) -[:ENDS_AT]-> (somewhereelse) <-[:ENDS_AT]- (z1)
WHERE somewhereconditions
WITH COLLECT(x contract_ids) as already_seen_contracts
, COLLECT(all other stuff of interested from x and z) as taking_over
//PART2
MATCH (x2:X) -[:STARTS_AT]-> (somewhere) -[:IS_IN]-> (s2:S),
(x2) -[:ENDS_AT]-> (somewhereelse) -[:IS_IN]-> (e2:S)
path = allshortestpath(s2) CONNECTED (e2)
MATCH (z2:Z) -[:STARTS_AT]-> (somewhere) -[:IS_IN]-> (r:S),
(z2) -[:ENDS_AT]-> (somewhereelse) -[:IS_IN]-> (t:S)
WHERE z2 on path
AND x2.contracts not in already_seen_contracts
WITH already_seen + COLLECT(x contract_ids) as already_seen
, taking_over + COLLECT(all other stuff of interested from x and z) as taking_over
//PART3 and PART4 similar
UNWIND taking_over AS taking_over_unwind
RETURN stuff from taking_over_unwind
我希望我简化的尝试清楚地表明了我正在使用的结构。我想到,在每个部分中的收集中,我都会创建一个容器首先,将其延续到下一部分,然后在所有结果中返回。
只要每个部分都有结果,正常工作。
现在,一个部分是空的(其他部分不是),并且整个查询最终根本没有返回任何内容。我确实希望它仍然会返回非空部分中发现的东西。
我尝试了可选的匹配项,但这不是一个选择,因为我明确不希望在第二部分中的第二个匹配中返回nulls。
我已经看到了使用虚拟节点的解决方案,我每次都必须将其添加到收集中,以确保至少有一个结果?
关于如何避免双重查询本身的任何想法,也可以对上述方法进行无效的解决方案。
另外:为什么这会发生?
非常感谢您的帮助!
编辑:示例数据: 具有某种类型的节点X,而Z的Z具有属性和创建的属性。
我想离开
x1. contractids,z1.contractids,x1。创建,Z1.CREATEDATES
x2.Contractids,Z2.Contractids,x2.创建,Z2.CREATEDATES
“ 1s” x和z具有相同的开始和端,因为“ 2s” z沿x的路线。
I am fairly new to neo4j and sometimes struggling with understanding what's going on under the hood.
The main thing I want to do is avoiding double queries. Therefore I need to carry over already seen contracts from part 1 to part 2 to not look at them again and so on.
I tried to use UNION instead of COLLECT as described below, but that is much slower over all.
I have a query build out of four parts like this (simplified):
//PART1
MATCH (x1:X) -[:STARTS_AT]-> (somewhere) <-[:STARTS_AT]- (z1:Z)
MATCH
(x1) -[:ENDS_AT]-> (somewhereelse) <-[:ENDS_AT]- (z1)
WHERE somewhereconditions
WITH COLLECT(x contract_ids) as already_seen_contracts
, COLLECT(all other stuff of interested from x and z) as taking_over
//PART2
MATCH (x2:X) -[:STARTS_AT]-> (somewhere) -[:IS_IN]-> (s2:S),
(x2) -[:ENDS_AT]-> (somewhereelse) -[:IS_IN]-> (e2:S)
path = allshortestpath(s2) CONNECTED (e2)
MATCH (z2:Z) -[:STARTS_AT]-> (somewhere) -[:IS_IN]-> (r:S),
(z2) -[:ENDS_AT]-> (somewhereelse) -[:IS_IN]-> (t:S)
WHERE z2 on path
AND x2.contracts not in already_seen_contracts
WITH already_seen + COLLECT(x contract_ids) as already_seen
, taking_over + COLLECT(all other stuff of interested from x and z) as taking_over
//PART3 and PART4 similar
UNWIND taking_over AS taking_over_unwind
RETURN stuff from taking_over_unwind
I hope that my clumsy attempt of simplifying made clear what the structure is I am using. I had the idea that with the COLLECT in each part I create kind of a container which I can add the new stuff from each part (using the first COLLECT and the NOT IN filter to avoid the double queries, which I had set it up this way in the first place), carry it over to the next part and return at the end all results.
Works fine as long as there are results for each part.
Now one part is empty (the others are not) and the whole query does not return anything at all at the end. I kind of did expect that it will still return the things found in the non-empty parts.
I tried the OPTIONAL MATCH, but that's not an option since I explicitly don't want the NULLs returned for example in the second MATCH in part1.
I have seen a solution using a dummy node, which I would have to add to the COLLECT every time to make sure there is at least one result?
Any ideas on how to do either avoiding double queries itself or a solution to the above where a null makes all results vanish are highly appreciated!
Also: why at all is this happening?
Thanks a lot for helping out!!
edited: Sample data:
Having nodes X of some kind and Z of another kind with properties contractid and createdate.
I want to get out all
x1.contractids, z1.contractids, x1.createdates, z1.createdates
x2.contractids, z2.contractids, x2.createdates, z2.createdates
The '1s' x and z have the same start and end, for the '2s' z is along the route of x.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试分析您的查询,这样:
您将获得查询的查询执行图。请注意,在内部,neo4j以
行的形式将数据从阶段传递到其他
的形式,因此,在其中一个阶段之一,所有行
由于某些有条件检查而被过滤出来,前面的阶段不会做任何事情,结果集将是空的。就您而言,您可以做的就是将查询分为多个部分。传递
主键
或节点的唯一标识符
您要在其他查询中重复使用,并在唯一的标识符上创建一些索引。这样,性能就不会成为问题,您将从不同部分获得预期的输出,这些零件可以在应用程序级别组合。Try profiling your query, like this:
You will get the query execution graph for the query. Note that internally, neo4j passes data from stage to other in the form of
rows
, so when at one of the stages all therows
get filtered out due to some conditional check, the stages ahead won't do anything, and the result set will be empty.In your case, what you can do is break your query into multiple parts. Pass the
primary key
orunique identifiers
of the nodes, that you want to reuse in other queries, and create some indexes on unique identifiers. In this way, performance will not be an issue, and you will get expected output from different parts, which can be combined at application level.