Neo4J Cypher具有不同的零件＆＃x27;如果一部分为空，则一无所获

发布于 2025-02-09 20:22:11 字数 1729 浏览 1 评论 0原文

我是Neo4J的新手，有时还在努力了解引擎盖下的情况。
我要做的主要工作是避免双重查询。因此，我需要经过第1部分到第2部分的合同，以免再次查看它们，依此类推。
我试图使用Union而不是如下所述收集，但这一切都慢得多。
我从这样的四个部分中建立了一个查询（简化）：

//PART1
MATCH (x1:X) -[:STARTS_AT]-> (somewhere) <-[:STARTS_AT]- (z1:Z)
MATCH 
   (x1) -[:ENDS_AT]-> (somewhereelse) <-[:ENDS_AT]- (z1)
WHERE somewhereconditions
WITH COLLECT(x contract_ids) as already_seen_contracts
   , COLLECT(all other stuff of interested from x and z) as taking_over

//PART2
MATCH (x2:X) -[:STARTS_AT]-> (somewhere) -[:IS_IN]-> (s2:S),
      (x2) -[:ENDS_AT]-> (somewhereelse) -[:IS_IN]-> (e2:S)
      path = allshortestpath(s2) CONNECTED (e2)

MATCH (z2:Z) -[:STARTS_AT]-> (somewhere) -[:IS_IN]-> (r:S),
        (z2) -[:ENDS_AT]-> (somewhereelse) -[:IS_IN]-> (t:S) 
WHERE z2 on path
AND x2.contracts not in already_seen_contracts

WITH already_seen + COLLECT(x contract_ids) as already_seen
   , taking_over +  COLLECT(all other stuff of interested from x and z) as taking_over

//PART3 and PART4 similar

UNWIND taking_over AS taking_over_unwind 
RETURN stuff from taking_over_unwind

我希望我简化的尝试清楚地表明了我正在使用的结构。我想到，在每个部分中的收集中，我都会创建一个容器首先，将其延续到下一部分，然后在所有结果中返回。
只要每个部分都有结果，正常工作。
现在，一个部分是空的（其他部分不是），并且整个查询最终根本没有返回任何内容。我确实希望它仍然会返回非空部分中发现的东西。

我尝试了可选的匹配项，但这不是一个选择，因为我明确不希望在第二部分中的第二个匹配中返回nulls。

我已经看到了使用虚拟节点的解决方案，我每次都必须将其添加到收集中，以确保至少有一个结果？

关于如何避免双重查询本身的任何想法，也可以对上述方法进行无效的解决方案。

另外：为什么这会发生？

非常感谢您的帮助！

编辑：示例数据：具有某种类型的节点X，而Z的Z具有属性和创建的属性。
我想离开

x1. contractids，z1.contractids，x1。创建，Z1.CREATEDATES
x2.Contractids，Z2.Contractids，x2.创建，Z2.CREATEDATES

“ 1s” x和z具有相同的开始和端，因为“ 2s” z沿x的路线。

原文

I am fairly new to neo4j and sometimes struggling with understanding what's going on under the hood.
The main thing I want to do is avoiding double queries. Therefore I need to carry over already seen contracts from part 1 to part 2 to not look at them again and so on.
I tried to use UNION instead of COLLECT as described below, but that is much slower over all.
I have a query build out of four parts like this (simplified):

//PART1
MATCH (x1:X) -[:STARTS_AT]-> (somewhere) <-[:STARTS_AT]- (z1:Z)
MATCH 
   (x1) -[:ENDS_AT]-> (somewhereelse) <-[:ENDS_AT]- (z1)
WHERE somewhereconditions
WITH COLLECT(x contract_ids) as already_seen_contracts
   , COLLECT(all other stuff of interested from x and z) as taking_over

//PART2
MATCH (x2:X) -[:STARTS_AT]-> (somewhere) -[:IS_IN]-> (s2:S),
      (x2) -[:ENDS_AT]-> (somewhereelse) -[:IS_IN]-> (e2:S)
      path = allshortestpath(s2) CONNECTED (e2)

MATCH (z2:Z) -[:STARTS_AT]-> (somewhere) -[:IS_IN]-> (r:S),
        (z2) -[:ENDS_AT]-> (somewhereelse) -[:IS_IN]-> (t:S) 
WHERE z2 on path
AND x2.contracts not in already_seen_contracts

WITH already_seen + COLLECT(x contract_ids) as already_seen
   , taking_over +  COLLECT(all other stuff of interested from x and z) as taking_over

//PART3 and PART4 similar

UNWIND taking_over AS taking_over_unwind 
RETURN stuff from taking_over_unwind

I hope that my clumsy attempt of simplifying made clear what the structure is I am using. I had the idea that with the COLLECT in each part I create kind of a container which I can add the new stuff from each part (using the first COLLECT and the NOT IN filter to avoid the double queries, which I had set it up this way in the first place), carry it over to the next part and return at the end all results.
Works fine as long as there are results for each part.
Now one part is empty (the others are not) and the whole query does not return anything at all at the end. I kind of did expect that it will still return the things found in the non-empty parts.

I tried the OPTIONAL MATCH, but that's not an option since I explicitly don't want the NULLs returned for example in the second MATCH in part1.

I have seen a solution using a dummy node, which I would have to add to the COLLECT every time to make sure there is at least one result?

Any ideas on how to do either avoiding double queries itself or a solution to the above where a null makes all results vanish are highly appreciated!

Also: why at all is this happening?

Thanks a lot for helping out!!

edited: Sample data:
Having nodes X of some kind and Z of another kind with properties contractid and createdate.
I want to get out all

x1.contractids, z1.contractids, x1.createdates, z1.createdates
x2.contractids, z2.contractids, x2.createdates, z2.createdates

The '1s' x and z have the same start and end, for the '2s' z is along the route of x.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

感受沵的脚步 2025-02-16 20:22:11

尝试分析您的查询，这样：

PROFILE MATCH (x1:X) -[:STARTS_AT]-> (somewhere) <-[:STARTS_AT]- (z1:Z)
MATCH 
   (x1) -[:ENDS_AT]-> (somewhereelse) <-[:ENDS_AT]- (z1)
WHERE somewhereconditions
WITH COLLECT(x contract_ids) as already_seen_contracts
   , COLLECT(all other stuff of interested from x and z) as taking_over

MATCH (x2:X) -[:STARTS_AT]-> (somewhere) -[:IS_IN]-> (s2:S),
      (x2) -[:ENDS_AT]-> (somewhereelse) -[:IS_IN]-> (e2:S)
      path = allshortestpath(s2) CONNECTED (e2)

MATCH (z2:Z) -[:STARTS_AT]-> (somewhere) -[:IS_IN]-> (r:S),
        (z2) -[:ENDS_AT]-> (somewhereelse) -[:IS_IN]-> (t:S) 
WHERE z2 on path
AND x2.contracts not in already_seen_contracts

WITH already_seen + COLLECT(x contract_ids) as already_seen
   , taking_over +  COLLECT(all other stuff of interested from x and z) as taking_over

UNWIND taking_over AS taking_over_unwind 
RETURN stuff from taking_over_unwind

您将获得查询的查询执行图。请注意，在内部，neo4j以行的形式将数据从阶段传递到其他的形式，因此，在其中一个阶段之一，所有行由于某些有条件检查而被过滤出来，前面的阶段不会做任何事情，结果集将是空的。

就您而言，您可以做的就是将查询分为多个部分。传递主键或节点的唯一标识符您要在其他查询中重复使用，并在唯一的标识符上创建一些索引。这样，性能就不会成为问题，您将从不同部分获得预期的输出，这些零件可以在应用程序级别组合。

Try profiling your query, like this:

PROFILE MATCH (x1:X) -[:STARTS_AT]-> (somewhere) <-[:STARTS_AT]- (z1:Z)
MATCH 
   (x1) -[:ENDS_AT]-> (somewhereelse) <-[:ENDS_AT]- (z1)
WHERE somewhereconditions
WITH COLLECT(x contract_ids) as already_seen_contracts
   , COLLECT(all other stuff of interested from x and z) as taking_over

MATCH (x2:X) -[:STARTS_AT]-> (somewhere) -[:IS_IN]-> (s2:S),
      (x2) -[:ENDS_AT]-> (somewhereelse) -[:IS_IN]-> (e2:S)
      path = allshortestpath(s2) CONNECTED (e2)

MATCH (z2:Z) -[:STARTS_AT]-> (somewhere) -[:IS_IN]-> (r:S),
        (z2) -[:ENDS_AT]-> (somewhereelse) -[:IS_IN]-> (t:S) 
WHERE z2 on path
AND x2.contracts not in already_seen_contracts

WITH already_seen + COLLECT(x contract_ids) as already_seen
   , taking_over +  COLLECT(all other stuff of interested from x and z) as taking_over

UNWIND taking_over AS taking_over_unwind 
RETURN stuff from taking_over_unwind

You will get the query execution graph for the query. Note that internally, neo4j passes data from stage to other in the form of rows, so when at one of the stages all the rows get filtered out due to some conditional check, the stages ahead won't do anything, and the result set will be empty.

In your case, what you can do is break your query into multiple parts. Pass the primary key or unique identifiers of the nodes, that you want to reuse in other queries, and create some indexes on unique identifiers. In this way, performance will not be an issue, and you will get expected output from different parts, which can be combined at application level.

回复收藏 0 原文

~没有更多了~