海王星上的 Gremlin 缓慢 joinTime

发布于 2025-01-09 11:04:43 字数 6527 浏览 0 评论 0原文

我对 Neptune 中的请求性能有疑问。我有一个这样的图表：

hasId('A_id')（计数：1）-> out('has_group') (数量: 12) -> out('has_class').hasLabel('C') (计数: 9751) -> out('has_type').hasLabel('D') (计数: 9749) -> out('has_element') (计数: 472370) -> hasLabel（在（11个元素标签））（计数：107233）

我替换了所有标签以简化，但图形完全是这样的。内部会大大降低查询的性能。有关更多详细信息：

g.V('0f7a21df-9413-4c71-99f3-242ae25356a5').out('has_group').out('has_class').hasLabel('C').out('has_type').hasLabel('D').out('has_element').count()

此请求花费不到 1 秒并返回 472370。

如果我像这样添加最后一个 hasLabel(whithin()) ：

g.V('0f7a21df-9413-4c71-99f3-242ae25356a5').out('has_group').out('has_class').hasLabel('C').out('has_type').hasLabel('D').out('has_element').hasLabel(P.within('element_1','element_2','element_3','element_4','element_5','element_6','element_7','element_8','element_9','element_10','element_11')).count()

时间减少到 18 秒。当我分析查询时，我们可以看到 joinTime 占用了内部至少 90% 的查询执行时间：

Optimized Traversal
===================
Neptune steps: [
    NeptuneCountGlobalStep {
        JoinGroupNode {
            PatternNode[(?1=<0f7a21df-9413-4c71-99f3-242ae25356a5>, ?5=<has_group>, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .
            ],
            {estimatedCardinality=12, expectedTotalOutput=12, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=12
            }
            PatternNode[(?3, ?9=<has_class>, ?7, ?10) . project ?3,?7 . IsEdgeIdFilter(?10) .
            ],
            {estimatedCardinality=208482, expectedTotalOutput=2000, indexTime=0, joinTime=9, numSearches=1, actualTotalOutput=10945
            }
            PatternNode[(?7, <~label>, ?8=<C>, <~>) . project ask .
            ],
            {estimatedCardinality=424296, expectedTotalOutput=2000, indexTime=5, joinTime=111, numSearches=10945, actualTotalOutput=9751
            }
            PatternNode[(?7, ?13=<has_type>, ?11, ?14) . project ?7,?11 . IsEdgeIdFilter(?14) .
            ],
            {estimatedCardinality=9675934, expectedTotalOutput=11695, indexTime=15, joinTime=95, numSearches=10, actualTotalOutput=42226
            }
            PatternNode[(?11, <~label>, ?12=<D>, <~>) . project ask .
            ],
            {estimatedCardinality=2333386, expectedTotalOutput=11695, indexTime=23, joinTime=402, numSearches=42226, actualTotalOutput=9749
            }
            PatternNode[(?11, ?17=<has_element>, ?15, ?18) . project ?11,?15 . IsEdgeIdFilter(?18) .
            ],
            {estimatedCardinality=8562896, expectedTotalOutput=556904, indexTime=18, joinTime=442, numSearches=10, actualTotalOutput=472370
            }
            PatternNode[(?15, <~label>, ?16, <~>) . project ask . ContainsFilter(?16 in (<element_1>, <element_2>, <element_3>, <element_4>, <element_5>, <element_6>, <element_7>, <element_8>, <element_9>, <element_10>, <element_11>)) .
            ],
            {estimatedCardinality=1158922, indexTime=598, joinTime=18991, numSearches=472370
            }
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, Vertex(?7):VertexStep, Vertex(?11):VertexStep, Vertex(?15):VertexStep
            ], joinStats=true, optimizationTime=1, maxVarId=19, executionTime=20799
        }
    }
]

请求看起来非常简单，而且量也没有那么大。内部不是一个好的方法吗？您有一些改进查询的线索吗？

编辑1：

我尝试了@saikiranboga提供的请求，索引操作的数量越大越好（除以10），但连接时间仍然很高。我很困惑。

索引操作数

Index Operations
================
Query execution:
    # of statement index ops: 525563
    # of unique statement index ops: 525563
    Duplication ratio: 1.0
    # of terms materialized: 0

之前：之后的

Index Operations
================
Query execution:
    # of statement index ops: 53666
    # of unique statement index ops: 53666
    Duplication ratio: 1.0
    # of terms materialized: 0

Optimized Traversal
===================
Neptune steps: [
    NeptuneCountGlobalStep {
        JoinGroupNode {
            PatternNode[(?1=<0f7a21df-9413-4c71-99f3-242ae25356a5>, ?5=<has_group>, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .
            ],
            {estimatedCardinality=12, expectedTotalOutput=12, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=12
            }
            PatternNode[(?3, ?9=<has_class>, ?7, ?10) . project ?3,?7 . IsEdgeIdFilter(?10) .
            ],
            {estimatedCardinality=208000, expectedTotalOutput=2000, indexTime=0, joinTime=10, numSearches=1, actualTotalOutput=10945
            }
            PatternNode[(?7, <~label>, ?8=<C>, <~>) . project ask .
            ],
            {estimatedCardinality=424296, expectedTotalOutput=2000, indexTime=4, joinTime=102, numSearches=10945, actualTotalOutput=9751
            }
            PatternNode[(?7, ?13=<has_type>, ?11, ?14) . project ?7,?11 . IsEdgeIdFilter(?14) .
            ],
            {estimatedCardinality=9456689, expectedTotalOutput=11695, indexTime=13, joinTime=94, numSearches=10, actualTotalOutput=42226
            }
            PatternNode[(?11, <~label>, ?12=<D>, <~>) . project ask .
            ],
            {estimatedCardinality=2333386, expectedTotalOutput=11695, indexTime=17, joinTime=341, numSearches=42226, actualTotalOutput=9749
            }
            PatternNode[(?11, ?17=<has_element>, ?15, ?18) . project ?11,?15 . IsEdgeIdFilter(?18) .
            ],
            {estimatedCardinality=7919022, expectedTotalOutput=556904, indexTime=17, joinTime=411, numSearches=10, actualTotalOutput=472370
            }
            PatternNode[(?15, <~label>, ?16, <~>) . project ?16 . ContainsFilter(?16 in (<element_1>, <element_2>, <element_3>, <element_4>, <element_5>, <element_6>, <element_7>, <element_8>, <element_9>, <element_10>, <element_11>)) .
            ],
            {estimatedCardinality=1145096, indexTime=848, joinTime=15268, numSearches=473
            }
        }, finishers=[dedup(?15)
        ], annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, Vertex(?7):VertexStep, Vertex(?11):VertexStep, Vertex(?15):VertexStep@[element
                ], VertexLabel(?16):LabelStep
            ], joinStats=true, optimizationTime=1, maxVarId=19, executionTime=17283
        }
    }
]

原文

I have an issue with a request performance in Neptune. I have a graph like this :

hasId('A_id') (count: 1) -> out('has_group') (count: 12) -> out('has_class').hasLabel('C') (count: 9751) -> out('has_type').hasLabel('D') (count: 9749) ->
out('has_element') (count: 472370) -> hasLabel(Within(11 elements label)) (count: 107233)

I replace all the label to simplify, but the graph is exactly like this. The within decrease a lot the performance of the query. For more details :

g.V('0f7a21df-9413-4c71-99f3-242ae25356a5').out('has_group').out('has_class').hasLabel('C').out('has_type').hasLabel('D').out('has_element').count()

This request take less than 1 secondes and return 472370.

If I add the last hasLabel(whithin()) like this :

g.V('0f7a21df-9413-4c71-99f3-242ae25356a5').out('has_group').out('has_class').hasLabel('C').out('has_type').hasLabel('D').out('has_element').hasLabel(P.within('element_1','element_2','element_3','element_4','element_5','element_6','element_7','element_8','element_9','element_10','element_11')).count()

The time decrease to 18 seconds. And when I profile the query we can see a joinTime which take at least 90% of the query execution time on the within :

Optimized Traversal
===================
Neptune steps: [
    NeptuneCountGlobalStep {
        JoinGroupNode {
            PatternNode[(?1=<0f7a21df-9413-4c71-99f3-242ae25356a5>, ?5=<has_group>, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .
            ],
            {estimatedCardinality=12, expectedTotalOutput=12, indexTime=0, joinTime=1, numSearches=1, actualTotalOutput=12
            }
            PatternNode[(?3, ?9=<has_class>, ?7, ?10) . project ?3,?7 . IsEdgeIdFilter(?10) .
            ],
            {estimatedCardinality=208482, expectedTotalOutput=2000, indexTime=0, joinTime=9, numSearches=1, actualTotalOutput=10945
            }
            PatternNode[(?7, <~label>, ?8=<C>, <~>) . project ask .
            ],
            {estimatedCardinality=424296, expectedTotalOutput=2000, indexTime=5, joinTime=111, numSearches=10945, actualTotalOutput=9751
            }
            PatternNode[(?7, ?13=<has_type>, ?11, ?14) . project ?7,?11 . IsEdgeIdFilter(?14) .
            ],
            {estimatedCardinality=9675934, expectedTotalOutput=11695, indexTime=15, joinTime=95, numSearches=10, actualTotalOutput=42226
            }
            PatternNode[(?11, <~label>, ?12=<D>, <~>) . project ask .
            ],
            {estimatedCardinality=2333386, expectedTotalOutput=11695, indexTime=23, joinTime=402, numSearches=42226, actualTotalOutput=9749
            }
            PatternNode[(?11, ?17=<has_element>, ?15, ?18) . project ?11,?15 . IsEdgeIdFilter(?18) .
            ],
            {estimatedCardinality=8562896, expectedTotalOutput=556904, indexTime=18, joinTime=442, numSearches=10, actualTotalOutput=472370
            }
            PatternNode[(?15, <~label>, ?16, <~>) . project ask . ContainsFilter(?16 in (<element_1>, <element_2>, <element_3>, <element_4>, <element_5>, <element_6>, <element_7>, <element_8>, <element_9>, <element_10>, <element_11>)) .
            ],
            {estimatedCardinality=1158922, indexTime=598, joinTime=18991, numSearches=472370
            }
        }, annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, Vertex(?7):VertexStep, Vertex(?11):VertexStep, Vertex(?15):VertexStep
            ], joinStats=true, optimizationTime=1, maxVarId=19, executionTime=20799
        }
    }
]

The request seems quite simple and the volume not that much. The within is not the good approach here ? Do you have some clue to improve the query ?

EDIT 1:

I tried with the request provided by @saikiranboga and the number of index operation is large better (divided by 10) but the join time is still high. I'm quite confuse.

The index operation number before :

Index Operations
================
Query execution:
    # of statement index ops: 525563
    # of unique statement index ops: 525563
    Duplication ratio: 1.0
    # of terms materialized: 0

and after

Index Operations
================
Query execution:
    # of statement index ops: 53666
    # of unique statement index ops: 53666
    Duplication ratio: 1.0
    # of terms materialized: 0

Optimized Traversal
===================
Neptune steps: [
    NeptuneCountGlobalStep {
        JoinGroupNode {
            PatternNode[(?1=<0f7a21df-9413-4c71-99f3-242ae25356a5>, ?5=<has_group>, ?3, ?6) . project ?1,?3 . IsEdgeIdFilter(?6) .
            ],
            {estimatedCardinality=12, expectedTotalOutput=12, indexTime=0, joinTime=0, numSearches=1, actualTotalOutput=12
            }
            PatternNode[(?3, ?9=<has_class>, ?7, ?10) . project ?3,?7 . IsEdgeIdFilter(?10) .
            ],
            {estimatedCardinality=208000, expectedTotalOutput=2000, indexTime=0, joinTime=10, numSearches=1, actualTotalOutput=10945
            }
            PatternNode[(?7, <~label>, ?8=<C>, <~>) . project ask .
            ],
            {estimatedCardinality=424296, expectedTotalOutput=2000, indexTime=4, joinTime=102, numSearches=10945, actualTotalOutput=9751
            }
            PatternNode[(?7, ?13=<has_type>, ?11, ?14) . project ?7,?11 . IsEdgeIdFilter(?14) .
            ],
            {estimatedCardinality=9456689, expectedTotalOutput=11695, indexTime=13, joinTime=94, numSearches=10, actualTotalOutput=42226
            }
            PatternNode[(?11, <~label>, ?12=<D>, <~>) . project ask .
            ],
            {estimatedCardinality=2333386, expectedTotalOutput=11695, indexTime=17, joinTime=341, numSearches=42226, actualTotalOutput=9749
            }
            PatternNode[(?11, ?17=<has_element>, ?15, ?18) . project ?11,?15 . IsEdgeIdFilter(?18) .
            ],
            {estimatedCardinality=7919022, expectedTotalOutput=556904, indexTime=17, joinTime=411, numSearches=10, actualTotalOutput=472370
            }
            PatternNode[(?15, <~label>, ?16, <~>) . project ?16 . ContainsFilter(?16 in (<element_1>, <element_2>, <element_3>, <element_4>, <element_5>, <element_6>, <element_7>, <element_8>, <element_9>, <element_10>, <element_11>)) .
            ],
            {estimatedCardinality=1145096, indexTime=848, joinTime=15268, numSearches=473
            }
        }, finishers=[dedup(?15)
        ], annotations={path=[Vertex(?1):GraphStep, Vertex(?3):VertexStep, Vertex(?7):VertexStep, Vertex(?11):VertexStep, Vertex(?15):VertexStep@[element
                ], VertexLabel(?16):LabelStep
            ], joinStats=true, optimizationTime=1, maxVarId=19, executionTime=17283
        }
    }
]

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

暖伴 2025-01-16 11:04:43

您的观察是正确的，最后一个模式确实是在查询中花费更多时间的模式。 “ask”投影通常涉及多个索引查找并可能导致速度缓慢，这些通常由数据库优化，但在本例中并非如此。

您可以尝试重写查询的版本来获取标签并过滤它们，而不是使用 hasLabel(...) 过滤器，如下所示：

g.V('0f7a21df-9413-4c71-99f3-242ae25356a5')
.out('has_group').out('has_class').hasLabel('C')
.out('has_type').hasLabel('D')
.out('has_element').as("element")
.label().is(
  within(
    'element_1','element_2','element_3','element_4','element_5',
    'element_6','element_7','element_8','element_9','element_10','element_11'
  )
)
.dedup("element")
.count()

Your observation is correct, the last pattern is indeed what is taking more time in the query. "ask" projections usually involve multiple index look ups and could cause slowness, these are usually optimized by the database, but it is not in this case.

Could you try a rewritten version of the query that fetches the labels and filters them instead of the hasLabel(...) filter, like below:

g.V('0f7a21df-9413-4c71-99f3-242ae25356a5')
.out('has_group').out('has_class').hasLabel('C')
.out('has_type').hasLabel('D')
.out('has_element').as("element")
.label().is(
  within(
    'element_1','element_2','element_3','element_4','element_5',
    'element_6','element_7','element_8','element_9','element_10','element_11'
  )
)
.dedup("element")
.count()

回复收藏 0 原文

~没有更多了~