`project` step is way slower than equivalent lambda function #3438

Kukant · 2022-12-29T21:41:56Z

Kukant
Dec 29, 2022

Hi all,
I have a query that has some starting point, traverses through the graph and saves all the edges on the way.
After that, for each edge, I want to know the parent of it's source and the parent of it's target. It looks like this:

g.V().has("fqn", "SomeFullyQualifiedName").
repeat(outE("flows_into").dedup().store("edges").inV()).
until(
    or(
        cyclicPath(),
        outE("flows_into").count().is(eq(0))
    )
).
cap("edges").
unfold().
dedup().
project('i', 'o'). // this projection is very slow
    by(inV().in('child').in('child').id()).
    by(outV().in('child').in('child').id()).
dedup()

As stated in the title, the project step is very slow. I tried to replace it with this following map step:

map {
 g.V(it.get().getVertex(0).id()).in('child').in('child').id().next().toString() +
 ',' +
 g.V(it.get().getVertex(1).id()).in('child').in('child').id().next().toString()
}.

Otherwise, the query stays the same.

The performance difference is huge! The map step is appx. 50x faster then the equivalent project.

Do you know why?
It does not seem straightforward as the lambda starts 2 new traversals and then do the same steps. I would expect the project step to be faster.

Additional info

Backend: I tried both hbase and berkeleyje, same result

RAM - 12 GB
CPU - 4 cores, but only one is used at 100%

config:

gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=berkeleyje
storage.directory=../db/berkeley
query.batch=true
storage.parallel-backend-executor-service.core-pool-size=32

.profile() outputs

.map()

gremlin> g.V().has("fqn", "S00000071   - O00003795   - E00065775  ").
......1> repeat(outE("flows_into").dedup().store("edges").inV()).
......2> until(
......3>     or(
......4>         cyclicPath(),
......5>         outE("flows_into").count().is(eq(0))
......6>     )
......7> ).
......8> cap("edges").
......9> unfold().
.....10> dedup().
.....11> map {
.....12>  g.V(it.get().getVertex(0).id()).in('child').in('child').id().next().toString() +
.....13>  ',' +
.....14>  g.V(it.get().getVertex(1).id()).in('child').in('child').id().next().toString()
.....15> }.
.....16> dedup().count().profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[fqn.eq(S00000071   - O000037...                     1           1           4.878     0.05
  constructGraphCentricQuery                                                                   0.618
  GraphCentricQuery                                                                         6068.429
    \_condition=(fqn = S00000071   - O00003795   - E00065775  )
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=multiKSQ[1]
    \_index=byFqn
    backend-query                                                      1                       2.674
    \_query=byFqn:multiKSQ[1]
RepeatStep([JanusGraphMultiQueryStep, JanusGrap...                 15364       15364        6054.281    57.25
  OrStep([[PathFilterStep(cyclic,null,null)], [...                                          5548.037
    PathFilterStep(cyclic,null,null)                                                          86.405
    NotStep([VertexStep(OUT,[flows_into],edge)])                                            5330.086
      VertexStep(OUT,[flows_into],edge)                                                     5251.315
  JanusGraphMultiQueryStep                                         13187       13187           8.786
  JanusGraphVertexStep(OUT,[flows_into],edge)                      42237       42237         270.210
    \_condition=type[flows_into]
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=flows_into:SliceQuery[0x71A0,0x71A1)
    \_vertices=1
    optimization                                                                               0.099
    backend-query                                                     71                       0.109
    \_query=flows_into:SliceQuery[0x71A0,0x71A1)
    optimization                                                                               0.006
    optimization                                                                               0.006
    optimization                                                                               0.011
    optimization                                                                               0.006
    optimization                                                                               0.008
    .... This repeats appx 13000 times ....
    optimization                                                                               0.001
    optimization                                                                               0.001
  DedupGlobalStep(null,null)                                       28550       28550          32.335
  AggregateLocalStep(edges,null)                                   28550       28550          37.484
  EdgeVertexStep(IN)                                               28550       28550          22.724
  RepeatEndStep                                                    15364       15364        5674.057
SideEffectCapStep([edges])                                             1           1          12.266     0.12
UnfoldStep                                                         28550       28550          12.162     0.11
DedupGlobalStep(null,null)                                         28550       28550          20.728     0.20
LambdaMapStep(lambda)                                              28550       28550        4453.018    42.10
DedupGlobalStep(null,null)                                           716         716          15.927     0.15
CountGlobalStep                                                        1           1           2.799     0.03
                                            >TOTAL                     -           -       10576.062        -

.project() (40x slower in this case)

gremlin> g.V().has("fqn", "S00000071   - O00003795   - E00065775  ").
......1> repeat(outE("flows_into").dedup().store("edges").inV()).
......2> until(
......3>     or(
......4>         cyclicPath(),
......5>         outE("flows_into").count().is(eq(0))
......6>     )
......7> ).
......8> cap("edges").
......9> unfold().
.....10> dedup().
.....11> project('i', 'o').
.....12>     by(inV().in('child').in('child').id()).
.....13>     by(outV().in('child').in('child').id()).
.....14> dedup().count().profile()
==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
JanusGraphStep([],[fqn.eq(S00000071   - O000037...                     1           1           1.792     0.00
  constructGraphCentricQuery                                                                   0.481
  GraphCentricQuery                                                                          894.758
    \_condition=(fqn = S00000071   - O00003795   - E00065775  )
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=multiKSQ[1]
    \_index=byFqn
RepeatStep([JanusGraphMultiQueryStep, JanusGrap...                 15364       15364         887.501     0.22
  OrStep([[PathFilterStep(cyclic,null,null)], [...                                           481.243
    PathFilterStep(cyclic,null,null)                                                          72.213
    NotStep([VertexStep(OUT,[flows_into],edge)])                                             296.736
      VertexStep(OUT,[flows_into],edge)                                                      236.547
  JanusGraphMultiQueryStep                                         13187       13187           6.389
  JanusGraphVertexStep(OUT,[flows_into],edge)                      42237       42237         218.388
    \_condition=type[flows_into]
    \_orders=[]
    \_isFitted=true
    \_isOrdered=true
    \_query=flows_into:SliceQuery[0x71A0,0x71A1)
    \_vertices=1
    optimization                                                                               0.085
    backend-query                                                     71                       0.127
    \_query=flows_into:SliceQuery[0x71A0,0x71A1)
    optimization                                                                               0.002
    optimization                                                                               0.001
    optimization                                                                               0.001
    .... This repeats appx 13000 times ....
    optimization                                                                               0.001
    optimization                                                                               0.001
    optimization                                                                               0.001
  DedupGlobalStep(null,null)                                       28550       28550          24.340
  AggregateLocalStep(edges,null)                                   28550       28550          26.407
  EdgeVertexStep(IN)                                               28550       28550          17.597
  RepeatEndStep                                                    15364       15364         587.807
SideEffectCapStep([edges])                                             1           1           6.419     0.00
UnfoldStep                                                         28550       28550          25.409     0.01
DedupGlobalStep(null,null)                                         28550       28550          42.338     0.01
ProjectStep([i, o],[[CoalesceStep([[EdgeVertexS...                 28550       28550      401861.770    99.74
  CoalesceStep([[EdgeVertexStep(IN), VertexStep...                 28550       28550      151088.086
    EdgeVertexStep(IN)                                             28550       28550         197.506
    VertexStep(IN,[child],vertex)                                  28550       28550         758.922
    NoOpBarrierStep(2500)                                          28550       28550       49934.798
    VertexStep(IN,[child],vertex)                                  28550       28550         294.999
    NoOpBarrierStep(2500)                                          28550       28550       49922.020
    IdStep                                                         28550       28550          43.948
  CoalesceStep([[EdgeVertexStep(OUT), VertexSte...                 28550       28550      150442.530
    EdgeVertexStep(OUT)                                            28550       28550         183.143
    VertexStep(IN,[child],vertex)                                  28550       28550         287.669
    NoOpBarrierStep(2500)                                          28550       28550       49915.113
    VertexStep(IN,[child],vertex)                                  28550       28550         225.585
    NoOpBarrierStep(2500)                                          28550       28550       49839.859
    IdStep                                                         28550       28550          44.249
DedupGlobalStep(null,null)                                           716         716          66.330     0.02
CountGlobalStep                                                        1           1           3.852     0.00
                                            >TOTAL                     -           -      402895.415        -

vtslab · 2023-01-02T17:53:11Z

vtslab
Jan 2, 2023

This is a remarkable difference indeed, for which I do not see an immediate explanation. It would be interesting to see the full unedited output of the .profile() for both queries.
Normally, run times are dominated by vertex retrievals from the storage backend, so your argument about two additional queries does not hold. There must be some issue why retrieval optimization (like multiquery) or some cache reuse is not applied in the query with the profile() step.
Finally, do you use the latest janusgraph-0.6.2?

1 reply

Kukant Jan 5, 2023
Author

I have added the full .profile() outputs to the desctiption. Please let me know what you were able to deduct.
Yes, I am using the latest version 0.6.2.

vtslab · 2023-01-06T15:48:10Z

vtslab
Jan 6, 2023

No idea what is going on, just two things that catch my eye:

in the project() profile, the first step does not trigger a backend query. I guess this is due to the starting vertex still being present in the transaction cache. To be sure can you check that a g.tx().rollback() between the queries does not make a significant difference? For janusgraph-0.6 the db-cache is disabled by default and cannot influence this behavior.
you are still using the store() step which is an alias for the lazy version aggregate('local', 'edges'), while TinkerPop gave aggregate('edges') eager evaluation by default, see https://tinkerpop.apache.org/docs/3.5.3/reference/#aggregate-step Can you also check whether using aggregate("edges") makes any difference?

1 reply

Kukant Jan 13, 2023
Author

Sadly, there is no noticeable difference while using g.tx().rollback() or aggregate instead of store.

I also checked what is the CPU doing during processing these queries and it seems that:

using .project - one core is used at 100% load the whole time
using .map - one core is used at 100% at the beginning, then all cores get used for a while (moderate load).

Therefore, I would assume the .project step just does not get parallelized.

vtslab · 2023-01-14T10:35:03Z

vtslab
Jan 14, 2023

Excellent find and it makes sense: starting the new traversals in the closure allows them to be parallelized! If you want you can make an issue of this in this github repo to keep the research you did available. Actually, JanusGraph (or upstream TinkerPop) could implement some ParallellizeAnonymousTraversalStrategy to perform the optimization you found by chance automatically. Then, users can activate the Strategy using the .withStrategies(ParallellizeAnonymousTraversalStrategy) step.
Also note the following:

you use JanusGraph in an embedded way in a java client, but for systems with Gremlin Server allowing closures often is a security risk and your optimization would be unavailable
a ParallellizeAnonymousTraversalStrategy is not likely to become a default strategy, because in Gremlin Server you often do not want one query to grab all computing resources. Earlier versions of TinkerPop allowed for using mid-query g, but this was even ruled out for this reason, see: https://issues.apache.org/jira/browse/TINKERPOP-2361

You did a great job answering this question yourself and you were a bit unlucky that your question did not draw the attention of the project committers who read this forum. I am not a committer myself, but I have a broad general knowledge and memory of JanusGraph and TinkerPop that I like to leverage to stimulate talented developers.

1 reply

porunov Feb 7, 2023
Maintainer

Thank you @Kukant and @vtslab for valuable discussions. I've create an issue from this discussion: #3559

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`project` step is way slower than equivalent lambda function #3438

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

project step is way slower than equivalent lambda function #3438

Kukant Dec 29, 2022

Additional info

.profile() outputs

.map()

.project() (40x slower in this case)

Replies: 3 comments · 3 replies

vtslab Jan 2, 2023

Kukant Jan 5, 2023 Author

vtslab Jan 6, 2023

Kukant Jan 13, 2023 Author

vtslab Jan 14, 2023

porunov Feb 7, 2023 Maintainer

`project` step is way slower than equivalent lambda function #3438

Kukant
Dec 29, 2022

Replies: 3 comments 3 replies

vtslab
Jan 2, 2023

Kukant Jan 5, 2023
Author

vtslab
Jan 6, 2023

Kukant Jan 13, 2023
Author

vtslab
Jan 14, 2023

porunov Feb 7, 2023
Maintainer