You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues and this bug is not already filed.
My model is hosted on OpenAI or Azure. If not, please look at the "model providers" issue and don't file a new one here.
I believe this is a legitimate bug, not just a question. If this is a question, please use the Discussions area.
Describe the bug
Doing some profiling to find opportunities to speed up queries, local search spends a good amount of time in get_entity_by_key(): 7s out of 20s for 49K entities, with another 9s spent waiting for GPT-4o to generate the response. That method makes an O(N) scan on the entity list when at least in the default case where embedding_vectorstore_key == EntityVectorStoreKey.ID it could do a O(1) lookup in the entity dictionary. In a quick test, replacing matched = get_entity_by_key(...) by matched = all_entities_dict.get(result.document.id) effectively made those 7s go away. Also, in the general case of O(N) full scans, since value is constant in get_entity_by_key(), calls to isinstance(), is_valid_uuid(), and replace() could be moved out of the loop to reduce the hot spot.
Steps to reproduce
Run local search with 50K entities.
Expected Behavior
Most of the query time is spent on generating the AI summary.
Do you need to file an issue?
Describe the bug
Doing some profiling to find opportunities to speed up queries, local search spends a good amount of time in
get_entity_by_key()
: 7s out of 20s for 49K entities, with another 9s spent waiting for GPT-4o to generate the response. That method makes an O(N) scan on the entity list when at least in the default case whereembedding_vectorstore_key == EntityVectorStoreKey.ID
it could do a O(1) lookup in the entity dictionary. In a quick test, replacingmatched = get_entity_by_key(...)
bymatched = all_entities_dict.get(result.document.id)
effectively made those 7s go away. Also, in the general case of O(N) full scans, sincevalue
is constant inget_entity_by_key()
, calls toisinstance()
,is_valid_uuid()
, andreplace()
could be moved out of the loop to reduce the hot spot.Steps to reproduce
Run local search with 50K entities.
Expected Behavior
Most of the query time is spent on generating the AI summary.
GraphRAG Config Used
Logs and screenshots
Output from Python
cProfile
:Additional Information
The text was updated successfully, but these errors were encountered: