Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Adding a dimension in --where but not in --group-by of an MF query will output dysfunctional SQL #1392

Open
2 tasks done
siljamardla opened this issue Sep 5, 2024 · 0 comments
Labels
bug Something isn't working triage Tasks that need to be triaged

Comments

@siljamardla
Copy link

siljamardla commented Sep 5, 2024

Is this a new bug in metricflow?

  • I believe this is a new bug in metricflow
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Here's a query to get my_metric value, grouped by user name and filtered for country (a valid dimension):

mf query --metrics my_metric --where "city_id__country_id__country_code='fr'" --explain --group-by user_id__user_name

This produces SQL that has the following WHERE clause

WHERE city_id__country_id__country_code='fr'

but it does not have a join to the dimension table dim_city (nor to dim_country, which is also needed). Therefore, the SQL cannot be run.

There is a workaround: by adding the same dimension in the --group-by like this:

mf query --metrics my_metric --where "city_id__country_id__country_code='fr'" --explain --group-by user_id__user_name,city_id__country_id__country_code

the output SQL will have the necessary joins and the WHERE clause becomes valid.

Expected Behavior

MetricFlow should be able to recognise dimensions in --where clauses and add the necessary joins.

At the moment the WHERE clause can be arbitrary and compiled of multiple conditions, even complex things with brackets and whatnot. I agree that this solution allows for the most flexibility and the option should remain. However, there should also be a more controlled way of adding where clauses that relies on the semantics that we have defined and makes sure the outcome is valid SQL.

Steps To Reproduce

I have not tested if the issue only happens due to this particular dimension needing two joins (to dim_city and to dim_country). I suspect not, because the output SQL did not even have a join to dim_city either.

To reproduce, it should be enough to try a --where clause with a dimension and without a --group-by for the same dimension.

Relevant log output

No response

Environment

- OS:
- Python:
- dbt: 1.8
- metricflow:

Which database are you using?

other (mention it in "Additional Context")

Additional Context

spark/databricks

@siljamardla siljamardla added bug Something isn't working triage Tasks that need to be triaged labels Sep 5, 2024
@siljamardla siljamardla changed the title [Bug] Adding a (multi-hop?) dimension in --where but not in --group-by of an MF query will output dysfunctional SQL [Bug] Adding a dimension in --where but not in --group-by of an MF query will output dysfunctional SQL Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Tasks that need to be triaged
Projects
None yet
Development

No branches or pull requests

1 participant