Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find "smart" command #2076

Open
jqnatividad opened this issue Aug 22, 2024 · 1 comment
Open

find "smart" command #2076

jqnatividad opened this issue Aug 22, 2024 · 1 comment
Labels
enhancement New feature or request. Once marked with this label, its in the backlog.

Comments

@jqnatividad
Copy link
Owner

jqnatividad commented Aug 22, 2024

As opposed to the search command, the find command will be a "smart" command, using the stats cache along with a frequency cache (see #2075) to find values within the CSV that are based on the valid domain values for a column.

For example:
qsv find data.csv fruits_column apple

list all rows from data.csv where fruits_column=apple.

It will return no rows if fruits_column does not have the value apple.

So far, nothing new... but if we issue the command

qsv find data.csv fruits_column _top

It will return all rows where fruits_column is the "top" occuring value (i.e. the mode).

qsv find data.csv fruits_column _bottom returns all rows where fruits_column is equal to the least occurring value.

Further:
qsv find data.csv fruits_column _top3
returns all rows from the top 3 domain values for the fruits column (e.g. apple, strawberry, banana - the top three occurring values)

qsv find data.csv fruits_column _TOP3
on the other hand, only returns rows where the fruits column is equal to the third most occuring fruits_column value (e.g. banana)

The opposite behavior will behave in a symmetric manner for _bottom, _BOTTOM.

This was inspired by #2056

@jqnatividad jqnatividad added the enhancement New feature or request. Once marked with this label, its in the backlog. label Aug 22, 2024
@jqnatividad
Copy link
Owner Author

If a regular expression is used, it will only be applied to the domain values of the column, as compiled in the frequency table cache.

@jqnatividad jqnatividad changed the title find command find "smart" command Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request. Once marked with this label, its in the backlog.
Projects
None yet
Development

No branches or pull requests

1 participant