Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Improve search accuracy for repeatable compound fields #10935

Open
vera opened this issue Oct 18, 2024 · 0 comments
Open

Feature Request: Improve search accuracy for repeatable compound fields #10935

vera opened this issue Oct 18, 2024 · 0 comments
Labels
Type: Feature a feature request

Comments

@vera
Copy link
Contributor

vera commented Oct 18, 2024

Overview of the Feature Request

I would like to request the ability to search for metadata records where multiple subfields within a single repetition of a repeatable metadata field can be matched in the search query. For example, in the "contributor" field within the citation block that contains the subfields "contributorName" and "contributorType," I would like to search for records where both subfields are matched in the same "contributor" instance. Currently, searches may match "name" in one contributor block and "type" in another, which leads to inaccurate search results.

Example: searching for datasets with contributions from research groups with names including "germany"

https://dataverse.harvard.edu/dataverse/harvard/?q=contributorName%3Agermany+AND+contributorType%3A%22Research+group%22

The first result (https://data.cipotato.org/dataset.xhtml?persistentId=doi:10.21223/P3/8J2ALV) doesn't match my intended query. It has a contributor that is a research group, and another contributor whose name includes "germany", but I wanted both to match in a single contributor.

image

(I tried to find a good example, I am sorry that this is somewhat contrived but I hope it illustrates the feature request.)

What kind of user is the feature intended for?
(Example users roles: API User, Curator, Depositor, Guest, Superuser, Sysadmin)

Anyone who is doing search queries (API User, Guest)

What inspired the request?

In searching for specific contributors (e.g., matching both their name and their role/type), the current system allows cross-matching across different repetitions of the same repeatable block. This makes it difficult to retrieve precise results.

What existing behavior do you want changed?

-

Any brand new behavior do you want to add to Dataverse?

I would like to add support for a query syntax that can restrict searches to ensure that the subfield matches must occur within the same instance of the repeatable field.

Any open or closed issues related to this feature request?

I didn't find any.

Are you thinking about creating a pull request for this feature?

Possibly. I've already done some brief research on how Solr might support these kind of searches, and it seems that the repeatable blocks must be stored as separate documents in the index:

  1. https://yonik.com/solr-nested-objects/
  2. https://solr.apache.org/guide/8_0/searching-nested-documents.html, https://solr.apache.org/guide/8_0/indexing-nested-documents.html
@vera vera added the Type: Feature a feature request label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature a feature request
Projects
None yet
Development

No branches or pull requests

1 participant