WIP: Constrained sampling based on EBNF grammars #354
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Matches the LlamaCPP behavior. I finished the EBNF parser which encodes the grammar in the same way as the implementation from: huggingface/transformers#27557
Unfortunately I think we may have to refactor or redesign much of the logic for processing acceptances in a way that is compatible with XLA. I know we can mimic a stack like data-structure (which I started to do), and I believe we can mimic the trie as well as containers. The issue I'm having is how possible it is to implement something like https://github.com/huggingface/transformers/pull/27557/files#diff-b7135bf8eda80faf271e4c9588eae893ebad019d2508df2f0afbe5b7ad5bbf4eR389 in a non-recursive way. Unless my understanding is incorrect and the actual incremental grammar acceptance process is the fixed depending on grammar and we can "compile" the acceptance up front